00:07 x512[m]: How NVK video decoding is going? Any chance to merge this year?
00:28 esdrastarsis[d]: @ dwlsalmeida
01:36 TheHypervisor[m]: Are there any other Matrix.org users here? I'm not in any other rooms, I'd like to ask if matrix is down for anyone else.
01:51 x512[m]: I am on Matrix.
02:12 TheHypervisor[m]: Its back online
02:12 TheHypervisor[m]: Matrix.org was offline for a little bit
03:08 airlied[d]: karolherbst[d]: does gh100 have it? We have a few of those in beaker if you can get nouveau to work 🙂
03:48 gfxstrand[d]: x512[m]: I'm hoping to merge for 25.3. It's pretty high up my list once I crawl out of my current hole of super urgent stuff.
04:03 x512[m]: gfxstrand[d]: Do NVK support regular Vulkan opaque FD buffer sharing (not DMABUF and no modifiers) ?
04:04 gfxstrand[d]: Yes. It's still dma-buf, we just pretend it's opaque and follow the opaque rules. If we need to split the NVKMD interface or add a flag for openrm, we can do that.
04:05 gfxstrand[d]: But sharing between NVK and the blob likely won't work because we have different image layout code and make different heuristic choices.
04:31 mhenning[d]: oh no, the heuristic is part of the interface?
04:31 mhenning[d]: we never get to change the heuristic again?
06:11 karolherbst[d]: airlied[d]: only some of it, blackwell added more
06:13 karolherbst[d]: but I should test the coop matrix mr on hopper anyway...
08:59 karolherbst[d]: my RA MR looking good so far: `Pass: 835707, Fail: 1, Warn: 1, Skip: 872270, Timeout: 18, Flake: 3, Duration: 1:06:42, Remaining: 44:39`
08:59 karolherbst[d]: ehh *phi
09:46 karolherbst[d]: okay.. the "Fail" is `dEQP-VK.info.device_extensions` because: Fail (Unknown extension VK_KHR_present_id2) 🙃
09:50 karolherbst[d]: just checking if the timeouts all pass
09:52 karolherbst[d]: okay.. I guess games will ship with RTXNTC at some point and dxvk will somehow support it, right?
09:52 karolherbst[d]: oh well.. guess coop matrix will also be required for gaming at some point 🙃
09:53 karolherbst[d]: at least if you care about performance
10:52 mangodev[d]: karolherbst[d]: wait what?
10:52 mangodev[d]: i'm curious on why a compute thing would be so wanted for gaming
10:53 karolherbst[d]: texture compression apparently
10:53 karolherbst[d]: but also frame generation
10:53 mangodev[d]: funky
10:56 karolherbst[d]: I tried to play a game with frame gen at some point and it was horrible
10:56 mohamexiety[d]: coop matrix is also used for FSR4 support on linux
10:57 snowycoder[d]: I played The Talos Principle 2 entirely on framegen, it was an experience
11:00 karolherbst[d]: oof
11:00 karolherbst[d]: with or without input prediction?
11:01 karolherbst[d]: though with talos it might be fine (tm)
11:02 snowycoder[d]: No idea, I only had 1660 so there wasn't support for DLSS either
11:02 snowycoder[d]: Sometimes I needed to stop to just let things average out😂
11:03 asuasuasu[d]: (on amd) but i also played some talos II with fsr framegen but i think i had frame pacing issues because it felt stuttery
11:08 snowycoder[d]: snowycoder[d]: I chase those issues down to an `isetp` <-> `membar` conflict I guess?
11:08 snowycoder[d]: I now have the shader like this:
11:08 snowycoder[d]: ...
11:08 snowycoder[d]: nop delay=32 x3
11:08 snowycoder[d]: p0 = isetp.ne.and r0, rz
11:08 snowycoder[d]: membar.gl
11:08 snowycoder[d]: nop delay=32 x3
11:08 snowycoder[d]: If the first isetp is set with a delay >= 12 the test succedes, otherwise it fails.
11:08 snowycoder[d]: I have no idea how this can even happen tbh, the isetp isn't even touching memory
11:54 gfxstrand[d]: mhenning[d]: That's what `driverUUID` is for.
12:39 snowycoder[d]: How much would you hate if I introduce a function similar to `exec_latency` that signals how many cycles we should wait before issuing the current instruction? Say, `prev_exec_latency`?
12:39 snowycoder[d]: Asking for a friend :3
12:45 gfxstrand[d]: I mean, if we need it... <a:shrug_anim:1096500513106841673>
12:46 gfxstrand[d]: But I'm so confused...
12:47 gfxstrand[d]: Does membar just want everything idle?
12:47 snowycoder[d]: Hacking in a 13-cycle wait in every istruction before `membar` fixes all my tests (except `dEQP-VK.info.device_extensions` but that is under a Sombody Else's Problem Field)
12:47 gfxstrand[d]: That seems more likely than a fixed 12 cycles
12:48 snowycoder[d]: I guess it does, but how can we enforce everything to be idle?
12:49 gfxstrand[d]: Insert a fake dependency, like it reads every register.
12:50 gfxstrand[d]: Similar to what we do at control flow edges
12:57 snowycoder[d]: Wait, if it only waits for everything to be idle, shouldn't we just need a 9-10 cycle delay for isetp (according to `instr_latency`)?
12:57 snowycoder[d]: Why 12 cycles
12:59 gfxstrand[d]: Good question
12:59 gfxstrand[d]: Maybe there's a memory op before it which is what we're really waiting on?
13:01 snowycoder[d]: I added an enormous quantity of nops before and after the isetp/membar case to isolate the issue (something around 200 cycle delays) and it still happened
13:05 gfxstrand[d]: 😩
13:06 gfxstrand[d]: It's also possible isetp is 12 cycles
13:11 snowycoder[d]: So, isetp is 12 cycles, hw automatically fixes it for RaW/WaW but not for, erm, membar-hazards?
13:12 snowycoder[d]: This should be measurable using S2R
14:10 karolherbst[d]: CTS run with the GL driver: `Pass: 108686, Fail: 388, Crash: 12, Warn: 17, Skip: 10617, Timeout: 1, Missing: 1741, Flake: 38, Duration: 10:22, Remaining: 0` not as bad as I thought it would be 🙃
14:10 snowycoder[d]: snowycoder[d]: Nevermind, s2r doesn't seem to have that precision :/
14:10 karolherbst[d]: missing seems to be gtf stuff I'm too lazy to set up
15:11 karolherbst[d]: and without the patches it's at `Pass: 99165, Fail: 262, Crash: 9647, Warn: 18, Skip: 10613, Timeout: 2, Missing: 1741, Flake: 52, Duration: 13:49, Remaining: 0` nice...
15:11 karolherbst[d]: which makes sense, because I also fixed something
15:15 montjoie: hello, I dont set rtx50xx on https://nouveau.freedesktop.org/FeatureMatrix.html
16:13 snowycoder[d]: montjoie: That is the nouveau matrix, nouveau (the old opengl driver) does not support newer cards.
16:13 snowycoder[d]: NVK does and you can use opengl through Zink+NVK
16:13 snowycoder[d]: (also, that feature matrix is really old and nobody touched it in a while)
16:14 snowycoder[d]: Ah, Blackwell is a bit different, the support is in the dev branch and should land in 25.2 if I'm not mistaken
16:36 TheHypervisor[m]: <snowycoder[d]> "montjoie: That is the nouveau..." <- Nouveau is also the kernel module in use by NVK, no?
16:38 TheHypervisor[m]: Also I would take the 3d features listed as done on the matrix for Turing and newer cards to mean their OpenGL driver has some functionality, no?
16:46 snowycoder[d]: TheHypervisor[m]: Yes, nouveau is the same kernel module used by NVK, but I don't know of any place with an updated feature matrix.
16:46 snowycoder[d]: Some time ago there was a discussion on this channel that the cards should just work, without the need to look up a feature matrix.
16:47 snowycoder[d]: TheHypervisor[m]: The nv50 opengl driver does support newer cards, but from mesa 25.1 all newer cards use Zink+NVK: https://www.collabora.com/news-and-blog/news-and-events/goodbye-nouveau-gl-hello-zink.html
21:02 snowycoder[d]: snowycoder[d]: I failed to measure instruction latencies using S2R, I wrote things down in a gitlab issue and I'll continue another day
21:34 mhenning[d]: yeah, it can be tricky to measure numbers that make sense
21:47 TheHypervisor[m]: <snowycoder[d]> "Some time ago there was a..." <- Well you still need to know to enable the GSP in the kernel module.
21:47 TheHypervisor[m]: Are there any architectures that got both Nouveau GL and NVK? I imagine Zink+NVK is faster because probably a more complete implementation or something I dunno.
21:48 TheHypervisor[m]: *Enable the GSP via a kernel parameter
21:53 orowith2os[d]: Isn't GSP enabled by default everywhere now?
21:54 orowith2os[d]: TheHypervisor[m]: Before Zink was enabled, it was technically nouveau-gl and NVK. You can also run the compositor on Zink+NVK and apps on nouveau-gl by accident, especially when it comes to Flatpak apps and older Mesa distributions.
22:01 mhenning[d]: orowith2os[d]: No, it's default on ada+ but it's still off by defautl on turing/ampere
22:02 mhenning[d]: I'm working on changing that
22:02 orowith2os[d]: what's wrong with it right now up to before Ada?
22:03 mhenning[d]: everything kepler through ada has both nouveau gl and nvk available
22:03 mhenning[d]: orowith2os[d]: nothing's wrong with it I just need to talk the kernel people into accepting a patch
22:33 airlied[d]: I think most distros have it enabled by default
22:35 mhenning[d]: arch has it off, fedora has it on
22:35 mhenning[d]: not sure about debian-based distros
22:36 airlied[d]: huh? https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/blob/main/config?ref_type=heads#L7084
22:37 tiredchiku[d]: was gonna say, arch has had it on ny default for quite a while now
22:37 mhenning[d]: oh, I missed that somehow
22:39 airlied[d]: can't find it on debian, but no idea if debian ships a new enough kernel
22:43 airlied[d]: ubuntu does still seem to be n alright
23:41 gfxstrand[d]: snowycoder[d]: I wonder if we can use membar to test instructions latencies. 🤔
23:51 gfxstrand[d]: TheHypervisor[m]: You should ignore the features matrix. It doesn't apply to Zink+NVK. But I'm fairly sure Zink+NVK supports more than nouveau GL pretty much across the board. There might be an extension or two that's missing but I doubt they'll be anything you'll miss.
23:53 gfxstrand[d]: I guess that means I should update the wiki some more to make it clear that the features matrix is old news. 😅
23:55 snowycoder[d]: gfxstrand[d]: You think that membar stops in-flight ops?
23:55 gfxstrand[d]: I have no idea!
23:56 gfxstrand[d]: But it might be interesting to experiment with