00:08airlied[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33521 be gentle 😛
07:32marysaka[d]: airlied[d]: Do you want me to fire a CTS on tu11x to test stuffs there?
07:36airlied[d]: Oh that might be useful
07:38marysaka[d]: yeah I saw your comments about tu11x and was wondering 😄
07:39marysaka[d]: on related note, coop matrix "works" on tu11x, but I disabled it because it ~50 times slower than on regular tu10x (and 5 time slower than fp16 on tu11x if you compare those)
07:54airlied[d]: Do NVIDIA advertise it?
08:01marysaka[d]: airlied[d]: No they disable it... I only noticed it worked because I swapped the wrong GPU once and was wondering why it was taking so long
08:20airlied[d]: I'm considering trying to type the spreadsheets of waits into rust, will see if I feel the same after I sleep on it
08:21karolherbst[d]: big mood
08:29asdqueerfromeu[d]: marysaka[d]: So would lavapipe be faster there?
09:02gfxstrand[d]: airlied[d]: Jeff just told me yesterday, "You really don't want to enable it". 😂
09:43marysaka[d]: yeah no it's really not worth it :sweating:
11:42mohamexiety[d]: dwlsalmeida: this should take care of the plane tiling issues for video if you want to give it a try: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33453
11:43dwlsalmeida[d]: mohamexiety[d]: mohamexiety[d] hey there, I saw you working on that, thanks a ton!
11:43dwlsalmeida[d]: I am a bit swamped with a few things this week
11:44dwlsalmeida[d]: But I will get to that
11:44mohamexiety[d]: all good yeah
15:27avhe[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1339618876723564605/Capture_decran_2024-06-21_131318.png?ex=67af60cd&is=67ae0f4d&hm=02c349c94decf4d5e93c58d9570676df052baf3512a973fbf33b141c8eeafd1f&
15:27avhe[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1339618877289664586/Capture_decran_2024-06-21_140346.png?ex=67af60cd&is=67ae0f4d&hm=80c974765bfc5e3767fac07092b93e5d6e38201f8a95041613e438e96d99dd03&
15:27avhe[d]: mohamexiety[d]: Hey you might know something about this, I have a bug related to layout for small vertical resolutions (<=16px). The luma plane is fine, but the chroma plane (note that this is subsampled 4:2:0 so it is really 8px tall) has evident issues. The thing is, this bug reproduced on official nvidia drivers, attached are two screenshots of a video on official Windows drivers, played on mpv
15:27avhe[d]: and vcl, and decoded via d3d11va.
15:28avhe[d]: I have suspicions that this is a hardware issue. This issue is probably related as well: <https://github.com/devkitPro/deko3d/issues/10>
15:29avhe[d]: (I also reported it to nvidia but they didn't get back to me)
15:37mohamexiety[d]: hmm this is weird. my initial guess would be tiling is configured wrong (since it's 8px, it should be one GOB tall, not two) but then why does it always use 2?
15:38mohamexiety[d]: since luma is fine, I'd guess 16px is fine, it's just anything smaller
15:41avhe[d]: Well, nvdec cannot output less than 2 gobs vertically
15:42avhe[d]: As far as I know at least (see eg here: <https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/video/nvdec_drv.h#L378>, every codec setup has a similar field)
15:44avhe[d]: Another thing is that on drivers where the frame data does a roundtrip to system memory before going to the graphics engine (eg cuvid/nvdec on mpv), the bug goes away. So my impression is that fetching is broken only in the 3d engine
15:44avhe[d]: Since the copy is done by the dma engine
15:53mohamexiety[d]: in that case yeah I would guess that it's 3d engine specific as well, when you have a 2 GOB layout image but the height is actually less than that. I'd lean into it being a hardware issue as well, but I think it should be fixable driver side
15:56mohamexiety[d]: haven't actually ran into it with the stuff I did before though so no experience with this issue 😦
15:59avhe[d]: Not sure how that would be fixable on software, unless you copy to a 1-gob surface
16:01avhe[d]: But anyway the issue is pretty much an edge case. If I ever get a user report about this, I'm comfortable just bumping the height constraint and aborting the hardware decode lol
16:02mohamexiety[d]: avhe[d]: yup, and it probably wouldn't be problematic given the overhead shouldn't be all that noticeable
16:05chillyigloo: Are there any current projects for using Nouveau in GPGPU applications? Following up on that, are there any prospects to integrate pieces of the open-source nvidia driver into Nouveau to expand GPGPU with Nouveau?
16:13DodoGTA: chillyigloo: I haven't heard much about CUDA (but Vulkan can technically be used for compute and NVK user-mode driver supports that API)
16:25chillyigloo: I see, are there any current work items to integrate parts of the open-sourced bits of the nvidia driver into Nouveau?
16:34gfxstrand[d]: dwlsalmeida[d]: Merged now so rebase at will
16:37redsheep[d]: chillyigloo: What nvidia open sourced is their kernel driver. There's ongoing work to make it possible to configure nvk to work with it, but when once that is done it will probably be more for debugging than anything else and won't be a default any time soon.
16:38redsheep[d]: Also you could maybe get somewhere right now with opencl on rusticl on zink on nvk on nouveau right now for compute, but don't expect anything like cuda performance
16:53chillyigloo: redsheep: Thanks, I'll look into trying that out. Mostly just trying to get more involved in driver dev with more exposure to GPU drivers
18:24glehmann: if anyone is able to collect stats on nvk, I would like to know if https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33498 has a positive effect
18:25glehmann: as far as I understand, nvidia has 1bit registers for booleans, similar to amd
18:27gfxstrand[d]: Yeah, we have predicate registers.
18:27gfxstrand[d]: I think there's a DRM shim that works with NVK.
19:12marysaka[d]: airlied[d]: Still ~1h30 to go but no failures related to fp16 so far
19:12karolherbst[d]: who dis
19:12karolherbst[d]: :ferrisPeekPing:
19:14airlied[d]: That coop perf test explodes in an register coalesce btw, I started trying to make sense of the hmma scheduling, not sure I'm there yet 🙂
19:15karolherbst[d]: oh no
19:15marysaka[d]: yeah it's probably going to be interesting :sweating:
19:15karolherbst[d]: are you ignoring the bypass stuff for now? 😄
19:35airlied[d]: Totally
19:35airlied[d]: Hmma also has a special chaining provision I think it is
19:37karolherbst[d]: yeah
20:24gfxstrand[d]: airlied[d]: What register coalesce?
20:27airlied[d]: it hits the debug assert on src_reg.comps in AssignRegBlocks::try_coalesce
20:27airlied[d]: failing instead blows up in the kernel with misaligned registers
20:38gfxstrand[d]: Uh oh...
22:15marysaka[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1339721734479155351/02_13_25_failures.csv?ex=67afc098&is=67ae6f18&hm=93c645308b1888ef06e527a1a28ce911a1f56daa11ccb7006eb68d96467081aa&
22:15marysaka[d]: Nothing around fp16 but we have lot of dgc failures :aki_thonk: (testing with vk-main.txt on 1.4.1.2)
22:43airlied[d]: I assume they aren't new
22:48marysaka[d]: unlikely related yes, I haven't done a full run on latest CTS in a while