IRC Logs of #nouveau on irc.freenode.net for 2025-04-03

01:44 mangodev: hmmmm
01:44 mangodev: was finally able to catch the soft crash that discord has had with nvk
01:44 mangodev: seems to be a failed syscall?
01:45 tiredchiku[d]: logs <a:bellybongocat:1215716434005467267>
01:45 mangodev: weird that the logs work today
01:45 mangodev: last time i tried it was a dead page
01:46 mangodev: how do you search that in the channel logs?
01:47 mangodev: looks to be a message id, but idk how to access it
01:48 airlied[d]: oh no movs from c[] either
01:49 mhenning[d]: yeah, that makes sense to me. probably constbufs are always an ldc now
02:11 mangodev: on a related note… is there a nouveau discord? i've seen notions about it here and there, seems like a hotspot for testers
02:12 mangodev: have seen it brought up both here and in gitlab (https://gitlab.freedesktop.org/mesa/mesa/-/issues/10763 https://gitlab.freedesktop.org/mesa/mesa/-/issues/12506)
02:18 tiredchiku[d]: no, but there is an unofficial fd.o server where the nouveau IRC channel is bridged over
02:19 tiredchiku[d]: https://discord.gg/ZjDvapU4
02:21 tiredchiku[d]: the quote from the first link is from "Linux Gaming Dev", where all the wine/proton guys hang out
02:22 tiredchiku[d]: https://discord.gg/linuxgamingdev
02:22 tiredchiku[d]: and zmike is zmike, posts memes just about everywhere :p
02:27 tiredchiku[d]: but you're not missing out on any nouveau specific testing by not being in those, everything happens here, maybe you'd miss out on screenshots of games running on NVK
02:28 mangodev: ah
02:28 mangodev: good to know
02:29 mangodev: mainly just wanted to know places where it's easier to find old messages, since i tend to miss a lot of conversation :P
02:33 mangodev[d]: tiredchiku[d]: *i see what you did there*
02:33 mangodev[d]: wait
02:33 mangodev[d]: *THAT'S WHAT THE `[d]` MEANT THE WHOLE TIME??*
02:33 mangodev[d]: :|
02:35 dwfreed: yes, [d] for discord user
02:37 tiredchiku[d]: :P
02:37 tiredchiku[d]: and [m] for matrix
02:37 mangodev[d]: ahhhhhh makes a lot more sense
02:38 mangodev[d]: ngl i thought `[d]` was for developer/contributor :/
02:38 tiredchiku[d]: nah hehe
02:38 tiredchiku[d]: https://tenor.com/view/see-i-pulled-a-sneaky-sneaky-on-gif-13247968
02:39 mangodev[d]: tiredchiku[d]: also… that was an emote? 😭
02:39 mangodev[d]: lmao
02:40 mangodev[d]: it looked like you posted a mysterious message link or something
02:40 mangodev[d]: as if it had already been talked about before
02:41 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357183171648684062/image.png?ex=67ef46d3&is=67edf553&hm=324db1458550d946c5a5f4dc10f03e724483c2bf1d5d3730dd47413abb33d5e1&
02:41 mangodev[d]: tiredchiku[d]: here are those logs btw
02:41 mangodev[d]: my top suspect is on animated gifs
02:41 mangodev[d]: the logs have massively slowed down as i disabled autoplay gifs
02:43 tiredchiku[d]: and you're using nvk with zink for the whole desktop?
02:43 mangodev[d]: i think https://gitlab.freedesktop.org/mesa/mesa/-/commit/58f8143da3ef5ba6afa8d55e4ccb4b08014ab9e8 caused it, as [etnaviv is the only driver with an impl for it](https://gitlab.freedesktop.org/mesa/mesa/-/commit/042138093f6e48134f073199b3d800161f258baf)
02:44 mangodev[d]: my hypothesis is that discord sees the format, and thus tries to use it
02:44 mangodev[d]: i'd have to test with some other application to see if this is discord-specific or electron-related
02:44 mangodev[d]: or worse, chromium related
02:45 mangodev[d]: as i don't see why chromium *wouldn't* have this issue too if electron does as well
02:48 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357184869855264859/image.png?ex=67ef4868&is=67edf6e8&hm=54f8f6ee861c3e4fff0acc493578b1ab569000a499b2421a2ba6e4a8165fdd7c&
02:48 mangodev[d]: ngl these issues are making me want to test drive labwc
02:49 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357185259157852291/image.png?ex=67ef48c5&is=67edf745&hm=f5f96a43caaab8e748dfb800e8684db0229d61b8c74129ddf9bcb759569caea8&
02:49 mangodev[d]: this stack trace of the discord soft crash is so long that i had to pipe it to vscode and codesnap it :|
02:49 tiredchiku[d]: mangodev[d]: possibly, yeah
02:50 tiredchiku[d]: I'm not very familiar with how mesa exposes image formats tbf
02:50 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357185588406521967/image.png?ex=67ef4913&is=67edf793&hm=4661feccfc9c45e5648c83c8f8d795976cf5f1d306f9e625b1f909cef5cc55fa&
02:50 mangodev[d]: also this error right after (in reverse because `--reverse`)
02:51 mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357185770485452810/image.png?ex=67ef493e&is=67edf7be&hm=df281315b7f6aa86966b0634b1ec7aefdc7582076f95a7a1f1116476c1e9d227&
02:51 mangodev[d]: which is weird because i have this disabled
02:55 tiredchiku[d]: :doomthink:
03:46 airlied[d]: ldc also seems to have some sort of shift on the cb.offset value, my branch has some hackery
04:15 airlied[d]: and I think tld is moved completely
04:19 mhenning[d]: airlied[d]: yeah, I saw that patch.
04:20 mhenning[d]: and yeah, the texture instruction forms have all moved around
04:21 mhenning[d]: this has the opcode changes: https://gitlab.freedesktop.org/mhenning/re/-/snippets/7832
04:22 mhenning[d]: At a glance, bindless textures might have just moved opcodes but I'm not sure cbuf textures are around any more
04:22 mhenning[d]: so there might be more fundamental changes there
04:23 mhenning[d]: planning to hack at it more tomorrow
04:25 airlied[d]: oh nice that might help be track down some more thigns
04:26 airlied[d]: the ldc offset only seems to apply to uniform version, but I think I need better ways to poke it
04:30 mhenning[d]: I've been writing tests against the disassembler, if that helps at all https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34334
04:32 mangodev[d]: i'm curious
04:32 mangodev[d]: what's the whole thing with the instruction timing stuff as of late? what is it all for?
04:38 mhenning[d]: the instruction latencies? the goal is to improve performance (some of it might also help with correctness)
04:39 orowith2os[d]: But nak is already written in Rust, surely it's already correct /j
04:40 mangodev[d]: mhenning[d]: i'd think most of it is to do with the test suite, but how does a table of instructions and their execution times help the performance of the live driver? does the compiler try choosing ways to compile calls based off of those execution times?
05:05 mhenning[d]: the short answer is that the compiler is in charge of scheduling when instructions happen, so it needs to know how long they take
05:31 airlied[d]: in order to implement cooperative matrix support properly I need proper latency infrastructure
05:32 airlied[d]: and Red Hat had a lot of the info under NDA
05:47 airlied[d]: branch also changes by the looks of it
06:04 mangodev[d]: so uhhhhh
06:04 mangodev[d]: obs is screwed rn :P
06:05 mangodev[d]: mangodev[d]: obs says "failed to initialize muxer" when trying to record because it uses nv12
06:05 mangodev[d]: i assume it's trying to use the hardware for nv12, but fails
08:42 snowycoder[d]: gfxstrand[d]: Did it (with also other minor edits) https://gitlab.freedesktop.org/nouveau/nv-shader-tools/-/merge_requests/8
08:43 snowycoder[d]: Also, MR9 adds support for sm32 even though I'm the only one to use it (and I haven't spent much time for scheduling)
12:45 gfxstrand[d]: snowycoder[d]: That's fine. We really don't need things to be perfect with the tool.
14:35 gfxstrand[d]: snowycoder[d]: Merged. I also fixed it up yesterday to try and find the newest CUDA version.
15:12 snowycoder[d]: I tried to encode SuLd/SuSt (using your patches for format transforming).
15:12 snowycoder[d]: The bad news is that there are 0 tests that pass
15:12 snowycoder[d]: The "good" news is that they weren't passing even with old codegen and we get the same error type (just a black image)
15:14 snowycoder[d]: But this assert (that I disabled) might be useful "Kepler image lowering requires image params to be loaded from the descriptor set which we don't currently support."
15:14 snowycoder[d]: I just have no idea of how to support it
15:26 snowycoder[d]: gfxstrand[d]: Nice, I added a quick-fix to force a CUDA home through env-vars because I need the oldest one.
16:37 gfxstrand[d]: Heh. Yeah, we should have a flag or ENV var for that.
16:57 snowycoder[d]: It's in the PR
19:50 ristovski[d]: I wish nvidia implemented the V/F curve nvapi in the GSP (thus making it usable in Linux). Even with the indirect undervolt (which applies a global overclock across all pstates) I drop ~10W at 30% GPU usage, with very little to no performance drop. Seems like one could really dial this down for efficiency with the proper V/F curve
19:56 ristovski[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357443541202112552/image.png?ex=67f03950&is=67eee7d0&hm=52399f8915bf3a6db23174c8266875450a7de71d205d56ec607363c6be30046c&
19:56 ristovski[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357443541458227350/image.png?ex=67f03950&is=67eee7d0&hm=744ad706f8a2dc58ac7d996fd8168083120b7d7863bc7f6f2bcbb6c6d96232f6&
19:56 ristovski[d]: (GFX clock doesn't take into account the +200MHz OC for some reason)
20:31 paulwalker: hey my mouse cursor has really bad fb corruption, is there a fix? this is the only issue I am having with nouveau
20:31 paulwalker: gnome 48 wayland, 3070 ti
20:35 redsheep[d]: It's an issue with a 32x32 hardware cursor plane
20:35 redsheep[d]: Looks like mutter let's you use a software cursor with MUTTER_DEBUG_DISABLE_HW_CURSORS
20:36 paulwalker: thanks
21:19 djdeath3483[d]: gfxstrand[d]: : planning to implement maintenance8?
21:20 djdeath3483[d]: gfxstrand[d]: : can nvidia's HW do non-uniform offsets? 🙂
21:21 karolherbst[d]: it can, but it's gonna be slow
21:21 gfxstrand[d]: Probably. Haven't looked at it, though.
21:21 gfxstrand[d]: Non-uniform offsets for what?
21:21 djdeath3483[d]: for texture operations
21:22 gfxstrand[d]: We have lowering. <a:shrug_anim:1096500513106841673>
21:22 djdeath3483[d]: cute
21:22 karolherbst[d]: tld4 offsets you mean?
21:22 djdeath3483[d]: it's not up to scratch
21:23 djdeath3483[d]: karolherbst[d]: not sure what that is
21:23 djdeath3483[d]: I just run into a HW issue on Intel which I didn't notice initially
21:23 djdeath3483[d]: because of multiple CIs down
21:23 gfxstrand[d]: I would have to look at things in more detail. If we have offsets in hardware, then non-uniform is supported.
21:23 karolherbst[d]: I'd assume the hardware doesn't care if the offset param to tex isn't uniform
21:23 djdeath3483[d]: lowering is my only option so far
21:23 karolherbst[d]: though the 4 offset thing is a constant
21:23 djdeath3483[d]: karolherbst[d]: not anymore
21:24 karolherbst[d]: in hardware
21:24 djdeath3483[d]: okay
21:24 karolherbst[d]: but that's the packed offset stuff where you do 4 offsets? in one op or this weird thing, I don't know
21:24 karolherbst[d]: tld4 is the name
21:24 djdeath3483[d]: I added support for non-uniform and that worked on 4 HW generations
21:24 djdeath3483[d]: but one is broken
21:25 djdeath3483[d]: the sampler cache key doesn´t seem to include the offset is my theory
21:25 karolherbst[d]: ahh
21:25 djdeath3483[d]: and I guess I'm just passing the other generations by luck
21:26 djdeath3483[d]: so fixing up the lower_offset of nir_lower_tex()
21:26 karolherbst[d]: I meant `nir_texop_tg4` above
21:26 karolherbst[d]: that exist natively in hardware on nvidia
21:26 djdeath3483[d]: but still running into issues with bias
21:27 karolherbst[d]: `int8_t tg4_offsets[4][2];`
21:27 djdeath3483[d]: karolherbst[d]: same for intel, but those new dynamic & non-uniform offsets got added everywhere
21:27 karolherbst[d]: I thought nvidia was the only one having that 😄
21:27 djdeath3483[d]: and lower_offset in nir_lower_tex only deals with LOD0
21:27 karolherbst[d]: but if it's not that multi offset stuff, it's just a normal gpr source on nvidia
21:28 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/compiler/nak_nir_lower_tex.c?ref_type=heads#L155
21:28 gfxstrand[d]: Yeah, it should all "just work"
21:29 gfxstrand[d]: It's all normal GPRs
21:31 djdeath3483[d]: good for you I guess
21:32 djdeath3483[d]: looks like nobody needs lower_offset it seems
21:33 gfxstrand[d]: Yeah, we lower projectors and a bunch of txd cases and that's about it. The hardware just does the things.
21:36 djdeath3483[d]: I'm not quite sure how I'm supposed to deal with bias
21:36 djdeath3483[d]: I just added it to lod
21:36 djdeath3483[d]: then put it in the size query so that it scales the offsets to the lod of the texture operations
21:37 djdeath3483[d]: but that's failing a bunch of tests with bias
21:37 airlied[d]: okay fixed bra
21:42 mhenning[d]: I'm working on texture ops right now and they're a little annoying
21:45 airlied[d]: also need to figure out QMD so I can launch a compute shaer
22:37 mhenning[d]: okay, texture encodings should be a lot better on my branch
22:37 mhenning[d]: I disabled cbuf textures on blackwell though and some details around lod offsets are probably broken
22:44 x512[m]: Do Vulkan suppose to support sending command buffers from different threads to the same queue in a wrong order, but synchronized with timeline semaphores?
22:45 Lynne: yes, that's what we do in ffmpeg
22:50 x512[m]: It will work with DRM syncobjs because it will block thread until all wait dependencies are scheduled, but I wonder how is it handled in Nvidia proprietary Vulkan driver.
22:51 Lynne: err, it can't do that
22:51 Lynne: unless the blocking happens elsewhere
22:51 Lynne: queue submissions are externally synchronized, so you need to wait on a mutex to even submit from different threads
22:52 x512[m]: So ffmpeg is not Vulkan spec compilant?
22:52 Lynne: no, we wait on a mutex
22:53 x512[m]: I mean ordering submits from different threads.
22:53 Lynne: before each vkQueueSubmit2 to make sure no other thread is submitting
22:53 Lynne: no, we wait on timeline semaphores on any dependencies
22:54 Lynne: ah... we wait for a queue to finish execution on the previously submitted buffer before we start writing a new command buffer
22:55 x512[m]: Imagine there are submissions s1 and s2. s2 depends on s1 with timeline semaphore. s1 and s2 are submitted from different threads to the same queue. It is possible that s2 will be submitted before s1 and it will deadlock because timeline semaphore will be never signaled.
22:56 x512[m]: DRM syncobj solve problem by blocking thread that submits s2 until s1 is submitted, but I am not sure that this behavior is guaranteed by Vulkan.
22:58 Lynne: nevermind, we do enforce correct ordering for decoding in ffmpeg
23:04 mhenning[d]: x512[m]: I think that behavior is required by vulkan
23:10 x512[m]: https://docs.vulkan.org/samples/latest/samples/extensions/timeline_semaphore/README.html#_out_of_order_submission_fallbacks_for_single_queue_implementations
23:11 x512[m]: It says that such deadlocks are possible...
23:13 leftmostcat[d]: I'm building mesa with `RUSTC=clippy-driver` since I've been doing some work on rusticl, but as a result, I get a bunch of warnings from nouveau. A lot of those are from automated bindings, but would an MR just for cleaning up clippy on nouveau be helpful, or a whole lot of churn for nothing?
23:13 mhenning[d]: ah, maybe I'm wrong. I'm not very well versed in timeline semaphore semantics
23:13 Lynne: maybe one day firmware submission and queue management would save us from this, as long as there's enough queues to resolve dependencies
23:18 Lynne: or rather resumable execution
23:19 orowith2os[d]: leftmostcat[d]: I'm assuming the warnings come from things like type names?
23:20 orowith2os[d]: :akipeek:
23:20 orowith2os[d]: For stuff that's not only syntax, I'd assume disabling those warnings is best
23:20 orowith2os[d]: If it can be fixed in the bindings generator, that's preferable
23:23 gfxstrand[d]: leftmostcat[d]: If it's Bingen, I'm not sure how much we can do about it. If it's from parse_header.py, we should consider fixing it.
23:25 leftmostcat[d]: Sorry, that was unclear. There are a lot that I assume aren't tractable because of bindgen, but there are a bunch that are just various style things in the handwritten code (`return foo;` at the end of a function, etc.). Though there are a few at a glance that could be handled with slightly more work in the templates in `struct_parser.py`.
23:27 leftmostcat[d]: Just don't want to drop an MR that's a whole bunch of non-functional changes without checking in first. 🙂
23:35 gfxstrand[d]: Go ahead and make an MR. If I don't like a patch, I'll say so.
23:42 gfxstrand[d]: I know when Karol did that exercise a while back there were a handful of things which I felt made the code worse. But also, it's been a while.