09:50ilgaz: hello all. Is this nouveau bug or KDE/wayland bug? https://paste.opensuse.org/70027008
09:57ilgaz: BRB disabled powertop
10:34ilgaz: karolherbst: my wayland/kde days are over :-) https://bugs.kde.org/show_bug.cgi?id=458291
10:34ilgaz: I will wait for new Mesa/Nouveau/kde hit opensuse tumbleweed
12:16karolherbst: Ilgaz: this is on nv50 with newest mesa?
12:17karolherbst: I found a few regressions, which I am already looking into. Maybe nvc0 is affected as well, but I didn't notice anything there yet
12:19karolherbst: Ilgaz: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18236 should fix it
12:23Ilgaz: karolherbst: says NVAC (nvidia 9400 here). I mailed to the guy who maintains next version of nouveau on opensuse mentioning your patch, no response yet
12:23Ilgaz: newest mesa on opensuse (22.3 I guess) freezes my system
12:24RSpliet: Not entirely surprising, NVAC is special in that it's an IGP.
12:24RSpliet: Stolen RAM, more like G9x than like NVA3/5/8
13:22karolherbst: RSpliet: yeah.. but also mesa broke nv50 :P
13:45RSpliet: or in the case of NVAC: broke it even harder :>
13:54Ilgaz: I am not experienced in opensuse packaging. nouveau is too complex for me. Otherwise I would package it and apply all those patches
14:02RSpliet: ligaz: there's only one way to gain experience ;-) Don't know much about OpenSUSE packaging, but I've dealt with Fedora's RPM building tools. IIRC OpenSUSE uses RPM as well, so I bet the workflow is pretty similar and there's bound to be guides on the internet for how to build RPMs on OpenSUSE
14:02RSpliet: In this case, you want to grab a mesa SRPM and add the above patches to it. Just mesa, no other components required.
14:49karolherbst: RSpliet, ilgaz: it would be easier to just review and merge the patch :P
15:35karolherbst: HdkR: soo... any good idea how helper invocations work on nvidia hardware? Atm it feels like those are magically enabled when needed as atm I don't see any way of configuring it explicitly.. or maybe nvidia has a silly name for them...
16:34HdkR: karolherbst: "magic"
16:34karolherbst: I see that...
16:35karolherbst: HdkR: we are more confused why const buffer and global loads don't seem to work in helper invocs
16:41HdkR: loads not happening shouldn't affect anything missing in a helper?
16:41karolherbst: HdkR: well.. they do if the load decides on when to quite a loop
16:42karolherbst: *quit
16:43karolherbst: I think at this point we are more confused about why we are getting helper invocations at all
16:43karolherbst: it's e.g. the dEQP-VK.glsl.indexing.matrix_subscript.mat3x4_dynamic_loop_write_static_read_fragment vulkan tests which doesn't seem to do very much
16:44HdkR: Should only get a helper invocation if the shader is doing a demote with kill or w/e
16:45karolherbst: it doesn't
16:46karolherbst: HdkR: what if you draw a triangle, would pixels on the edge be helper invocs?
16:46HdkR: I think so
16:46karolherbst: ahh
16:46HdkR: Since if you need the derivatives then you still need the helper invocations
16:47karolherbst: right...
16:47karolherbst: so let's assume we are getting helpers for real, why are our const buffer loads not happening?
16:47karolherbst: I know that the hw skips writes in helpers
16:47karolherbst: but I got no information on loads being skipped
16:47karolherbst: unless there is a weirdo knob to disable them
16:48karolherbst: anyway, 3d headers are public :P
16:48karolherbst: but at least I couldn't find anything
16:48karolherbst: and the shader helper also doesn't seem to contain such knobs
16:55HdkR: Not even sure if the shader header needs a knob for this
16:56karolherbst: mhhh
16:59karolherbst: HdkR: it still doesn't make much sense. I mean, the hw would have to magically know, that not loading random stuff in a helper invocation is safe
16:59karolherbst: which in the general case isn't
17:00karolherbst: what the vk test is doing is to load a value from memory and the iterator variable is checked against that value each iteration
17:00karolherbst: so I am a little confused how the hw decides to skip loads is actually the way to go here...
17:03HdkR: Some "magic" in there I believe :P
17:24karolherbst: HdkR: it's weird.. soo.. with GL I see that all threads do loads from const buffers
17:24karolherbst: but if a quad only has helpers it's stopped alltogether, which makes sense
17:27HdkR: karolherbst: Right, you're likely hitting the issue where a partial quad is enabled
17:28karolherbst: sooo
17:29karolherbst: I see loads from ubos happening, but not loads from ssbos
17:31karolherbst: HdkR: https://gist.githubusercontent.com/karolherbst/82ad045c3d06b4fe0f1f6e50d48a8215/raw/3b00d0963e548aef4ebeef4dbee02a237cf82e77/gistfile1.txt
17:32karolherbst: given that, it would explain why that vulkan test runs into an infinite loop, because the global load is garbage
17:37karolherbst: HdkR: ahh yeah.. I think our issue is, that we use global loads for vulkan ubos :(
17:39HdkR: Vulkan is also different because it supports both demote and kill at this point
17:40karolherbst: "fun"
17:40karolherbst: but the test really isn't doing any of that
17:40karolherbst: it just draws two triangles
17:44HdkR: I don't quite recall. When I was investigating it I mostly came up that behaviour didn't really change between Maxwell->Volta so I didn't think too hard about it :P
17:44karolherbst: right...
17:49karolherbst: HdkR: is there maybe some weirdo caching magic happening?
17:51HdkR: I don't think so. Helper invocations are very tightly coupled to the executing quads in the active thread mask. No way some sort of caching should affect that
17:52karolherbst: ohh.. I was more thinking that helper invoces rely on the active thread to actually do the load or something weird like this
17:54HdkR: Nothing like that, would be a bit too weird to break Nvidia's execution model
17:54karolherbst: yeah...
17:54karolherbst: but you see the piglit file and I see that the ballot on the ssbo load is only correct in the active thread :(
17:54karolherbst: maybe it's different with nvidia...
18:41karolherbst: HdkR: heh soo.. it's behvaing differently on nouveau vs nvidia :(
18:41HdkR: :)
19:09karolherbst: HdkR: could it also be something very dumb like LD vs LDG?
19:10HdkR: Don't think so
19:10karolherbst: because atm it looks like there is nothing in the header to change that and nothing in the 3d class
19:16TimurTabi: karolherbst: I think I've been able to "solve" my missing DCB table problem by requesting only systems that have a "GeForce" GPU in them. I'm not 100% convinced that all the systems in the farm are labeled correctly, but it does seem to be working so far.
19:16karolherbst: TimurTabi: okay.. still annoying though, so it means there are GPUs without that table and we have to deal with it
19:17TimurTabi: Those should only be Tesla SKUs, I think.
19:19karolherbst: most likely
19:25karolherbst: it's not LDG either...
19:39TimurTabi: LDG?
19:41karolherbst: LD but only for global memory...
19:41karolherbst: LD can load from all memory locations if it hits a window
19:45TimurTabi: What's "LD"?
19:45karolherbst: load
19:45TimurTabi: I still don't understand.
20:00HdkR: karolherbst: Indeed, its behaviour is quite similar so I wouldn't expect it to be an issue
20:02karolherbst: none of this makes any sense :(
20:09karolherbst: anybody up for reviewing some trivial nv50 regression? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18236
23:20karolherbst: HdkR: I hope it's not something evil like on turing+ only the ldg with uniform reg encoding loads in helper invocs and the others don't?...
23:24HdkR: Well that's entirely different, uniform will load as long as you have any active threads. If you have no active threads at all then that's just a dead program at that point :P
23:26karolherbst: sure...
23:26karolherbst: but I meant LDG [UR4] vs LDG [R4]
23:28HdkR: Indeed, but with LDG to uniform, you're going to have at least one active thread in the warp. So it'll load no matter what
23:28HdkR: Since only 1 load will occur for the whole warp there
23:28karolherbst: ohh.. I meant it as the source, not dest
23:29karolherbst: anyway.. quads with only dead threads aren't doing anything, that's fine
23:29karolherbst: the issue we have is, that LDs/LDGs in helper invocs in quads with active threads don't do anything
23:29karolherbst: uhm.. "active pixels" I should say
23:30karolherbst: HdkR: like this shader_test file: https://gist.githubusercontent.com/karolherbst/033939feeca07c05fbb1fce940a5bfb2/raw/df8cfdbad643a39f8bb6286fa30e4a862560fbe1/gistfile1.txt
23:30karolherbst: on nvidia we get expected values
23:30karolherbst: with nouveau that second test returns 1
23:31karolherbst: it's strange
23:31karolherbst: I already suspect this to be something super stupid.. like "D3D vs GL mode"
23:32HdkR: You might be missing a bit in the SPH for how to consider helper invocations, but that would likely require dumping that from the blob and doing a bitcompare
23:32HdkR: Since the full state isn't in the public documentation there I don't believe
23:33karolherbst: yeah.. maybe
23:33karolherbst: but we looked at the blobs headers :)
23:35HdkR: Surely a comparison of what the blob does versus nouveau will showcase some differences and then you can just play wack-a-mole until you find one that works?
23:36karolherbst: I already sand an email to andy though :D
23:36karolherbst: but also asked for updated shader header docs
23:36karolherbst: guess I'll flip random header bits until something changes...
23:37HdkR: That's quite a lot of bits to flip through
23:37karolherbst: nah
23:37karolherbst: most of them are known
23:37karolherbst: 90% is like input/output config
23:43karolherbst: anyway.. most bits are actually documented :(
23:43karolherbst: and there are like.. 4 bits left excluding the new header fields of turing+