16:43 gfxstrand[d]: How do I tell Proton to pretend I'm an AMD GPU so the game doesn't try to use NVAPI that doesn't work?
16:43 gfxstrand[d]: `PROTON_DISABLE_NVAPI=1` just makes the game crash.
16:45 pixelcluster[d]: try `PROTON_HIDE_NVIDIA_GPU=1`?
16:48 gfxstrand[d]: Thanks
16:49 gfxstrand[d]: Trying to figure out how to get *Dragon Age: The Veilguard* to launch.
16:50 gfxstrand[d]: It seems to just sit and spin forever
16:50 gfxstrand[d]: Oh, shit. My GPU Fell off the bus. Should have checked dmesg
16:51 gfxstrand[d]: I should probably be trying this on the desktop where things are a tad more robust
17:01 gfxstrand[d]: Okay, now that I have my GPU back and Nvidia shit disabled I'm getting illegal instruction encoding errors. That's gonna be annoying to track down...
17:03 gfxstrand[d]: I mean, once I track down what shader is failing it's pretty easy to fix them usually but finding that needle in the haystack is the hard part. 😭
17:04 gfxstrand[d]: At least I have enough that I can file a bug now
17:09 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12183
17:11 gfxstrand[d]: I really wish I had good tooling for narrowing that down within a game. It's not too bad in a CTS test but in a full game it's a PITA. As long as it renders *something* and doesn't crash, you can get a RenderDoc trace and go from there. If it crashes it's way more annoying.
17:12 gfxstrand[d]: Maybe I can hack up NVK to log a bunch of stuff like shaders uploaded and pushbufs. That might let me narrow it down.
17:12 gfxstrand[d]: I'll need to dump a mess of stuff, though.
17:13 pixelcluster[d]: do you not get a program counter or anything?
17:13 gfxstrand[d]: It's a GPU crash. I get very little
17:13 gfxstrand[d]: If there is a way to get the PC out of the GPU, we don't know what it is.
17:13 pixelcluster[d]: oh that's sad
17:13 gfxstrand[d]: Not as nice as AMD, I'm afraid
17:24 notthatclippy[d]: Did you try Ben's new nouveau tree and the latest gsp.bin? Should give you more info.
17:24 gfxstrand[d]: I think I can come up with something, though. I just need to log a pile of stuff to files somewhere. Every shader upload, every pushbuf. Maybe even split pushbufs harder so I can figure out exactly which one hung.
17:25 gfxstrand[d]: notthatclippy[d]: No, I haven't. Where do I find it all?
17:26 notthatclippy[d]: gfxstrand[d]: https://discord.com/channels/1033216351990456371/1034184951790305330/1307923889040654360
17:27 notthatclippy[d]: I think it has the new log entries already added as well.. lemme check, might need a small patch on top of it.
17:32 notthatclippy[d]: Yep, it's there. Just look for this in dmesg, has more info now: https://gitlab.freedesktop.org/bskeggs/nouveau/-/blob/01.03-r565/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r565/fifo.c?ref_type=heads#L39
17:34 gfxstrand[d]: Yeah, but I'm not getting faults. I'm getting illegal instruction encoding errors
17:34 gfxstrand[d]: What I really need is an 64-bit instruction address to the bad instruction
17:35 gfxstrand[d]: I guess that might be possible if I figured out how to write exception handlers. (I think that's a thing?)
17:37 notthatclippy[d]: Meh. This info does exist and can be fetched from GSP, but nouveau KMD doesn't do it even in the latest tree...
17:38 notthatclippy[d]: It might be an easy patch if we combine it with a "send to NV to decode and get plaintext back" workflow.
17:39 notthatclippy[d]: tl;dr there's an RPC you can send to GSP and get a lot of the diagnostics in binary protobuf format, and then decode in userspace. But we don't publish the full protobuf spec for _reasons_ and this bit is part of the not-published bucket.
17:41 gfxstrand[d]: Yeah, if we could get something out in debugfs or similar, it'd be useful.
17:43 notthatclippy[d]: I'll get back to you on that in about 30 hours. Don't go wasting too much time till then.
17:45 notthatclippy[d]: Alternatively... what you could do is run the NV driver stack but hijack the SASS write and replace with your own. Then, when it goes boom, run `nvidia-bug-report.sh`
17:47 skeggsb9778[d]: gfxstrand[d]: do you get "rc" messages + your channel being killed when it happens?
17:48 skeggsb9778[d]: if so, with r565 if you have nouveau.debug=gsp=debug, you'll (probably) see a whole bunch of POST_NOCAT_RECORD along with the RC_TRIGGERED message from gsp, that appear to have related info in it (though you'll have to look at a hex dump in dmesg...)
17:48 skeggsb9778[d]: maybe it'll be hiding in there somewhere, i still need to look into what that POST_NOCAT_RECORD is about properly - i've just silenced it for now
17:50 gfxstrand[d]: I'm building the 565 kernel now
17:52 gfxstrand[d]: Then I need to figure out how to in-place update my firmware
17:53 skeggsb9778[d]: there's a linux-firmware tree on the same gitlab if that helps
18:01 gfxstrand[d]: Yeah. But I'll probably just copy the files over because I don't want to muck about with building my own Fedora package.
18:54 mhenning[d]: gfxstrand[d]: I have a hack that runs every nak shader binary through nvdisasm and checks if it failed with an error or not, as a check for instruction encoding issues
18:55 mhenning[d]: it gives some false positives because of the graphics instructions that are missing from nvdisasm, but it might be helpful
18:57 mhenning[d]: I've also been thinking about writing unit tests of our instruction encodings that encode an instruction, decode it with nvdisasm, and then check if the output is what we expect
19:08 gfxstrand[d]: Oh, that might help. Got it in a branch?
19:09 mhenning[d]: gfxstrand[d]: https://gitlab.freedesktop.org/mhenning/mesa/-/commit/751fcb42bc841daa98b237c665252aeb30dd9731
19:14 gfxstrand[d]: Sweet, thanks!
19:14 gfxstrand[d]: I'll give it a try later
19:15 gfxstrand[d]: I'm "not working" right now so things are pretty ad-hoc.