00:02fdobridge: <leopard1907> Alan Wake 2 time 🐸
00:04fdobridge: <Sid> sadly it does not get Miles Morales' main menu rendering
00:04fdobridge: <Sid> still infinitely loading after the intro videos
00:05fdobridge: <redsheep> I don't have that game, also mesh shaders aren't merged yet
00:08fdobridge: <Sid> I do have it, but yeah
00:08fdobridge: <Sid> mesh shaders
00:12fdobridge: <leopard1907> Ah, rip
00:40fdobridge: <redsheep> Hmm talos 2 just tells me I don't have dx12 support.
00:42fdobridge: <redsheep> Same goes for death stranding. Maybe these are 12_1 games
00:49fdobridge: <redsheep> Hey dying light 2 works though, that one is a bit of a surprise.
00:50fdobridge: <redsheep> Deathloop also just says it can't do a dx12 device
00:59fdobridge: <leopard1907> Dying Light 2 is DX11 iirc
01:00fdobridge: <redsheep> It has two renderers. I changed the setting and confirmed it was using dx12 via mangohud
01:01fdobridge: <redsheep> I would be testing these more rapidly but dx12 games take a million years to download 😛
01:05fdobridge: <redsheep> On the face of it though it seems generally either dx12 games now work, or are probably blocked on features again which is a much better place to be
01:07fdobridge: <leopard1907> Does RT work on NVK?
01:07fdobridge: <redsheep> No
01:08fdobridge: <leopard1907> 👍
01:26fdobridge: <esdrastarsis> theres some work on bvh iirc
01:30fdobridge: <redsheep> Horizon zero dawn almost works, aside from something where it appears to be failing to clear the previous frame when rendering foliage which is kind of nauseating to look at in motion
01:37fdobridge: <redsheep> Did miles morales kick out anything in dmesg?
01:41fdobridge: <redsheep> Cyberpunk 2077 fails to render part of the menus and loading the benchmark dies partway through and kicks out this in dmesg:
01:41fdobridge: <redsheep> ```[11376.476818] nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:184 type:13 scope:1 part:233
01:41fdobridge: <redsheep> [11376.476828] nouveau 0000:01:00.0: fifo:c00000:0017:00b8:[GameThread[104763]] errored - disabling channel
01:41fdobridge: <redsheep> [11376.476834] nouveau 0000:01:00.0: GameThread[104763]: channel 184 killed!
01:41fdobridge: <redsheep> ```
01:44fdobridge: <Sid> didn't check
01:44fdobridge: <Sid> having brekkie, will check after
02:10fdobridge: <redsheep> Helldivers 2 is another fail on dx12 device creation
02:17fdobridge: <redsheep> Hmm. Spiderman remastered seems like another case where it is failing on dx12 features, but it's doing do quietly and the logs aren't entirely clear.
02:18fdobridge: <Sid> probably what I'm facing with Miles Morales too
02:18fdobridge: <Sid> and no, nothing in the dmesg
02:26fdobridge: <redsheep> Ok last dx12 game I could think of easily from my steam library, lego builder's journey is working just fine
02:27fdobridge: <redsheep> Something like half of dx12 games seem to work now, and I suspect once we can have 12_1 that will be more like 80-90%
03:01fdobridge: <mohamexiety> Oh.
03:01fdobridge: <mohamexiety> It built before I pushed but there was a meson buildfile merge conflict and I think that went wrong so I'll recheck in a bit. Thanks!
06:47fdobridge: <ahuillet> I've seen that too, but did not take time to understand why. Any idea?
06:48fdobridge: <ahuillet> missing a feature level?
06:59fdobridge: <redsheep> I can't be sure where exactly the games are stuck without some more tinkering, but in any case we're still missing two features for 12_1
06:59fdobridge: <redsheep> It would make sense to me that all of these games want that feature level
07:01fdobridge: <redsheep> Does any hardware even exist that advertises 12_0 but not 12_1?
07:20fdobridge: <!DodoNVK (she) 🇱🇹> Literally every RADV-supported GPU before 🔺 did the ACO-specific PSI magic
07:23fdobridge: <redsheep> Yeah I was just looking at how lots of gcn hardware is 12_0, for instance the rx470 which is listed as the minimum spec for Talos 2. Clearly this isn't so simple, but either way adding the 12_1 features might fix it, and all the hardware NVK works on is capable of it
07:25fdobridge: <redsheep> Conservative rasterization looks close, but there's no MR around for VK_EXT_fragment_shader_interlock
07:29fdobridge: <redsheep> Oh also @asdqueerfromeu I think shaderSharedInt64Atomics can be checked on https://gitlab.freedesktop.org/mesa/mesa/-/issues/9479
07:30fdobridge: <!DodoNVK (she) 🇱🇹> Have you checked the source code? If not then you should check it and see the TODO for that feature
07:46fdobridge: <redsheep> So the issue for it was just closed by mistake?
07:47fdobridge: <redsheep> I did find what you were referring to, that todo doesn't say anything more
07:50fdobridge: <!DodoNVK (she) 🇱🇹> https://gitlab.freedesktop.org/mesa/mesa/-/commit/9b60a1c00e938bfeb4e3e2419960fa1c9e00c77a
07:50fdobridge: <!DodoNVK (she) 🇱🇹> `Shared atomics don't seem to work, though, for some reason.`
07:58fdobridge: <!DodoNVK (she) 🇱🇹> Is there anything before these lines?
07:58fdobridge: <!DodoNVK (she) 🇱🇹> 13 is `ROBUST_CHANNEL_GR_EXCEPTION`
07:59fdobridge: <redsheep> Ah, missed that the issue was not specific to shared.
08:00fdobridge: <redsheep> Nothing before that in dmesg, no
08:09fdobridge: <!DodoNVK (she) 🇱🇹> Can you get Proton logs for this?
08:14fdobridge: <redsheep> It worked to add VKD3D_CONFIG=no_upload_hvv but if that's not enough info I can get logs
08:15fdobridge: <!DodoNVK (she) 🇱🇹> Can you disable that option and get a log?
08:15fdobridge: <redsheep> Sure
08:24fdobridge: <redsheep> To be entirely clear, in order to get Control into this state I have to add -dx12 to the launch options, it does not default to trying to use vkd3d-proton for whatever reason
08:24fdobridge: <redsheep> https://cdn.discordapp.com/attachments/1034184951790305330/1231883389338914856/steam-870780.log?ex=663893cf&is=66261ecf&hm=27f6946f7282bf10195557631db4f130f535ba40608c3ea96c1c67687b4df83c&
08:25fdobridge: <redsheep> I think control just doesn't use the dx12 renderer at all unless you either force it or turn on raytracing features
08:36fdobridge: <!DodoNVK (she) 🇱🇹> PCGW mentions it being selectable in the launcher too
08:44fdobridge: <redsheep> Mine doesn't display a launcher but now that you mention it i used to see that option, yeab
08:50fdobridge: <georgeouzou> Turing seems to spit out ~200 lines of shader assembly for fragment shader interlock.
08:51fdobridge: <georgeouzou> Turing proprietary driver seems to spit out ~200 lines of shader assembly for fragment shader interlock. (edited)
08:52fdobridge: <ahuillet> do you know what gen fragment shader interlock appeared in? Turing?
08:52fdobridge: <marysaka> Maxwell B
08:52fdobridge: <marysaka> If I remember correctly
08:52fdobridge: <ahuillet> maybe @notthatclippy can help decode that further, otherwise I'll take a look. I assume "13" matches our Xid numbers
08:53fdobridge: <ahuillet> thanks
08:59fdobridge: <!DodoNVK (she) 🇱🇹> That's not an Xid but `ROBUST_CHANNEL_GR_EXCEPTION` according to OGK
09:06fdobridge: <mtijanic> You can get the error info as protobuf out of GSP theoretically. This has a bit of the structure: <https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/nvidia/generated/g_rc_pb.h>
09:08fdobridge: <ahuillet> which matches Xid 13
09:08fdobridge: <gfxstrand> Improve your performance with this one neat trick!
09:10fdobridge: <gfxstrand> That one shouldn't be hard if you want to take a crack at it. We need to lower to a cmpxchg loop.
09:11fdobridge: <gfxstrand> We may even have NIR lowering for it already. I'm not sure.
09:13fdobridge: <gfxstrand> Doesn't look like it
09:14fdobridge: <gfxstrand> Should be a good first NIR pass for someone
09:43fdobridge: <!DodoNVK (she) 🇱🇹> I'm surprised the game is able to consume all 24 GB of VRAM (because the VRAM limit got reached vkd3d-proton tries to allocate memory without the DEVICE_LOCAL bit but that also fails)
09:43fdobridge: <mohamexiety> fixed @redsheep. thanks again for the headsup!
09:43fdobridge: <mohamexiety> also @gfxstrand: added the format_mod_linear patch too
09:48fdobridge: <redsheep> Something odd is going on for sure, control definitely does not need that much vram
09:50fdobridge: <redsheep> I'll give that some testing tomorrow, I'd like to see how well a zink session works
09:51fdobridge: <!DodoNVK (she) 🇱🇹> That's where https://gitlab.freedesktop.org/drm/nouveau/-/issues/336 could be really useful
09:51fdobridge: <redsheep> Oh, also gamescope probably would be working with that too
09:52fdobridge: <mohamexiety> I am not sure what's remaining tbh. I know the OGL side of things has some stuff left but not sure if there's something major left more to do on the Vk side
09:52fdobridge: <mohamexiety> also for game testing, one dx12 game you could try testing is forza horizon 5 since I heard the sparse usage there is a bit rough on drivers
09:52fdobridge: <mohamexiety> (if you have it ofc)
09:53fdobridge: <redsheep> If you're referring to modifiers shouldn't I just get all that taken care of by using zink?
09:53fdobridge: <redsheep> I do have Forza horizon 5, that's a good idea
09:54fdobridge: <redsheep> I breezed past it when I was installing games like crazy because it's so huge
09:54fdobridge: <dadschoorse> I think someone wrote something like this for ir3, but it's not merged yet
09:54fdobridge: <mohamexiety> yeah, meant what's missing for modifiers in general
09:55fdobridge: <mohamexiety> yeah...
09:55fdobridge: <mohamexiety> iirc it was like 150 GB or something
10:00fdobridge: <redsheep> Yeah I filled my 1 TB Linux ssd right to the top with dx12 titles during testing, then removed some and kept going. Kinda upset my roommates by downloading steam games nonstop for like 5 straight hours lol
10:06fdobridge: <!DodoNVK (she) 🇱🇹> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28851
10:08fdobridge: <dadschoorse> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27776
10:19fdobridge: <gfxstrand> Thanks for digging that up!
10:32fdobridge: <gfxstrand> Ugh... Zink surfaceless doesn't work anymore. 😭
11:39fdobridge: <Sid> yeah we inherited all the Xids with the GSP
11:40fdobridge: <Sid> time to switch proton over to bleeding-edge-debug, and move back to an -rc kernel
11:53fdobridge: <karolherbst🐧🦀> one simple MR to the gallium driver: `342.5 thousand draws/second -> 40.9 million draws/second` :blobcatnotlikethis: It's about multi draws, but still.. games heavily use those
11:55fdobridge: <karolherbst🐧🦀> and the cause was just doing silly things.. I can't wait for a new driver or just using zink by default 😄
11:55fdobridge: <!DodoNVK (she) 🇱🇹> zmike-level improvement momento
11:56fdobridge: <karolherbst🐧🦀> yeah.. apparently if you do a full state validation inside your `draw_vbo` callback and use `util_draw_multi` which calls it for each draw in a multi draw, you get bad performance
11:57fdobridge: <karolherbst🐧🦀> (and the driver is inherently thread unsafe, so the solution to make it thread-safe also makes the situation worse)
12:01fdobridge: <gfxstrand> Or maybe desktop GL can't do surfaceless?
12:01fdobridge: <karolherbst🐧🦀> it should in theory
12:02fdobridge: <karolherbst🐧🦀> via gbm
12:02fdobridge: <karolherbst🐧🦀> I think you need to convince the CTS to use that or something...
12:03fdobridge: <gfxstrand> In theory, I've turned all that on
12:03fdobridge: <Sid> faith, did you ever get a chance to look at msaa 16 samples again?
12:03fdobridge: <karolherbst🐧🦀> @gfxstrand I always used to make the windows hidden, maybe that's required?
12:04fdobridge: <karolherbst🐧🦀> `--deqp-visibility=hidden`
12:04fdobridge: <karolherbst🐧🦀> maybe also need to change the surface type
12:05fdobridge: <karolherbst🐧🦀> make sure `GBM_LIBRARIES` is set in cmake anyway
12:05fdobridge: <karolherbst🐧🦀> or maybe it is GLES only.... I honestly have no idea tbh
12:12fdobridge: <!DodoNVK (she) 🇱🇹> I don't think it's really important right now (RADV only has limited support for it which doesn't even work on my Vega GPU)
12:18fdobridge: <Sid> sparse16 is required for FL12_0 though
12:18fdobridge: <Sid> surely radv has something in place for that
12:20fdobridge: <!DodoNVK (she) 🇱🇹> Check radv_physical_device.c and you'll be surprised
12:22fdobridge: <Sid> can you just tell me what it is instead of derailing what I'm doing right now just for this
12:22fdobridge: <Sid> thanks
12:25fdobridge: <gfxstrand> I keep poking at it but it blows up and I don't know why
12:26fdobridge: <Sid> maybe the nvidia guys can help us out here?
12:40fdobridge: <!DodoNVK (she) 🇱🇹> None of the samples stuff is enabled
12:40fdobridge: <!DodoNVK (she) 🇱🇹> For sparse
12:40fdobridge: <Sid> huh
12:41fdobridge: <karolherbst🐧🦀> msaa x16 is just cursed
12:41fdobridge: <dadschoorse> radv also doesn't support 16x msaa
12:41fdobridge: <dadschoorse> the hw doesn't really either. at least images don't support 16 samples, the rasterizer kind of does I think
12:41fdobridge: <karolherbst🐧🦀> yeah.... fake it or just ignore it
12:42fdobridge: <dadschoorse> I think only intel has real 16x msaa
12:42HdkR: Faking it is what gives Quadros 64x msaa :D
12:43fdobridge: <mohamexiety> yeeep
12:43fdobridge: <karolherbst🐧🦀> well.. there is a MSAA16 PTE thing on nvidia
12:43fdobridge: <Sid> I'm just confused as to why dxvk isn't picking FL12 on NVK then
12:43fdobridge: <Sid> picking/detecting
12:43fdobridge: <dadschoorse> fragment shader interlock?
12:43fdobridge: <dadschoorse> conservative raster?
12:43fdobridge: <Sid> ah
12:44fdobridge: <Sid> makes sense
12:44fdobridge: <Sid> those weren't listed on the tracker issue, so I forgot about them :p
12:44fdobridge: <Sid> https://gitlab.freedesktop.org/mesa/mesa/-/issues/9478#fl12_0-feature-level-12
12:49fdobridge: <gfxstrand> I just tried this and I'm seeing piles of errors, too. I'm also seeing lots of
12:49fdobridge: <gfxstrand> ```
12:49fdobridge: <gfxstrand> ERROR - dEQP error: MESA: error: ../src/nouveau/vulkan/nvk_device_memory.c:203: VK_ERROR_OUT_OF_DEVICE_MEMORY
12:49fdobridge: <gfxstrand> ERROR - dEQP error: MESA: error: zink: couldn't allocate memory: heap=0 size=1073741824
12:49fdobridge: <gfxstrand> ```
12:49fdobridge: <Sid> oh speaking of out of device memory
12:49fdobridge: <zmike.> hm I wasn't seeing that
12:50fdobridge: <zmike.> but I'm still running without rebar
12:50fdobridge: <Sid> for some reason vkd3d-proton with rebar has games pop up an out of video memory error as well
12:50fdobridge: <Sid> gfxrecon tells me there's a few VK_INCOMPLETEs on vkGetPhysicalDeviceSurfacePresentModesKHR
12:54fdobridge: <gfxstrand> I'm seeing that with rebar disabled (ish)
12:56fdobridge: <mtijanic> Ugh. With 535 gsp.bin you really don't get a whole lot of useful info here. We could do another roundtrip to GSP to get more info. Does anyone know what nouveau prints for pre-GSP? And/or what additional info would be useful here?
12:56fdobridge: <karolherbst🐧🦀> so updating to a newer GSP version would allow us to get more debugging info, better error messages, whatever?
12:57fdobridge: <zmike.> dunno then
13:02fdobridge: <mtijanic> Well, the info is kept on GSP in a protobuf entry. With 550 that entry is sent along with the RC event in the `rpc_rc_triggered_v17_02`. With 535 you can still get the same data by asking the GSP for a dump of the protobuf journal and then finding your entry. It's potentially a lot of extra traffic, an extra roundtrip and some more complexity on the kernel, but the data is available.
13:03fdobridge: <mtijanic> 555 will put some more data there outside of protobuf, so maybe enough to not need all that crap.
13:04fdobridge: <mtijanic> We could theoretically add more info, in protobuf or directly into the structure, but the lead time on that change making it into a .bin file and then nouveau updating to use that bin file is at least 6 months; better make do with what we have for now (but I can note down any requests)
13:04fdobridge: <karolherbst🐧🦀> maybe we could just make those things available to userspace and deal with it with some tooling
13:05fdobridge: <mtijanic> That's how we do it. Only protobuf encoder exists in kernel/GSP, and is given to userspace as a blob to decode.
13:07fdobridge: <mtijanic> In the short term, we could hack up a minimal decoder to just get the desired fields in the kernel and print them out.
13:08fdobridge: <mtijanic> Not a great solution, but if it accelerates userspace development.. ¯\_(ツ)_/¯
13:14fdobridge: <karolherbst🐧🦀> yeah... no point in doing it in the kernel if userspace might as well
13:20fdobridge: <mtijanic> The problem then is that you need some debug collection utility. It's no big deal for eg Faith when developing - try a thing, it goes boom, run a diagnostics tool to see why.. But if end users see an issue with their game, they can send you dmesg output but it would take a fair bit of back and forth to get pry all these diag things from them.
13:21fdobridge: <karolherbst🐧🦀> yeah...
13:21fdobridge: <mtijanic> We use `nvidia-bug-report.sh` that users need to run to file bugs with us, and it collects all this stuff, but that's not idiomatic for upstream drivers. Dunno. Not a fun decision to make.
13:22fdobridge: <karolherbst🐧🦀> maybe the buffer could be attached to the ioctl and mesa deals with it?
13:22fdobridge: <karolherbst🐧🦀> like if you submit and the kernel goes "there was an error, here look at it:..." it might be good enough
13:22fdobridge: <mtijanic> But there's no ioctl active at the time you get the fault
13:22fdobridge: <karolherbst🐧🦀> could be retrieved on next submit
13:22fdobridge: <karolherbst🐧🦀> or something
13:22fdobridge: <karolherbst🐧🦀> dunno...
13:23fdobridge: <mtijanic> Does mesa know if a runlist failed?
13:24fdobridge: <ahuillet> also, to be honest, us UMD engineers don't always decode the RM blob that's in there :)
13:24fdobridge: <karolherbst🐧🦀> it can
13:24fdobridge: <karolherbst🐧🦀> well
13:24fdobridge: <dadschoorse> actually, those are optional feature in the 11_x feature levels
13:24fdobridge: <karolherbst🐧🦀> not with the new ioctl.. it still kinda can though
13:24fdobridge: <karolherbst🐧🦀> mesa can submit and immediately wait on the syncobj
13:27fdobridge: <mtijanic> Could mesa periodically poll an ioctl to get error info (if any)? Or mmap a buffer where the kernel will write the protobuf and check there, to avoid syscall overhead.
13:29fdobridge: <karolherbst🐧🦀> yeah, it could
13:29fdobridge: <karolherbst🐧🦀> like if the context gets killed, the ioctls will fail
13:29fdobridge: <Joshie with Max-Q Design> We specifically avoided hang/error query sfuff in radeonsi and radv
13:29fdobridge: <Joshie with Max-Q Design> The only thing we ever check is if cs submit fails
13:29fdobridge: <karolherbst🐧🦀> we could either have a new ioctl to fetch those errors, or something
13:30fdobridge: <Joshie with Max-Q Design> We have a set of retvals for who is guilty/whatnot on the submit ioctl
13:30fdobridge: <mtijanic> Yeah, I don't think you want to poll an ioctl in general. At least not at any kind of frequency.
13:30fdobridge: <Joshie with Max-Q Design> Just don't do what AMDGPU used to do and not say anything if soft recovery happened and pretend stuff was fine
13:30fdobridge: <mtijanic> Extra syscall per frame is likely noticably painful. If you do it ever X (seconds, frames), then you might get stutter
13:31fdobridge: <Joshie with Max-Q Design> I actually can't believe Daenzer and Christian were advocating for that
13:31fdobridge: <pixelcluster> don’t even start, or else people are going to complain and say they want that behavior
13:31fdobridge: <Joshie with Max-Q Design> "my gl app is fine if we do that"
13:32fdobridge: <Joshie with Max-Q Design> "my gl 2.x app is fine if we do that" (edited)
13:32fdobridge: <ahuillet> oh it's not a GL vs. Vk thing necessarily, a sufficiently complex GL app would have the same problem
13:32fdobridge: <pixelcluster> compositors are fine with it
13:32fdobridge: <Joshie with Max-Q Design> No they aren't
13:32fdobridge: <ahuillet> in fact I probably have seen more GL stutter bugs than Vk stutter bugs at this point (sure: selection bias)
13:33fdobridge: <Joshie with Max-Q Design> Gamescope is absolutely not fine with that
13:33fdobridge: <ahuillet> but basically anything that can ever potentially stutter will stutter in some use case that somebody will absolutely be relying on, per the driver engineer's theorem
13:33fdobridge: <pixelcluster> huh
13:33fdobridge: <pixelcluster> what garbage vram will cause gamescope to hang?
13:33fdobridge: <pixelcluster> imagine running anything other than ~~compositors~~ mutter on your gpu :frog_turtle:
13:34fdobridge: <karolherbst🐧🦀> yeah... we already know it in the hot path if the context was nuked. But a new ioctl to fetch errors _if_ you know the context was nuked should be good enough
13:34fdobridge: <ahuillet> (in fact, case in point, the blob had massive stutter bug reports a few years back due to some error logging happening every X seconds)
13:34fdobridge: <karolherbst🐧🦀> "get the error of my nuked context" or something
13:35fdobridge: <pixelcluster> oh amdgpu has that too for pagefaults ye
13:35fdobridge: <mtijanic> Yes, totally. If you already know something is not right, you can take the slow path and get the data, decode and display error messages inline. That totally works.
13:35fdobridge: <ahuillet> anyhow, why is this supposed to be UMD? can't the kernel get more info from GSP at the time where it prints [11376.476818] nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:184 type:13 scope:1 part:233 ?
13:36fdobridge: <karolherbst🐧🦀> the kernel is already complex as it is, don't really want code there you can do in userspace
13:36fdobridge: <ahuillet> the blob dumps a bit more than this to dmesg actually and it's often pretty good
13:36fdobridge: <karolherbst🐧🦀> but if it's trivial to get more info, sure
13:36fdobridge: <karolherbst🐧🦀> why not
13:36fdobridge: <mtijanic> Yes, but the logic to decode them is, not kernel-friendly.
13:36fdobridge: <karolherbst🐧🦀> I'd just not decode arbitrary buffers in the kenrel
13:37fdobridge: <karolherbst🐧🦀> especially if it comes from untrusted code 😛
13:37fdobridge: <mtijanic> Also, I'm not sure if - yeah, I was just gonna say that.
13:37fdobridge: <ahuillet> @notthatclippy when we report what method crashed in what channel belonging to what PID, is that decoding not kernel-friendly? I thought it would just be peeking a couple registers?
13:37fdobridge: <karolherbst🐧🦀> parsing the vbios is already bad enough
13:37fdobridge: <karolherbst🐧🦀> (and I'm sure there are enough buffer overflows possible with crafted vbios)
13:38fdobridge: <ahuillet> evil maid replaces your VBIOS
13:38fdobridge: <mtijanic> Bwahaha. I think that's not quite reasonable of a threat model. But point well taken.
13:38fdobridge: <mtijanic> GSP could technically detect if it was running nouveau and craft invalid protobuf. It's perfectly fair to defend agaisnt that.
13:39fdobridge: <ahuillet> let's maybe not leak our competitive perf tricks
13:39fdobridge: <ahuillet> (or bitcoin wallet addresses)
13:39fdobridge: <mtijanic> `kekw.gif`
13:40fdobridge: <mtijanic> nvidia.ko doesn't have to worry about that since if we were gonna do nasty things to the kernel, we'd just do them in the kernel. But even then, we don't actually _decode_ these things.
13:42fdobridge: <karolherbst🐧🦀> (or state actors planting code, nvidia isn't even aware of)
13:42fdobridge: <mtijanic> Anyway, I think the approach above sounds perfectly sane - mesa detects a nuked context however it does that, makes an ioctl to get raw protobuf data, that ioctl goes to GSP to collect the data, sends it back. Mesa decodes it and prints the info
13:42fdobridge: <karolherbst🐧🦀> aware of as in: doesn't know it's a back door
13:42fdobridge: <karolherbst🐧🦀> but sure, the decoder should also be bug free, but I think it's also bad to need to update your kernel to get a better decoder
14:21fdobridge: <gfxstrand> Looks like running with `--deqp-surface-width=256 --deqp-surface-height=256` fixes the OOMs
14:25fdobridge: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1231974192879239178/message.txt?ex=6627c4e0&is=66267360&hm=197ca3fee89579f906aeb5ebf653e6548f0eedc1acd527478e808ec1f8db1aff&
14:29fdobridge: <!DodoNVK (she) 🇱🇹> ```445.082:0138:02f8:warn:vkd3d-proton:vkd3d_allocate_device_memory: Memory allocation failed, falling back to system memory.
14:29fdobridge: <!DodoNVK (she) 🇱🇹> 445.082:0138:02f8:err:vkd3d-proton:vkd3d_allocate_device_memory: Failed to allocate device memory (size 16777216, type_flags 0x7, type_mask 0x6).```
14:52fdobridge: <pavlo_kozlenko> it test witch zink?
15:00fdobridge: <zmike.> Ah, my runner script does use that, which explains why I don't see it
15:03fdobridge: <zmike.> CI also uses that size
15:06fdobridge: <gfxstrand> @zmike. I'm a bit confused as to why none of Zink's `QueueBindSparse()` calls have a wait semaphore.
15:06fdobridge: <gfxstrand> Shouldn't we be waiting on whatever is currently in the batch?
15:06fdobridge: <zmike.> Uh
15:07fdobridge: <zmike.> I think the wait is supposed to be implicit based on batching
15:13fdobridge: <gfxstrand> Yeah, binds don't implicitly wait on submits
15:13fdobridge: <gfxstrand> Or vice versa
15:14fdobridge: <gfxstrand> Bind and submit are basically different queues that happen to be part of the same `VkQueue`. It's a bad API design.
15:15fdobridge: <zmike.> No I meant that the existing ordering of sparse binds with queue waits should enforce completion
15:16fdobridge: <zmike.> iirc
15:16fdobridge: <gfxstrand> What kinds of queue waits? Waiting on fences?
15:16fdobridge: <zmike.> Yes
15:16fdobridge: <gfxstrand> Hm... Okay...
15:16fdobridge: <zmike.> Or semaphores
15:17fdobridge: <gfxstrand> So you're waiting in Zink on the CPU before submitting any binds?
15:18fdobridge: <zmike.> No, iirc sparse binding triggers semaphores on cmdbuf queue submit, and rebinds/unbinds trigger more semaphores
15:18fdobridge: <zmike.> Something like that
15:18fdobridge: <zmike.> I'd have to go back and look since it's been a while
15:18fdobridge: <zmike.> It's all semaphores
15:19fdobridge: <zmike.> Just maybe not where/how you're expecting them
15:19fdobridge: <gfxstrand> kk
15:19fdobridge: <gfxstrand> Yeah, I'm seeing lots of WaitSemaphore
15:25fdobridge: <gfxstrand> The good news is that the results are stable. I did a 2nd run and got exactly the same set of crash/flake
15:50fdobridge: <zmike.> that is good news
16:37fdobridge: <!DodoNVK (she) 🇱🇹> There's also this MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28850
16:44fdobridge: <gfxstrand> I'm starting to think that this may be a really good testcase for @airlied to look into RE random MMU faults.
16:46fdobridge: <gfxstrand> It's looking very much like a pressure thing
17:10fdobridge: <Sid> @gfxstrand `It interacts with descriptor buffer. It doesn't require it.` That's confusing. How can it not require it if it interacts with descriptor buffer? /genuine question
17:11fdobridge: <Sid> also sorry for the noise :p was going through the issues to look for something to do
17:11fdobridge: <gfxstrand> No worries. You reminded me to close a stale issue. 😅
17:13fdobridge: <gfxstrand> Extensions can interact with each other without a strict dependency. In this case, it added a bunch of 2 versions of descriptor binding commands. *If* you implement descriptor buffer, it has a 2 version of the descriptor buffer binding command. If you don't you can just implement a no-op version.
17:13fdobridge: <Sid> oh, fancy
17:13fdobridge: <Sid> thanks
17:26fdobridge: <zmike.> You mean the sparse thing?
17:43fdobridge: <gfxstrand> Yeah
18:45fdobridge: <zmike.> nice
18:52fdobridge: <airlied> @notthatclippy is the protobuf contents documented? or do we a decoder ring
19:30fdobridge: <airlied> I suspect we'd end up decoding some of that in kernel if we can
19:47fdobridge: <mtijanic> Documented is a stretch. But all the fields are named at least. Enum values might be missing.
19:58fdobridge: <airlied> I think getting fault info is probably more urgent anyways
21:28fdobridge: <airlied> so tinygrad just merged direct to nvidia ioctl programming, if anyone wants to know how to get nvk on nvidia going
21:54fdobridge: <marysaka> From a quick look they seems to hardcode Ada :aki_thonk:
22:08fdobridge: <airlied> yeah they only care about one GPU in their configurations so far, but the work submit and channel setup was more sample code we could use for nvk if someone wanted
22:15fdobridge: <karolherbst🐧🦀> though the issue isn't how to do it, it's mostly just a silly catch up game
22:16fdobridge: <karolherbst🐧🦀> and I also think we don't want it out of project reasons
22:20fdobridge: <redsheep> It would be useful for testing. It could just be left as an unsupported configuration, right? Don't accept bug reports for it has such
22:20fdobridge: <redsheep> *and such
22:22fdobridge: <karolherbst🐧🦀> sure, but nobody showed up willing to do as much afaik
22:22fdobridge: <karolherbst🐧🦀> and I honestly don't think it's that useful for testing
22:22fdobridge: <karolherbst🐧🦀> like what would you test with it?
22:23fdobridge: <karolherbst🐧🦀> and also.. some distributions will just enable that by default anyway
22:23fdobridge: <karolherbst🐧🦀> (and then we have to deal with the bug reports)
22:24fdobridge: <redsheep> Well, if it kicks more useful errors that would be nice. It would also help with isolating issues, and with testing things the nouveau kmd doesn't support yet
22:24fdobridge: <karolherbst🐧🦀> well.. it doesn't
22:25fdobridge: <karolherbst🐧🦀> and the latter isn't really all that relevant
22:25fdobridge: <karolherbst🐧🦀> as most of that is used in closed source userspace of nvidia
22:25fdobridge: <karolherbst🐧🦀> I mean.. people can up with 1000 _theoretical_ arguments on why it's useful
22:25fdobridge: <karolherbst🐧🦀> but I'm more asking about actual practical and real ones
22:27fdobridge: <karolherbst🐧🦀> but most of the GPU programming is done in userspace anyway and the kernel side really doesn't do all that much
22:28fdobridge: <redsheep> I likely won't be the one to maintain it or wade through the bug reports, so I'm not at all saying you're wrong not to want it
22:28fdobridge: <karolherbst🐧🦀> one reason was performance, but that's kinda solved with GSP
22:28fdobridge: <karolherbst🐧🦀> and most of the perf difference comes from bad userspace
22:28fdobridge: <karolherbst🐧🦀> well.. not bad
22:28fdobridge: <karolherbst🐧🦀> like not optimal
22:28fdobridge: <redsheep> Hmm. The kernel code itself and how you interface with it can impact performance though, right?
22:29fdobridge: <karolherbst🐧🦀> hardly, unless you submit a lot
22:30fdobridge: <karolherbst🐧🦀> but then you'd rather look at GPU idle counters
22:30fdobridge: <karolherbst🐧🦀> and if you are actually CPU bottlenecked
22:30fdobridge: <karolherbst🐧🦀> which you won't need to run on nvidia's kernel driver to figure that out
22:31fdobridge: <redsheep> Wasn't Faith saying one motivation would be performance counters?
22:31fdobridge: <redsheep> Or, at least could be
22:31fdobridge: <karolherbst🐧🦀> performance counters are mostly done in userspace
22:31fdobridge: <karolherbst🐧🦀> there are GPU idle counters, but those are trivial to set up
22:31fdobridge: <karolherbst🐧🦀> someone just needs to write the code, but I think GSP has the interfaces for it anyway
22:34fdobridge: <redsheep> Hopefully there will be more room for that kind of work before long, it feels like there's probably starting to be a light at the end of the tunnel for the feature work
22:34fdobridge: <redsheep> Modifiers and working vkd3d-proton are really huge steps forward.
23:08fdobridge: <airlied> my main reason for it, would be that nvk keep working if someone install the nvidia kernel driver
23:47fdobridge: <bylaws> It's very useful for isolating kmd issues with turnip on android
23:47fdobridge: <bylaws> It's very useful for isolating kmd issues with turnip on android, though perhaps that's less of an issue with gsp on nv (edited)
23:48fdobridge: <karolherbst🐧🦀> which... is a lot of work because the UAPI isn't stable and distros will ship outdated mesa
23:48fdobridge: <karolherbst🐧🦀> so I doubt it's of any practical use
23:49fdobridge: <karolherbst🐧🦀> and a support nightmare for us if users do rely on it or something. I'd just rather not
23:49fdobridge: <karolherbst🐧🦀> yeah... but you also don't have to keep up with an ever changing UAPI every couple of weeks
23:50fdobridge: <karolherbst🐧🦀> I think the UAPI on android is de-factor stable?
23:50fdobridge: <karolherbst🐧🦀> *de-facto
23:50fdobridge: <bylaws> Kgsl is yeah
23:50fdobridge: <karolherbst🐧🦀> nvidia isn't 🥲
23:50fdobridge: <bylaws> Tegra nv kinda is :)
23:50fdobridge: <karolherbst🐧🦀> well...
23:50fdobridge: <bylaws> But tegra is too special
23:50fdobridge: <karolherbst🐧🦀> yeah
23:52fdobridge: <karolherbst🐧🦀> I mean, I can totally related with the sentiment of being able to do it, I just don't think it's either practical nor beneficial to us (also in regards to pushing nvidia to actually go the upstream route)
23:52fdobridge: <karolherbst🐧🦀> so even strategically it's not a great idea
23:52fdobridge: <karolherbst🐧🦀> and I kinda don't want to have news on "nvk is able to run on the open source nvidia kernel driver" either
23:53fdobridge: <karolherbst🐧🦀> so it's some toy project with 0 practical benefits and tons of downside
23:53fdobridge: <karolherbst🐧🦀> s
23:53fdobridge: <bylaws> Yeah, definitely a different case than tu
23:54fdobridge: <karolherbst🐧🦀> it kinda made sense pre GSP, but at that time I already knew it's going to happen
23:54fdobridge: <bylaws> Where you can't even use the upstream kernel driver in most cases you use kgsl
23:54fdobridge: <karolherbst🐧🦀> soooo...
23:54fdobridge: <airlied> the uapi we use currently hasn't changed in years I don't think
23:54fdobridge: <airlied> it's not like they rev it every week
23:55fdobridge: <airlied> like it's mostly just map userd into userspace and command submit to it, WSI integration might be tricky
23:55fdobridge: <karolherbst🐧🦀> how'd we do syncobjs on it?
23:55fdobridge: <karolherbst🐧🦀> well.. we wouldn't I guess
23:56fdobridge: <karolherbst🐧🦀> but that also means all the fencing/semaphore code would also need to be custom
23:56fdobridge: <airlied> yeah I'm guessing there would be some impedance mismatches 🙂
23:56fdobridge: <karolherbst🐧🦀> might that part of the UAPI is fairly stable, but then you also deal with the compute side of things and nvidia-uvm is kinda a mess
23:57fdobridge: <karolherbst🐧🦀> and the uapi does sometimes change there in regards to VM management
23:57fdobridge: <karolherbst🐧🦀> maybe less so these days
23:57fdobridge: <karolherbst🐧🦀> but the point is, there is no guarantee and it does change and it does take a while until it reaches users on debian/ubunutu
23:57fdobridge: <karolherbst🐧🦀> so I doubt it's of any use unless you run like... arch or so
23:57fdobridge: <airlied> yeah uvm is a bit more painful alright
23:58fdobridge: <karolherbst🐧🦀> or compile mesa from main
23:58fdobridge: <karolherbst🐧🦀> (or flatpaks even)
23:58fdobridge: <karolherbst🐧🦀> well.. flatpaks would be a nightmare then
23:59fdobridge: <karolherbst🐧🦀> at which point given all those issues, I don't think we are doing us a big favor here in having that code, especially for a "nvk keeps working if you move kernel drivers" use cases