00:13fdobridge: <airlied> @gfxstrand you don't see any CTS exits with ResourceError errors due out of device memory?
00:36fdobridge: <gfxstrand> I haven't seen one in a bit I don't think, no.
00:36fdobridge: <gfxstrand> I fixed the massive leak.
00:43fdobridge: <airlied> guess I should track that down first then
00:52fdobridge: <airlied> hmm so it's doing a bunch of 1k allocations and we end up making 64k ones
01:29fdobridge: <gfxstrand> That's possible... I would think we shouldn't align up to 64k unless the size is at least 64k.
01:30fdobridge: <gfxstrand> Everything else should round up to a GPU page.
01:30fdobridge: <gfxstrand> The only reason to go 64k is so we can place big tiled images in it but those need at least 64k of memory.
01:32fdobridge: <airlied> yeah though going 4k didn't fix the problem, going to dig in the kernel some more
01:40fdobridge: <airlied> okay the kernel seems to be using 1MB
01:44fdobridge: <gfxstrand> 🫤
01:45fdobridge: <gfxstrand> I did push a change recently to start propagating OOM where we weren't before.
01:45fdobridge: <gfxstrand> I tweaked VMA ranges a bit while I was at it. I may have screwed that up.
01:49fdobridge: <gfxstrand> It's possible we're allocating from one heap and returning to the other. You'd only see that in a long run, though.
01:54fdobridge: <airlied> oh I think we lost a patch I wrote ages ago
01:58fdobridge: <gfxstrand> Oh?
01:58fdobridge: <airlied> you even reviewed it 🙂
01:58fdobridge: <airlied> https://patchwork.freedesktop.org/patch/552331/
01:58fdobridge: <gfxstrand> That's possible
01:58airlied: dakr: ^ care to pick that up for fixes?
01:58airlied: maybe add a stable tag to it
02:00fdobridge: <gfxstrand> Yeah, that'll do it. 🙃 That also explains why I haven't seen this in a while. I was running a pile of hacks (including that one) when I did my CTS run. I switched to 6.5rc something a while ago.
02:02fdobridge: <gfxstrand> I should cherry-pick that into my build
03:05fdobridge: <gfxstrand> @asdqueerfromeu https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26383
03:05fdobridge: <gfxstrand> That should make testing MUCH easier. With the instance version in the json set to 1.1, `MESA_VK_VERSION_OVERRIDE` should work properly now.
03:05fdobridge: <gfxstrand> Also, may as well enable 1.1 by default since we have all the features and are passing almost all the tests.
03:06fdobridge: <gfxstrand> RIP my CTS runtime. 😭
03:07fdobridge: <airlied> something strange is happening with ffmpeg and GetPhysicalDeviceFeatures2, it's like the loader isn't aliasing thigns properly to GetPhysicalDeviceFeature2KHR etc
03:08fdobridge: <gfxstrand> Yeah, real 1.1 may fix that
03:08fdobridge: <gfxstrand> 1.0 has so many problems.
03:12fdobridge: <airlied> I'll rebase onto that when marges
03:13fdobridge: <airlied> yeah that helps it a lot
03:15fdobridge: <airlied> my cts run died with Test case 'dEQP-VK.glsl.derivate.dfdxsubgroup.fbo.float_mediump'..
03:15fdobridge: <airlied> thread '<unnamed>' panicked at ../src/nouveau/compiler/nak_from_nir.rs:2064:18:
03:15fdobridge: <airlied> Unsupported intrinsic instruction: quad_broadcast
03:15fdobridge: <airlied> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
03:15fdobridge: <airlied> fatal runtime error: failed to initiate panic, error 5
03:18fdobridge: <gfxstrand> Uh... what? We've had `quad_broadcast` for a while.
03:18fdobridge: <gfxstrand> I guess nvk/conformance needs a rebase.
03:19fdobridge: <gfxstrand> Running 8/16-bit now.
03:19fdobridge: <gfxstrand> I think I have enough NIR magic to not blow up UBOs.
03:40fdobridge: <gfxstrand> And... opt_algebraic is cleverer than me... 🙃
06:22fdobridge: <airlied> Test case 'dEQP-VK.graphicsfuzz.cov-dfdx-dfdy-after-nested-loops'..
06:22fdobridge: <airlied> thread '<unnamed>' panicked at ../src/nouveau/compiler/nak_from_nir.rs:173:9:
06:22fdobridge: <airlied> assertion failed: idx < 16
06:22fdobridge: <airlied> note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
06:22fdobridge: <airlied> fatal runtime error: failed to initiate panic, error 5
06:22fdobridge: <airlied> that was with the vk11 branch
11:29fdobridge: <udoprog> Maybe this might be interesting to someone working on GNOME.
11:29fdobridge: <udoprog> So [I've written a patch against gdm](<https://gist.github.com/udoprog/1f0a156be3d95d9297019c65b595b255>) which allows for seats to be excluded, with it you can do something like this in `/etc/gdm/custom.conf`:
11:29fdobridge: <udoprog> ```
11:29fdobridge: <udoprog> [daemon]
11:29fdobridge: <udoprog> # other settings
11:29fdobridge: <udoprog> ExcludedSeats=seat-nvidia
11:29fdobridge: <udoprog> ```
11:29fdobridge: <udoprog> With this we get the ability to use seats to exclude graphical devices from being eagerly used by gdm (else it uses anything which passes `sd_seat_can_graphical`). And if we want to start a gdm session with a particular device we can just assign it to a seat which is not excluded:
11:29fdobridge: <udoprog> ```sh
11:29fdobridge: <udoprog> sudo loginctl attach seat-nvidia2 /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card0
11:29fdobridge: <udoprog> ```
11:29fdobridge: <udoprog> Getting rid of `gdm` as a user of the driver can be achieved by assigning it back to `seat-nvidia`, so it's fairly easy to reload the driver. Now a downside is that I've yet to figure out how to conveniently share input devices across seats, so you'll have to have two sets. But the benefit is that `seat0` is not disturbed by any of this. I'm using it with integrated graphics.
11:30fdobridge: <udoprog> I'm gonna try to write a small utility to launch a new session as well (if gnome-session can't already do that), so I don't have to use gdm at all with the excluded seat.
14:01fdobridge: <!DodoNVK (she) 🇱🇹> I'm recompiling my GSP kernel with a newer kernel version (hopefully I'll be able to retest GSP stuff fairly soon)
14:21fdobridge: <esdrastarsis> Just use linux-mainline from chaotic-aur (if you are using arch)
14:32karolherbst: dakr, airlied: do you think you have time to look into https://gitlab.freedesktop.org/nouveau/mesa/-/issues/84 ? I honestly have no idea what's going wrong there, just that some context is invalid and uhh.. that's pretty much it
15:00fdobridge: <!DodoNVK (she) 🇱🇹> The SuperTuxKart OpenGL GSP issue still appears :cursedgears:
15:00fdobridge: <!DodoNVK (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1179074369180422245/message.txt?ex=65787593&is=65660093&hm=d47d5ca50e209e0556481802992641be47a51e1f7ec84c3bfcca940e2e56a58a&
15:52fdobridge: <!DodoNVK (she) 🇱🇹> NFS Most Wanted 2012 at 1080p with 4x SSAA (so basically 2160p) and ambient occlusion disabled :triangle_nvk:
15:52fdobridge: <!DodoNVK (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1179087312957747300/Screenshot_20231128_175037.png?ex=657881a1&is=65660ca1&hm=2a7729c8fd1f3ab93a29cef7ef77252d0b53485f16df703a5dc26925f74a7164&
16:00fdobridge: <gfxstrand> Ah, yes. That's the test that uses too much control-flow and I need to figure out how to spill warp barriers to sort it out. That's another reason to run 1.0.
16:13fdobridge: <karolherbst🐧🦀> what do you still need there btw?
16:13fdobridge: <karolherbst🐧🦀> you only need `BMOV` for that
16:14fdobridge: <karolherbst🐧🦀> you have a `BMOV.32 Ra Ba` and a `BMOV.32 Ba Ra` variant
16:16fdobridge: <karolherbst🐧🦀> you can also use `BMOV.32 Ba URa` in case you know it was uniform, but not sure the additional mov into uniform is worth the trouble
16:20fdobridge: <!DodoNVK (she) 🇱🇹> I managed to freeze Overwatch 2 🐸
16:20fdobridge: <!DodoNVK (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1179094589349568553/Screenshot_20231128_181935.png?ex=65788868&is=65661368&hm=abe4e654bb5d12f3515345182666e59c2c272430b580f24dfd58d92c5f2bd3d4&
16:23fdobridge: <gfxstrand> Yeah, but what does BMOV move? That's what I don't understand and what scares me a bit.
16:24Ilgaz: I may have outdone myself this time. gnome-wayland works perfect, night and day difference with other de with no errors on dmsg and yet installed gnome-terminal. It displays horribly garbaged txt but looks normal in "top view" I even have a video of it. nouveau nvidia 9400 here
16:26fdobridge: <karolherbst🐧🦀> the value of the source?
16:26fdobridge: <gfxstrand> But what kind of "value" is it? Is it just the re-convergence target?
16:27fdobridge: <karolherbst🐧🦀> how is that relevant? the barrier registers are just 32 bit values of whatever
16:27fdobridge: <karolherbst🐧🦀> they are not any more special than gprs
16:28fdobridge: <karolherbst🐧🦀> you use the 16 special barrier registers to put special values into them
16:29fdobridge: <karolherbst🐧🦀> but I think in general those are just thread masks
16:29fdobridge: <karolherbst🐧🦀> (or well.. most values will be)
16:30fdobridge: <karolherbst🐧🦀> `BSSY` sets a thread mask
16:30fdobridge: <gfxstrand> If it's just a thread mask then why does BSSY take a merge IP?
16:30fdobridge: <karolherbst🐧🦀> debugging
16:30fdobridge: <karolherbst🐧🦀> the hw igores it
16:30fdobridge: <karolherbst🐧🦀> *ignores
16:30fdobridge: <gfxstrand> Okay...
16:31fdobridge: <karolherbst🐧🦀> it's for your debugger to know what is the intented target 😄
16:31fdobridge: <gfxstrand> Also, according to https://gitlab.freedesktop.org/mhenning/re/-/blob/main/opclass/opclass75?ref_type=heads, it looks like they're maybe 64-bit?
16:31fdobridge: <karolherbst🐧🦀> (or disassembler)
16:31fdobridge: <!DodoNVK (she) 🇱🇹> The performance has improved with GSP but there's a lot of freezing/stuttering and weird object flashes (and also the game recently died)
16:31fdobridge: <karolherbst🐧🦀> that's useful for setting 64 bit values
16:31fdobridge: <karolherbst🐧🦀> check the `TSSemantic` enum in codegen
16:31fdobridge: <karolherbst🐧🦀> `TS_ATEXIT_PC_LO` and `TS_ATEXIT_PC_HI` e.g.
16:31fdobridge: <karolherbst🐧🦀> also `TS_TRAP_RETURN_PC_LO`/`_HI`
16:32fdobridge: <gfxstrand> Ah
16:32fdobridge: <gfxstrand> Okay, that makes more sense.
16:34fdobridge: <karolherbst🐧🦀> `.CLEAR` only works on the general purpose barrier registers btw
16:34fdobridge: <karolherbst🐧🦀> I don't think there are any other restrictions where you can only use the registers but not the special ones
16:36Ilgaz: OK I see really psychedelic things with gnome-terminal. all default settings zero extensions. Should I report this to gnome/distro/freedesktop? https://photos.app.goo.gl/ypr2bG2XMZz95DEZ9
16:37fdobridge: <!DodoNVK (she) 🇱🇹> It died with the same GR class error for some reason 🤔
16:38fdobridge: <karolherbst🐧🦀> @gfxstrand I already use that stuff in codegen for cube texgrad emulation and stuff... and I think if we want to move gallium to it NAK will have to handle that as well :ferrisUpsideDown: Maybe I just make NAK CL only for now...
16:39fdobridge: <karolherbst🐧🦀> mhh 3d as well I think
16:39fdobridge: <gfxstrand> NAK already passes all the Vulkan texture tests. What's the problem?
16:40fdobridge: <karolherbst🐧🦀> or is there general nir lowering for that?
16:40fdobridge: <gfxstrand> NIR
16:40fdobridge: <karolherbst🐧🦀> I see
16:40fdobridge: <karolherbst🐧🦀> might be not optimal though 😄
16:40fdobridge: <gfxstrand> codegen's use of NIR is in the dark ages.
16:40fdobridge: <karolherbst🐧🦀> sure.. but I'm also sure that nirs lowering is quite slow
16:40fdobridge: <karolherbst🐧🦀> maybe it's fine...
16:40fdobridge: <gfxstrand> Why? What NV HW tricks are being played that are so important?
16:41fdobridge: <karolherbst🐧🦀> yeah, more or less
16:41fdobridge: <karolherbst🐧🦀> you can use of quad operations to speed it up
16:41fdobridge: <karolherbst🐧🦀> *make use
16:41fdobridge: <karolherbst🐧🦀> so you force quad operation mode via barriers, do your quad based lowering, and restore the thread masks
16:42fdobridge: <gfxstrand> Yeah, maybe
16:42fdobridge: <gfxstrand> But also meh
16:42fdobridge: <gfxstrand> Like, prove to me that that changes app FPS before I'm going to care enough to carry NAK code for it.
16:42fdobridge: <karolherbst🐧🦀> yeah...
16:43fdobridge: <karolherbst🐧🦀> I suspect there are none
16:44fdobridge: <karolherbst🐧🦀> but yeah, I kinda forgot that it's required? to have that in vulkan and that nir already lowers it...
16:44fdobridge: <gfxstrand> I mean, it's not required to have NIR lower it.
16:44fdobridge: <gfxstrand> But NV isn't that unique in needing it lowered.
16:45fdobridge: <karolherbst🐧🦀> fair enough
16:51fdobridge: <gfxstrand> Ugh... bmov.clear is not SSA-friendly...
16:51fdobridge: <karolherbst🐧🦀> not very 😄
16:52fdobridge: <karolherbst🐧🦀> but I also don't really get the point of it
16:53fdobridge: <karolherbst🐧🦀> well.. could optimize a `BMOV B0 0x0` away
16:53fdobridge: <karolherbst🐧🦀> but also not sure what you'd do with a 0 thread mask..
16:54fdobridge: <gfxstrand> Ugh.... BSSY isn't SSA-friendly, either. 🤦🏻♀️
16:54fdobridge: <gfxstrand> Maybe I should just use VOTE?
16:54fdobridge: <karolherbst🐧🦀> huh?
16:54fdobridge: <karolherbst🐧🦀> what are you trying to do?
16:54fdobridge: <gfxstrand> I'm trying to turn barriers into SSA values so I can spill them.
16:54fdobridge: <karolherbst🐧🦀> you only need BMOV for that
16:55fdobridge: <gfxstrand> Yes, BMOV is the instruction. I need compiler theory to make everything actually work.
16:56fdobridge: <karolherbst🐧🦀> mhh.. actually `BSSY` really just updates certain bits.. pain
16:56fdobridge: <gfxstrand> Yeah
16:56fdobridge: <gfxstrand> The usualy blob pattern is `bmov.clear` followed by `bssy`.
16:56fdobridge: <karolherbst🐧🦀> yeah.. makes sense
16:57fdobridge: <gfxstrand> I *think* `bmov.clear` copies the source and clears it in one step. Okay, cool. But that's a very SSA-unfriendly operation.
16:57fdobridge: <karolherbst🐧🦀> yeah
16:57fdobridge: <karolherbst🐧🦀> deal with it post rA?
16:57fdobridge: <gfxstrand> I can special-case it in RA if I need to
16:58fdobridge: <gfxstrand> That's the other option. Come up with something that's SSA-friendly and then lower post-RA.
16:58fdobridge: <karolherbst🐧🦀> I mean.. you'd only need to set the barrier to 0 before a bssy to make it legal
16:58fdobridge: <karolherbst🐧🦀> so SSA BSSY -> BMOV B0 0 + BSSY
16:58fdobridge: <gfxstrand> Even there, BSSY is SSA-unfriendly because it updates the register, not read and write.
16:58fdobridge: <karolherbst🐧🦀> and then optimize to `.CLEAR` post RA
16:58fdobridge: <karolherbst🐧🦀> sure, but if it's 0...
16:59fdobridge: <gfxstrand> Yeah, so I could make `bssy` implicitly turn into `bmov.clear` followed by `bssy` post-RA and then combine that with the previous `bmov` if there is one and the regs line up.
16:59fdobridge: <karolherbst🐧🦀> yeah
17:00fdobridge: <karolherbst🐧🦀> sounds like the least painful way
17:01fdobridge: <karolherbst🐧🦀> so `BSSY` is really just `BOR` :ferrisUpsideDown:
17:01fdobridge: <gfxstrand> BOR?
17:01fdobridge: <karolherbst🐧🦀> barrier or
17:01fdobridge: <gfxstrand> Ah
17:02fdobridge: <karolherbst🐧🦀> well..
17:02fdobridge: <karolherbst🐧🦀> there is also `MACTIVE` :ferrisUpsideDown:
17:02fdobridge: <karolherbst🐧🦀> however
17:03fdobridge: <karolherbst🐧🦀> there are those THREAD_STATE barrier regs
17:03fdobridge: <karolherbst🐧🦀> 5 of them
17:03fdobridge: <karolherbst🐧🦀> and they seem to contain random stuff
17:03fdobridge: <karolherbst🐧🦀> and the barrier instructions seem to make use of those things internally
17:32fdobridge: <gobrosse> _someone_ had to try it
17:32fdobridge: <gobrosse> https://cdn.discordapp.com/attachments/1034184951790305330/1179112668423004242/message.txt?ex=6578993f&is=6566243f&hm=d1e9026b013ab7468c14586447fa12d7054499e8d666248e3bbe31bf184247a8&
17:32fdobridge: <gobrosse> i tried NVK on a 32-bit system
17:33fdobridge: <karolherbst🐧🦀> pain
17:33fdobridge: <gobrosse> the kernel driver isn't happy
17:33fdobridge: <gobrosse> kernel `6.7.0-rc2`, built from the tarball over a weekend
17:33fdobridge: <gobrosse> unacceptable /s
17:33fdobridge: <gobrosse> unacceptable, amdgpu works fine on i686 /s (edited)
17:34fdobridge: <gobrosse> unacceptable, radv/amdgpu works fine on i686 /s (edited)
17:34fdobridge: <gobrosse> is this even meant to work though 😛
17:34fdobridge: <karolherbst🐧🦀> I wouldn't be surprised if some of the RPC stuff doesn't really work on 32 bit 😄
17:34karolherbst: dakr: if you are bored... it seems like that 32 bit kernel builds are busted
17:35fdobridge: <gobrosse> (i'm not kidding about amd by the way, I ran one my test shaders on an actual Pentium III a while back...)
17:42fdobridge: <airlied> I don't really care about 32 bit working
17:42fdobridge: <airlied> If someone figures it out and sends patches
17:43fdobridge: <gobrosse> as long as the GL stuff isn't regressed I think that's fair
17:43fdobridge: <gobrosse> i don't really care either it's just a funny thing to try
18:06fdobridge: <karolherbst🐧🦀> so GL isn't impacted there?
18:24fdobridge: <illwieckz> this is a feature! 🫣
18:24fdobridge: <illwieckz> damn, discord got me, I was answering to:
18:24fdobridge: <illwieckz> https://discord.com/channels/1033216351990456371/1034184951790305330/1179098431172661418
18:25fdobridge: <illwieckz> the message was perfectly placed at the bottom like if it was the last one 😄
20:32fdobridge: <!DodoNVK (she) 🇱🇹> https://gitlab.freedesktop.org/drm/nouveau/-/issues/279
22:14fdobridge: <gobrosse> the machine in question is headless so it's hard to tell, eglinfo seems to default to LLVMpipe tho
22:15fdobridge: <gobrosse> the machine in question doesn't run a DE so it's hard to tell, eglinfo seems to default to LLVMpipe tho (edited)
22:45fdobridge: <airlied> Does 32 bit work non gsp?