00:03fdobridge: <esdrastarsis> maybe NV_ERR_NOT_READY?
00:30fdobridge: <airlied> seems like it
00:31fdobridge: <airlied> I wonder should we retry in that instance
01:54Liver_K: Where does fdobridge bridge to?
01:55fdobridge: <esdrastarsis> Discord
06:13fdobridge: <airlied> okay I've written the code to in theory decode a h264 I frame, in practice the gpu just kills my channel 😛
06:20fdobridge: <orowith2os> `dbg!()` time? :v
06:54udoprog: So IIUC, GSP support using nvidia firmware is landing in 6.7, so I decided to take one of the release candidates and nouveau for a spin. Once booted into the system, how do I check if it's working?
06:58airlied: you have to install the firmware from linux-firmware and then if you have a pre-ada card add nouveau.config=NvGspRm=1 to kernel command line
07:00udoprog: And after that? How can I check if the GSP is working? GPU clock speeds? I looked through the patch and I didn't immediately spot any kernel diagnostics being emitted once it boots I could look for.
07:02airlied: yeah there isn't anything too obvious some bios parsing goes away
07:05udoprog: all right, thanks
08:06fdobridge: <airlied> okay, not hanging the decode engine, and I can see some red in the output so something is decoding
15:13fdobridge: <gfxstrand> Doing this with NVK or VAAPI?
15:27fdobridge: <!DodoNVK (she) 🇱🇹> Doesn't matter if zink supports VA-API :frog_pregnant:
17:02fdobridge: <!DodoNVK (she) 🇱🇹> Does anyone have a TU116 GPU? I want you to do some nouveau GSP testing 🐸
17:16udoprog: Oh, is there a Discord server I could join instead? Just saw fdobridge.
17:27DodoGTA: udoprog: https://discord.gg/cfByQFvg2q
17:28udoprog: <3
17:40fdobridge: <udoprog> So let's say I want to start working on some problems I'm having right now with nouveau. The first thing I was pondering was how to set up a development environment. My idea so far has been to do PCI passthrough and work with a virtual machine so that the host environment can stay up while the driver is being reloaded. Does this work or are there other suggestions for how to do this? Like a devbox with a kvm switch?
17:45fdobridge: <gfxstrand> PSA: I just rebased the sm50 branch and ran rustfmt on the whole thing. Outstanding MRs may need to be rebased. I pulled in @dwlsalmeida's and @marysaka's first, though.
17:46fdobridge: <gfxstrand> Are you wanting to hack on userspace or kernel? If userspace, the kernel is fairly stable on new GPUs and you shouldn't need the VM.
17:46fdobridge: <gfxstrand> As long as you're running a recent kernel
17:49fdobridge: <udoprog> Both probably, right now I want to investigate a warn I get in the kernel and the broader usecase of dmabuf passing in pipewire. But I also just want to conveniently boot up a bleeding edge kernel on a separate environment.
17:52fdobridge: <udoprog> Both probably, right now I want to investigate a warn I get in the kernel and the broader usecase of dmabuf passing in pipewire. But I also just want to conveniently boot up a bleeding edge / patched kernel on a separate environment. (edited)
17:56fdobridge: <gfxstrand> Yeah, a VM with passthrough should work for that, I think.
17:56fdobridge: <orowith2os> for userspace, containers? :)
17:57fdobridge: <orowith2os> lets you mess around a *lot* without worry of messing anything up
17:57fdobridge: <orowith2os> distrobox would let you get up and running fairly quickly, feels almost like a bare metal distro
17:57fdobridge: <udoprog> Yeah, sounds like an idea. Is there way to conveniently build the relevant userspace parts from source into a container?
17:58fdobridge: <gfxstrand> That won't work if you want to monkey with kernels.
17:58fdobridge: <orowith2os> hence why I said, for userspace
17:59fdobridge: <gfxstrand> For userspace, just build mesa in a directory and use `meson devenv`
17:59fdobridge: <orowith2os> would also be helpful, since you can move the tarball of the container into a VM if you need kernel shenanigans
17:59fdobridge: <gfxstrand> No need to deal with whole containers unless you're trying to test the whole x/wayland/whatever stack with a hacked up driver.
18:00fdobridge: <udoprog> Neat, so that just sets environment variables in a shell to point towards build-from-source mesa?
18:00fdobridge: <orowith2os> bingo
18:02fdobridge: <orowith2os> I find it helps when I need to dev *anything*, especially considering I like to run immutable systems and having the build deps of.... everything on any host system feels icky
18:03fdobridge: <udoprog> All right, that should work for userspace, cheers!
18:03fdobridge: <orowith2os> and I can change envs on a dime, so if something doesn't like Arch, Fedora, or Ubuntu, I can move to what I need
18:03fdobridge: <gfxstrand> That's fine if that's your jam. I avoid containers like the plague unless they solve a very real problem for me like crazy ruby deps or something.
18:03fdobridge: <clangcat> Yup You can also do it for Vulkan if you want to work on Vulkan drivers to.
18:03fdobridge: <!DodoNVK (she) 🇱🇹> Crazy Ruby deps?
18:04fdobridge: <orowith2os> *looks at ten container envs currently running*
18:04fdobridge: <gfxstrand> I have a container just for building the Vulkan spec because it uses recent asciidoctor + extensions which no one actually has in their distro.
18:04fdobridge: <gfxstrand> But if I'm just hacking on Mesa? Nah, stock Linux (any distro) is fine.
18:10fdobridge: <clangcat> Mesa's built system is very nice for the most part it just works.
18:11fdobridge: <udoprog> So I just want to check, recently (last 5 years or so maybe?) I've not been able to unload nvidia and load the nouveau kernel modules cleanly on a running system. It seems to lead to stuff like the gpu hanging. Is that supposed to work?
18:14fdobridge: <clangcat> It probably should as I know Optimus laptops can do it(Not sure how well it should work) I think they use a tool called bbswitch no idea if that'll work in your situtation.
18:15fdobridge: <clangcat> Though can't say for sure as I just use Nouveau cause I don't care about prefomance.
18:18fdobridge: <udoprog> Well, if I can get it to work I was thinking of just doing a multi-seat setup with my the environment I'm working on running on my integrated gpu.
18:19fdobridge: <udoprog> I've had difficulties getting PCI passthrough to work on my current mobo, and there's no space in my chassi to move the current card around much to trial and error.
18:20fdobridge: <clangcat> Yea I can't do it with my motherboard atleast not the intial way I planned as my GPUs are all on one IOMMU group. There is probably still a way to do it but I don't care to much tbh.
18:23fdobridge: <udoprog> Similar issue. It's a pain.
18:24fdobridge: <clangcat> Depends on use case I guess. I don't care alot I just do it for OS-dev sometimes.
18:34fdobridge: <pac85> Uh yeah, iirc they provide one for building it.
18:34fdobridge: <pac85> Without it needs tons of ruby stuff and some java thing as well iirc
18:34fdobridge: <pac85> https://github.com/KhronosGroup/Vulkan-Docs/blob/main/scripts/runDocker
18:34fdobridge: <pac85> <https://github.com/KhronosGroup/Vulkan-Docs/blob/main/scripts/runDocker> (edited)
18:49fdobridge: <airlied> NVK so far, though ffmpeg has some ext reqs we aren't providing yet, so just using the simple CTS test
18:50fdobridge: <gfxstrand> What exts?
18:52fdobridge: <!DodoNVK (she) 🇱🇹> What's the current status of VA-API on zink? :frog_gears:
18:53fdobridge: <airlied> Don't know, it could also be a vulkan version problem
18:54fdobridge: <!DodoNVK (she) 🇱🇹> Feel free to use my hacks then :triangle_nvk:
18:55fdobridge: <airlied> It failed to start saying timeline semaphores weren't supported, and when I started debugging it I got lost in some getphysicaldevicefeatures mess
18:56fdobridge: <airlied> Like the loader was calling gpdf1 instead of gpdf2
18:56fdobridge: <airlied> Vaapi on zink needs Vulkan exts to do it properly
18:57fdobridge: <!DodoNVK (she) 🇱🇹> Which is obvious (how well does it work with the current Vulkan video extensions though?)
19:01fdobridge: <!DodoNVK (she) 🇱🇹> Timeline semaphores are a hard requirement for FFmpeg Vulkan (it's checking for the Vulkan 1.2 feature though so you should use my Vulkan 1.3 exposure hacks)
19:04fdobridge: <!DodoNVK (she) 🇱🇹> I just saw your Vulkan video testing branch for NVK so I guess I can try hanging the GPU myself :cursedgears:
19:09fdobridge: <airlied> Yeah I hacked NVK to 1.3. But it probably needs a bit more than that, maybe today, after I write P frames
19:11fdobridge: <!DodoNVK (she) 🇱🇹> I wonder where I can get the experimental kernel changes though (one commit mentions kernel changes as a requirement)
19:16fdobridge: <airlied> Yeah I havent pushed it anywhere yet since it just a hack to get me going, needs actual thought applied, I'll post it in a bit
19:19fdobridge: <gfxstrand> I'm doing a Vulkan 1.1 run now. It might be time to turn it on.
19:19fdobridge: <gfxstrand> At least for Turing+
19:21fdobridge: <gfxstrand> Unfortunately, I can't do an actual CTS submission because I can't get through a run with GSP. 😭
19:22fdobridge: <!DodoNVK (she) 🇱🇹> Is it because of some race conditions?
19:23fdobridge: <gfxstrand> No, something appears messed up with sync objects.
19:23fdobridge: <gfxstrand> IDK what
19:24fdobridge: <gfxstrand> I get to the synchronization semaphore impor/export tests and it just locks up.
19:24fdobridge: <gfxstrand> It's totally reproducable, too.... with a full CTS run. 😭
19:24fdobridge: <gfxstrand> I haven't narrowed it down at all.
19:26fdobridge: <airlied> Nothing in dmesg?
19:26fdobridge: <airlied> Like the interaction of gsp and syncobjs is quite minimal
19:27fdobridge: <gfxstrand> Nothing instructive
19:27fdobridge: <gfxstrand> It's been a bit since I've tried to do a full run, though.
19:27fdobridge: <airlied> Have you got a branch? I can give it a run on one of my machines
19:28fdobridge: <!DodoNVK (she) 🇱🇹> GSP is a mess (TU117 is suffering from GR class errors)
19:29fdobridge: <gfxstrand> nvk/conformance
19:31fdobridge: <airlied> Is there an issue filed?
19:32fdobridge: <!DodoNVK (she) 🇱🇹> Now that I have another person reproducing the issue with their TU117 on a 6.7 rc kernel I can definitely open a bug for that
19:33fdobridge: <!DodoNVK (she) 🇱🇹> Would some kernel parameters help give extra information?
19:33fdobridge: <airlied> Yes but file the issue first then can work out what is best
19:34fdobridge: <airlied> I'm sure I have tu117 somewhere
19:39fdobridge: <gfxstrand> What does the LDC subop do?
19:41fdobridge: <gfxstrand> Hrm... Maybe it affects how the immediate (if any) is interpreted?
19:43fdobridge: <karolherbst🐧🦀> it does
19:43fdobridge: <karolherbst🐧🦀> it controls how the overflow into the next cb works
19:44fdobridge: <karolherbst🐧🦀> more even
19:44fdobridge: <karolherbst🐧🦀> only the default one works with bindless ubos btw
19:45fdobridge: <gfxstrand> Ah
19:46fdobridge: <gfxstrand> I don't think I care about one CB flowing into another
19:46fdobridge: <gfxstrand> At least not right now
19:46fdobridge: <karolherbst🐧🦀> `.IL` basically treats overflows as the next cb
19:46fdobridge: <karolherbst🐧🦀> `.IS` splits the input into two 16 bit values and the sum of the bottom can't overflow the index
19:47fdobridge: <karolherbst🐧🦀> `.ISL`same as `.IS` just that it checks that the cb index is checked against the limit of 14?
19:48fdobridge: <karolherbst🐧🦀> mhhh
19:50fdobridge: <karolherbst🐧🦀> yeah.. the overflow really only matters for real edge case indirects where it's a bit faster
19:50fdobridge: <airlied> that branch has a revert 1.1 on top, should I drop that, or will it die regardless?
19:50fdobridge: <karolherbst🐧🦀> I think it matters more for robustness and what value needs to be returned for an OOB access
19:51fdobridge: <karolherbst🐧🦀> OOB on constant buffers generally return 0, unless they overflow into the next index
19:51fdobridge: <gfxstrand> It'll probably work with 11
19:51fdobridge: <gfxstrand> But go ahead and keep it at 1.0 because that's what I think I did for my last run
19:51fdobridge: <gfxstrand> And it'll be faster
19:52fdobridge: <airlied> so it hangs with just run-deqp or do I need to do a proper submission run?
19:52fdobridge: <!DodoNVK (she) 🇱🇹> The entire NVK trio is active right now :triangle_nvk:
19:53fdobridge: <airlied> https://paste.centos.org/view/raw/ac20f769 is the kernel hack
19:53fdobridge: <gfxstrand> Proper submission run
19:54fdobridge: <gfxstrand> It takes like 2-3 hours before it dies. 😭
19:54fdobridge:<airlied> cries into single thread
19:55fdobridge: <!DodoNVK (she) 🇱🇹> Literally a one-line change (totally upstreamable /s)
19:56fdobridge: <orowith2os> is it not possible to skip ahead to the bits you want?
19:56fdobridge: <orowith2os> then when it doesn't die from there, restart from the beginning
19:56fdobridge: <gfxstrand> The group of tests it does on work fine by themselves. I haven't spent the time to bisect a minimum precursor set.
19:57fdobridge: <orowith2os> I see
19:57fdobridge: <karolherbst🐧🦀> tried increasing the test group size with deqp-runner?
19:59fdobridge: <gfxstrand> All of dEQP-VK.synchronization.* passes
19:59fdobridge: <gfxstrand> So it needs to be bigger groups than that
20:02fdobridge: <karolherbst🐧🦀> I mean.. deqp-runner kinda runs tests in random order, no? So my hope is that with bigger sizes (maybe 10k?) it makes it more likely to hit the issue through it
20:04fdobridge: <gfxstrand> Unrelated: Why does `ldc.u16` throw a misaligned addr error for things aligned to 2B? 🤡
20:05fdobridge: <karolherbst🐧🦀> sure the offset is aligned to 2b?
20:05fdobridge: <gfxstrand> fairly
20:06fdobridge: <karolherbst🐧🦀> is it a negative number?
20:07fdobridge: <karolherbst🐧🦀> mhh though I think the base address from the reg is treated as unsigned...
20:12fdobridge: <gfxstrand> Not as far as I know
20:13fdobridge: <gfxstrand> My 8/16-bit branch is failing exactly one test. 🤡
20:13fdobridge: <gfxstrand> Oh, it's in a loop... I wonder if this is somehow helper-related. 🤔
20:24fdobridge: <airlied> Test case 'dEQP-VK.memory.allocation.basic.size_4KiB.forward.count_4000'..
20:24fdobridge: <airlied> MESA: error: ../src/nouveau/vulkan/nvk_device_memory.c:212: VK_ERROR_OUT_OF_DEVICE_MEMORY
20:24fdobridge: <airlied> ResourceError (res: VK_ERROR_OUT_OF_DEVICE_MEMORY at vktMemoryAllocationTests.cpp:505)
20:24fdobridge: <airlied> bleh
20:32fdobridge: <gfxstrand> `ldc.u8` seems to handle unaligned things fine. 🤡
20:32fdobridge: <gfxstrand> Does `ldc.u16` secretly require 4B alignment?
20:32fdobridge: <gfxstrand> That would be silly
20:32fdobridge: <gfxstrand> But also entirely believable.
20:43fdobridge: <airlied> I plugged in a tu117, seems as fine as any other card with GSP here
20:44fdobridge: <airlied> display works, parallel deqp run isn't dying in flames
20:52fdobridge: <karolherbst🐧🦀> not afaik
20:53fdobridge: <karolherbst🐧🦀> have you double checked with the nvidia disassembler?
20:55fdobridge: <karolherbst🐧🦀> mhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
20:55fdobridge: <karolherbst🐧🦀> @gfxstrand it sounds like that the offset needs to be 4 byte aligned
20:56fdobridge: <karolherbst🐧🦀> but I think that's implied by the encoding?
20:56fdobridge: <karolherbst🐧🦀> like the two lower bits just don't exist in the offset or something?
20:56fdobridge: <karolherbst🐧🦀> maybe I misremember that
20:57fdobridge: <karolherbst🐧🦀> though I think `LDC` has the full 16 bits?
20:57fdobridge: <karolherbst🐧🦀> yeah...
20:57fdobridge: <karolherbst🐧🦀> `LDC` doesn't have that restriction
20:59fdobridge: <!DodoNVK (she) 🇱🇹> How many rays could NVK trace before causing a GPU hang once RT support gets implemented? 🔦
21:36fdobridge: <gfxstrand> Well, it doesn't have an encoding restriction but it sure seems to require 4B alignment internally.
21:36fdobridge: <gfxstrand> If I split all the way to `ldc.u8` it handles unaligned things just fine.
21:41fdobridge: <karolherbst🐧🦀> weird...
21:42fdobridge: <karolherbst🐧🦀> are you sure your encoding is correct?
21:42HdkR: Sounds about right. LDC restrictions suck :)
21:43fdobridge: <gfxstrand> Pretty sure. The disassembler gives me the right thing back, anyway.
21:43fdobridge: <karolherbst🐧🦀> annoying...
21:43fdobridge: <karolherbst🐧🦀> let me craft some CL kernel and see what happens
21:44fdobridge: <gfxstrand> Yeah... I'm about to give up on `ldc.u16`. like, who needs it anyway? I can shift AND and shift.
21:44fdobridge: <gfxstrand> Yeah... I'm about to give up on `ldc.u16`. like, who needs it anyway? I can AND and shift. (edited)
21:44fdobridge: <gfxstrand> It only matters for constant offsets anyway.
21:47fdobridge: <karolherbst🐧🦀> ohh.. so indirects are fine?
21:47fdobridge: <gfxstrand> It's one of those things that's super annoying but ultimately doesn't matter once you've decided how to paint the shed.
21:48fdobridge: <gfxstrand> Nope. Indirects are still busted. I've tired with unaligned+indirect, aligned+indirect, and 0+indirect.
21:48fdobridge: <!DodoNVK (she) 🇱🇹> An OpenCL kernel of Linux would be pretty ironic
21:49fdobridge: <gfxstrand> UBO indirects just aren't that common and adding 2-3 ALU ops isn't going to kill performance.
22:03fdobridge: <karolherbst🐧🦀> ```
22:03fdobridge: <karolherbst🐧🦀> test:
22:03fdobridge: <karolherbst🐧🦀> /*0000*/ IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ;
22:03fdobridge: <karolherbst🐧🦀> /*0010*/ IMAD.MOV.U32 R6, RZ, RZ, c[0x0][0x178] ;
22:03fdobridge: <karolherbst🐧🦀> /*0020*/ ULDC.64 UR4, c[0x0][0x170] ;
22:03fdobridge: <karolherbst🐧🦀> /*0030*/ IMAD.MOV.U32 R7, RZ, RZ, 0x1 ;
22:03fdobridge: <karolherbst🐧🦀> /*0040*/ SHF.L.U32 R2, R6, 0x1, RZ ;
22:03fdobridge: <karolherbst🐧🦀> /*0050*/ IADD3 R0, R2.reuse, c[0x0][0x168], RZ ;
22:03fdobridge: <karolherbst🐧🦀> /*0060*/ IADD3 R4, R2, c[0x0][0x160], RZ ;
22:03fdobridge: <karolherbst🐧🦀> /*0070*/ LDC.U16.IL R0, c[0x0][R0] ;
22:03fdobridge: <karolherbst🐧🦀> /*0080*/ LDC.U16.IL R3, c[0x0][R4] ;
22:03fdobridge: <karolherbst🐧🦀> /*0090*/ IADD3 R5, R0, R3, RZ ;
22:03fdobridge: <karolherbst🐧🦀> /*00a0*/ SHF.L.U64.HI R3, R6, R7, c[0x0][0x17c] ;
22:03fdobridge: <karolherbst🐧🦀> /*00b0*/ STG.E.U16.SYS [R2.64+UR4], R5 ;
22:03fdobridge: <karolherbst🐧🦀> /*00c0*/ EXIT ;
22:03fdobridge: <karolherbst🐧🦀> .L_x_0:
22:03fdobridge: <karolherbst🐧🦀> /*00d0*/ BRA `(.L_x_0);
22:03fdobridge: <karolherbst🐧🦀> /*00e0*/ NOP;
22:03fdobridge: <karolherbst🐧🦀> /*00f0*/ NOP
22:03fdobridge: <karolherbst🐧🦀> ```
22:03fdobridge: <karolherbst🐧🦀> mhh
22:03fdobridge: <karolherbst🐧🦀> let me add an offset and see what changes
22:03fdobridge: <karolherbst🐧🦀> ` /*0080*/ LDC.U16.IL R3, c[0x0][R4+0x2] ;`
22:03fdobridge: <karolherbst🐧🦀> 🤷
22:04fdobridge: <karolherbst🐧🦀> ohhh
22:04fdobridge: <karolherbst🐧🦀> @gfxstrand maybe it works with `.IL`?
22:07fdobridge: <gfxstrand> That could be. I can try that after a bit.
22:08fdobridge: <karolherbst🐧🦀> they probably use `.IL` in CL for other reasons though 😄
22:08fdobridge: <karolherbst🐧🦀> actually...
22:08fdobridge: <karolherbst🐧🦀> kinda smart
22:08fdobridge: <karolherbst🐧🦀> doesn't need to deal with pain `vec2`
22:08fdobridge: <gfxstrand> IDK
22:08fdobridge: <gfxstrand> Yeah
22:09fdobridge: <gfxstrand> And if you don't care about bounds checking...
22:09fdobridge: <karolherbst🐧🦀> well.. the hardware bound checks anyway
22:09fdobridge: <gfxstrand> I can poke at the Vulkan blob, too
22:09fdobridge: <karolherbst🐧🦀> hardware returns 0 for any OOB cb access
22:09fdobridge: <gfxstrand> Not with .IL, not if you actually have separate bindings.
22:10fdobridge: <karolherbst🐧🦀> does the hardware complain?
22:10fdobridge: <gfxstrand> I mean, it'll give you 0 if you run past the end of cb14 or whatever
22:10fdobridge: <karolherbst🐧🦀> yeah, sure
22:10fdobridge: <karolherbst🐧🦀> that's what I meant 😄
22:10fdobridge: <karolherbst🐧🦀> it won't trap I mean
22:14fdobridge: <airlied> okay tu117 made it through a parallel deqp run with about the same amount of damage as any other card, so I don't think there is anything gsp specific wrong with tu117
22:15fdobridge: <!DodoNVK (she) 🇱🇹> How about testing SuperTuxKart with zink?
22:16fdobridge: <!DodoNVK (she) 🇱🇹> Or DXVK v1.10.0+ (both of these cases trigger an error for me)?
22:19fdobridge: <!DodoNVK (she) 🇱🇹> Actually I meant the OpenGL driver (maybe I should get some rest)
22:43fdobridge: <airlied> Yeah just I doubt it's a gsp specific problem it might just be reporting a problem we miss