00:00 gfxstrand[d]: Yeah, we have separate OpIAdd3 and OpIAdd3X simply to handle the semantic differences.
00:00 snowycoder[d]: gfxstrand[d]: `LDSLK P0, R2, [R1]`: load and lock address R1, if successful store result in R2 and P0=T, else P0=F (R2=?)
00:00 snowycoder[d]: `STSCUL P1, [R1], R2`: store conditionally and unlock address R1 with value R2 (if condition P1)
00:00 gfxstrand[d]: Structs are cheap. Confusion is not.
00:02 gfxstrand[d]: snowycoder[d]: Okay, it's the load/lock pattern. Yeah, let's make new ops.
00:06 gfxstrand[d]: The annoying bit is that the NIR `ldslk_nv` intrinsic will have to return a vec5 and we'll have to do the predicate to int dance. But you know that drill already.
00:07 gfxstrand[d]: Well, not a vec5 since there aren't vec4 atomics. But one more than the actual number of components.
00:07 redsheep[d]: Does Blackwell even work on the kernel side yet with the latest branch and firmware? I don't keep up on the mailing list if something was said there
00:08 gfxstrand[d]: redsheep[d]: There's a GB branch but I haven't tried it.
00:13 matt_schwartz[d]: yeah to clarify, thats the branch i was trying above w/ blackwell. 03.01-gb20x
00:23 yusisamerican: a
00:26 yusisamerican: gfxstrand[d]: nvkmd is very highly linked with the vulkan error handling system, for non-vulkan drivers should it just be yanked out or do you not want those drivers to use nvkmd?
00:27 snowycoder[d]: gfxstrand[d]: Fortunately we already have an opt pass for thatπŸ˜‚
00:31 gfxstrand[d]: yusisamerican: Depends. What are you trying to do?
00:33 yusisamerican: Im trying to use it instead of what we had in the winsys for a gallium driver, for heap allocations
00:33 gfxstrand[d]: For that, I wouldn't bother.
00:34 yusisamerican: alright
00:34 gfxstrand[d]: Eventually, NVK is probably going to run on as many as 4 different kernels. But I think a gallium driver can just target Nouveau.
00:36 gfxstrand[d]: If we wanted to extract the Vulkan logic, we probably could. It would just mean that we have to pull the error handling up a level or two.
00:36 yusisamerican: (**if**) the gallium driver gets merged, just add a patch to revert the vma removals in winsys?
00:37 gfxstrand[d]: Unless you're planning on implementing sparse, you don't really need the VM bind stuff.
00:37 gfxstrand[d]: Though the new submit ioctl is a lot nicer.
00:38 gfxstrand[d]: But also it doesn't have implicit sync which is annoying for GL
00:40 yusisamerican: The old ioctl was/is annoying for GL as well
00:40 gfxstrand[d]: Yeah...
00:40 gfxstrand[d]: You have to track all the BOs
00:41 gfxstrand[d]: But at least you get implicit sync. <a:shrug_anim:1096500513106841673>
00:41 yusisamerican: id rather lose implicit sync...90% of the reason to write a new gallium driver is EXEC
00:45 gfxstrand[d]: gfxstrand[d]: I would need to do an audit of all `vk_error()` in NVKMD. It could be that it's not to bad to remove the Vulkan errors and replace them with something else.
00:46 gfxstrand[d]: I kinda never liked plumbing Vulkan in there but didn't have a better plan at the time.
00:47 gfxstrand[d]: So maybe we can rip it out.
00:52 redsheep[d]: gfxstrand[d]: 4? Nouveau, Nova, ogk... The Nvidia kmd on windows?
00:52 gfxstrand[d]: Yup
00:52 gfxstrand[d]: We'll see how many of those actually happen
00:53 redsheep[d]: Hopefully those last two will turn out to be really similar
00:53 redsheep[d]: And hopefully nouveau and Nova aren't too similar πŸ˜›
00:53 gfxstrand[d]: The modern nouveau API isn't bad
00:55 yusisamerican: Then with haiku it would be 5...
01:04 butterflies[d]: Does somebody happen to know the difference between MOMC and MIMC access counters on NV?
01:13 airlied[d]: // Whether this counter refers to outbound accesses to remote GPUs or
01:13 airlied[d]: // sysmem (MIMC), or it refers to inbound accesses from CPU or a non-peer
01:13 airlied[d]: // GPU (whose accesses are routed through the CPU, too) to vidmem (MOMC)
03:15 skeggsb9778[d]: matt_schwartz[d]: the display issues need a patch to core drm from airlied[d], as well as a patch to nouveau to enable it
03:16 skeggsb9778[d]: the other warnings i might have fixed already - i'll push a new branch soon
03:16 skeggsb9778[d]: (i'll put airlied's patches in there too)
06:21 tiredchiku[d]: hm
06:21 tiredchiku[d]: I might have to redo the entire device initialization loop here
06:22 tiredchiku[d]: because to get the hardware classes I need an hSubdevice handle
06:23 tiredchiku[d]: but I can't get that in my `nvkmd_nvrm_try_create_pdev` since the nv root device is allocated in `nvkmd_nvrm_create_dev`
06:24 tiredchiku[d]: but `nvkmd_pdev.dev_info` wants all the classes up front
06:27 tiredchiku[d]: or maybe I can initialize things in `nvkmd_nvrm_try_create_pdev` and just store them in a temporary struct to pass onto `nvkmd_nvrm_create_dev`, which might be easier
06:33 tiredchiku[d]: anyway, I've got things hardcoded for now and should figure out how to get this prototype working..
07:23 airlied[d]: in my latest regalloc battle, r0..4 = ldsm.16.m8n8.x4 [r30+0x20]
07:23 airlied[d]: par_copy r14 = r0, r15 = r1, r22 = r2, r23 = r3
07:23 airlied[d]: r0..4 = ldsm.16.m8n8.x4 [r30+0x520]
07:23 airlied[d]: par_copy r28 = r0, r32 = r1, r34 = r2, r35 = r3
07:23 airlied[d]: r0..4 = ldsm.16.m8n8.x4 [r30+0xa20]
07:23 airlied[d]: par_copy r66 = r0, r67 = r1, r70 = r2, r71 = r3
07:23 airlied[d]: r0..4 = ldsm.16.m8n8.x4 [r30+0xf20]
07:23 airlied[d]: par_copy r74 = r0, r75 = r1, r78 = r2, r79 = r3
07:30 tiredchiku[d]: ```c
07:30 tiredchiku[d]: NvU32 nvRmApiMapMemory(NvRmApi *api, NvU32 hDevice, NvU32 hMemory, NvU64 offset, NvU64 length, NvU32 flags, NvRmApiMapping *mapping)
07:30 tiredchiku[d]: {
07:30 tiredchiku[d]: mapping->address = NULL;
07:30 tiredchiku[d]: int memFd = open(api->nodeName, O_RDWR | O_CLOEXEC);
07:30 tiredchiku[d]: if (memFd < 0) {
07:30 tiredchiku[d]: return NV_ERR_GENERIC;
07:30 tiredchiku[d]: }
07:30 tiredchiku[d]: nv_ioctl_nvos33_parameters_with_fd p = {
07:30 tiredchiku[d]: .params = {
07:30 tiredchiku[d]: .hClient = api->hClient,
07:30 tiredchiku[d]: .hDevice = hDevice,
07:30 tiredchiku[d]: .hMemory = hMemory,
07:30 tiredchiku[d]: .offset = offset,
07:30 tiredchiku[d]: .length = length,
07:30 tiredchiku[d]: .pLinearAddress = 0,
07:30 tiredchiku[d]: .status = 0,
07:30 tiredchiku[d]: .flags = flags
07:30 tiredchiku[d]: },
07:30 tiredchiku[d]: .fd = memFd
07:30 tiredchiku[d]: };
07:30 tiredchiku[d]: int ret = nvRmIoctl(api->fd, NV_ESC_RM_MAP_MEMORY, &p, sizeof(p));
07:30 tiredchiku[d]: if (ret < 0) {
07:30 tiredchiku[d]: p.params.status = NV_ERR_GENERIC;
07:30 tiredchiku[d]: goto done1;
07:30 tiredchiku[d]: }
07:30 tiredchiku[d]: if (p.params.status != NV_OK) {
07:30 tiredchiku[d]: goto done1;
07:30 tiredchiku[d]: }
07:30 tiredchiku[d]: mapping->stubLinearAddress = (void*)(uintptr_t)p.params.pLinearAddress;
07:30 tiredchiku[d]: mapping->address = (void*)mmap(0, length, PROT_READ|PROT_WRITE, MAP_SHARED, memFd, 0);
07:30 tiredchiku[d]: if (mapping->address == MAP_FAILED) {
07:30 tiredchiku[d]: p.params.status = NV_ERR_GENERIC;
07:30 tiredchiku[d]: goto done1;
07:30 tiredchiku[d]: }
07:30 tiredchiku[d]: mapping->size = length;
07:30 tiredchiku[d]: done1:
07:30 tiredchiku[d]: close(memFd);
07:30 tiredchiku[d]: return p.params.status;
07:30 tiredchiku[d]: }```
07:30 tiredchiku[d]: spot the error
07:30 tiredchiku[d]: ||nvRmIoctl should be getting `memFd`, not `api->fd`||
07:34 tiredchiku[d]: which gets me NV_ERR_INVALID_CLIENT
07:34 tiredchiku[d]: progress!
08:49 avhe[d]: tiredchiku[d]: i don't know about this, NV_ESC_RM_MAP_MEMORY is NV_CTL_DEVICE_ONLY which means that will fail when mapping from /dev/nvidiaN
08:49 tiredchiku[d]: it is?
08:50 tiredchiku[d]: huh
08:51 tiredchiku[d]: so it is
08:51 tiredchiku[d]: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/c5e439fea4fe81c78d52b95419c30cabe44e48fd/src/nvidia/arch/nvalloc/unix/src/escape.c#L507
08:51 tiredchiku[d]: :sigh:
09:29 djdeath3483[d]: gfxstrand[d]: : so you don't have any link optimization in nvk for pipelines?
09:41 tiredchiku[d]: avhe[d]: https://github.com/averne/Envideo/blob/933de833badd2842519791cfa04bf9e66e39ad9c/src/nvidia/device.cpp#L529
09:41 tiredchiku[d]: does that ternary not pick the device fd if its dealing with not-system memory?
09:42 avhe[d]: yes but the NV_ESC_RM_MAP_MEMORY ioctl always happens on the nvidiactl node that was created at init
09:42 avhe[d]: (which matches what cuda does)
09:50 tiredchiku[d]: interesting
09:50 tiredchiku[d]: oh
09:51 tiredchiku[d]: so the device fd is passed along as the params, but the ioctl itself happens on the ctl fd
09:51 tiredchiku[d]: got it
09:51 tiredchiku[d]: took me long enough
12:46 tiredchiku[d]: :LETSFUCKINGCACO:: :LETSFUCKINGGO: :LETSFUCKINGO: :PogDuck:
12:48 tiredchiku[d]: no longer failing in ioctls!
13:07 tiredchiku[d]: I am truly amazing
13:07 tiredchiku[d]: first I was failing in mmap
13:08 tiredchiku[d]: now I'm failing in munmap
13:08 tiredchiku[d]: :ha:
13:08 tiredchiku[d]: but at least the kernel module is no longer complaining about things
13:10 gfxstrand[d]: djdeath3483[d]: Not at the moment. I should add some.
13:11 djdeath3483[d]: gfxstrand[d]: That includes not even knowing how many rendertargets you have?
13:12 gfxstrand[d]: tiredchiku[d]: Yeah, we may have to do something like that. We have similar issues with nouveau. 😒 As long as the context itself is tied to the `ctx` object, that's the important thing.
13:12 asdqueerfromeu[d]: NVK fell behind Venus in terms of extension support a few weeks ago πŸ“‰
13:12 djdeath3483[d]: When the shader writes gl_FragColor and you need to replicate on all rts
13:12 gfxstrand[d]: djdeath3483[d]: Yeah, we don't currently care about dead-coding render targets.
13:12 gfxstrand[d]: Vulkan doesn't have `gl_FragColor`.
13:13 djdeath3483[d]: Yeah, but zink does 😭
13:15 gfxstrand[d]: Not that we ever see.
13:15 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/compiler/nak_nir.c?ref_type=heads#L377
13:25 tiredchiku[d]: gfxstrand[d]: I don't need to create the context to get everything, but I do have to create the logical device
13:27 gfxstrand[d]: That's probably okay to put in the pdev
13:29 tiredchiku[d]: okie
13:30 tiredchiku[d]: right now I'm investigating the mystery of the invalid pointer
13:30 tiredchiku[d]: `munmap_chunk(): invalid pointer`
13:36 tiredchiku[d]: probably just double free memes
13:41 tiredchiku[d]: uhh
13:41 tiredchiku[d]: why is the fd 0 :doomthink:
13:46 tiredchiku[d]: why is it trying to close /dev/pts/2 :wat:
13:49 tiredchiku[d]: rude..
13:49 tiredchiku[d]: konsole crashed while I was gdb'ing
13:52 tiredchiku[d]: oh I'm failing on `nvk_upload_queue_init(dev, &dev->upload);`
13:52 tiredchiku[d]: wow
14:03 tiredchiku[d]: ah
14:03 tiredchiku[d]: context creation fails
14:20 tiredchiku[d]: semaphore surface creation specifically
15:31 tiredchiku[d]: !
15:32 tiredchiku[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1353753356195336272/f8edz2W.png?ex=67e2cc8f&is=67e17b0f&hm=953faea8aaf37bd4b9e3f51ba76894f3da57070522a5fad8f7544a33f1dc90f8&
15:32 tiredchiku[d]: the window never shows up, but it also doesn't crash
15:35 tiredchiku[d]: :wahoo:
15:37 tiredchiku[d]: I think I know the issue
15:37 esdrastarsis[d]: tiredchiku[d]: Are you using the haiku dev tree?
15:37 tiredchiku[d]: kinda
15:37 tiredchiku[d]: mostly, yeah
15:37 tiredchiku[d]: `ioctl(5, DRM_IOCTL_AUTH_MAGIC, 0x7fff460e69c4) = -1 EACCES (Permission denied)`
15:38 tiredchiku[d]: esdrastarsis[d]: they dropped libdrm, I'm trying to reintegrate it
15:38 tiredchiku[d]: `lrwx------ 1 sidpr sidpr 64 Mar 24 21:07 5 -> /dev/dri/card0`
15:39 tiredchiku[d]: wsi init is failing
15:42 tiredchiku[d]: zmike[d]: how does zink manage that ioctl on nvprop?
15:44 zmike[d]: what
15:44 zmike[d]: tf ?
15:46 tiredchiku[d]: :myy_TinyGiggle:
15:46 tiredchiku[d]: trying to get nvk running on openrm
15:46 tiredchiku[d]: I consistently see it trying to poke /dev/dri/card0 with that ioctl
15:47 tiredchiku[d]: but I'm guessing zink doesn't interact with the kernel directly..
15:47 tiredchiku[d]: apologies for the ping πŸ˜…
16:00 tiredchiku[d]: airlied[d]: oh
16:00 tiredchiku[d]: ..that sounds like having nvrm as a drm driver isn't exactly possible
16:05 f_: karolherbst: sorry for the unrelated ping but did you reach out to oftc about your discord bridge? Just curious
16:05 karolherbst: nope
16:05 karolherbst: I should at some point, but as long as it's below 50 connections it's ifne
16:06 gfxstrand[d]: skeggsb9778[d]: FYI: Your branch doesn't build on aarch64. It's an easy fix but it doesn't out-of-the-boc
16:06 gfxstrand[d]: tiredchiku[d]: Congrats! :transtada128x128:
16:06 tiredchiku[d]: not yet <a:ahh:1022261668148940810>
16:06 tiredchiku[d]: I need to get the window up
16:06 tiredchiku[d]: tiredchiku[d]: need to figure this out
16:07 gfxstrand[d]: Ugh
16:07 gfxstrand[d]: Is that ioctl happening on nvidiactl, nvidia0, or card0?
16:07 tiredchiku[d]: card0 gets EACCESS
16:07 tiredchiku[d]: nvidiactl has `[Mon Mar 24 21:26:48 2025] NVRM: RmIoctl: unknown NVRM ioctl command: 0x11`
16:11 tiredchiku[d]: tiredchiku[d]: EINVAL (nvctl)
16:12 f_: karolherbst: right thanks
16:20 tiredchiku[d]: gfxstrand[d]: I could do what haiku did and bypass drm for openrm
16:26 tiredchiku[d]: though I think this is more of an ahuillet or notthatclippy[d] question
16:37 gfxstrand[d]: It returning -EACCESS is kinda fine
16:37 gfxstrand[d]: That just means you're not the DRM master
16:37 tiredchiku[d]: hm
16:38 tiredchiku[d]: fair enough, I did gdb a bit too
16:38 tiredchiku[d]: alloc is failing on `NV_MMU_PTE_KIND_C32_2CBA`
16:38 tiredchiku[d]: with NV_ERR_ILLEGAL_ACTION
16:39 tiredchiku[d]: silently failing, that is
16:46 tiredchiku[d]: marysaka[d]: more for you :3
16:46 tiredchiku[d]: UNKNOWN CMDID: da0002
16:47 tiredchiku[d]: this is confusing
16:47 tiredchiku[d]: or rather, I am confused
16:48 tiredchiku[d]: I can't find anything that explicitly requests that
16:48 avhe[d]: NV_SEMAPHORE_SURFACE_CTRL_CMD_BIND_CHANNEL
16:48 avhe[d]: just grep ogkm
16:51 tiredchiku[d]: :salute:
16:53 avhe[d]: NV_SEMAPHORE_SURFACE looks pretty convenient
16:53 avhe[d]: i didn't know it was a thing and implemented my own version
17:01 karolherbst[d]: it's funny how often nvidia flips between "dedicated int alu" and "mixed alu all the way"
17:01 karolherbst[d]: apparently blackwell is mixed without dedicated things again
17:01 karolherbst[d]: ehh dedicated fp alu I mean
17:05 mhenning[d]: yeah, it feels like they've been flipping back and forth each generation lately
17:06 notthatclippy[d]: tiredchiku[d]: Hey, sorry, I wasn't paying attention the past few days but I see a lot of pings and a lot of things happening since then. Is there any question for me that's still pending?
17:08 tiredchiku[d]: notthatclippy[d]: indeed I do
17:08 tiredchiku[d]: going through gdb I spotted
17:08 tiredchiku[d]: Thread 1 "main" hit Breakpoint 1, nvRmApiAlloc (api=api@entry=0x7fffffffa470, hParent=<optimized out>, hObject=hObject@entry=0x5555556700c0, hClass=hClass@entry=218, pAllocParams=pAllocParams@entry=0x7fffffffa480) at ../mesa/src/nouveau/vulkan/nvkmd/nvrm/nvRmApi.c:38
17:08 tiredchiku[d]: 38 if (ret < 0) {
17:08 tiredchiku[d]: (gdb) print ret
17:08 tiredchiku[d]: $10 = 22
17:08 tiredchiku[d]: (gdb) print p
17:10 tiredchiku[d]: now I gather gdb's print puts out ints, so I converted 22 to hex, which gave me 0x16
17:10 tiredchiku[d]: which corresponds to NV_ERR_ILLEGAL_ACTION status ccode
17:11 tiredchiku[d]: likewise, 218 (the hClass) appears to be 0xDA
17:11 notthatclippy[d]: Which branch is this in?
17:12 tiredchiku[d]: 570.133.07
17:12 tiredchiku[d]: the latest released
17:12 tiredchiku[d]: (this is for nvk on openrm
17:12 tiredchiku[d]: the only whole word match for 0xda I get is for `NV_MMU_PTE_KIND_C32_2CBA`
17:13 notthatclippy[d]: Think I found it. THis, right? <https://github.com/X547/mesa/blob/mesa-nvk/src/nouveau/vulkan/nvkmd/nvrm/nvRmApi.c>
17:13 tiredchiku[d]: indeed (albeit slightly different)
17:13 tiredchiku[d]: just corrected to always be directed at the ctl fd
17:14 notthatclippy[d]: Just above, you have ```c
17:14 notthatclippy[d]: static int nvRmIoctl(int fd, NvU32 cmd, void *pParams, NvU32 paramsSize)
17:14 notthatclippy[d]: {
17:14 notthatclippy[d]: int res;
17:14 notthatclippy[d]: do {
17:14 notthatclippy[d]: res = ioctl(fd, _IOC(IOC_INOUT, NV_IOCTL_MAGIC, cmd, paramsSize), pParams);
17:14 notthatclippy[d]: if (res < 0) {
17:14 notthatclippy[d]: res = errno;
17:14 notthatclippy[d]: }
17:14 notthatclippy[d]: } while ((res == EINTR || res == EAGAIN));
17:14 notthatclippy[d]: return res;
17:14 notthatclippy[d]: }
17:14 notthatclippy[d]: ``` which means that the `ret` will not be an NV_STATUS code but rather one of these errno ones.
17:15 tiredchiku[d]: oh
17:15 notthatclippy[d]: As for 0xDA, this is `NV_SEMAPHORE_SURFACE` in cl00da.h
17:15 tiredchiku[d]: right
17:16 tiredchiku[d]: https://github.com/X547/mesa/blob/mesa-nvk/src/nouveau/vulkan/nvkmd/nvrm/nvRmSemSurf.c#L69
17:18 tiredchiku[d]: tiredchiku[d]: meaning I'm getting an EINVAL :doomthink:
17:21 notthatclippy[d]: That's not surprising, because: <https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/kernel-open/nvidia/nv.c#L2694-L2698>
17:21 notthatclippy[d]: What matters is what is in the status register afterwards?
17:22 tiredchiku[d]: let me check
17:22 notthatclippy[d]: i.e.
17:22 notthatclippy[d]: ```c
17:22 notthatclippy[d]: int ret = nvRmIoctl(api->fd, NV_ESC_RM_ALLOC, &p, sizeof(p));
17:22 notthatclippy[d]: if (ret < 0) {
17:22 notthatclippy[d]: return NV_ERR_GENERIC;
17:22 notthatclippy[d]: }
17:22 notthatclippy[d]: *hObject = p.hObjectNew;
17:22 notthatclippy[d]: return p.status;
17:22 notthatclippy[d]: ``` `ret` is EINVAL, but what is `p.status`?
17:22 tiredchiku[d]: that saus 0
17:22 tiredchiku[d]: NV_OK
17:24 tiredchiku[d]: however, hObjectNew is also 0
17:26 notthatclippy[d]: It could be that it failed before that point, or that the debugger is lying to you. Can you try something like
17:26 notthatclippy[d]: ```diff
17:26 notthatclippy[d]: NvU32 nvRmApiAlloc(NvRmApi *api, NvU32 hParent, NvU32 *hObject, NvU32 hClass, void *pAllocParams)
17:26 notthatclippy[d]: {
17:26 notthatclippy[d]: NVOS21_PARAMETERS p = {
17:26 notthatclippy[d]: .hRoot = api->hClient,
17:26 notthatclippy[d]: .hObjectParent = hParent,
17:26 notthatclippy[d]: .hObjectNew = *hObject,
17:26 notthatclippy[d]: .hClass = hClass,
17:26 notthatclippy[d]: + .status = 0x12345678,
17:26 notthatclippy[d]: .pAllocParms = pAllocParams
17:26 notthatclippy[d]: };
17:26 notthatclippy[d]: int ret = nvRmIoctl(api->fd, NV_ESC_RM_ALLOC, &p, sizeof(p));
17:26 notthatclippy[d]: if (ret < 0) {
17:26 notthatclippy[d]: + printf("xxx status=0x%08x, ret=%d\n", p.status, ret);
17:26 notthatclippy[d]: return NV_ERR_GENERIC;
17:26 notthatclippy[d]: }
17:26 notthatclippy[d]: *hObject = p.hObjectNew;
17:26 notthatclippy[d]: return p.status;
17:26 notthatclippy[d]: }
17:26 notthatclippy[d]: ```?
17:26 notthatclippy[d]: (thanks discord for that lovely whitespace formatting)
17:26 tiredchiku[d]: will try :salute:
17:27 tiredchiku[d]: will also go through the debugger again
17:27 tiredchiku[d]: before trying that
17:27 notthatclippy[d]: Honestly, use bpftrace for all your tracing and debugging needs
17:28 tiredchiku[d]: hmm
17:36 tiredchiku[d]: but yeah, I've got to a point where things.. hang
17:36 tiredchiku[d]: vulkaninfo too
17:36 tiredchiku[d]: even this thing https://github.com/necrashter/minimal-vulkan-compute-shader
17:37 tiredchiku[d]: trying to figure out where and why it hangs now
17:37 notthatclippy[d]: tiredchiku[d]: FYI: ```
17:37 notthatclippy[d]: struct NVOS21_PARAMETERS {
17:37 notthatclippy[d]: uint32_t hRoot;
17:37 notthatclippy[d]: uint32_t hObjectParent;
17:37 notthatclippy[d]: uint32_t hObjectNew;
17:37 notthatclippy[d]: uint32_t hClass;
17:37 notthatclippy[d]: uint64_t pAllocParms;
17:37 notthatclippy[d]: uint32_t paramsSize;
17:37 notthatclippy[d]: uint32_t status;
17:37 notthatclippy[d]: };
17:37 notthatclippy[d]: kprobe:nvidia_unlocked_ioctl {
17:37 notthatclippy[d]: if ((arg1 & 0xff) == 0x2B) {
17:37 notthatclippy[d]: @ptr[tid] = arg2;
17:37 notthatclippy[d]: }
17:37 notthatclippy[d]: }
17:37 notthatclippy[d]: kretprobe:nvidia_unlocked_ioctl / @ptr[tid] / {
17:37 notthatclippy[d]: $p = uptr((struct NVOS21_PARAMETERS*)@ptr[tid]);
17:37 notthatclippy[d]: if ($p->hClass == 0xDA) {
17:37 notthatclippy[d]: printf("Alloc(NV_SEMAPHORE_SURFACE) - status=0x%x ret=%d\n", $p->status, retval);
17:37 notthatclippy[d]: }
17:37 notthatclippy[d]: delete(@ptr[tid]);
17:37 notthatclippy[d]: }
17:37 tiredchiku[d]: yeah, I'll have to learn how to use bpftrace πŸ˜…
17:38 notthatclippy[d]: Start here: <https://github.com/bpftrace/bpftrace/blob/master/docs/tutorial_one_liners.md>
17:39 tiredchiku[d]: :salute:
17:39 tiredchiku[d]: will do tomorrow, it's 2309 here
17:46 tiredchiku[d]: hmm
17:47 tiredchiku[d]: it looks like the semaphore isn't updating
17:49 avhe[d]: notthatclippy[d]: one unrelated question if you don't mind: does unmapping host->device memory (NV_ESC_RM_UNMAP_MEMORY) trigger an L2 flush?
17:50 avhe[d]: on the tegra kernel is does, and i rely on this behavior (<https://github.com/alliedvision/linux_nvidia_jetson/blob/l4t-32.7.2/kernel/nvgpu/drivers/gpu/nvgpu/common/mm/gmmu.c#L787-L796>)
17:51 avhe[d]: checking the tegra vgpu stuff (which i think basically means gsp?), it seems to imply this is done firmware-side <https://github.com/alliedvision/linux_nvidia_jetson/blob/l4t-32.7.2/kernel/nvgpu/drivers/gpu/nvgpu/vgpu/mm_vgpu.c#L114>
17:55 notthatclippy[d]: ```c
17:55 notthatclippy[d]: // TODO: investigate whether the tegra wbinvd flush is really necessary, seems only useful for SYSMEM_COH
17:55 notthatclippy[d]: memdescFlushCpuCaches(pGpu, pMemDesc);
17:55 notthatclippy[d]: ``` <https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/nvidia/src/kernel/gpu/bus/arch/maxwell/kern_bus_gm107.c#L2988-L2989>
17:55 notthatclippy[d]: 🀷
17:57 notthatclippy[d]: But yes, I _think_ everything gets flushed on unmap. Not certain, it's been a long time since I looked at this.
17:57 avhe[d]: ah, by L2 here i meant the GPU one
17:58 marysaka[d]: tiredchiku[d]: bpftrace is quite a lifesaver sometime a bit to get used ot it but quite useful πŸ˜„
17:58 notthatclippy[d]: I think gsp will handle that when it gets the RPC. I'll check tomorrow. Ping me if I don't respond
17:58 avhe[d]: thanks
17:59 tiredchiku[d]: yeah, it's not exiting this for loop
17:59 tiredchiku[d]: https://github.com/X547/mesa/blob/mesa-nvk/src/nouveau/vulkan/nvkmd/nvrm/nvkmd_nvrm_ctx.c#L354-L372
18:00 tiredchiku[d]: semaphore isn't updating
18:11 tiredchiku[d]: bah
18:12 tiredchiku[d]: I was sending the semaphore stuff to the wrong FD
18:12 tiredchiku[d]: fixed it, now I get status 59 on alloc for NV_SEMAPHORE_SURFACE
18:12 tiredchiku[d]: `NV_ERR_OPERATING_SYSTEM` - Generic operating system error
18:12 tiredchiku[d]: amazing
18:22 tiredchiku[d]: ```DEBUG: BEFORE IOCTL NV_ESC_RM_ALLOC(NVOS21_PARAMETERS { hRoot: 3251635282, hObjectParent: 3404726273, hObjectNew: 0, hClass: 218, pAllocParms: 0x7ffcb4a225e0, paramsSize: 0, status: 0 } (unknown class_id: see cl00da.h))
18:22 tiredchiku[d]: DEBUG: AFTER IOCTL NV_ESC_RM_ALLOC(NVOS21_PARAMETERS { hRoot: 3251635282, hObjectParent: 3404726273, hObjectNew: 3404726297, hClass: 218, pAllocParms: 0x7ffcb4a225e0, paramsSize: 0, status: 59 } (unknown class_id: see cl00da.h))```
18:22 tiredchiku[d]: :tired:
18:27 avhe[d]: 59 is NV_ERR_INVALID_PARAMETER
18:27 tiredchiku[d]: wait
18:28 avhe[d]: yeah hex vs dec
18:28 tiredchiku[d]: envyhooks outputs status in decimal?
18:29 tiredchiku[d]: :o
18:29 avhe[d]: check semsurfConstruct_IMPL, it has several conditions on which it returns that code
18:30 tiredchiku[d]: :neko_salute:
18:31 mhenning[d]: tiredchiku[d]: I think it's all decimal unless it starts with 0x
18:31 tiredchiku[d]: right, that makes sense
18:33 tiredchiku[d]: anyway
18:33 tiredchiku[d]: this is for tomorrow
18:33 tiredchiku[d]: thanks for the lead avhe[d] :saigeheart:
19:11 asdqueerfromeu[d]: Apparently open-gpu-doc misses quite a few class headers that the OGK source code has πŸ€”
19:18 airlied[d]: gfxstrand[d]: did you see my earlier regalloc paste before all the ogkm? It seems when we do some parcopy we pick the same 4 regs 0..3, I assume there is some method ra's use to avoid that
19:19 airlied[d]: Because using the same 4 regs causes stalls
19:23 airlied[d]: I was going to just throw in an circular offset that moves each time and see if it helps
19:24 mhenning[d]: Yeah, there are a variety of heuristics we can use to help that. Round-robin register selection is one of them
19:26 mhenning[d]: The par copy is needed in the general case, but we can be a little naive in terms of how we select registers right now, which can create more scheduling issues and par copies than strictly necessary
19:34 gfxstrand[d]: airlied[d]: What does it look like before RA?
19:38 airlied[d]: Will paste it when I have it, but from memory it's sane but with ever increasing register numbers in the 200s
19:46 gfxstrand[d]: What baffles me is why it's inserting parallel copies in the first place.
19:46 gfxstrand[d]: I see nothing that would cause 0..4 to be a nice range that it definitely wants to use
19:47 mhenning[d]: Vector regs, if the components aren't allocated right
19:48 mhenning[d]: if 0..4 is always free, I think we'll always grab the lowest available regs right now
19:48 gfxstrand[d]: Yeah but it's the destination and there are clearly available regs because that's what it immediately copies to
19:48 gfxstrand[d]: So I guess our search for a free vec4 is failing?
19:50 mhenning[d]: The dest regs are aligned wrong. r14 isn't aligned by 4
19:51 gfxstrand[d]: Ugh... Yeah, okay, things are more scattered than I thought when I first looked at it.
19:51 gfxstrand[d]: Looks like things are pretty fragmented
19:53 airlied[d]: https://paste.centos.org/view/raw/d37aecd3 is probably it before that
19:53 airlied[d]: yes it's definitely fragmented, just wondering if anyone has every done a buddy allocator for regalloc πŸ˜›
19:55 mhenning[d]: I think the real question is what happens after it that foils the vector heuristics
19:58 mhenning[d]: airlied[d]: Section 5.3 of colombet's thesis talks about some heuristics like that https://theses.hal.science/tel-00764405v1/file/thesis.pdf
19:58 mhenning[d]: Right now, we have only the "Aggressive pre-coalescing" heuristic from there implemented
19:59 mhenning[d]: I'm not sure he treats vector registers directly there though
20:16 airlied[d]: okay I hacked in a quick move the range up everytime, seems to avoid most of it
20:16 airlied[d]: https://paste.centos.org/view/raw/5ac22ba4 with a lot of hackery the shader is approaching a happy place, though still a lot of movs'
20:26 airlied[d]: though I went back to 18TFlops so need to work out why
20:31 airlied[d]: I thought it was the register stall, but doesn't seem to b
20:48 sney: hi, I'm attempting to test nvk on a Turing (GTX 1660 Super) device, on debian trixie with mesa 25.0.1. for some reason the nouveau module is refusing to load the gsp firmware, even though it's present on disk: https://paste.debian.net/1365023/
20:49 sney: I hoped adding the "nouveau.debug=DEVINIT=debug" might shed some light but it didn't as you can see. any hints to figure out what's wrong here?
20:49 Sid127: sney: what's your kernel version?
20:49 sney: 6.12.19
20:50 Sid127: oh, says that in the log, my bad
20:50 sney: :)
20:52 Sid127: can you quickly check zcat /proc/config.gz | grep -iE nouveau_gsp_default ?
20:53 sney: # CONFIG_DRM_NOUVEAU_GSP_DEFAULT is not set
20:53 sney: hmm. guess I'm building a kernel?
20:53 Sid127: righto, add nouveau.config=NvGspRm=1 to your kernel options and it _should_ work
20:54 sney: I did add that one via /etc/modprobe.d but maybe it didn't take
20:56 Sid127: would depend on how and when the kernel loads the nouveau module, and whether or not that file is included in the initramfs
20:56 sney: indeed. I'll try it in grub and see.
20:59 sney: alas, https://paste.debian.net/1365027/
21:01 Sid127: could you also check if it exists in /lib/firmware/nvidia/tu116/gsp/gsp-535.113.01.bin
21:02 Sid127: errno -2 says ENOENT, meaning wherever the kernel's looking for firmware doesn't have it
21:02 sney: it does, that's what the dpkg -S coutput was in my first paste. dpkg will only show that for packages that are installed
21:02 sney: although, it's a symlink: lrwxrwxrwx 1 root root 34 Dec 19 16:13 /lib/firmware/nvidia/tu116/gsp/gsp-535.113.01.bin -> ../../tu102/gsp/gsp-535.113.01.bin
21:03 Sid127: yeah, was just making sure, since the dpkg -S output mentioned /usr/lib
21:04 sney: usrmerge means slightly misleading output from dpkg sometimes, I barely notice it anymore
21:05 Sid127: still doesn't hurt to make sure :P
21:06 Sid127: hm
21:06 Sid127: personally I'm at a loss
21:07 sney: there was a sid bug a few months ago that was creating huge initrds because these redundant firmware blobs weren't shipping as symlinks, so now I'm wondering if that wasn't a bug and so I'm going to try it with the actual binary in that location
21:07 Sid127: that shouldn't be an issue
21:07 sney: I agree but it doesn't hurt to make sure :D
21:08 Sid127: :D
21:10 Sid127: on my system running arch it's symlinks and .zst compressed, yet it doesn't fail to load
21:10 sney: nope. oh well
21:10 sney: what kernel are you on?
21:10 Sid127: might be a deb specific issue?
21:10 Sid127: 6.13.7
21:12 Sid127: though I've been using it since gsp support was a downstream patch for kernel 6.6, so..
21:13 sney: yeah. the debian kernel is *very* stock, so if it's supported in mainline it'll be supported in debian (as long as any relevant configs are enabled)
21:14 Sid127: it's been stick since 6.7
21:14 Sid127: stock, even
21:16 Sid127: the only gsp related config is to have it on by default for devices that support it
21:17 snowycoder[d]: I messed up something in `ALd`/`ASt` encoding and I halted the GPU, the only solution was a system rebootπŸ˜‚
21:18 sney: ooooh nouveau is in the initrd but the gsp firmware isn't. that's how it isn't finding it
21:20 Sid127: heh
22:53 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1353864253454417950/rn_image_picker_lib_temp_a777ed43-c19b-49ad-a0b8-f7929876f0b0.jpg?ex=67e333d7&is=67e1e257&hm=c82395da7634aa8b8501bf2eef825d777f0eda76cecac98a3ac68ff23507ff40&
22:53 gfxstrand[d]: MOAR Kepler!
23:28 marysaka[d]: Is that a K2000 :aki_thonk:
23:28 mohamexiety[d]: https://tenor.com/view/gotta-catch-em-all-gif-5662253
23:29 mohamexiety[d]: faith with all the variants ^ :KEKW:
23:34 marysaka[d]: ngl those Quadros are so small and cute
23:47 redsheep[d]: Never thought I'd hear cute and quadro in the same sentence
23:47 mohamexiety[d]: nah plenty of cute quadros
23:48 marysaka[d]: redsheep[d]: cursed and cute are the same words in my brain don't mind me that happen a lot and will continue
23:49 mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1353878399457820802/image.png?ex=67e34104&is=67e1ef84&hm=25e619d333e0b0b43b497275d92179b841995e591cf88fc4ff533a66c6bb32b0&
23:49 mohamexiety[d]: even the modern ones have some cute ones. you ever wanted a.. 75W 20GB RTX 4070?
23:51 redsheep[d]: Tbh I've never really felt the appeal towards small hardware, I'm not out of space
23:51 redsheep[d]: I'd rather have a big cooler and extract all the perf and keep things quiet
23:56 mohamexiety[d]: kinda same but:
23:56 mohamexiety[d]: - lack of physical space
23:56 mohamexiety[d]: - for GPUs, gigantic HW requiring you to swap consistently when having to work on different things