00:19airlied[d]: Yes you have to add modeset for now, trying to get NVIDIA to make it default
00:20airlied[d]: mohamexiety[d]: Nope, but it sounds like you should nail down what bos are doing what paths and where they all come from
00:21mohamexiety[d]: redsheep[d]: Nah not dsc. It got worse because that res/refresh rate combo requires fusing two display heads together
00:21mohamexiety[d]: On Ada and older
00:22mohamexiety[d]: Blackwell is normal in that regard for example. Though NV display handling in general leaves a lot to be desired on windows
00:23mohamexiety[d]: airlied[d]: Is there a good starting origin point to track all this down? Is it just nouveau_bo_init?
00:37airlied[d]: nouveau_bo_init is the root of all
03:05mohamexiety[d]: Understood, thanks
03:05mohamexiety[d]: Printk spam it is :LeoKek:
03:08x512[m]: This bug affects NVK on Haiku :( https://github.com/rust-lang/rust/issues/91979
03:19orowith2os[d]: x512: I'm not sure if it'll get any more traction going, but do post a comment under the issue.
03:19orowith2os[d]: Just so it's logged.
03:20orowith2os[d]: How you get there and all
03:20gfxstrand[d]: airlied[d]: We just need to get them to make NVK the default. 😝
03:23gfxstrand[d]: x512[m]: What TLS variables implement Drop?
03:24gfxstrand[d]: I guess the debug flags might count but I thought that was just a bunch of bools or something.
03:24x512[m]: Currently not at home PC, so can't check. Some cache if I remember correctly.
08:38airlied[d]: gfxstrand[d]: 34176 looks like it might be useful for our addr calcs as well
08:58asdqueerfromeu[d]: gfxstrand[d]: Ironically this is now the default option on Arch 🐸
09:04airlied[d]: if you install from packaging on most distros it's default, if you use nvidia run files it isn't
13:05asdqueerfromeu[d]: I seem to be getting this with Sid's OGK branch: `[175257.865816] NVRM: nvAssertOkFailedNoLog: Assertion failed: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057) returned from memGetByHandle(pRsClient, pShared->hMaxSubmittedMem, &pShared->pMaxSubmittedMem) @ sem_surf.c:874`
13:09esdrastarsis[d]: asdqueerfromeu[d]: On Turing?
13:10asdqueerfromeu[d]: esdrastarsis[d]: Yes (I expected vkcube to segfault but I got a failed vkGetDeviceQueue() assertion instead)
13:14esdrastarsis[d]: asdqueerfromeu[d]: I have the same issue, Sid added ampere specific code
13:23tiredchiku[d]: yeah, I have it too, need to fix
13:24tiredchiku[d]: asdqueerfromeu[d]: did you replace the classes in nvkmd_nvrm_pdev?
13:24esdrastarsis[d]: tiredchiku[d]: So, only vulkaninfo works atm?
13:24tiredchiku[d]: and in nvkmd_nvrm_ctx
13:25tiredchiku[d]: esdrastarsis[d]: yes
13:25tiredchiku[d]: I said that yesterday toi
13:25tiredchiku[d]: I only really pushed it to gitlab so I could show Milos/Arthur stuff and ask questions
13:32x512[m]: Where is "Sid's OGK branch"?
13:34esdrastarsis[d]: x512[m]: https://gitlab.freedesktop.org/Sid127/mesa/-/tree/nvk-openrm
13:35x512[m]: Looks like based on my code.
13:36tiredchiku[d]: it is, yes
13:36tiredchiku[d]: I went through 3 iterations
13:36tiredchiku[d]: first one in nouveau/winsys, that I thrashed
13:36tiredchiku[d]: second one in nvkmd, which I also thrashed because I hadn't made much progress by then and your code was up
13:37x512[m]: > `[NV_ERR_OBJECT_NOT_FOUND] (0x00000057) returned from memGetByHandle(pRsClient, pShared->hMaxSubmittedMem, &pShared->pMaxSubmittedMem) @ sem_surf.c:874``
13:37tiredchiku[d]: I'm just a student doing this in my free time, progress is slow 😅
13:37x512[m]: Is it on Turing?
13:37x512[m]: Semaphore surface have a bit different caps on Ampere etc..
13:40tiredchiku[d]: wai does Turing absolutely need max submitted value tracking?
13:41tiredchiku[d]: because when I set hMaxSubmittedMem to hMemoryPhys in NV_SEMAPHORE_SURFACE_ALLOC_PARAMETERS, it refused to create the semaphore surface
13:41tiredchiku[d]: I forget what the status code was
13:46x512[m]: The same memory handle for hSemaphoreMem and hMaxSubmittedMem worked for me.
13:46x512[m]: Nvidia 570.86.16.
13:47tiredchiku[d]: weird
13:47tiredchiku[d]: doing that doesn't work for me on ampere, 570.133.07
13:47tiredchiku[d]: maybe it's an ampere/turing diff, idk
13:48x512[m]: Yes, Ampere is different. Ampere supports 64 bit semaphore values in semsurf, but Turing doesn't.
13:48tiredchiku[d]: :BlobhajShock:
13:48tiredchiku[d]: got you
13:48x512[m]: hMaxSubmittedMem is not needed for Ampere+.
13:48x512[m]: And getting semaphore value become easier.
13:49tiredchiku[d]: I see
13:52tiredchiku[d]: dodo how'd you get that log again
13:52asdqueerfromeu[d]: tiredchiku[d]: Obviously yes (because Ampere classes wouldn't make sense)
13:55asdqueerfromeu[d]: Now I managed to get something impressive (or awful): https://pastebin.com/1fJvuuTW
13:58x512[m]: > 3D-Z KIND Violation. Coordinates: (0x0, 0x0)
13:58x512[m]: Wrong pte_kind is set.
13:59x512[m]: See https://github.com/X547/mesa/blob/4a52b0e8981b2c1fddd2a98a9d0fa9d4ed21158e/src/nouveau/vulkan/nvkmd/nvrm/nvkmd_nvrm_va.c#L61
13:59x512[m]: And this flag: https://github.com/X547/mesa/blob/4a52b0e8981b2c1fddd2a98a9d0fa9d4ed21158e/src/nouveau/vulkan/nvkmd/nvrm/nvkmd_nvrm_va.c#L118
14:00tiredchiku[d]: :salute:
14:03gfxstrand[d]: airlied[d]: Yeah, maybe. I think we can also do a lot with the `no_signed_overflow` bit. I need to dig out from under the current pile and maybe I'll be able to look in a few weeks.
14:11tiredchiku[d]: hm
14:11tiredchiku[d]: the compiler is choosing NV_MMU_PTE_KIND_Z16
14:14asdqueerfromeu[d]: x512[m]: Both of those code parts are in Sid's branch
14:16tiredchiku[d]: beh, dinner hours
14:16tiredchiku[d]: will look at this after
14:25tiredchiku[d]: oh I just have to set the right attrs
14:25tiredchiku[d]: ez
14:27tiredchiku[d]: but first, dinner
14:40esdrastarsis[d]: asdqueerfromeu[d]: Finally, xids errors with useful info :happy_gears:
15:45tiredchiku[d]: I have gotten to the most annoying point in debugging
15:46tiredchiku[d]: where my code takes down the entire GPU
15:46tiredchiku[d]: and requires a hard reset of the system
15:46tiredchiku[d]: not even a reboot via ssh helps (probably because of systemd stop jobs)
15:49asdqueerfromeu[d]: tiredchiku[d]: That's what basically happened to me above
15:51tiredchiku[d]: mhm
15:53tiredchiku[d]: x512[m]: I wanna know if _I'm_ running into this exact thing too
15:53tiredchiku[d]: tiredchiku[d]: I had another piece of code giving me the same status code that I just fixed, got that mixed up
15:58tiredchiku[d]: looks like I have a bunch of nvRmApiFree failing
16:03tiredchiku[d]: huh?
16:04tiredchiku[d]: ioctl() returns -1, but envyhooks shows status: 0 for nvRmApiFree
16:26tiredchiku[d]: I'm so confused
16:30tiredchiku[d]: perfectly fine
16:30tiredchiku[d]: ```DEBUG: BEFORE IOCTL NV_ESC_RM_MAP_MEMORY(nv_ioctl_nvos33_parameters_with_fd { params: NVOS33_PARAMETERS { hClient: 3251634377, hDevice: 3404726273, hMemory: 3404726330, offset: 0, length: 16384, pLinearAddress: 0x0, status: 0, flags: 0 }, fd: 13 })
16:30tiredchiku[d]: DEBUG: AFTER IOCTL NV_ESC_RM_MAP_MEMORY(nv_ioctl_nvos33_parameters_with_fd { params: NVOS33_PARAMETERS { hClient: 3251634377, hDevice: 3404726273, hMemory: 3404726330, offset: 0, length: 16384, pLinearAddress: 0x5f2f06000, status: 0, flags: 50364416 }, fd: 13 })```
16:31tiredchiku[d]: broken???
16:31tiredchiku[d]: DEBUG: BEFORE IOCTL NV_ESC_RM_UNMAP_MEMORY(NVOS34_PARAMETERS { hClient: 3251634377, hDevice: 3404726273, hMemory: 3404726330, pLinearAddress: 0xffffffffffffffff, status: 0, flags: 0 })
16:31tiredchiku[d]: DEBUG: AFTER IOCTL NV_ESC_RM_UNMAP_MEMORY(NVOS34_PARAMETERS { hClient: 3251634377, hDevice: 3404726273, hMemory: 3404726330, pLinearAddress: 0xffffffffffffffff, status: 87, flags: 0 })```
16:32tiredchiku[d]: I wonder if I shouldn't be setting stubLinearAddress to -1
16:32tiredchiku[d]: static void
16:32tiredchiku[d]: nvkmd_nvrm_mem_unmap(struct nvkmd_mem *_mem,
16:32tiredchiku[d]: enum nvkmd_mem_map_flags flags,
16:32tiredchiku[d]: void *map)
16:32tiredchiku[d]: {
16:32tiredchiku[d]: struct nvkmd_nvrm_mem *mem = nvkmd_nvrm_mem(_mem);
16:32tiredchiku[d]: struct nvkmd_nvrm_dev *dev = nvkmd_nvrm_dev(_mem->dev);
16:32tiredchiku[d]: struct NvRmApi rm;
16:32tiredchiku[d]: if (mem->isSystemMem) {
16:32tiredchiku[d]: nvkmd_nvrm_dev_api_ctl(dev, &rm);
16:32tiredchiku[d]: } else {
16:32tiredchiku[d]: nvkmd_nvrm_dev_api_dev(dev, &rm);
16:32tiredchiku[d]: }
16:32tiredchiku[d]: NvRmApiMapping mapping;
16:32tiredchiku[d]: mapping.stubLinearAddress = (void*)(uintptr_t)(-1);
16:32tiredchiku[d]: mapping.address = map;
16:32tiredchiku[d]: mapping.size = mem->base.size_B;
16:32tiredchiku[d]: nvRmApiUnmapMemory(&rm, dev->hSubdevice, mem->hMemoryPhys, 0, &mapping, dev->ctlFd);
16:33tiredchiku[d]: }```
16:39avhe[d]: why are you setting pLinearAddress to -1?
16:39avhe[d]: also when ioctl returns -1 the error is in errno
16:39tiredchiku[d]: yeah, am aware 😅
16:39tiredchiku[d]: you taught me that a couple days ago :D
16:40tiredchiku[d]: avhe[d]: this is straight code from the haiku tree
16:40tiredchiku[d]: color me surprised if it doesn't blow up there
16:40tiredchiku[d]: gotta set stubLinearAddress to mem->base->map and see what it does
16:46tiredchiku[d]: still doesn't like me
16:56avhe[d]: if i'm reading this right you'll be hitting /dev/nvidiaN with the ioctl in the case of device mem
16:56avhe[d]: but i'm pretty sure it must always be /dev/nvidiactl
16:57avhe[d]: yeah is NV_ESC_RM_UNMAP_MEMORY is NV_CTL_DEVICE_ONLY
17:05tiredchiku[d]: no, it just gets hClient from /dev/nvidiaX instead of the ctl device
17:06tiredchiku[d]: that's why I pass ctlFd to it separately at the end
17:08tiredchiku[d]: I figured I'd make it match the map function in appearance, because the map function needs info on both fds
17:08avhe[d]: ah, i was looking at your code on gitlab which doesn't do that <https://gitlab.freedesktop.org/Sid127/mesa/-/blob/37f2d51d0b119f92d424afd42bddb11b6825666b/src/nouveau/vulkan/nvkmd/nvrm/nvRmApi.c#L173>
17:08tiredchiku[d]: the ctl fd to send info to, and the device/ctl fd depending on where the memory in question is located to pass with nv_ioctl_nvos33_parameters_with_fd
17:08tiredchiku[d]: avhe[d]: ah, yeah, haven't pushed anything I did since yesterday 😅
17:09avhe[d]: tiredchiku[d]: so you're allocating two root objects?
17:09tiredchiku[d]: just one
17:09tiredchiku[d]: I do know it's a bit redundant atm 😅
17:09tiredchiku[d]: I still don't know why I get
17:09avhe[d]: right, tbh i don't even know if that does anything
17:09tiredchiku[d]: 0x57 out of all things
17:10tiredchiku[d]: NV_ERR_OBJECT_NOT_FOUND
17:10tiredchiku[d]: :wat:
17:10avhe[d]: check that you're aren't freeing handles before unmapping i guess
17:11tiredchiku[d]: I'm not
17:11tiredchiku[d]: that's for sure
17:11tiredchiku[d]: handles only get freed on dev_destroy
17:11avhe[d]: otherwise i did compile a debug build of ogkm and traced errors a couple time when i was really stumped working on my own thing
17:11tiredchiku[d]: not on mem_unmap
17:12avhe[d]: not a very nice experience but it'll get you a definitive answer
17:12tiredchiku[d]: mm
17:12tiredchiku[d]: could do that
17:13tiredchiku[d]: though first I should go through openrm to see where the hell it's getting 0x87 from
17:13avhe[d]: also check dmesg logs at the highest verbosity, i imagine
17:14tiredchiku[d]: mhm
17:14tiredchiku[d]: highest is ':' right?
17:17avhe[d]: yeah
17:21tiredchiku[d]: `[ 251.028504] NVRM: rmapiUnmapFromCpuWithSecInfo: Nv04UnmapMemory: unmap failed; status: Requested object not found [NV_ERR_OBJECT_NOT_FOUND] (0x00000057)`
17:21tiredchiku[d]: ha!
17:24avhe[d]: that's doesn't say much unfortunately, if it logged the missing handle you'd know where to look
17:25tiredchiku[d]: yeah
17:27tiredchiku[d]: it can't be hClient or hDevice
17:27tiredchiku[d]: because those are used by other ioctls that complete successfully after the first map fail
17:28tiredchiku[d]: so that leaves us with hMemory or pLinearAddress
17:32tiredchiku[d]: :ohh:
17:32tiredchiku[d]: I'm not smart in the evening
17:33tiredchiku[d]: the definition for NVOS34_PARAMETERS has a comment..
17:33tiredchiku[d]: `NvP64 pLinearAddress NV_ALIGN_BYTES(8); // ptr to virtual address of mapped memory`
17:33tiredchiku[d]: so uh
17:33tiredchiku[d]: mem->base.va.address, maybe
17:34orowith2os[d]: Quick q: I've been seeing it around, but don't think I know what it is. What's ogkm?
17:34tiredchiku[d]: open-gpu-kernel-modules
17:34orowith2os[d]: Ah
17:34avhe[d]: it should be the same thing that you get from NV_ESC_RM_MAP_MEMORY
17:34tiredchiku[d]: yeah
17:34tiredchiku[d]: however
17:34orowith2os[d]: I've only seen it referred to as nvidia-open, so it's throwing me off
17:34orowith2os[d]: :blobcatnotlikethis:
17:34tiredchiku[d]: I don't think I'm storing the pLinearAddress anywhere
17:35tiredchiku[d]: and I'm not entirely sure how to retrieve it
17:35tiredchiku[d]: orowith2os[d]: that's the arch package name
17:35tiredchiku[d]: open-gpu-kernel-modules is the github repository name
17:37tiredchiku[d]: I could've sotred it in nvkmd_nvrm_mem, but
17:37tiredchiku[d]: that isn't passed around
17:37tiredchiku[d]: wait
17:46tiredchiku[d]: let's see if storing it in nvkmd_nvrm_mem works
17:46avhe[d]: actually your issue made me check my own code and i realized my device-unmapping was completely off
17:48tiredchiku[d]: 😅
17:48tiredchiku[d]: tiredchiku[d]: I have my doubts about this
17:49tiredchiku[d]: fuck me that was it
17:49tiredchiku[d]: :Sweat:
17:49tiredchiku[d]: why does this work :doomthink:
17:49tiredchiku[d]: I don't see nvkmd_nvrm_mem being passed around
17:50tiredchiku[d]: `struct nvkmd_mem *_mem` is what moves
17:50tiredchiku[d]: and then nvkmd_nvrm_mem is... cast from it?
17:50tiredchiku[d]: ` struct nvkmd_nvrm_mem *mem = nvkmd_nvrm_mem(_mem);`
17:52avhe[d]: nice
17:52avhe[d]: i thought NV_ERR_OBJECT_NOT_FOUND was only for handles, apparently not
17:52tiredchiku[d]: it doesn't make sense to me *how* it works, but
17:52tiredchiku[d]: I'm not gonna complain
17:58tiredchiku[d]: ammm
17:58tiredchiku[d]: (gdb) bt #0 0x00007942bd4b5582 in ?? () from /usr/lib/libc.so.6
17:58tiredchiku[d]: #1 0x00007942bd4a92ac in ?? () from /usr/lib/libc.so.6
17:58tiredchiku[d]: #2 0x00007942bd4f122e in clock_nanosleep () from /usr/lib/libc.so.6
17:58tiredchiku[d]: #3 0x00007942bd4ff117 in nanosleep () from /usr/lib/libc.so.6
17:58tiredchiku[d]: #4 0x00007942bbb3d42a in ?? () from /usr/lib/mangohud/libMangoHud.so
17:58tiredchiku[d]: #5 0x00007942bba41e79 in ?? () from /usr/lib/mangohud/libMangoHud.so
17:58tiredchiku[d]: #6 0x00005cb2712cecbe in VulkanExample::render() ()
17:58tiredchiku[d]: #7 0x00005cb2712da8fb in VulkanExampleBase::renderLoop() ()
17:58tiredchiku[d]: #8 0x00005cb2712c6e38 in main ()```
17:59tiredchiku[d]: that doesn't look good
18:00tiredchiku[d]: wait I should probably disable mangohud lol
18:01tiredchiku[d]: https://pastebin.com/JqY3T8j9
18:04tiredchiku[d]: that's with MESA_VK_WSI_DEBUG=sw
18:09tiredchiku[d]: wait I hope this test doesn't use present_wait
18:10tiredchiku[d]: wait nvm it shouldn't matter, since that's broken on the userspace proprietary driver
18:11tiredchiku[d]: bah the dmesg didnt save
18:12asdqueerfromeu[d]: tiredchiku[d]: Did you get the Xid 13 with that variable set?
18:12mhenning[d]: tiredchiku[d]: nvkmd_mem is a base class. nvkmd_nouveau_mem subclasses it, and I would guess nvkmd_nvrm_mem also subclasses it. The `struct nvkmd_nvrm_mem *mem = nvkmd_nvrm_mem(_mem);` is likely casting from the base to the subclass, which is reasonable if you're in one of the subclass functions
18:13tiredchiku[d]: asdqueerfromeu[d]: I fixed that already
18:13tiredchiku[d]: I think
18:13tiredchiku[d]: asdqueerfromeu[d]: this one, right?
18:14asdqueerfromeu[d]: tiredchiku[d]: Yes
18:14tiredchiku[d]: yeah, fixed it already
18:14tiredchiku[d]: mhenning[d]: I see, I guess that makes sense
18:14tiredchiku[d]: I'll continue this tomorrow, having to ssh and get logs over phone is unfun and I wanna wind down before bed 😅
18:14tiredchiku[d]: might clean up the commit history now though
18:15tiredchiku[d]: since vulkaninfo runs (without any segfaults)
18:42gfxstrand[d]: mhenning[d]: And if this were rust, `nvkmd_mem_ops` would be a trait. 😆
18:44gfxstrand[d]: One of these days I'm going to RIIR NVK
18:46tiredchiku[d]: :blobnotlike:
18:47tiredchiku[d]: I just started getting familiar with the C codebase!
18:49asdqueerfromeu[d]: gfxstrand[d]: What's next after that? The Vulkan runtime?
19:08gfxstrand[d]: <a:shrug_anim:1096500513106841673>
19:13gfxstrand[d]: tiredchiku[d]: It probably wouldn't end up massively different. Just not written in C
19:14tiredchiku[d]: faith keeping me learning
19:14gfxstrand[d]: Like, I've already structured it more or less the way I would build a Rust project.
19:14gfxstrand[d]: It'll be okay. Just have Faith. 😉
19:20airlied[d]: gfxstrand[d]: once you have 0 MRs to review :-p
19:22gfxstrand[d]: Hah!
19:52tiredchiku[d]: ok
19:52tiredchiku[d]: took me an hour
19:53tiredchiku[d]: but I have dismantled the mega "WIP" commit
19:55x512[m]: avhe[d]: > why are you setting pLinearAddress to -1?
19:55x512[m]: Because this logic is not used on UNIX systems. mmap is used instead.
19:56x512[m]: It seems mapping userland memory from kernel modules is not welcomed in Linux/UNIX. But it is regular way to do things on Windows and Haiku.
20:01tiredchiku[d]: bah, gitlab is being stinky
20:02tiredchiku[d]: not showing co-authorship properly
20:05tiredchiku[d]: okay, switched authorship around, marked myself co-author on those commits instead :)
20:06x512[m]: Note that open-gpu-kernel-modules Git repo or symlink to it is supposed to be put in nvrm directory of NVK to be buildable.
20:07tiredchiku[d]: I'm not importing the whole openrm tree as a submodule
20:09x512[m]: It may be hard to update Nvidia version if open-gpu-kernel-modules files will be copied and randomly spread around over NVK repo.
20:09gfxstrand[d]: That's what a python script is for
20:10gfxstrand[d]: Mesa doesn't use git submodules in general.
20:10gfxstrand[d]: We might be able to pull it in as a Meson wrap but IDK how well that'd work.
20:12tiredchiku[d]: x512[m]: they're not "randomly spread", they're all in a subfolder
20:12x512[m]: gfxstrand[d]: Any idea why commented out push buffer generation code do not work? https://github.com/X547/mesa/blob/4a52b0e8981b2c1fddd2a98a9d0fa9d4ed21158e/src/nouveau/vulkan/nvkmd/nvrm/nvkmd_nvrm_ctx.c#L323
20:12tiredchiku[d]: and I do intend to make a python script to grab the required headers later, once the required headers are a bit more final
20:13x512[m]: But code above works.
20:16airlied[d]: upstream will not be importing or submoduling ogkm 🙂
20:17airlied[d]: either something like gsp-parse to generate out what is needed or a simple script to pull just required headers, though dealing with the changing versions of ogkm might be fun, but I think most of the basic interfaces haven't changed too much
20:19x512[m]: > dealing with the changing versions of ogkm might be fun
20:20x512[m]: I plan to do kernel driver version check like original Nvidia userland drivers does.
20:20x512[m]: Some min-max Nvidia kernel driver version.
20:21x512[m]: User may want to build NVK driver setting specific Nvidia driver version and polling open-gpu-kernel-modules headers of that version.
20:22airlied[d]: I mean more fun at building time
20:23airlied[d]: once things diverge, you have to consider how to deal with it, manually ifdeffing it or building an abstraction layer that is dynamically generated
20:24gfxstrand[d]: x512[m]: What's not working about it? The sem write isn't showing up? It's erroring somewhere? No interrupt?
20:25x512[m]: #ifdef NVRM_VERSION == ... /* some code */
20:25x512[m]: gfxstrand[d]: Semaphore value is no updated and FIFO stuck.
20:27airlied[d]: okay so for headers where structs might change, how do you deal with it?
20:27airlied[d]: like you can go with hope they won't change as an answer up until it does 🙂
20:27x512[m]: Also generated pushbuf code cause compile error because of switch-case with constant with the same value. It is currently fixed by manually adjusting generated code.
20:30x512[m]: I originally planned that user provides open-gpu-kernel-modules directory and kernel module version. And all Nvidia header differences will be handled by version ifdef.
20:31x512[m]: Something similar how Nvifia driver handle different Linux kernel versions, but reverse.
20:33gfxstrand[d]: x512[m]: Not sure. It's possible it's not getting executed. Normally that command pretty reliably writes memory. Though it looks like NVK uses SEMAPHORE, not SEM.
20:35gfxstrand[d]: Looks like SEM was added on Volta.
20:35gfxstrand[d]: We're also not using WFI in NVK right now so IDK about that.
20:36x512[m]: Actually only SDK headers are needed from Nvidia kernel code: https://github.com/NVIDIA/open-gpu-kernel-modules/tree/main/src/common/sdk/nvidia/inc .
20:36x512[m]: It are much smaller and maybe can be copied completely.
20:37x512[m]: Or some used header list to copy.
20:37x512[m]: Also maybe nv-status.c to show errors.
20:40x512[m]: Copying headers will complicate setting Nvidia kernel version during build.
20:42airlied[d]: yeah maybe we don't really care about it and just say build it for a specific version
20:44x512[m]: What will be a policy of updating Nvidia version? Latest release?
20:48gfxstrand[d]: Yeah
20:48gfxstrand[d]: I don't see another reasonable policy
20:48gfxstrand[d]: And we hope they only do breaking header changes at major releases
20:51airlied[d]: at least for the sdk api I'd hope that is mostly true 🙂
20:51airlied[d]: but they don't guarantee anything currently
20:52x512[m]: It already was an incompatible change in sem-surf headers.
20:55x512[m]: > /usr/lib/mangohud/libMangoHud.so
20:55x512[m]: Looks suspicious.
20:55marysaka[d]: I would probably say that using LTS might be the best approach as it might be unlikely to break between minor versions?
20:57x512[m]: Using too old versions may be problematic because newer version can include important bug fixes and new features (G-Sync support etc.).
21:03airlied[d]: LTS doesn't usually support new hardware
21:34x512[m]: gfxstrand[d]: gpu_info.sm is 75 on Nouveau, but NVRM reports 7.3, is it fine?
21:35x512[m]: May it break if pass 73 to NVK?
21:37gfxstrand[d]: What GPU are you using that it's 73?
21:40x512[m]: It says "NVIDIA T400 4GB TU117".
21:40gfxstrand[d]: Fun.
21:40gfxstrand[d]: So there might actually be some bits of NAK that are slightly broken for 73 but we should fix them.
21:41gfxstrand[d]: There's a bunch of 75 in NAK but that should all probably be 73
21:41mohamexiety[d]: I wonder if the low end turing quadros are reported as 73 as some sort of private special case. since from what I heard they actually allow running DLSS and such on them despite them not having HW accel for AI stuff
21:41gfxstrand[d]: 70 is Volta and 72 is Xavier which is the Volta Tegra.
21:42gfxstrand[d]: mohamexiety[d]: Probably.
21:42gfxstrand[d]: On those parts, the tensor units exist but they're so slow as to be unusable.
21:42mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1354571253402435806/image.png?ex=67e5c649&is=67e474c9&hm=59ab2f98d1225e1ed780e82474c39bac64a15c76985e6a02d15908bb784b9b4c&
21:42mohamexiety[d]: because when you check the documentation there's actually no mention of 7.3. it's 7.0, 7.2, and 7.5
21:43gfxstrand[d]: x512[m]: I'm glad we're running this experiment. I didn't even know 7.3 exists. 😂
21:43mohamexiety[d]: yeah
21:44x512[m]: NVK is already running really good on Haiku. Maybe some games will even work.
21:45x512[m]: It seems many Vullan programs do not care about broken semaphores/fences.
21:46mohamexiety[d]: fences only hold you back after all :KEKW:
21:47x512[m]: Queue submission is currently blocking operation in my code.
21:53x512[m]: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/c5e439fea4fe81c78d52b95419c30cabe44e48fd/src/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h#L280
21:53gfxstrand[d]: I hate computers: https://github.com/mesonbuild/meson/issues/14416
21:53x512[m]: 7.3 mentioned.
21:54gfxstrand[d]: I think 7.3 should 90% work as a SM70. There's not much that's different between 7.0 and 7.5 that isn't backwards-compatible.
22:11airlied[d]: the one thing I think Linus always got right was that make install should just install stuff, if it has to build stuff it should error out
22:13x512[m]: Are mp_per_tpc and litter_num_sm_per_tpc the same thing?
22:13airlied[d]: maybe 7.3 is code for I have hmma but don't use it because it's bullshit
22:26mhenning[d]: Yeah, the CUDA docs also only mention 7.0 and 7.5. I've never heard of 7.3 before https://docs.nvidia.com/cuda/turing-compatibility-guide/index.html#building-turing-compatible-apps-using-cuda-10-0
22:27mhenning[d]: but yeah, the docs also state: "As a consequence, any binary that runs on Volta will be able to run on Turing (forward compatibility), but a Turing binary will not be able to run on Volta."
22:27mhenning[d]: so we have a formal binary compatibility guarantee and the 7.5 checks that are wrong should only be a perf hit
22:28mhenning[d]: (although for all I know it could make things faster considering that we sometimes do silly things with uniform instructions right now)
22:42gfxstrand[d]: x512[m]: I've pushed some SM 7.3 fixes into https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34201
22:42gfxstrand[d]: It'll get landed once CI is back.
22:47x512[m]: gfxstrand[d]: This need to be adjusted too: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/winsys/nouveau_device.c?ref_type=heads#L52
22:48gfxstrand[d]: Yeah, if we can figure out where the cutoff is.
22:49gfxstrand[d]: If we just call them 75 for now, it won't break anything
22:49gfxstrand[d]: Actually...
22:51gfxstrand[d]: Is there some easy way to get this info? I have the prop driver on my desktop now. I could throw some cards in it and figure it out quick enough, probably.
22:52x512[m]: NV0080_CTRL_GR_INFO_INDEX_SM_VERSION
22:53x512[m]: Can make some simple gather info program later.
22:57gfxstrand[d]: I found a python script
23:02gfxstrand[d]: TU106 -> SM 7.5
23:02gfxstrand[d]: TU116 -> SM 7.5
23:02gfxstrand[d]: Okay, IDK if I believe that
23:02gfxstrand[d]: This is coming from CUDA though, so maybe?
23:02gfxstrand[d]: Am I gonna have to buy yet another Turing card?
23:02gfxstrand[d]: 😩
23:03mohamexiety[d]: like I said it may be different for the quadros
23:03gfxstrand[d]: I don't have a 117
23:04mhenning[d]: We could read NV0080_CTRL_GR_INFO_INDEX_SM_VERSION directly on gsp and then expose it to userspace via getparam, if we wanted to
23:04mohamexiety[d]: the T400/T100 should be cheap at least :thonk:
23:04gfxstrand[d]: mhenning[d]: Yeah, that feels more reliable than `sm_for_chipset`
23:04mohamexiety[d]: alternatively GTX 1650 but you may end up getting unlucky and getting a 116 or 106 version
23:04gfxstrand[d]: I have a GTX 1660 which is TU116
23:04gfxstrand[d]: And a 1650 which is TU106
23:05mohamexiety[d]: mohamexiety[d]: yeah I think those are guaranteed to be 117
23:05esdrastarsis[d]: my gtx 1650 is 117, asdqueerfromeu[d] 's gpu is 117 too
23:06gfxstrand[d]: Yeah, I think most of the laptop 1650s are TU117
23:06gfxstrand[d]: The desktop 1650s are often TU106
23:06gfxstrand[d]: Because marketing names are awesome!
23:06mhenning[d]: gfxstrand[d]: yeah, although that does kind of just move the lookup table into drm_shim
23:06x512[m]: mohamexiety[d]: That is why I bought it. For experiments. I don't know are there any risk of damaging hardware with driver experiments.
23:07x512[m]: I will probably get some low end Ampere GPU soon.
23:07mohamexiety[d]: there shouldn't be a risk nowadays at least; I think we have got to a point where the HW is generally capable of protecting itself
23:07mohamexiety[d]: since all the spicy bits (fan control, power control, ...) are locked behind the FW
23:10x512[m]: gfxstrand[d]: Nouveau KMD reports dev_info.chipset = 0x167 for my GPU.
23:11x512[m]: This info is copied from Nouveau KMD: https://github.com/X547/mesa/blob/c27e88c5ce81827949e7b2e3f9ccb651c89a0fa8/src/nouveau/vulkan/nvkmd/nvrm/nvkmd_nvrm_pdev.c#L34
23:11esdrastarsis[d]: gfxstrand[d]: huh, my desktop 1650 is TU117, but it's a low profile card
23:12mohamexiety[d]: they originally were all 117. it's just that over time when they start accumulating a large amount of defective big dies they start using them in the lower part of the product stack
23:12gfxstrand[d]: Okay, I've got a T400 on the way.
23:12gfxstrand[d]: I really should quit stress-buying GPUs. 😂
23:12gfxstrand[d]: There are just too many Turing variants that may or may not matter.
23:13mohamexiety[d]: yeah 😦
23:13x512[m]: Rich person.
23:14airlied[d]: I wonder does GSP actually answer that 0080 call
23:14airlied[d]: tracing through ogkm is a nightmare
23:16airlied[d]: looks like it comes from the pGrInfo->infoList somewhere
23:34mohamexiety[d]: airlied[d]: what sort of info do I want though? I am not really sure what's the relevant bits in nouveau_bo_alloc for example which seems to be the meat of nouveau_bo_new/nouveau_bo_init
23:35mohamexiety[d]: there doesnt seem to be anything that could help me trace things to where they come from/what they are used for
23:39esdrastarsis[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1354600608094617673/Screenshot_2025-03-26_20-37-26.png?ex=67e5e1a0&is=67e49020&hm=7dbbb2c9e3e74f69e0d120d3e5a58550fcd6fadfd01ed2294543103acd9a5141&
23:39esdrastarsis[d]: finally
23:41airlied[d]: mohamexiety[d]: maybe at least dump the client creating them and then what size they are and what page size is picked
23:41mohamexiety[d]: airlied[d]: where's the client stored? 😮
23:41gfxstrand[d]: esdrastarsis[d]: Which branch is that?
23:42airlied[d]: cli
23:42airlied[d]: or also internal
23:42esdrastarsis[d]: gfxstrand[d]: nvk-openrm from Sid
23:42mohamexiety[d]: ..oh so that's what it stands for 🐸
23:42mohamexiety[d]: thanks!
23:43esdrastarsis[d]: just needs to switch ampere specific code to turing
23:43mohamexiety[d]: btw another question, sorry. there's this comment:
23:43mohamexiety[d]: /* Because we cannot currently allow VMM maps to fail
23:43mohamexiety[d]: * during buffer migration, we need to determine page
23:43mohamexiety[d]: * size for the buffer up-front, and pre-allocate its
23:43mohamexiety[d]: * page tables.
23:43mohamexiety[d]: *
23:43mohamexiety[d]: * Skip page sizes that can't support needed domains.
23:43mohamexiety[d]: */
23:43mohamexiety[d]: > pre-allocate its page tables
23:43mohamexiety[d]: where/how exactly does the page table get allocated? I am confused
23:44mohamexiety[d]: discoooord. welp sorry about that indentation
23:47airlied[d]: that is the GL path, I think it happens once ttm does it's stuff and we hit noveau_vma_map
23:48mohamexiety[d]: hm ok, I see
23:49mohamexiety[d]: what should we dump in cli? `cli->name?`?
23:51airlied[d]: whatever will help you workout what is doing GL allocs and what is doing VK allocs
23:51airlied[d]: name is probably good enough
23:52mohamexiety[d]: yeah I am just not sure where that info is stored. name looked like a good start but wasnt clear
23:57mohamexiety[d]: airlied[d]: also, the comment is duplicated for the vm path too. is that a mistake? if not, I guess it still gets allocated in vma_map?
23:58airlied[d]: yeah it's probably just cut-n-paste