IRC Logs of #nouveau on irc.freenode.net for 2025-03-23

05:40 gfxstrand[d]: Okay, that should make Rust not murder your RAM quite so bad:
05:40 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34136
08:22 tiredchiku[d]: hm
08:22 tiredchiku[d]: maybe I should move the nvrm headers into their own subfolder
08:22 tiredchiku[d]: instead of littering nouveau/headers
09:27 tiredchiku[d]: god damn it
09:27 tiredchiku[d]: so many headers to pull in
09:34 tiredchiku[d]: gah
09:35 tiredchiku[d]: at this point it might be easier to include everything
09:55 avhe[d]: you can include g_allclasses.h while defining SDK_ALL_CLASSES_INCLUDE_FULL_HEADER which will pull pretty much everything
09:56 avhe[d]: except the ctrl stuff but you shouldn't need that much
10:14 tiredchiku[d]: the issue isn't the classes though :tired:
10:26 avhe[d]: what else is then?
10:27 avhe[d]: there's like 6 headers you should need aside from classes defs
10:38 tiredchiku[d]: :doomthink:
10:51 tiredchiku[d]: well
10:51 tiredchiku[d]: the code itself only requires 5 headers (apart from the classes)
10:52 tiredchiku[d]: but those headers in turn require other headers
10:52 tiredchiku[d]: and I've not grabbed the whole openrm tree as a submodule like you've done in your project 😅
11:02 tiredchiku[d]: ookay, getting closer
11:05 avhe[d]: tiredchiku[d]: ah yeah
11:06 avhe[d]: tbh i think it makes sense to if you're writing driver code for ogkm, or at least have a script which pulls every header from relevant include dirs
11:06 tiredchiku[d]: maybe later 😅
11:06 tiredchiku[d]: right now I'm quite excited
11:06 tiredchiku[d]: because the code _compiles_
11:07 tiredchiku[d]: and it _runs_
11:07 tiredchiku[d]: but also runs into errors :3
11:07 tiredchiku[d]: `[ 9924.244022] NVRM: RmIoctl: unknown NVRM ioctl command: 0x11`
11:07 tiredchiku[d]: :D
11:23 tiredchiku[d]: printf debugging here I come :LUL:
11:28 tiredchiku[d]: it's failing in mem_map
11:29 tiredchiku[d]: alloc_tiled_mem(0x1000, 0x1000, 0, 0, 0xa)
11:29 tiredchiku[d]: alloc_tiled_mem(0x80000, 0x10000, 0, 0, 0x9)
11:29 tiredchiku[d]: ```c
11:29 tiredchiku[d]: fprintf(stderr, "alloc_tiled_mem(%#" PRIx64 ", %#" PRIx64 ", %#" PRIx8 ", %#" PRIx16 ", %#x)\n",
11:29 tiredchiku[d]: size_B, align_B, pte_kind, tile_mode, flags);```
12:02 tiredchiku[d]: oh rad
12:02 tiredchiku[d]: seems to be a simple enough issue
12:02 tiredchiku[d]: it's failing to handle /dev/nvidia0 correctly
12:04 tiredchiku[d]: and that's only because nodeName is null here
12:04 tiredchiku[d]: ```c
12:04 tiredchiku[d]: static inline void
12:04 tiredchiku[d]: nvkmd_nvrm_dev_api_dev(struct nvkmd_nvrm_dev *dev, struct NvRmApi *rm)
12:04 tiredchiku[d]: {
12:04 tiredchiku[d]: rm->fd = dev->devFd;
12:04 tiredchiku[d]: rm->hClient = dev->hClient;
12:04 tiredchiku[d]: rm->nodeName = dev->devName;
12:04 tiredchiku[d]: }```
12:34 tiredchiku[d]: I've been lied to :kittysurprise:
12:37 tiredchiku[d]: that was _one_ of the issues
12:37 tiredchiku[d]: the other is an EINVAL
13:01 djdeath3483[d]: gfxstrand[d]: tracking my last mesh/fs bug I think
13:01 djdeath3483[d]: gfxstrand[d]: I'm wondering why a float[16] is lowered to 16 slots
13:01 djdeath3483[d]: by nir_lower_io
13:02 djdeath3483[d]: probably because of type_size_vec4, but still sounds inefficient
13:03 djdeath3483[d]: I might need to lower per-primitive/per-vertex differently (which lower_io doesn't support yet)
13:20 tiredchiku[d]: notthatclippy[d]: does the /dev/nvidia0 fd not support mmap?
13:38 avhe[d]: tiredchiku[d]: depending on the type of memory you allocated you should either use /dev/nvidiaN or /dev/nvidiactl
13:38 tiredchiku[d]: yeah
13:38 tiredchiku[d]: using /dev/nvidiactl for GART and /dev/nvidiaN for device_local
13:38 tiredchiku[d]: both appear to be failing
13:38 tiredchiku[d]: ```c
13:38 tiredchiku[d]: NvU32 nvRmApiMapMemory(NvRmApi *api, NvU32 hDevice, NvU32 hMemory, NvU64 offset, NvU64 length, NvU32 flags, NvRmApiMapping *mapping)
13:38 tiredchiku[d]: {
13:38 tiredchiku[d]: mapping->address = NULL;
13:38 tiredchiku[d]: int memFd = open(api->nodeName, O_RDWR | O_CLOEXEC);
13:38 tiredchiku[d]: if (memFd < 0) {
13:38 tiredchiku[d]: return NV_ERR_GENERIC;
13:38 tiredchiku[d]: }
13:38 tiredchiku[d]: nv_ioctl_nvos33_parameters_with_fd p = {
13:38 tiredchiku[d]: .params = {
13:38 tiredchiku[d]: .hClient = api->hClient,
13:39 tiredchiku[d]: .hDevice = hDevice,
13:39 tiredchiku[d]: .hMemory = hMemory,
13:39 tiredchiku[d]: .offset = offset,
13:39 tiredchiku[d]: .length = length,
13:39 tiredchiku[d]: .pLinearAddress = 0,
13:39 tiredchiku[d]: .flags = flags
13:39 tiredchiku[d]: },
13:39 tiredchiku[d]: .fd = memFd
13:39 tiredchiku[d]: };
13:39 tiredchiku[d]: int ret = nvRmIoctl(api->fd, NV_ESC_RM_MAP_MEMORY, &p, sizeof(p));
13:39 tiredchiku[d]: if (ret < 0) {
13:39 tiredchiku[d]: p.params.status = NV_ERR_GENERIC;
13:39 tiredchiku[d]: goto done1;
13:39 tiredchiku[d]: }
13:39 tiredchiku[d]: if (p.params.status != NV_OK) {
13:39 tiredchiku[d]: goto done1;
13:39 tiredchiku[d]: }
13:39 tiredchiku[d]: mapping->stubLinearAddress = (void*)(uintptr_t)p.params.pLinearAddress;
13:39 tiredchiku[d]: mapping->address = (void*)mmap(0, length, PROT_READ|PROT_WRITE, MAP_SHARED, memFd, 0);
13:39 tiredchiku[d]: if (mapping->address == MAP_FAILED) {
13:39 tiredchiku[d]: fprintf(stderr, "[!] mmap failed 2: %s (errno: %d)\n", strerror(errno), errno);
13:39 tiredchiku[d]: p.params.status = NV_ERR_GENERIC;
13:39 tiredchiku[d]: goto done1;
13:39 tiredchiku[d]: }
13:39 tiredchiku[d]: mapping->size = length;
13:39 tiredchiku[d]: done1:
13:39 tiredchiku[d]: close(memFd);
13:39 tiredchiku[d]: return p.params.status;
13:39 tiredchiku[d]: }```
13:39 tiredchiku[d]: it hits that fprintf
13:39 tiredchiku[d]: (apologies IRC crowd)
13:40 avhe[d]: what's the status from NV_ESC_RM_MAP_MEMORY?
13:40 avhe[d]: oh should be 0 i guess
13:40 tiredchiku[d]: yup
13:41 tiredchiku[d]: if that was failing we wouldn't get this far 😅
13:41 tiredchiku[d]: ```c
13:41 tiredchiku[d]: $15 = {params = {hClient = 3251635017, hDevice = 3404726273, hMemory = 3404726274, offset = 0, length = 4096, pLinearAddress = 0x0, status = 0, flags = 0}, fd = 10}```
13:41 tiredchiku[d]: 4096 bytes is a bit :doomthink:
13:42 avhe[d]: it's probably coming from the flags=0 then?
13:42 tiredchiku[d]: hmm
13:42 avhe[d]: i always have DRF_DEF(OS33, _FLAGS, _CACHING_TYPE, _DEFAULT) | DRF_DEF(OS33, _FLAGS, _MAPPING, _DIRECT)
13:42 tiredchiku[d]: oh yeah, nvRmIoctl() is just
13:42 tiredchiku[d]: ```c
13:42 tiredchiku[d]: static int nvRmIoctl(int fd, NvU32 cmd, void *pParams, NvU32 paramsSize)
13:42 tiredchiku[d]: {
13:42 tiredchiku[d]: int res;
13:42 tiredchiku[d]: do {
13:42 tiredchiku[d]: res = ioctl(fd, _IOC(IOC_INOUT, NV_IOCTL_MAGIC, cmd, paramsSize), pParams);
13:42 tiredchiku[d]: if (res < 0) {
13:42 tiredchiku[d]: res = errno;
13:42 tiredchiku[d]: }
13:42 tiredchiku[d]: } while ((res == EINTR || res == EAGAIN));
13:42 tiredchiku[d]: return res;
13:42 tiredchiku[d]: }```
13:44 avhe[d]: tiredchiku[d]: that's just the size of a page, it sounds correct
13:44 tiredchiku[d]: let me try that
13:44 tiredchiku[d]: the flags
14:00 tiredchiku[d]: hm nope, apparently that wasn't it
14:03 tiredchiku[d]: well, I'll get back to it after dinner
14:26 gfxstrand[d]: tiredchiku[d]: Yes, please
14:30 gfxstrand[d]: djdeath3483[d]: Yup. The GL/Vulkan varying rules are that arrays always have a stride of 16B. Is that the right thing for mesh? Probably not but nir_lower_io wasn't really built for mesh. Caio is probably the better one to ask for details.
14:34 esdrastarsis[d]: New screenshots: https://discuss.haiku-os.org/t/haiku-nvidia-porting-nvidia-gpu-driver/16520/20
14:35 tiredchiku[d]: gfxstrand[d]: did already :)
14:38 tiredchiku[d]: by my count I had to pull in 46 headers
14:38 tiredchiku[d]: :Sweat:
14:41 tiredchiku[d]: I need to figure out why I'm running into this
14:41 tiredchiku[d]: ```[22183.549541] NVRM: RmIoctl: unknown NVRM ioctl command: 0x11
14:41 tiredchiku[d]: [22183.574600] NVRM: VM: invalid mmap
14:41 tiredchiku[d]: [22183.576086] NVRM: VM: invalid mmap
14:41 tiredchiku[d]: strace time
14:45 avhe[d]: tiredchiku[d]: looking again i'm pretty sure _IOC(IOC_INOUT, ...) is incorrect
14:45 avhe[d]: IOC_INOUT is already shifted into place
14:46 tiredchiku[d]: hm
14:47 tiredchiku[d]: :o
14:55 tiredchiku[d]: changed it to `_IOC_READ | _IOC_WRITE` as seen here: https://github.com/elFarto/nvidia-vaapi-driver/blob/c519e97ef7af581c109f49b6973269fb16d1bc54/src/direct/nv-driver.c#L57
14:55 tiredchiku[d]: still the same :why:
15:06 tiredchiku[d]: this is very weird
15:06 tiredchiku[d]: because I'm doing another ioctl elsewhere (though now with the function) and that works fine
15:11 tiredchiku[d]: :BlobhajShock:
15:11 tiredchiku[d]: I think I figured it out
15:12 tiredchiku[d]: I'm giving _IOC the params size as the last arg
15:12 tiredchiku[d]: or rather, IOWR now
15:12 tiredchiku[d]: but, in the header..
15:12 tiredchiku[d]: `#define _IOWR(type,nr,argtype) _IOC(_IOC_READ|_IOC_WRITE,(type),(nr),(_IOC_TYPECHECK(argtype)))`
15:12 tiredchiku[d]: #ifndef __KERNEL__
15:12 tiredchiku[d]: #define _IOC_TYPECHECK(t) (sizeof(t))
15:12 tiredchiku[d]: #endif```
15:12 tiredchiku[d]: tiredchiku[d]: which also corroborates with this
15:13 tiredchiku[d]: let's see
15:14 tiredchiku[d]: no wait that's not it
15:14 tiredchiku[d]: because
15:14 tiredchiku[d]: ```#define _IOC(dir,type,nr,size) \
15:14 tiredchiku[d]: (((dir) << _IOC_DIRSHIFT) | \
15:14 tiredchiku[d]: ((type) << _IOC_TYPESHIFT) | \
15:14 tiredchiku[d]: ((nr) << _IOC_NRSHIFT) | \
15:14 tiredchiku[d]: ((size) << _IOC_SIZESHIFT))```
15:15 tiredchiku[d]: ..I'm looking in the wrong place, my mmap is failing, not the ioctl
15:22 tiredchiku[d]: :doomthink:
15:22 tiredchiku[d]: ```(gdb) print p.params.pLinearAddress
15:22 tiredchiku[d]: $1 = (NvP64) 0x0```
15:23 tiredchiku[d]: ioctl isn't populating that with a valid mapped address?
15:25 gfxstrand[d]: tiredchiku[d]: I'd recommend using the #defines in drm.h.
15:26 gfxstrand[d]: You shouldn't be redefining _IOC
15:26 tiredchiku[d]: oh, I'm not redefining it
15:26 gfxstrand[d]: Okay, good
15:26 tiredchiku[d]: I went looking in the headers to see what's what
15:26 marysaka[d]: I wonder if you couldn't use envyhooks to trace your RM calls and compare with the blobs :aki_thonk:
15:27 tiredchiku[d]: :ha:
15:28 tiredchiku[d]: didn't even cross my mind
15:28 tiredchiku[d]: mind you, I'm just tryung to run vulkaninfo/vkcube
15:31 gfxstrand[d]: That's not a bad idea
15:34 tiredchiku[d]: tiredchiku[d]: yup, I was right
15:36 tiredchiku[d]: actually
15:36 tiredchiku[d]: a lot more appears to be borky on my end
15:36 tiredchiku[d]: hClient, hDevice, hMemory is also null
15:36 tiredchiku[d]: :doomthink:
15:37 tiredchiku[d]: OH
15:37 tiredchiku[d]: :vanpalm:
15:38 tiredchiku[d]: I'm only opening the fd's with O_RDWR
15:38 tiredchiku[d]: and not O_CLOEXEC
15:41 gfxstrand[d]: Yeah, you react O_CLOEXEC, not that it'll make a difference for anything you're debugging at the moment.
15:42 tiredchiku[d]: tiredchiku[d]: there we go, none of this is null now
15:52 tiredchiku[d]: marysaka[d]: any idea what `UNKNOWN CMDID: 20801352` would be referring to?
15:55 marysaka[d]: not sure but it seems to not really matter
15:55 tiredchiku[d]: found it
15:55 marysaka[d]: I seems to pop at regular interval ect
15:55 tiredchiku[d]: `#define NV2080_CTRL_CMD_FB_GET_SEMAPHORE_SURFACE_LAYOUT (0x20801352U)`
15:55 marysaka[d]: oh huh
15:56 marysaka[d]: I might have misread this then
15:57 marysaka[d]: but yeah you can modify envyhooks to log that if you want tiredchiku[d] around here https://gitlab.freedesktop.org/nouveau/envyhooks/-/blob/main/src/nvrm/utils.rs?ref_type=heads#L114
15:58 tiredchiku[d]: :salute:
16:07 avhe[d]: i'd also recommend checking ogkm logs
16:07 avhe[d]: use this to set the log matcher to ":" <https://gist.github.com/mtijanic/32b077542b7e56dc603074115fcaea75>
16:08 avhe[d]: then you run your application along with dmesg -W
16:08 avhe[d]: that will usually set you on the right direction
16:10 tiredchiku[d]: :salute:
16:10 avhe[d]: for tracing there is also this tool: <https://github.com/mtijanic/nvtrace>, which is pretty simple to set up
16:11 tiredchiku[d]: yeah, do have that, but struggled to parse the output a bit
16:11 avhe[d]: yeah i haven't used it myself, i just know it's out there and pretty powerful
16:12 tiredchiku[d]: avhe[d]: oh boy that's loud
16:12 tiredchiku[d]: five billion lines per second :nyoom:
16:13 avhe[d]: yeah don't use it on official software (or reduce the verbosity), it does way too many ioctls
16:33 tiredchiku[d]: !
16:33 tiredchiku[d]: the mmap fails if it goes through /dev/nvidiaN
16:33 tiredchiku[d]: but not though /dev/nvidiactl
16:36 tiredchiku[d]: stinky
16:39 tiredchiku[d]: but at least now it fails with `NV_ERR_INVALID_ARGUMENT`
16:46 djdeath3483[d]: gfxstrand[d]: Since per-vertex has to match with legacy pipeline, we'll have to go with that, but per-primitive ought to be different
16:46 djdeath3483[d]: shouldn't be hard to add a special case for it
16:47 djdeath3483[d]: thanks for confirming
17:34 tiredchiku[d]: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/c5e439fea4fe81c78d52b95419c30cabe44e48fd/src/nvidia/arch/nvalloc/unix/src/escape.c#L509-L513
17:34 tiredchiku[d]: :wahoo:
17:34 tiredchiku[d]: found the source of my issue
17:39 tiredchiku[d]: weird that it only happens in one function though
17:42 tiredchiku[d]: and not all of them
17:49 tiredchiku[d]: oh
17:49 tiredchiku[d]: so both the ones that were going to /dev/nvidia0 return the same status
17:50 tiredchiku[d]: just that one of them does error handling, the other doesn't
18:05 djdeath3483[d]: gfxstrand[d]: Also a bit confused. Because for memory that would make sense since lots of clients can access it (various GPU stages, CPU)
18:06 djdeath3483[d]: But when it's an internal interface between shader stages, as long as you can manage things matching, who would know that you didn't respect the rules? 😉
18:09 gfxstrand[d]: Arguably that's the wrong pass for internal interfaces
18:16 djdeath3483[d]: it's full of logic for those though 🙂
19:07 tiredchiku[d]: notthatclippy[d]: question for you: I seem to be running into https://github.com/NVIDIA/open-gpu-kernel-modules/blob/c5e439fea4fe81c78d52b95419c30cabe44e48fd/kernel-open/nvidia/nv-mmap.c#L519-L527 when I try to do NV_ESC_RM_MAP_MEMORY.. any idea why that may be?
19:07 tiredchiku[d]: have been stuck on that for quite a while now
19:08 tiredchiku[d]: [Mon Mar 24 00:38:30 2025] NVRM: VM: invalid mmap
19:08 tiredchiku[d]: [Mon Mar 24 00:38:30 2025] NVRM: VM: invalid mmap
19:08 tiredchiku[d]: [Mon Mar 24 00:38:30 2025] vulkaninfo[1060682]: segfault at 5869f0000000 ip 00007338f32bf196 sp 00007fff25733bf0 error 4 in libc.so.6[b3196,7338f3230000+16f000] likely on CPU 4 (core 4, socket 0)
19:08 tiredchiku[d]: [Mon Mar 24 00:38:30 2025] Code: 79 13 00 a8 04 74 26 48 8b 0d 36 70 13 00 48 8d 14 8d 00 00 00 00 48 f7 da 48 85 c9 48 c7 c1 00 00 00 fc 48 0f 44 d1 48 21 ea <4c> 8b 3a 49 89 c0 49 83 e0 f8 4c 89 c2 48 f7 da 48 39 ea 0f 82 e1
19:08 tiredchiku[d]: if that helps
19:15 tiredchiku[d]: smol victories though
19:15 tiredchiku[d]: https://github.com/necrashter/minimal-vulkan-compute-shader
19:16 tiredchiku[d]: ```[sidpr@makron bin]$ VK_ICD_FILENAMES=/usr/share/vulkan/icd.d/nouveau_icd.x86_64.json NVK_ON_NVRM=1 ./main
19:16 tiredchiku[d]: Device Name : NVIDIA GeForce RTX 3070 (NVK GA104)
19:16 tiredchiku[d]: Vulkan Version : 1.4.309
19:16 tiredchiku[d]: Compute Queue Family Index: 0
19:16 tiredchiku[d]: Segmentation fault (core dumped)```
19:21 tiredchiku[d]: I feel like once I figure out why I'm failing to map memory, it'll be a bit more helpful
19:24 airlied[d]: @averne you might want to take a look at https://github.com/airlied/gsp-parse
19:25 airlied[d]: It might not cover all the headers you need but it could be used to create a single header from ogkm
19:26 airlied[d]: I used it to produce a single header for nouveau as a demo and for nova it can produce a rust file
19:46 tiredchiku[d]: ..wait a minute
19:46 tiredchiku[d]: gfxstrand[d]: how bad an idea is it to share an execution context between two fds
19:48 gfxstrand[d]: Not sure what you mean
19:48 tiredchiku[d]: just a sec
19:49 tiredchiku[d]: if you look at the context creation code here: https://pastebin.com/P7VK1AnY
19:49 tiredchiku[d]: it creates the context for `nvkmd_nvrm_dev_api_ctl(dev, &rm);`
19:49 tiredchiku[d]: but I also need to create one for `nvkmd_nvrm_dev_api_dev(dev, &devRm);`
19:50 tiredchiku[d]: the first corresponds to /dev/nvidiactl, the latter to /dev/nvidiaX
19:50 gfxstrand[d]: That's fine
19:50 tiredchiku[d]: so I can reasonably share with both?
19:50 gfxstrand[d]: Sure
19:50 tiredchiku[d]: (this is assuming no one's running multiple nvidia cards on a system :P)
19:51 gfxstrand[d]: If you need to have two descriptors open, then do it.
19:51 tiredchiku[d]: I do need both, because some ioctls go only to either, not both
19:51 gfxstrand[d]: Yeah, we'll have to figure that out eventually. But for now just stick both in the dev.
19:51 tiredchiku[d]: nvidiactl doesn't seem to handle local memory, for example
19:51 tiredchiku[d]: that's done by nvidiaX
19:52 gfxstrand[d]: That makes some sense in a very non-DRM way
19:52 avhe[d]: airlied[d]: nice, though i'm working in c++ so in my case it's pretty easy to just submodule OGKM and include what i need
19:53 gfxstrand[d]: The only real issue is going to be if there's some reason we can't have nvidiactl open twice. Then we'd have to deduplicate somehow. But for now, just open both on the dev and call it a day.
19:54 tiredchiku[d]: got you
19:55 tiredchiku[d]: I dunno if nv takes ioctls over the drm fd (which I think is /dev/dri/cardX)?
19:56 avhe[d]: tiredchiku[d]: if you are mapping from /dev/nvidia0, you might need to use NV_ESC_REGISTER_FD
19:56 gfxstrand[d]: That's probably just for KMS and maybe some prime FD stuff
19:56 avhe[d]: not sure about this one but worth a try
19:56 tiredchiku[d]: doing that already, yeah
19:56 avhe[d]: ah ¯\_(ツ)_/¯
19:57 airlied[d]: Yes the DRM fd is used for KMS only
19:59 tiredchiku[d]: gotcha
19:59 tiredchiku[d]: thanks
20:01 tiredchiku[d]: gonna take myself to bed cuz it's 0131 and I've been working on this about since I woke up (with breaks of course)
20:01 tiredchiku[d]: the autistic hyperfixation is real
20:27 eric_engestrom[d]: gfxstrand[d]: fdo-wide yeah, everything is working (although not everything is migrated, some of the old infra is still in use), but mesa needs some changes for s3 which we use throughout the ci, and I've fixed everything I could and right now I'm blocked on the lava & rpi proxy returning 403s, so... no mesa ci yet, which means please don't merge anything 😅
21:19 gfxstrand[d]: I tried a couple yesterday and they died on ci-fairy and S3 stuff.
21:35 pavlo_kozlenko[d]: https://nouveau.freedesktop.org/
21:35 pavlo_kozlenko[d]: 2025-03-22
21:44 snowycoder[d]: Hello, I need help encoding the `OpSync` instruction in Kepler.
21:44 snowycoder[d]: I've seen that `OpSync` is only encoded in sm50 while `OpBSync` is only encoded in sm70, though there does not seem to be any pass that rewrites them (how?).
21:44 snowycoder[d]: I tried to encode `OpSync` directly but nvdisasm doesn't seem to know about any `sync` op, just `bar.sync`.
21:44 snowycoder[d]: Old codegen has no `OP_SYNC` but has `OP_JOIN` that emits a strange `nop.s`
21:44 snowycoder[d]: I tried to use `__warp_sync` after a divergent-if in cuda but it is ultimately encoded as a `nop` (without `.s`) and reordered.
21:45 snowycoder[d]: So... what do I do now? I'm kinda lost.
21:48 mhenning[d]: snowycoder[d]: OpSync corresponds to OP_JOIN in the old compiler
21:49 gfxstrand[d]: Yes, SM70 uses a totally different reconvergence strategy.
21:49 mhenning[d]: Yeah, control flow works very differently on volta+ vs before that
21:50 gfxstrand[d]: We have a NIR pass that gets rid of structured control flow entirely, which is why you never get OpSync there.
21:50 gfxstrand[d]: That only gets called on Volta+
21:50 mhenning[d]: gfxstrand[d]: specifically, it's called nak_nir_lower_cf
21:52 snowycoder[d]: Thanks, I'll use `nop.s` and read more about nak_nir_lower_cf.
21:58 mhenning[d]: There's also some information about the pre-volta control flow model (which originated on the g80) here: https://envytools.readthedocs.io/en/latest/hw/graph/tesla/cuda/control.html
22:05 mhenning[d]: oh, also nvidia's volta whitepaper has a description of why they changed the control flow model, which could be interesting background information
23:30 matt_schwartz[d]: hi all. i got a blackwell card yesterday and decided to try nouveau + nvk on it, but I seem to be hitting a kernel issue with skeggsb9778[d]'s 570 gsp kernel branch + firmware where no physical vulkan device ever gets created for my 5090 so I'm stuck with only llvmpipe. my dmesg shows a couple kernel warnings for
23:30 matt_schwartz[d]: WARNING: CPU: 1 PID: 228 at drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/fifo.c:361 r535_engn_nonstall+0x2d/0x40 [nouveau]
23:30 matt_schwartz[d]: and then a lot of gsp failures. i also never see anything like `gsp: RM version 570.124.04` in my dmesg so it seems like maybe it's not loading the firmware? I grabbed a full readout with `drm.debug=0xe` here: https://gist.github.com/matte-schwartz/56d3188fafd5b21f1b2ea2424c7226ce.
23:30 matt_schwartz[d]: I thought it was my meme of an ultrawide display (7680x2160p), but I tried a different display that's 3440x1440p and that failed in the same way. there's no change when forcing a lower resolution like `video=1920x1080@60` either. on my 4090, my main 7680x2160p display connects at 3840x1080p by default without issues using nouveau.
23:39 gfxstrand[d]: You need a patch for it to print the firmware version. It doesn't do that be default.
23:42 matt_schwartz[d]: ah okay
23:44 snowycoder[d]: Bad news: kepler does not seem to have atomic shared, they compile to loops to locked memory
23:45 gfxstrand[d]: Does it have compare exchange?
23:47 gfxstrand[d]: If so, we can just turn on the lowering.
23:50 mhenning[d]: gfxstrand[d]: I don't think so
23:51 gfxstrand[d]: Oh, well that would be sad.
23:51 snowycoder[d]: Does not seem so, it either has "atom" that is only used for global memory or "LDSLK/STSCUL" that locks shared memory
23:51 gfxstrand[d]: Oh, so there's a lock. That's annoying but workable, I suppose.
23:52 snowycoder[d]: Yes but we should add locking to every Ld/St in ir even for newer archs
23:52 gfxstrand[d]: We'll need to add an `_nv` NIR intrinsic or two and some lowering.
23:53 gfxstrand[d]: snowycoder[d]: Not sure what you mean by that
23:53 snowycoder[d]: It adds a flag that would only be used for kepler :/
23:54 gfxstrand[d]: We could also add new ops
23:55 gfxstrand[d]: Especially if the flag massively changes the behavior in some way.
23:56 snowycoder[d]: Mmmh, it kinda does, it returns a predicate
23:56 gfxstrand[d]: Like, to the locking things still take an address? Do they take data? What are the semantics?
23:56 mhenning[d]: It
23:58 mhenning[d]: It's worth noting that codegen tends to group a lot of things under subOps of the same Op, in places where NAK prefers to break them out into distinct Ops and doesn't have a concept of subOps