00:58 mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1354620490467115149/message.txt?ex=67e5f424&is=67e4a2a4&hm=e81006d5af4ede8039623c677ec01c56256a9f9db978bc53bfa5848b39152d49&
00:58 mohamexiety[d]: so this is a sample of dmesg with `vkcube`
01:00 mohamexiety[d]: and this is a sample from the smoke triangle test in the CTS:
01:00 mohamexiety[d]: [ 276.885679] client: deqp-vk[4346] | internal: 1 | bo_size: 73728 | ogl page_size: 4096
01:00 mohamexiety[d]: [ 276.887589] client: deqp-vk[4346] | internal: 0 | bo_size: 4096 | vm page_size: 4096
01:00 mohamexiety[d]: [ 276.919159] client: deqp-vk[4346] | internal: 1 | bo_size: 73728 | ogl page_size: 4096
01:00 mohamexiety[d]: [ 276.920888] client: deqp-vk[4346] | internal: 0 | bo_size: 4096 | vm page_size: 4096
01:00 mohamexiety[d]: [ 276.951605] client: deqp-vk[4346] | internal: 1 | bo_size: 73728 | ogl page_size: 4096
01:00 mohamexiety[d]: [ 276.953451] client: deqp-vk[4346] | internal: 0 | bo_size: 4096 | vm page_size: 4096
01:00 mohamexiety[d]: [ 276.971443] client: deqp-vk[4346] | internal: 1 | bo_size: 73728 | ogl page_size: 4096
01:00 mohamexiety[d]: [ 276.973219] client: deqp-vk[4346] | internal: 0 | bo_size: 4096 | vm page_size: 4096
01:00 mohamexiety[d]: [ 276.973503] client: deqp-vk[4346] | internal: 0 | bo_size: 4096 | vm page_size: 4096
01:00 mohamexiety[d]: [ 276.973576] client: deqp-vk[4346] | internal: 0 | bo_size: 32768 | vm page_size: 4096
01:00 mohamexiety[d]: [ 276.973610] client: deqp-vk[4346] | internal: 0 | bo_size: 131072 | vm page_size: 65536
01:00 mohamexiety[d]: [ 276.973908] client: deqp-vk[4346] | internal: 0 | bo_size: 65536 | vm page_size: 65536
01:00 mohamexiety[d]: [ 276.974570] client: deqp-vk[4346] | internal: 0 | bo_size: 131072 | vm page_size: 65536
01:00 mohamexiety[d]: [ 276.975805] client: deqp-vk[4346] | internal: 0 | bo_size: 262144 | vm page_size: 65536
01:00 mohamexiety[d]: [ 276.978070] client: deqp-vk[4346] | internal: 0 | bo_size: 524288 | vm page_size: 65536
01:00 mohamexiety[d]: [ 276.982089] client: deqp-vk[4346] | internal: 1 | bo_size: 73728 | ogl page_size: 4096
01:00 mohamexiety[d]: [ 276.983806] client: deqp-vk[4346] | internal: 0 | bo_size: 4096 | vm page_size: 4096
01:00 mohamexiety[d]: [ 276.986541] client: deqp-vk[4346] | internal: 0 | bo_size: 4096 | vm page_size: 4096
01:00 mohamexiety[d]: [ 276.986554] client: deqp-vk[4346] | internal: 0 | bo_size: 65536 | vm page_size: 65536
01:00 mohamexiety[d]: [ 276.986648] client: deqp-vk[4346] | internal: 0 | bo_size: 16384 | vm page_size: 4096
01:00 mohamexiety[d]: [ 276.994197] client: systemd-logind[1274] | internal: 0 | bo_size: 1376256 | ogl page_size: 4096
01:00 mohamexiety[d]: [ 276.994845] client: systemd-logind[1274] | internal: 0 | bo_size: 4096 | ogl page_size: 4096
01:00 mohamexiety[d]: [ 277.117675] client: deqp-vk[4346] | internal: 0 | bo_size: 65536 | vm page_size: 65536
01:00 mohamexiety[d]: [ 277.117739] client: deqp-vk[4346] | internal: 0 | bo_size: 262144 | vm page_size: 65536
01:00 mohamexiety[d]: [ 277.117763] client: deqp-vk[4346] | internal: 0 | bo_size: 262144 | vm page_size: 65536
01:00 mohamexiety[d]: [ 277.119368] client: deqp-vk[4346] | internal: 0 | bo_size: 65536 | vm page_size: 65536
01:00 mohamexiety[d]: [ 277.119438] client: deqp-vk[4346] | internal: 0 | bo_size: 65536 | vm page_size: 65536
01:00 mohamexiety[d]: [ 277.119495] client: deqp-vk[4346] | internal: 0 | bo_size: 4096 | vm page_size: 4096
01:07 mohamexiety[d]: airlied[d]: skeggsb9778[d] so looking at this, it seems things actually work fine somewhat. vulkan seems ok and does use larger page sizes where applicable and the previous output was confusing due to noise from the DE and such. however, ogl/the non vm path doesn't seem very keen on using larger page sizes
01:08 mohamexiety[d]: I am going to assume that removing the GART guard on the non vm path would fix this, but the issue is when I do that we get a record number of MMU faults per second and the GPU really, really, really doesn't like it
01:13 mohamexiety[d]: the question is where to proceed from here :thonk:
01:17 airlied[d]: ignore the non VM paths for now
01:17 mohamexiety[d]: nice
01:17 airlied[d]: get vulkan to work
01:17 mohamexiety[d]: it looks like it works. but short of games, not sure what else to verify
01:17 mohamexiety[d]: full CTS time?
01:19 mohamexiety[d]: I am kind of impressed honestly because I was dead certain the way the uvmm code does page sizing is.. not correct but things are pretty stable all things considered
01:19 mohamexiety[d]: I could also wire up compression but that's another variable that could mess things up
01:19 mohamexiety[d]: also cc gfxstrand[d] ^
01:20 airlied[d]: yeah I'd start seeing if compression can be made work, or whether you can find anything that gets faster
01:25 mohamexiety[d]: yeah will do that tomorrow then
01:26 airlied[d]: karolherbst[d]: any idea what hfma2.mma is, the docs seem a bit sparse
01:30 redsheep[d]: And the sparsity makes it 2x faster 🥁
01:30 karolherbst[d]: no idea
01:30 karolherbst[d]: is nvidia using it?
01:32 karolherbst[d]: did you check the PTX docs?
01:37 airlied[d]: don't see it, but might be a way to hfma2 on the mma pipe
01:43 mohamexiety[d]: https://gitlab.freedesktop.org/mohamexiety/mesa/-/commits/nvk-variable-pages
01:43 mohamexiety[d]: https://gitlab.freedesktop.org/mohamexiety/nouveau/-/commits/vm-bind-experiments-5
01:43 mohamexiety[d]: gfxstrand[d] uploaded my stuff here btw if you want to take a look/test
01:44 mohamexiety[d]: going to test and/or try to add compression on top tomorrow
01:44 mohamexiety[d]: (kernel side is rebased on top of ben's tree, so needs the r570 firmware)
02:16 karolherbst[d]: airlied[d]: mhhhh
02:17 gfxstrand[d]: Okay, so I kinda know what's going wrong with my tegra. Or at least where it's going wrong. It ioremaps the device and then goes to read from the mapped range and that read hangs.
02:17 gfxstrand[d]: Why would that hang? I don't know
02:17 gfxstrand[d]: Something not configured right with the device or memories?
02:17 gfxstrand[d]: Maybe it's at the wrong address somehow?
02:18 gfxstrand[d]: It's currently assuming the GPU is at 0x57000000
02:19 karolherbst[d]: I don't really see what benefit that gives you to run it on the mma pipe
02:20 karolherbst[d]: it does seem fixed latency though, so maybe it's just some special mma version that's fast
02:36 karolherbst[d]: mhenning[d]: sure, but the mma uni has a massive latency penalty
02:36 karolherbst[d]: *unit
02:37 karolherbst[d]: it's like 5 cycles
02:37 karolherbst[d]: sometimes 6 or 7
02:39 karolherbst[d]: also..
02:39 karolherbst[d]: `HFM2.MMA` has different flags compared to `HFMA2`
02:41 airlied[d]: gfxstrand[d]: most likely the gpu is clocked off
02:47 karolherbst[d]: well.. if the computation allows it, I'm sure you can balance a bit off the mma unit and get some gains, but not sure how widely applicable that would be
02:50 gfxstrand[d]: airlied[d]: Yeah, that's what I suspect. But why?
02:51 gfxstrand[d]: I'm staring at DTBs but I have no idea what I'm looking for
02:53 airlied[d]: it's gonna be a clock or a regulator 🙂
02:54 airlied[d]: but there ends most of my arm power knowledge
02:54 airlied[d]: karolherbst[d]: I think on some high end GPUs the mma unit might be higher throutput
02:55 airlied[d]: not sure
02:55 karolherbst[d]: could be
02:55 karolherbst[d]: still you pay the latency
02:55 karolherbst[d]: but yeah, I'm sure there are cases where it's beneficial
03:07 x512[m]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30033#note_2836894
03:07 x512[m]: Tegra need yet another KMD backend?
03:08 gfxstrand[d]: Ideally, no. I don't see a whole lot of advantage to programming for host1x vs. nouveau.
03:09 gfxstrand[d]: At least with nvrm, it's a potential path to Windows and to NVK+CUDA setups.
03:10 gfxstrand[d]: Given that we can barely get anyone to hack on Tegra to begin with, I don't see what advantage there is to having another poorly maintained back-end.
03:11 tiredchiku[d]: wonder if NVK+CUDA means DLSS
03:11 tiredchiku[d]: :wolfFIRE:
03:13 gfxstrand[d]: Nah
03:13 gfxstrand[d]: For DLSS, we need to implement the direct shader dispatch thing in NVK. That's pretty separate.
03:13 gfxstrand[d]: I don't think the CUDA runtime is actually invoked for DLSS
03:13 tiredchiku[d]: it's not, yesh
03:16 gfxstrand[d]: gfxstrand[d]: My board is clearly cursed.
03:16 gfxstrand[d]: And I'm going to bed
03:16 gfxstrand[d]: I wonder if marysaka[d] and I have different revisions or something.
03:17 gfxstrand[d]: It works with the Tegra Linux distro so <a:shrug_anim:1096500513106841673>
03:20 gfxstrand[d]: I would say maybe there's some bios/firmware thing I can update but I would expect the nvidia flashing tool to do that and we're using the same version of everything AFAICT
03:22 tiredchiku[d]: I think I know why my code is taking the GPU down
03:24 airlied[d]: the fact that you can take the whole GPU down from userspace isn't ideal 🙂
03:26 tiredchiku[d]: indeed
03:28 tiredchiku[d]: anyway, I'll have to look into this in my evening
03:28 tiredchiku[d]: have to go take a test today
03:30 x512[m]: Really? Nvidia GPU is quite good at checking illegal usage and it stops only misbehaving client, not whole GPU.
03:30 x512[m]: AMD seems worse at that.
03:33 esdrastarsis[d]: tiredchiku[d]: good luck
03:34 karolherbst[d]: x512[m]: recovery isn't perfect
03:35 gfxstrand[d]: I figured out how to take the x server down when it's running the NVIDIA driver today. That was fun.
03:35 gfxstrand[d]: The rest of the system survived but boom, no session for you.
03:36 karolherbst[d]: heh
03:36 karolherbst[d]: I got the driver into an irrecoverable state a couple of times
03:36 gfxstrand[d]: Sandy Bridge has the best hang recovery.
03:37 x512[m]: Intel GPU?
03:37 karolherbst[d]: But I agree that on AMD it's a disaster
03:38 gfxstrand[d]: Yup
03:38 tiredchiku[d]: esdrastarsis[d]: thank you
03:38 x512[m]: Do recent Intel GPUs have preemption and userland submission like Nvidia?
03:39 gfxstrand[d]: Sandy Bridge has a hardware issue where semaphores are kinda busted and the GPU hangs every 30s or so continuously. Users never notice.
03:39 karolherbst[d]: on my desktop I need to disable the desktop before doing testing on AMD. No idea what it is, but a GPU reset enables automatic suspend...
03:40 gfxstrand[d]: Preemption: yes. Userland submit: in theory. But Sandy Bridge is ancient and predates all that stuff.
03:40 gfxstrand[d]: AMD has the worst hang recovery.
03:41 gfxstrand[d]: Some of it is hardware issues, I think, but I suspect software is also partially to blame.
03:41 karolherbst[d]: It's probably some gnome bug not coping with GPU resets properly but AMD also hides the fact, so Userspace is left with broken VRAM
03:41 gfxstrand[d]: My WDDM2 branch will reboot Windows if you start the CTS and walk away for a coffee.
03:42 karolherbst[d]: heh
03:42 tiredchiku[d]: trying to get you to boot back to linux instead
03:42 x512[m]: On Haiku I can load and unload NVRM kernel module multiple times without reboot and keeping framebuffer functional that is convenient.
03:42 karolherbst[d]: probably the best thing to do though
03:43 karolherbst[d]: like.. if VRAM content is something random, you can't ensure that you won't leak confidential data to clients
03:43 karolherbst[d]: so a reboot is probably the only sane choice left
03:44 x512[m]: I mean GUI on screen is still running when NVRM drivers is unloaded.
03:46 karolherbst[d]: well, sure
03:46 karolherbst[d]: no need for a driver when it's just raw host framebuffer ops
05:37 airlied[d]: marysaka[d]: started to work out that weird coop mat cross fails, seems like the a/b signed flags aren't directly mapped or there is a missing bit
08:05 marysaka[d]: gfxstrand[d]: ngl I wouldn't mind swapping one of my board with yours because I'm very curious
08:05 marysaka[d]: It could be a different MC configuration or something to do with fuses but can't be so sure
08:07 marysaka[d]: airlied[d]: I did try to search for other signed bits without much success but might have missed something.... the blobs doesn't seem to care about setting it on those tests from what I gathered
08:17 airlied[d]: I'm trying to work out some sort of truth tables for 3 inputs bits to the 2 bits and see if the tests pass, but it's quite whack-a-mole
08:21 airlied[d]: marysaka[d]: did you trace all those matmul_cross tests? they are the only ones where sign of A != sign of B
08:22 marysaka[d]: I did that manually by grabbing the shader in test result and using https://gitlab.freedesktop.org/nouveau/nv-shader-tools
08:23 marysaka[d]: (well my own variant of the tool that goes around the network ect)
08:25 airlied[d]: like I can get down to like 4-8 tests failing, but it's hard to find the last piece to close the gap, I suppose I should go trace the shaders make sure there isn't any workarounds
08:28 marysaka[d]: airlied[d]: That's without exposing "signed on some inputs and not others" right?
08:28 marysaka[d]: If my memory serves me right if you don't set the signed flag when only one input is signed, all the tests pass (and NVIDIA blobs were doing that from what I could tell)
08:30 airlied[d]: the tests just test that A and B don't have the same sign, that is what the "cross" means, then C ends up being the same as A for the khr_a tests and B for the khr_b tests I think
08:31 marysaka[d]: right... I forgot 🙃
09:09 asdqueerfromeu[d]: esdrastarsis[d]: It even pushed the limits of NVK in its early days: https://old.reddit.com/r/linux_gaming/comments/12pkh5k/a_friend_of_mine_just_posted_this_on_his_discord :triangle_nvk:
09:10 asdqueerfromeu[d]: tiredchiku[d]: The former feels more important to me (because I'm running the code on my main system)
09:42 asdqueerfromeu[d]: Also is `flags = 0` needed for `nvRmSemSurfCreate()` on Ampere+? 🤔
09:52 x512[m]: asdqueerfromeu[d]: There are currently no flags defined for sem-surf constructor.
09:52 x512[m]: So flags are always 0.
09:54 x512[m]: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/c5e439fea4fe81c78d52b95419c30cabe44e48fd/src/nvidia/src/kernel/gpu/mem_mgr/sem_surf.c#L776
09:55 tiredchiku[d]: asdqueerfromeu[d]: I did remove that when fixing up the commit history yesterday
09:56 asdqueerfromeu[d]: tiredchiku[d]: I see you did another push (the initial commit rewrite still had the `flags = 0`)
09:57 x512[m]: Are Sid Pranjale present here?
09:57 tiredchiku[d]: that's me
09:57 x512[m]: Sid?
09:58 tiredchiku[d]: yes that's me
09:58 tiredchiku[d]: chiku is sid, sid is chiku
09:58 tiredchiku[d]: same person
09:59 avhe[d]: x512[m]: the nvgpu driver is also not stable, they modified crucial parts of the uapi between firmwares
09:59 avhe[d]: youo should probably look into the tegra_drm stuff that was upstreamed some time ago: <https://github.com/torvalds/linux/blob/master/include/uapi/drm/tegra_drm.h>
10:00 avhe[d]: though it's probably not sufficient to drive the gpu
10:03 x512[m]: Missing fields in designated initializer are automatically initialized to zero, so explicit {.flags = 0} is not needed.
10:11 tiredchiku[d]: yeah I am aware
10:37 tiredchiku[d]: `_gmmuWalkCBFillEntries: [GPU0]: PA 0x1F0516000, Entries 0x20-0x2E = INVALID`
10:37 tiredchiku[d]: :blobcatnotlikethis:
11:04 x512[m]: What is better to buy: A400 or A2000?
11:05 tiredchiku[d]: A2000
11:07 tiredchiku[d]: x512[m]: could you try running https://github.com/necrashter/minimal-vulkan-compute-shader and see if it outputs correctly?
11:09 x512[m]: At least SaschaWillems compute particles Vulkan demo worked fine for me.
11:09 tiredchiku[d]: which WSI option, wayland?
11:11 x512[m]: Haiku WSI implicit layer.
11:12 x512[m]: Internally it works by using GPU -> CPU memory copy.
11:14 x512[m]: Direct GPU display path is not supported yet.
11:14 snowycoder[d]: I think kepler `dsetp` hardware is bugged.
11:14 snowycoder[d]: `DSETP.NE.AND P0, PT, R4, R0, PT`
11:14 snowycoder[d]: with `R0 = 0xa126f3907ff02681, R4 = 0xd17acf3c85cba70b`
11:14 snowycoder[d]: returns `P0=0` even though they seem pretty different(?) neither of them are even NaN
11:15 x512[m]: MESA_VK_WSI_DEBUG=sw on Linux will do basically the same.
11:16 tiredchiku[d]: yeah but the vulkan samples have configure options to make them use different wsi
11:17 x512[m]: Both X11 and Wayland will work.
11:17 x512[m]: It do not really matter if using software display path.
11:18 x512[m]: Hardware display path will not work because of missing dma-buf support.
11:18 x512[m]: But it can be implemented using nvidia-drm.ko.
11:18 tiredchiku[d]: I see
11:18 tiredchiku[d]: well, something's wrong then
11:20 x512[m]: I plan to add Linux support to my branch, but patience is needed.
11:22 tiredchiku[d]: `[Thu Mar 27 16:52:00 2025] NVRM: Xid (PCI:0000:01:00): 31, pid=3644, name=main, Ch 0000001e, intr 00000000. MMU Fault: ENGINE GRAPHICS GPC2 GPCCLIENT_T1_4 faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ`
11:44 tiredchiku[d]: oh god damn it
11:44 tiredchiku[d]: am dum :D
11:44 tiredchiku[d]: I have handling code for every uncompressed pte_kind except the generic one :LUL:
12:05 tiredchiku[d]: oh
12:06 tiredchiku[d]: I was setting the wrong flags on mmap
12:06 tiredchiku[d]: k
12:07 tiredchiku[d]: :PogDuck:
12:08 asdqueerfromeu[d]: x512[m]: Is there a reason for not freeing the `hClient` handle on device destruction?
12:09 x512[m]: Probably just resource leak.
12:09 x512[m]: But it will be released anyway on ctl fd close.
12:10 x512[m]: It is still very WIP code with a lot of small problems. Freeing and error handling may be missing.
12:13 x512[m]: Subchannel handles freeing was recently added.
12:13 tiredchiku[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1354790490234421308/QM0nM6O.png?ex=67e69277&is=67e540f7&hm=c66ddf3a3d5cc1f6d2378bf011e7270562fe530a1ab3c991b0df24924b4182ca&
12:14 x512[m]: It worked? Great.
12:15 tiredchiku[d]: yeah, me setting the wrong flag was what was breaking things 😅
12:16 asdqueerfromeu[d]: tiredchiku[d]: Does the GPU still fail and cause a freeze?
12:17 tiredchiku[d]: no
12:17 tiredchiku[d]: I have the cube slowly spinnning while I'm texting
12:17 asdqueerfromeu[d]: tiredchiku[d]: So what was the problem here?
12:18 tiredchiku[d]: tiredchiku[d]: ^
12:18 tiredchiku[d]: wrong flag on mmap
12:18 tiredchiku[d]: caused the GPU to be unable to access the memory it needed to fill the command buffer
12:19 x512[m]: It it spins too slow, do not forget to addnVK_MEMORY_PROPERTY_HOST_CACHED_BIT flag here: https://github.com/X547/mesa/blob/c27e88c5ce81827949e7b2e3f9ccb651c89a0fa8/src/vulkan/wsi/wsi_common.c#L1787.
12:19 tiredchiku[d]: right
12:19 tiredchiku[d]: I don't have that in my tree
12:23 tiredchiku[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1354793010046435419/SkeqYwW.png?ex=67e694d0&is=67e54350&hm=0ce01682fa48fb364e7de00ab42e952f3834cce4889816cb2e785e271844ed09&
12:23 tiredchiku[d]: vkcube goes brr
12:24 x512[m]: Does it work directly on KMS?
12:25 tiredchiku[d]: how would I check 😅
12:25 x512[m]: It will currently not work if dma-buf export is required.
12:25 x512[m]: This demos are more interesting: https://github.com/SaschaWillems/Vulkan
12:25 tiredchiku[d]: do have those built, yeah
12:31 pavlo_kozlenko[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1354794950679724187/6d3d2339522c157a.png?ex=67e6969f&is=67e5451f&hm=a838dbaa30bb0d1a7cb95cca301f3b04156cfe979d5772ba5d5fb000b8aedee4&
12:31 pavlo_kozlenko[d]: tiredchiku[d]: that's what is needed for automatic reclocking, we need to determine at the right level when the graphics library is used (libGL.so for example), and switch the state of the video card
12:32 tiredchiku[d]: ?
12:46 tiredchiku[d]: that's an official nvidia tool (also supported on below turing)
12:46 tiredchiku[d]: just their "device monitor" of sorts
12:51 x512[m]: It seems I know where from it gets info. I will make GUI tool for GPU monitoring for Haiku later.
12:51 tiredchiku[d]: it uses libnvidia-ml.so
12:51 x512[m]: It use NVRM UAPI.
12:51 tiredchiku[d]: nope
12:51 x512[m]: libnvidia-ml.so have no source code, so it is useless.
12:52 tiredchiku[d]: nvidia-smi uses NVML, which is a wrapper around NVRM's uAPI
12:52 tiredchiku[d]: x512[m]: it has public API bindings however
12:52 tiredchiku[d]: https://docs.nvidia.com/deploy/nvml-api/
12:52 x512[m]: I can't recompile it for Haiku.
12:52 tiredchiku[d]: fair
12:53 x512[m]: So I will use NVRM UAPI directly.
12:58 x512[m]: All Linux DE sucks, so I am on Haiku :)
13:14 karolherbst[d]: haikus DE is nextstep inspired or something along those lines, no?
13:15 gfxstrand[d]: snowycoder[d]: So is dsetp not bugged, then?
13:18 x512[m]: BeOS (Haiku is reimplementation of BeOS) was made by former MacOS developers. It is inspired by classic MacOS and old UNIX DEs.
13:18 x512[m]: Also BeOS/Haiku have its own GUI server and protocol, not X11/Wayland. But it is conceptually similar to X11.
13:20 x512[m]: Haiku still active use server side 2D vector graphics. It are more modern and advanced compared to BeOS. I plan to accelerate it with Skia -> Vulkan -> NVK -> NVRM.
13:20 x512[m]: compared to X11
13:21 x512[m]: But it is probable off-topic here.
13:22 karolherbst[d]: meanwhile nvidia GPUs still have a dedicated 2D interface
13:23 karolherbst[d]: even supports polylines, though it doesn't support all the insanity which is vector graphics 😄
13:23 x512[m]: Haiku need antialiased vector graphics, gradients etc..
13:24 karolherbst[d]: there is limited support for that as well
13:24 karolherbst[d]: though mhh not sure if gradients actually work
13:25 avhe[d]: karolherbst[d]: do you mean fermi_2d?
13:26 karolherbst[d]: yeah
13:26 karolherbst[d]: looks like it doesn't support gradients unless I missed something
13:26 avhe[d]: i always thought of it as a funny copy engine
13:26 karolherbst[d]: it's more than that
13:27 avhe[d]: looks like it
13:27 karolherbst[d]: though the main purpose was scan out
13:27 karolherbst[d]: and basic compositing
13:27 karolherbst[d]: it even supports basic blending
13:27 avhe[d]: on the switch they use it for copies, that's why i had this misconception
13:27 karolherbst[d]: polylines is I think the most crazy feature it supports 😄
13:28 karolherbst[d]: yeah.. the good thing is, it supports color formats
13:28 karolherbst[d]: bad thing is, not all of them
13:28 x512[m]: 8 bit palette too?
13:28 karolherbst[d]: no palette
13:28 karolherbst[d]: it's all single color operation
13:28 karolherbst[d]: though....
13:28 karolherbst[d]: mhhh
13:29 karolherbst[d]: there is an index color mode...
13:29 avhe[d]: karolherbst[d]: so it's actually not too far from the VIC on tegra
13:29 karolherbst[d]: maybe that's good enough for palette
13:29 karolherbst[d]: https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/twod/cl902d.h
13:30 karolherbst[d]: mhhhh
13:30 karolherbst[d]: yeah.. I think you can do a palette with index...
13:30 karolherbst[d]: just not sure how that works
13:31 avhe[d]: x512[m]: i have userland code for reading clocks from RUSD if you need a reference
13:31 avhe[d]: <https://github.com/averne/Envideo/blob/master/src/nvidia/device.cpp#L371-L376>
13:31 avhe[d]: <https://github.com/averne/Envideo/blob/master/src/nvidia/device.cpp#L304-L313>
13:31 avhe[d]: you can easily extend that to power draw etc by playing with flags
13:31 x512[m]: Why NVK instantiate FERMI_TWOD_A subchannel? NVK use legacy 2D graphics?
13:32 karolherbst[d]: it hasn't changed since fermi
13:32 karolherbst[d]: but yeah, some APIs use 2D
13:32 snowycoder[d]: gfxstrand[d]: Nope, I just don't know how to make IPA work.
13:32 snowycoder[d]: (But I created some neat dsetp tests)
13:32 karolherbst[d]: it's good for blitting
13:33 x512[m]: Some simple Vulkan blitting commands that do not use shaders?
13:33 karolherbst[d]: though not sure if nvk still uses it
13:35 karolherbst[d]: yeah.. I think all the 2d code got removed
13:35 tiredchiku[d]: banished into the dark realm
13:36 asdqueerfromeu[d]: karolherbst[d]: https://discord.com/channels/1033216351990456371/1034184951790305330/1180160032583729284
13:36 x512[m]: avhe[d]: I also got an answer about it from Nvidia developers: https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/157#discussioncomment-10381935
13:37 tiredchiku[d]: milos is also in this server/channel
13:37 tiredchiku[d]: @ notthatclippy
13:38 karolherbst[d]: I still like how I implemented fill on 2D lol
13:40 avhe[d]: x512[m]: ah i wasn't aware of that discussion... i just traced nvidia-smi and figured it out from there
13:40 avhe[d]: also if you set __NVML_USE_RUSD=0, it pokes at control cmds 0x20809004 and 0x20808545, which i wasn't able to find in the repo
13:41 avhe[d]: that's still a mystery to me
13:45 x512[m]: Probably forwarded to GSP.
13:45 avhe[d]: that was my guess too
13:45 x512[m]: Nvidia developers said that GSP monitoring commands cause lags and shutter because of internal locking.
13:46 avhe[d]: they say that but then nvml forces an rusd update at each clock query :)
13:47 avhe[d]: interesting thread though
13:47 avhe[d]: <https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/157#discussioncomment-12306234>
13:47 avhe[d]: this was always confusing to me too
13:49 gfxstrand[d]: snowycoder[d]: Right... The details of IPA change ever few generations.
13:50 gfxstrand[d]: Someone keeps going "I think I can make multisampling better!" and I'm not sure they ever do.
13:52 tiredchiku[d]: tiredchiku[d]: faith!
13:52 gfxstrand[d]: I saw!
13:52 gfxstrand[d]: Congrats! :transtada128x128:
13:52 tiredchiku[d]: <a:HypedAsFuckBoi:1295407078327844934>
13:52 tiredchiku[d]: thoughts on a draft MR after I clean up device enumeration?
13:52 gfxstrand[d]: Go for it
13:53 gfxstrand[d]: MR often. MR early. That's my motto
13:53 snowycoder[d]: gfxstrand[d]: I've got something broken even with ipa.const, I'm starting to think that the problem's outside the shaders (addresses or commands?).
13:53 snowycoder[d]: Well, I can't debug today 😦
13:53 gfxstrand[d]: It's way easier to review MRs than branches.
13:53 gfxstrand[d]: snowycoder[d]: That's plausible. There are SPH fields that go along with `.const` and it's possible something is wrong there.
13:54 gfxstrand[d]: And there might be MMIO state, too, potentially
13:56 tiredchiku[d]: gfxstrand[d]: okie
13:56 tiredchiku[d]: then I might make a not-draft instead, since sw rendered wsi works and the device enumerates
13:57 tiredchiku[d]: and then build it upstream
14:07 gfxstrand[d]: That's fine. Or it can live in an MR for a bit. It's unlikely to run into many merge conflicts since it's mostly in NVKMD. But as long as it's behind an environment variable and the code is reasonably clean, I'm okay with iterating in-tree. Especially if that helps you and x512[m] coordinate.
14:08 gfxstrand[d]: But be warned. As soon as it's in the tree, it'll make Phoronix and you'll have a bunch of people asking you to fix their issues with it.
14:08 gfxstrand[d]: Sometimes it's better to let things bake in an MR for a bit. 😅
14:08 mohamexiety[d]: yeah :KEKW:
14:09 gfxstrand[d]: But leaving something as a draft MR for a while is fine, too. MR doesn't mean "merge tomorrow"
14:14 x512[m]: tiredchiku[d]: This was fixed in another way https://gitlab.freedesktop.org/Sid127/mesa/-/commit/896f82b42ddd91ae26876a2a04a3ee5f0513914f
14:14 Jasper[m]: Anyone have the discord invite link on hand? Might be an easier read if something doesn't survive the D->IRC->Matrix translation
14:14 tiredchiku[d]: https://discord.gg/D232F7C3
14:14 x512[m]: No need to pass ctlFd as separate argument.
14:15 x512[m]: Discord banned me immediately after login for the first time. And their tech support is useless.
14:16 jja2000[d]: That's a bummer
14:17 jja2000[d]: It's not my preference either, but I don't have too much to say about fd.org hahaha
14:17 x512[m]: Open protocol and open servers are better.
14:17 Jasper[m]: jja2000[d]: (This is me btw)
14:18 jja2000[d]: (Seems like discord username is bridged as opposed to displayname)
14:21 karolherbst[d]: I should check if the bridge has an option for that actually
14:23 gfxstrand[d]: Looks like CI is back. :transtada128x128:
14:27 gfxstrand[d]: I guess I don't have an excuse for not reviewing patches anymore. 😂
14:29 x512[m]: What is left in NVK Video decode MR?
14:29 karolherbst[d]: mhhhh
14:35 gfxstrand[d]: x512[m]: Review. Finalizing the Rust stuff. There's another codec to implement (but that doesn't need to happen before landing).
14:43 esdrastarsis[d]: tiredchiku[d]: did you update your branch? I wanna test vkcube on Turing, no rush ofc
14:46 tiredchiku[d]: yes
15:10 tiredchiku[d]: x512[m]: you do, actually
15:11 tiredchiku[d]: the memFd has to be /dev/nvidiaX if you're mapping VRAM and /dev/nvidiactl if you're mapping system mem
15:11 tiredchiku[d]: in the parameters, I mean
15:11 tiredchiku[d]: but the api call always has to go to the ctlFd
15:12 tiredchiku[d]: mmap was failing for me on ampere without that change ¯\_(ツ)_/¯
15:26 avhe[d]: correct
15:27 avhe[d]: you can easily check that by tracing cuMemAlloc
15:49 x512[m]: tiredchiku[d]: It was fixed in my branch. Now fd is almost always ctlFd.
15:49 x512[m]: And file name is used to create map fd.
15:50 tiredchiku[d]: dunno, I had many mmap fails
15:50 tiredchiku[d]: so I changed it till it worked and matched what the kernel expected
15:50 tiredchiku[d]: ¯\_(ツ)_/¯
16:14 gfxstrand[d]: mhenning[d]: I rebased your sched MR on my Volta latency fix. I'll review and land soon. I really am going to land stuff. 😂
16:18 gfxstrand[d]: mhenning[d]: If you wouldn't mind reviewing the rest of the fp64->fp16 MR, that'd be cool. I don't care as much about the final Intel patch but I'd like to land all the maxwell fixes before 25.1 branches.
16:19 gfxstrand[d]: And I'll either flip conformant to true before the branch point or flip it and Cc 25.1 a week after.
16:20 gfxstrand[d]: Probably best to do the later to avoid unsubstantiated phoronix articles. 😂
16:27 tiredchiku[d]: avhe[d]: question re: acquiring classlist... how do you tell which class is what :D
16:27 tiredchiku[d]: as far as I can tell the kernel just gives you a list of classes and you're left to figure things out
16:29 avhe[d]: either you query the entire list of classes for your chip using NV0080_CTRL_CMD_GPU_GET_CLASSLIST_V2 and rely on the low 8 bits to determine which class it is (eg. 0x97 for 3D)
16:29 avhe[d]: but that's not supposed to be stable as was discussed last week (was it?)
16:30 tiredchiku[d]: :notLikeCat:
16:30 avhe[d]: the more correct way, or at least one i've found is to use NV2080_CTRL_GPU_GET_ENGINE_CLASSLIST_PARAMS
16:31 tiredchiku[d]: oh, huh
16:31 avhe[d]: you give it a NV2080_ENGINE_TYPE, it will return a list of class ids that match <https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/common/sdk/nvidia/inc/class/cl2080_notification.h>
16:31 avhe[d]: see <https://github.com/averne/Envideo/blob/master/src/nvidia/device.cpp#L290-L302>
16:32 tiredchiku[d]: interesting
16:33 tiredchiku[d]: so I can give it NV2080_ENGINE_TYPE_GRAPHICS and it'll return what should be the value of cls_eng3d?
16:33 avhe[d]: actually looking at that code again, you probably don't need to do the ioctl twice if you're just gonna take the first id... just pass 1 from the beginning
16:34 tiredchiku[d]: mm
16:35 tiredchiku[d]: another question, bit of a long shot
16:35 avhe[d]: avhe[d]: yep that works
16:35 tiredchiku[d]: for `NV2080_CTRL_GR_GET_INFO_PARAMS`
16:35 tiredchiku[d]: if I pass ```NvU32 grInfoList[3] = {
16:35 tiredchiku[d]: NV2080_CTRL_GR_INFO_INDEX_SM_VERSION,
16:35 tiredchiku[d]: NV0080_CTRL_GR_INFO_INDEX_LITTER_NUM_SM_PER_TPC,
16:35 tiredchiku[d]: NV2080_CTRL_GR_INFO_INDEX_MAX_WARPS_PER_SM
16:35 tiredchiku[d]: };```
16:35 tiredchiku[d]: can I expect things to come back in the same order?
16:36 tiredchiku[d]: s/2080\/0080
16:38 avhe[d]: i haven't used that one but that looks like it, yeah
16:38 tiredchiku[d]: okay, as long things come back in the same order..
16:40 tiredchiku[d]: I think the biggest question mark is retrieving `.chipset_name = "GA104",`
16:40 tiredchiku[d]: I don't think the kernel driver has it
16:41 avhe[d]: tiredchiku[d]: yep
16:41 avhe[d]: look in kernel_graphics.c:2729
16:41 avhe[d]: it just writes to the data member
16:41 mhenning[d]: tiredchiku[d]: I'm not sure chipset_name is used for much - it might just be in the name of the device
16:42 tiredchiku[d]: in that case I can leave it empty
16:42 tiredchiku[d]: meaning NVK on openrm will say
16:42 tiredchiku[d]: GPU MARKETING NAME (NVK)
16:42 tiredchiku[d]: :ha:
16:42 tiredchiku[d]: instead of NVK GA104 or whatever
16:43 mhenning[d]: yeah, that's probably fine
16:45 avhe[d]: you can get that from NV2080_CTRL_CMD_GPU_GET_NAME_STRING
16:45 avhe[d]: (the product name, not the chipset name)
16:45 tiredchiku[d]: did that already :D
17:25 gfxstrand[d]: tiredchiku[d]: I'm good with that
17:26 tiredchiku[d]: quick question, if I wanted to use nvkmd_pdev in nvkmd_nvrm_create_exec_ctx, how would I do that
17:27 tiredchiku[d]: the ctx function only gets nvkmd_dev
17:27 tiredchiku[d]: oh nvm nvkmd_pdev is in nvkmd_dev
17:27 tiredchiku[d]: should've looked before asking :LUL:
17:33 gfxstrand[d]: No worries
17:33 gfxstrand[d]: Yeah, it's all tied together
17:45 tiredchiku[d]: deviceName = NVIDIA GeForce RTX 3070 (NVK )
17:47 mhenning[d]: yeah, if you want to be picky you could remove the extra space when the chipset name is empty
17:48 tiredchiku[d]: `NV2080_CTRL_GR_INFO_INDEX_SM_VERSION` returns 0xc
17:48 tiredchiku[d]: probably not what we want
17:49 mhenning[d]: Not sure what 0xc corresponds to in this case
17:49 tiredchiku[d]: oh I just printed everything in hexadecimal
17:49 tiredchiku[d]: 12
17:50 mhenning[d]: Right, but what is 12?
17:51 tiredchiku[d]: same with NV2080_CTRL_GR_INFO_INDEX_MAX_WARPS_PER_SM, not what we want
17:51 tiredchiku[d]: * NV2080_CTRL_GR_INFO_INDEX_SM_VERSION
17:51 tiredchiku[d]: * This index is used to determine the SM version.
17:51 tiredchiku[d]: * A value of 0 indicates the GPU does not support this function.```
17:51 tiredchiku[d]: I'm confused.
17:52 tiredchiku[d]: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/c5e439fea4fe81c78d52b95419c30cabe44e48fd/src/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080gr.h#L278-L343
17:54 tiredchiku[d]: guess I'll just go back to using the nouveau winsys functions for these c:
17:58 tiredchiku[d]: oh
17:58 tiredchiku[d]: nvm
17:58 tiredchiku[d]: a param in the ioctl was wrong
17:59 tiredchiku[d]: I correctly get 84(decimal) now, for the sm
18:02 tiredchiku[d]: no wait nvm it was garbage data (breakpoint in the wrong place)
18:02 tiredchiku[d]: back to the old functions I suppose
18:42 tiredchiku[d]: am I doing something wrong here
18:43 tiredchiku[d]: NV2080_CTRL_FB_GET_INFO_PARAMS memParams = {
18:43 tiredchiku[d]: .fbInfoListSize = 2,
18:43 tiredchiku[d]: };
18:43 tiredchiku[d]: NvU32 fbInfoList[2] = {
18:43 tiredchiku[d]: NV2080_CTRL_FB_INFO_INDEX_HEAP_SIZE,
18:43 tiredchiku[d]: NV2080_CTRL_FB_INFO_INDEX_BAR1_SIZE
18:43 tiredchiku[d]: };
18:43 tiredchiku[d]: memParams.fbInfoList = (NvP64)fbInfoList;
18:43 tiredchiku[d]: nvRmApiControl(&rm, pdev->hDevice, NV2080_CTRL_CMD_FB_GET_INFO, &memParams, sizeof(memParams));```
18:44 tiredchiku[d]: struggling to access the sizes
18:44 tiredchiku[d]: https://github.com/NVIDIA/open-gpu-kernel-modules/blob/c5e439fea4fe81c78d52b95419c30cabe44e48fd/src/common/sdk/nvidia/inc/ctrl/ctrl2080/ctrl2080fb.h#L90C1-L95C46
18:49 tiredchiku[d]: I am, yes
18:50 mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1354890232503205999/message.txt?ex=67e6ef5c&is=67e59ddc&hm=5755223a8141104fc858a257d11b96e75d9e584cf5477fa8fd75e9feba679262&
18:50 mohamexiety[d]: ok I managed to kill it \o/ gfxstrand[d] airlied[d]
18:50 mohamexiety[d]: sparse CTS
18:50 mohamexiety[d]: more specifically: `'dEQP-VK.sparse_resources.buffer.transfer.rebind.buffer_size_2_16'`
18:52 mohamexiety[d]: not sure if I am reading the trace properly but it seems like it's dying in the lowest layer? i.e. `gf100_vmm_pgt_unmap`
18:52 mohamexiety[d]: so something isnt being passed correctly that low :thonk:
19:07 tiredchiku[d]: tiredchiku[d]: notthatclippy[d] send help :D
19:07 tiredchiku[d]: I'm struggling to get the data out
19:10 avhe[d]: tiredchiku[d]: that doesn't look correct
19:11 avhe[d]: you need to pass it a pointer to NV2080_CTRL_FB_INFO
19:11 tiredchiku[d]: :sigh:
19:11 avhe[d]: for each of these you set .index to whatever you want, and you read back the .data member
19:11 tiredchiku[d]: ..huh
19:12 avhe[d]: it works the same way as NV2080_CTRL_GR_GET_INFO_PARAMS
19:12 avhe[d]: * This buffer must be at least as big as fbInfoListSize multiplied
19:12 avhe[d]: * by the size of the NV2080_CTRL_FB_INFO structure.```
19:12 tiredchiku[d]: ~~I didn't get that working either :D~~
19:12 avhe[d]: well that's probably why
19:12 tiredchiku[d]: 😅
19:17 tiredchiku[d]: nope, that's still zero
19:17 tiredchiku[d]: ..wait
19:17 tiredchiku[d]: I may be dumb
19:18 tiredchiku[d]: didn't pass fbInfoList to params 😅
19:18 mohamexiety[d]: mohamexiety[d]: Ok so interesting thing; this only happens as long as the nvk change is applied (the one that aligns the buffers to 64KiB or 2MiB if the size allows for it). Which means it’s due to the larger page stuff :thonk:
19:33 tiredchiku[d]: bah
19:33 tiredchiku[d]: I'm not able to wrap my head around this right no
19:37 esdrastarsis[d]: tiredchiku[d]: Which are the correct flags for mmap?
19:37 tiredchiku[d]: 0
19:37 tiredchiku[d]: why
19:38 tiredchiku[d]: there's a difference between how turing does semaphore surface and how ampere does it
19:38 tiredchiku[d]: for turing you might have to set this to hMemoryPhys too
19:38 tiredchiku[d]: https://gitlab.freedesktop.org/Sid127/mesa/-/blob/nvk-openrm/src/nouveau/vulkan/nvkmd/nvrm/nvRmSemSurf.c?ref_type=heads#L67
19:39 esdrastarsis[d]: I'm still having the alloc_tiled_mem error on Turing
19:39 tiredchiku[d]: that's not an error
19:39 tiredchiku[d]: it's debug output
19:39 esdrastarsis[d]: tiredchiku[d]: yeah, but vkcube still doesn't work
19:39 tiredchiku[d]: :P
19:39 tiredchiku[d]: did you pass MESA_VK_WSI_DEBUG=sw?
19:39 tiredchiku[d]: swapchain creation fails atm
19:40 tiredchiku[d]: hardware accelerated wsi doesn't work, that is
19:40 esdrastarsis[d]: okay, its working now
19:40 esdrastarsis[d]: ty
19:40 tiredchiku[d]: :ThumbsUp:
19:40 esdrastarsis[d]: wsi shenanigans
19:40 tiredchiku[d]: ` 8 files changed, 7301 insertions(+), 29 deletions(-)`
19:40 tiredchiku[d]: chonker commit
19:40 tiredchiku[d]: most of it is headers 😅
19:42 mhenning[d]: It might make sense to put the header import into a separate commit (sometime before review that is, not immediately important)
19:42 tiredchiku[d]: yup
19:43 tiredchiku[d]: already doing that 😅
19:43 tiredchiku[d]: just the "wip" commits that are squashed into chonkers
19:48 tiredchiku[d]: okay, this is for tomorrow me to figure out
19:48 esdrastarsis[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1354904954782875904/Screenshot_2025-03-27_16-48-21.png?ex=67e6fd12&is=67e5ab92&hm=da8869f89693323565734b90d5d5d9a57628dc4da813ed2a708f011a6c3d3649&
19:48 esdrastarsis[d]: berf
19:49 tiredchiku[d]: yeah, software WSI really should be using CACHED memory
19:49 tiredchiku[d]: src/vulkan/wsi/wsi_common.c:1787
19:50 tiredchiku[d]: ```diff
19:50 tiredchiku[d]: - return wsi_select_memory_type(wsi, VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
19:50 tiredchiku[d]: + return wsi_select_memory_type(wsi, VK_MEMORY_PROPERTY_HOST_CACHED_BIT,
19:51 esdrastarsis[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1354905786408767498/Screenshot_2025-03-27_16-51-34.png?ex=67e6fdd8&is=67e5ac58&hm=49c5a3d347764df0a84f4beb8303a6c1d7f6b100277d96a672bce21812395097&
19:51 esdrastarsis[d]: 🏎️
19:54 tiredchiku[d]: looking for both properties is probably the better option
19:54 pavlo_kozlenko[d]: esdrastarsis[d]: in the middle of the jackal?
19:58 esdrastarsis[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1354907573970796654/Screenshot_2025-03-27_16-58-13.png?ex=67e6ff82&is=67e5ae02&hm=156f7f3c21e4d3ccc0f06d4d678b0c66bf072cb5a769dbbc7acade007f522cb4&
19:58 esdrastarsis[d]: tiredchiku[d]: znvk :happy_gears:
20:00 tiredchiku[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34247
20:00 tiredchiku[d]: :froge:
20:04 asdqueerfromeu[d]: esdrastarsis[d]: ~~Does DXVK work?~~
20:17 esdrastarsis[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1354912292952608949/Screenshot_2025-03-27_17-17-25.png?ex=67e703e7&is=67e5b267&hm=7d1aa6e87af846ab3240ce410e3839303551e386c3200c626d40b6ec9b5b2894&
20:17 esdrastarsis[d]: asdqueerfromeu[d]: yeah...
20:17 esdrastarsis[d]: black screen, but I can hear the unigine heaven benchmark sound
20:24 x512[m]: tiredchiku[d]: See https://github.com/X547/mesa/commit/ce0d96ba8513356325d8113cb87094a43e9ea3d6#diff-f8242d3c736ff7ff7e088d4225c824e47330e17a9c937818e440ca91442a7291R162
20:25 x512[m]: That is real fix. No adding ctlFd arg to nvRmApiMapMemory is needed.
20:26 tiredchiku[d]: but then what do you do about the ioctls that have to be directed at the device fd only
20:26 x512[m]: It is unrelated to GPU version. All ioctls should be done to ctlFd, not actual device fd.
20:26 x512[m]: tiredchiku[d]: There are no such ioctls.
20:27 x512[m]: Or almost no.
20:28 x512[m]: I do not want my early mistakes get upstreamed.
20:29 tiredchiku[d]: at that point having nvkmd_nvrm_dev_api_dev itself is redundant
20:29 x512[m]: I was under wrong impression that some ioctl calls need to be done on actual device id.
20:29 tiredchiku[d]: they do though
20:29 tiredchiku[d]: they're rare but they exist
20:30 x512[m]: nvkmd_nvrm_dev_api_dev is useful only to set device file name.
20:30 x512[m]: I will refactor that later, so please not rush with upstreaming.
20:37 tiredchiku[d]: x512[m]: incorporated that and pushed to my tree
20:46 ristovski[d]: Seems like NVAPI on Windows uses a bunch of undocumented ctrl2080 calls for all the fancy stuff like V/F curves/pstates. I should really get a Win10 VM with VFIO up...
20:55 x512[m]: ristovski[d]: Any info how to call NVRM on Windows?
20:55 jja2000[d]: esdrastarsis[d]: "Sounds like it's running well"
20:56 ristovski[d]: x512[m]: Sorry, no clue - I haven't actually looked at the Windows part at all yet
20:56 ristovski[d]: Are you interested in something specific?
20:57 x512[m]: I suppose it use DeviceIoControl? What are device name and cmd number?
21:04 ristovski[d]: It does indeed seem to use DeviceIoControl
21:07 x512[m]: nvRmIoctl in my code is supposed to provide platform abstraction over NVRM. So on Windows I suppose it should use DeviceIoControl.
21:09 x512[m]: nvRmApi.h should be usable for Haiku, Linux and Windows, so I do not want to allow OS-details leaking into its API such as fd parameters.
21:18 esdrastarsis[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1354927485963210942/Screenshot_2025-03-27_17-48-50.png?ex=67e7120d&is=67e5c08d&hm=a04f2000108625512f4345008b1de91cf8691c69bedee0493e6443eefcbd7859&
21:18 esdrastarsis[d]: shadPS4 works on nvk-openrm, the talos principle too (only on vulkan renderer)
21:24 tiredchiku[d]: insane
21:31 user0: how would i go about debugging the data that comes in and out of my graphics driver?
21:33 tiredchiku[d]: userspace or kernel
21:33 orowith2os[d]: Strongly depends on what you're looking for. Vulkan calls, OpenGL, shaders, etc
21:33 tiredchiku[d]: that too
21:49 user0: kernel, i want an understanding of how my kernel and graphics driver communicate to each other. Curious how the hardware takes in code and where the code ends up after processing or mid processing? are there debuggers out there or should i keep an eye out for specifc calls in gdb, documentation?
21:57 pixelcluster[d]: if you're interested in kernel<->userspace interfaces you should probably read up on the uapi, if it's about kernel<->hardware communication it's probably best to read kernel code, if it's about driver<->hardware interface it's probably best to read usermode driver code
21:57 pixelcluster[d]: e.g. if you're interested in how shader code gets turned into gpu code look at the shader compiler parts
21:58 pixelcluster[d]: it's a lot to take in so I wouldn't expect a quick skim through the code nor starting to step through it with a debugger to answer all your questions right away
22:14 rinlovesyou[d]: hmm my login manager isn't showing up anymore
22:14 rinlovesyou[d]: on nvk anyways
22:23 user0: pixelcluster[d]: whats "usermode driver code"? if im curious about kernel<-> hardware i can see looking at the kernel code. are you suggesting a specific place to look at inside the graphics driver? or does it refer to your e.g. with shader compilers?
22:24 mhenning[d]: tiredchiku[d]: tbh I'm surprised anything works okayish without fence support