00:21 gfxstrand[d]: I finally got annoyed enough....
00:21 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34853
00:27 gfxstrand[d]: I blame Alyssa for getting me in the warning deleting mood
00:27 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34847
00:28 HdkR: Nice nice
09:13 karolherbst[d]: weird question, but why gets nak compiled twice?
10:29 karolherbst[d]: `%r2068 = hfma2 %r2011 0x4000.xx %r2059` crashes inside RA
10:29 karolherbst[d]: `self.try_coalesce(*dst_ssa, ©.src)` is the part, but `try_coalesce` only works on scalars it seems
10:30 karolherbst[d]: well.. it crashes/asserts on the `%r2091 = copy 0x40004000` rather
10:30 karolherbst[d]: as that got turned into
10:30 karolherbst[d]: %r2091 = copy 0x40004000
10:30 karolherbst[d]: %r2068 = hfma2 %r2011 %r2091 %r2059
10:31 karolherbst[d]: `src_reg.comps()` is 2, and it's expected to be 1
10:53 snowycoder[d]: mangodev[d]: There are minecraft mods to run it directly on Vulkan
10:54 snowycoder[d]: karolherbst[d]: In my setup, the second compilation is for the tests (try to put an error or warning in there and you might see it present on the second compilation but not in the first one)
10:55 karolherbst[d]: snowycoder[d]: ohhh... right...
10:58 snowycoder[d]: gfxstrand[d]: should I merge the VILD in the initial MR? It still fails in some cases but it's more correct than the wrong ISBERD.
10:58 snowycoder[d]: (Should I remove them both and add VILD in a later patch?)
12:36 gfxstrand[d]: karolherbst[d]: One compilation is to build the unit tests and the other is to build the library that gets linked into NVK.
12:39 gfxstrand[d]: snowycoder[d]: 🤷🏻♀️ Whatever you prefer. I would keep it as a separate commit but I'd say it can go in now.
12:39 gfxstrand[d]: The first batch of stuff we land doesn't have to be perfect.
12:43 gfxstrand[d]: Probably better to get it in than to wait around for it to be perfect at this point.
12:44 karolherbst[d]: it's always impressive how much NAK_DEBUG=serial tanks performance
13:08 karolherbst[d]: mhhhh
13:08 karolherbst[d]: Does NAK do any kind of load/store vectorization?
13:12 gfxstrand[d]: No. We really need to improve that.
13:12 gfxstrand[d]: There are NIR passes we can call. I just haven't bothered
13:13 karolherbst[d]: I see
13:14 karolherbst[d]: kinda need a solution for this patch: https://gitlab.freedesktop.org/airlied/mesa/-/commit/df54aaaa80978a6c5464d846316c72ba9ba698ee this seems to make a huge difference alone
13:19 karolherbst[d]: guess I'll figure out vectorization then..
13:23 karolherbst[d]: mhh nak does call into `nir_opt_load_store_vectorize` though it seems
13:24 karolherbst[d]: maybe it can't figure out that those thing can be vectorized
13:32 gfxstrand[d]: I'm not sure then. You'd have to look into it deeper
13:41 karolherbst[d]: maybe needs some rebalance or cse...
14:40 snowycoder[d]: gfxstrand[d]: Ok, Kepler+ support MR is ready! https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34329
14:41 gfxstrand[d]: Sweet! I'll give it another read today
14:41 snowycoder[d]: Thanks!
14:48 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1369687236286873750/1e8d3bab-e924-43a7-bb36-fe655b4397b3.png?ex=681cc424&is=681b72a4&hm=ddcd840a5b0fc66c790a358aea18ad9615472d4c3d9c086c4fd5ede9f6685269&
14:48 gfxstrand[d]: Let's goooooooo!
14:51 Lynne: that's a weird-looking space heater
14:52 snowycoder[d]: Motherboard hex thingy: "AAAAAAAAAA"
14:53 tiredchiku[d]: it's a face
14:53 tiredchiku[d]: the radiator fans up top are the eyes, the GPU is the mouth
14:56 mohamexiety[d]: gfxstrand[d]: congrats! one thing since it's not clear from here but if you dont have a support for the card I'd strongly recommend one as the thicc cards have issues with PCBs cracking and such over time due to sheer weight
14:57 gfxstrand[d]: I've got the machine laying on it's side
14:57 mohamexiety[d]: yup that works too
14:59 tiredchiku[d]: BFG(pu)
15:01 mohamexiety[d]: https://www.nvidia.com/en-us/geforce/news/rtx-3090-out-september-24/ they actually did use that term in marketing in the past :HyperKek:
15:01 mohamexiety[d]: 5090 is so BF it makes the 3090 look tiny though. 2x in practically everything -- including compute
15:10 gfxstrand[d]: snowycoder[d]: Can you quick pull the OpVild changes into a separate commit before the one which adds the SM20 back-end?
15:10 gfxstrand[d]: Or I can do it if you don't have the time
15:12 marysaka[d]: mohamexiety[d]: I should put back mine for my poor RDNA 4 GPU wasn't able to put it back when I tried :blobcatnotlikethis:
15:13 gfxstrand[d]: The card came with a little adjustable support doodad. I'll use that if I ever set the machine on edge again.
15:14 mohamexiety[d]: yeeep
15:14 snowycoder[d]: gfxstrand[d]: Yep, give me 5 mins
15:38 snowycoder[d]: gfxstrand[d]: Done!
15:39 gfxstrand[d]: Thanks!
15:44 gfxstrand[d]: What branch are y'all using for Blackwell?
15:44 mohamexiety[d]: Dave’s one is the most complete but mine has the qmd stuff. I could rebase on top of his in a bit though
15:44 mohamexiety[d]: But mine also doesn’t compile anyways given the missing qmd fields
15:46 gfxstrand[d]: I meant kernel branch
15:47 mhenning[d]: karolherbst[d]: Yes, nak uses nir_opt_load_store_vectorize like you noticed.
15:47 mohamexiety[d]: gfxstrand[d]: Oh one sec
15:48 mhenning[d]: Sometimes we miss vectorization because the alignment information isn't good enough. I've been working on a nir pass to derive better alignment where possible
15:48 mhenning[d]: Also, I'm not entirely sure we do the vectorization in the right order for things like ssbos
15:48 mohamexiety[d]: gfxstrand[d]: https://lore.kernel.org/nouveau/20250429233929.24363-1-bskeggs@nvidia.com/T/ it's in the mailing list now
15:49 mhenning[d]: But yes, in general NAK is supposed to be vectorizing right now
15:49 mohamexiety[d]: https://gitlab.freedesktop.org/bskeggs/nouveau/-/commits/03.01-gb20x?ref_type=heads
15:49 mohamexiety[d]: https://gitlab.freedesktop.org/bskeggs/linux-firmware/-/commit/1c457ce8dc792dcb57246b8e097a05d2cc4bce90 with these being the branches in fdo gitlab
15:50 gfxstrand[d]: IDK if I feel better or worse that Windows doesn't know how to drive the display on this thing, either. 😅
15:50 mohamexiety[d]: wait what :SufferingInside:
15:51 mohamexiety[d]: on boot with changed GPU windows does yeet the gfx drivers so maybe it's just not installed yet?
15:51 gfxstrand[d]: I flipped back to my 3060 and ran Windows update like 4 times
15:52 gfxstrand[d]: Maybe if I download a driver directly from NVIDIA?
15:52 mohamexiety[d]: could try that yeah
15:52 mohamexiety[d]: but note that every time you flip, windows just deletes the driver
15:52 mohamexiety[d]: it does that on HW change
15:52 gfxstrand[d]: That's no good then
15:53 mohamexiety[d]: 5070 was plug and play. it just booted into glorious 1024x768 using the basic microsoft adapter, then winupdate downloaded the drivers and it worked™️
15:54 gfxstrand[d]: I get a black screen
15:54 gfxstrand[d]: But maybe I need to let it sit longer
15:58 mohamexiety[d]: hmm. yeah try that then
16:01 LastRecluse: Hey, I'm trying to get nouveau working on an NV160 but I see in xf86-video-nouveau/src/nv_driver.c:NVHasKMS it's not supported? I see many supported features on the feature page, is there anything I can do to get it up and running?
16:04 mhenning[d]: xf86-video-nouveau is pretty much deprecated. use the modesetting driver instead
16:07 LastRecluse: I see, I'll give it a shot thanks
16:24 mangodev[d]: snowycoder[d]: i'm very aware, but i find it harder to pinpoint areas where the driver struggles because the vulkan pipeline is more limited
16:25 gfxstrand[d]: Anyone else hit this with Ben's branch?
16:25 gfxstrand[d]: ./usr/include/cxl/features.h:11:10: fatal error: uuid/uuid.h: No such file or directory
16:25 gfxstrand[d]: 11 | #include <uuid/uuid.h>
16:25 gfxstrand[d]: | ^~~~~~~~~~~~~
16:25 mangodev[d]: gfxstrand[d]: ~~glad i don't use branches~~
16:26 dwfreed: gfxstrand[d]: sounds like you just need your distro's libuuid dev package?
16:28 gfxstrand[d]: Or I can just disable CXL
16:31 mangodev[d]: although i do wanna try more branches in the future
16:31 mangodev[d]: though speaking of
16:31 mangodev[d]: gfxstrand[d] any progress on the review of the prepass scheduler? i remember mel asking you to review it when it was updated to latest changes in main
16:31 mangodev[d]: i just checked the branch and it's still updated as of 22 minutes ago (surprisingly)
16:32 gfxstrand[d]: No. I'm just now finally starting to get back on top of my life. I'm hoping to start burning down the backlog soon.
16:33 mhenning[d]: mangodev[d]: yeah, it needed a rebase - I was just going through branches and resolving conflicts
16:33 mangodev[d]: gfxstrand[d]: ah, perfectly fair, hope things are going well and you can get back on your feet :)
16:33 mangodev[d]: mhenning[d]: i would think so from the latest commits
16:34 mangodev[d]: i'm really excited to try it because of the performance gains it has
16:35 mangodev[d]: averages are so weird, there's no way all of those shaderdb runs averaged to a -3% cycle count 🫠
16:35 mangodev[d]: maybe because of the two outliers that were +1?
16:36 mhenning[d]: Yeah, I think my shaderdb might be weighted somewhat strangely
16:36 mangodev[d]: although honestly i'd rather take the sacrifice of +1% worse performance on certain workloads and -30% to 40% on the rest, seems like a steal
16:37 mangodev[d]: and it's some big boosts to tests like anti alias and shadow mapping, which should help in running actual programs
16:38 mangodev[d]: everywhere where the driver currently struggles in at least a bit has major performance boosts :D
16:39 mangodev[d]: what does the pre pass even do that the post pass can't?
16:40 LastRecluse: I'm trying to run applications through nouveau on a remote server through ssh forwarding. I want to use the GPU but when I "glxinfo" I see "OpenGL renderer string: llvmpipe (LLVM 19.1.7, 128 bits)" nouveau is loaded and I don't see any major issues from dmesg(just no pmu fw)
16:41 LastRecluse: I'm not really show how to go forward from here, any advice would be helpful.
16:41 mhenning[d]: The main disadvantage of postpass is that the register allocator creates these false dependencies for things that just happen to use the same register. So it adds additional constraints
16:41 mhenning[d]: also after register allocation runs it's impossible to schedule in such a way that we reduce the register count
16:42 mhenning[d]: but anyway those are all proxy statistics in the MR descriptions and real improvements will likely be more modest
16:42 mangodev[d]: LastRecluse: bump because i have no clue, maybe one of the smart people here with experience in x11 remote server can help
16:43 mangodev[d]: mhenning[d]: is the post pass on the NIR, or on the nvidia machine code? I'd assume the machine code since i thought that's what `nir_optimize` did
16:46 mangodev[d]: or wait
16:46 mangodev[d]: is the prepass nvidia-specific optimizations on the nir, then the pass is compiling nir to nvidia machine code, then the postpass is optimizing the machine code?
16:46 mangodev[d]: that's my current assumptions from an outsider perspective
16:46 LastRecluse: Here's my dmesg, none of the errors stood out to me: https://pastebin.com/4LKM6Xd0
16:53 snowycoder[d]: mangodev[d]: It's not optimizing the machine code nor NIR code, but the NAK code.
16:53 snowycoder[d]: NAK (Nouveau Awesome Kompiler) is the nvidia-specific "backend" compiler for shaders, its IR is almost 1-to-1 with nvidia ops so it's much easier to do lower-level passes, register-allocation and scheduling.
16:53 snowycoder[d]: Codeflow is: SPIRV -> NIR -> NAK -> machine code
16:53 snowycoder[d]: That pass is one of the last stages of NAK, after register-allocation but before encoding in machine code
16:54 snowycoder[d]: (If I understand it correctly, please correct me if I'm wrong)
16:54 mangodev[d]: snowycoder[d]: so wait
16:54 mangodev[d]: there's 3 ir langs involved? 🫠
16:55 mangodev[d]: so is prepass in preparation of converting NIR to NAK? or a different step
16:56 mangodev[d]: I'd assume the nak conversion is "the pass," but if not, i'd assume it's from nak to machine code
16:56 snowycoder[d]: mangodev[d]: Yep, but this means that all NIR passes are shared between most mesa shader compilers, it can do really magical optimizations
16:56 mangodev[d]: because i thought mesa already handles NIR optimization
16:57 mangodev[d]: snowycoder[d]: how is this done so fast?
16:57 mangodev[d]: is it done when a graphical context is created?
16:57 mangodev[d]: how does it know what shaders to compile?
16:59 mangodev[d]: isn't it the program's job to decide when to compile shaders?
17:04 snowycoder[d]: mangodev[d]: Well, it isn't.
17:04 snowycoder[d]: Programs generally decide when to compile shaders, and if it happens in the middle of a frame there's a lag spike.
17:04 snowycoder[d]: Newer games compile it in a loading screen, either that or steam uses fossilize to pre-compile shaders before the game runs.
17:07 mangodev[d]: snowycoder[d]: are these compilers categorically JITs?
17:09 snowycoder[d]: mangodev[d]: Depends on your definition of JIT, I'd say yes in the general sense but if we talk about valve's fossilize they can also be viewed as AOT(?).
17:11 mangodev[d]: snowycoder[d]: speaking of
17:11 mangodev[d]: how does fossilize run (relatively) fast with mesa (since it's running 3 compilers at once (4 if you count dxvk/vkd3d12))?
17:13 snowycoder[d]: It's not only mesa, most compilers have multiple Intermediate Representations, and the number of pass doesn't really change much. Having multiple IRs is only a coding choice and shouldn't affect performance much (NIR -> NAK is just a linear pass)
17:34 gfxstrand[d]: mangodev[d]: Fossilize will spread out to as many threads as your CPU can handle. So 36 on my desktop.
17:37 mangodev[d]: snowycoder[d]: ah okay, makes sense
17:37 mangodev[d]: gfxstrand[d]: i'm *very* aware of that part 😭
17:37 mangodev[d]: although
17:37 mangodev[d]: why does radv fossilize so fast?
17:38 gfxstrand[d]: So fast compared to what?
17:38 mangodev[d]: from what i've seen, it's instant for them (for most games)
17:38 snowycoder[d]: It can be that since radv is much more used, it is downloading cached shaders
17:39 mangodev[d]: gfxstrand[d]: anything else
17:39 mangodev[d]: (my only exposure to fossilize myself is Nvidia and Nouvidia, it's the only graphics card in my possession)
17:39 gfxstrand[d]: With RADV, you might be getting shaders they compiled in the cloud and cashed. Especially if you're on a steam deck.
17:40 mangodev[d]: funny that NVK, a relatively early driver, already compiles a little faster than proprietary for me :cursedgears:
17:40 snowycoder[d]: mangodev[d]: But it shouldn't be way faster compared to nvidia :/
17:40 snowycoder[d]: Even nvidia uses cloud caching
17:40 gfxstrand[d]: Proprietary spins up LLVM.
17:41 mangodev[d]: gfxstrand[d]: fair, probably nowhere near as many Nvidia users than RADV users or even AMDVLK users
17:41 mangodev[d]: (on Linux)
17:41 mangodev[d]: gfxstrand[d]: oh
17:41 mangodev[d]: that'd explain it then
17:41 mangodev[d]: snowycoder[d]: it's not
17:41 mangodev[d]: it's just a touch faster
17:41 gfxstrand[d]: I've got plans to make NVK compile faster. I just need to get to dusting off old branches
17:42 gfxstrand[d]: My predication branch makes NVK compile about 2x as fast
17:42 notthatclippy[d]: fossilize tends to hit a particularly pathologic case in the proprietary NV driver, so it scales poorly and at a certain point having more threads make it overall slower due to thundering herd.
17:42 mangodev[d]: mangodev[d]: not much of a difference because i haven't bothered building any intense games, my setup can barely even run we ❤️ katamari reroll as it is (yes, the PS2 game)
17:42 snowycoder[d]: gfxstrand[d]: Woah, how can you do that?
17:43 mangodev[d]: gfxstrand[d]: prediction in what way? like branch prediction?
17:43 notthatclippy[d]: So it's totally plausible that NVK can do it significantly faster than proprietary
17:44 gfxstrand[d]: It massively reduces the number of control-flow edges that RA sees and we take a pretty solid hit in RA for lots of basic blocks. But with predication, most of your trivial cases like SSBO bounds checks stop being real control flow and are now just a predicated instruction so it cuts the number of basic blocks massively.
17:44 mangodev[d]: notthatclippy[d]: and not using LLVM helps too
17:44 gfxstrand[d]: It's a better than 2x speedup in DA:TV shader compilation. The game is 3-5% faster with it, too.
17:45 notthatclippy[d]: mangodev[d]: maaybe? But the issue is never the time it takes to compile the thing, it's all the lock contention from starting so many things in parallel
17:46 notthatclippy[d]: If you serialize it all to a single core/thread and compare the stacks that way, you'll be comparing the actual _compile_ speed.
17:46 gfxstrand[d]: NVK isn't thrilled about large numbers of contexts, either.
17:46 notthatclippy[d]: But when I looked at this last, on a typical fossilize run over 90% of the time was spent in the context init, particularly due to lock contention issues
17:46 mangodev[d]: gfxstrand[d]: ooh nice
17:46 mangodev[d]: killing one bird and lightly pelting another with one stone
17:47 gfxstrand[d]: Something like that, yeah.
17:47 mangodev[d]: llvm has always been slow on my system
17:48 mangodev[d]: notthatclippy[d]: that's probably part of why
17:48 mangodev[d]: in llvm languages, 99% of my compile time is on `LLVM Emit Object…`
17:48 gfxstrand[d]: notthatclippy[d]: Yeah, the GSP lock is a pain. There's work going on to try and fix this but it'll be a while before anything's released.
17:48 notthatclippy[d]: Nah, the issue with proprietary is mostly the locks in the kernel driver, and to the lesser extent that GSP is single-core. Arguably GSP should not be involved in this at all, but it is today.
17:49 mangodev[d]: although iirc LLVM also has a jit mode that's different than the full compile mode
17:49 notthatclippy[d]: At least with Nova I expect GSP will not be hit on this path at all. And assuming a sane locking model in the kernel, it should be fully scalable that way
17:49 mangodev[d]: notthatclippy[d]: single core arm chip? never knew that was even a thing
17:49 notthatclippy[d]: risc-v, not arm
17:50 mangodev[d]: oh? interesting
17:50 mangodev[d]: i thought the GSP was arm as foreshadowing to their massive business endeavors with arm corp
17:50 mangodev[d]: never knew they even touched risc-v, nonetheless that one is potentially in my system already
17:50 gfxstrand[d]: notthatclippy[d]: With NVK, you can work around this today by using the DRM shim to create a fake device and never touch the kernel.
17:51 notthatclippy[d]: In theory, all you really need to know is what GPU you're compiling for, right?
17:51 gfxstrand[d]: But if you want to run against the actual kernel, yeah, GSP lock is the enemy.
17:51 gfxstrand[d]: notthatclippy[d]: Pretty much
17:52 gfxstrand[d]: gfxstrand[d]: That's actually why I have dual-3060s in my test rig. It's not because the GPUs aren't fast enough. It's to halve the GSP lock contention.
17:52 notthatclippy[d]: And on proprietary, running `nvidia-smi` to just query the GPU model ends up hitting the GSP 5-10 times, and taking a kernel lock another 10 or so.
17:52 notthatclippy[d]: Are you running with Ben's blackwell patches, Faith?
17:53 gfxstrand[d]: Yeah. Well, I will be soon
17:53 gfxstrand[d]: But I've been running with 570 for a while
17:54 notthatclippy[d]: IIRC they get rid of the 0000/0080/2080 allocs on GSP for every process. I _think_ those are like half the traffic for context creation (and teardown! which is often worse than creation)
17:54 mangodev[d]: gfxstrand[d]: i'm sorry for your loss
17:55 gfxstrand[d]: I haven't benchmarked it to be sure but I think 570 shaved about 5 min off a CTS run. It's also more stable.
17:56 mangodev[d]: gfxstrand[d]: compared to 565, or to NVK?
17:56 gfxstrand[d]: Compared to 565. It's all NVK
17:56 gfxstrand[d]: Just switching GSP versions
17:57 notthatclippy[d]: 565 or 535?
17:57 mangodev[d]: gfxstrand[d]: wait what
17:57 mangodev[d]: what versions are we talking, i thought we were talking about proprietary drivers for a sec
17:57 gfxstrand[d]: Uh... 535 maybe? Whatever is currently upstream
17:57 notthatclippy[d]: Yeah, 535. Ben has a tree that supports every version inbetween, but the latest tree is just 535 and 570.
17:57 gfxstrand[d]: mangodev[d]: GSP firmware. It's versioned the same as the proprietary driver
17:58 mangodev[d]: gfxstrand[d]: is this a hardware thing or a firmware thing?
17:58 mangodev[d]: gfxstrand[d]: ah okay, that makes sense
18:01 mangodev[d]: btw i found half the reason as to why the cursor stutters so much with discord voice calls
18:01 mangodev[d]: it's because the voice call menu has an autohide cursor
18:01 mangodev[d]: there's something that either Zink or NVK doesn't like about moving a cursor between monitors to a window that has an auto-hide cursor
18:02 mangodev[d]: it hitches more when the system is under contention
18:02 mangodev[d]: so it's definitely busy with something, i just don't know what
18:02 mangodev[d]: there's also some lag when switching windows, may be a kde issue though
18:03 mangodev[d]: never noticed it on proprietary, but that's also because the whole system was too stuttery to notice any micro stutter :|
18:04 mangodev[d]: Firefox has been crashing a lot less though, either a fix on firefox's part, on mesa's part, or on my part from toying with some configs
18:05 mangodev[d]: still crashes every now and then, but it's getting rarer each passing day
18:05 mangodev[d]: now i only get it once or twice a day instead of >30 times a day
18:06 mangodev[d]: i wonder if i could debug the discord soft crash, because it happens a lot and is far more debuggable than the Firefox crash
18:06 mangodev[d]: it even gives stack traces
18:07 mangodev[d]: it gives me so much information in logs when it crashes, from a stack trace to a crash dump to exact warnings
18:07 mangodev[d]: it's debugging heaven
18:08 mangodev[d]: i love how semi-smooth the desktop experience is so far on NVK, i mainly use it because of school and native Wayland applications
18:08 mangodev[d]: almost everything i use is in native Wayland, so the double perf boost helps a ton
18:09 mangodev[d]: although i think resizing xwayland windows segfaults the driver :/
18:09 mangodev[d]: either zink, nvk, or both not playing nice
18:10 mangodev[d]: fullscreening works fine, and resizing it for a little bit works, but resizing it a decent amount or for a decent amount of time is almost guaranteed to hard crash, i think something is done out of order or something and a pointer is lost
18:11 mangodev[d]: it seems sustained resizes are what make it crash and burn
18:16 mangodev[d]: but most things are really nice on nvk :)
18:16 mangodev[d]: the desktop is generally great to use, even if features are lacking in some areas
18:17 mangodev[d]: i hope blender goes all-in and makes a Cycles-X Vulkan backend
18:18 mangodev[d]: that way the entire program can run purely on vulkan alone, no proprietary (literal definition) tech stacks like CUDA, RoCM, or OneAPI required
18:19 gfxstrand[d]: Okay, both machines now on Fedora 42 and my kernel is building again.
18:20 mangodev[d]: games are getting better over time, NVK is getting to the point where it can start handling most 2d games, may start gaming again and testing the driver's bounds
18:21 mangodev[d]: is there a way to start renderdoc as a wrapper program or launch arg? i feel like that'd be a way easier way to use renderdoc on games that run through launchers like steam
18:21 gfxstrand[d]: Yeah, you can cobble together some environment variables. I don't remember how off the top of my head, though.
18:21 mangodev[d]: gfxstrand[d]: oooh interesting
18:22 gfxstrand[d]: It really needs a clever wrapper script like mangohud has
18:22 gfxstrand[d]: It's just a Vulkan layer at the end of the day
18:23 mangodev[d]: the only ways i'm aware of is
18:23 mangodev[d]: - built in support to the program itself
18:23 mangodev[d]: - launching the game through the exe path in renderdoc itself
18:23 mangodev[d]: - having a really really fast reaction time and somehow hooking renderdoc after the window is created, but before the graphics API is initialized
18:24 mangodev[d]: gfxstrand[d]: speaking of
18:24 mangodev[d]: how do you get mangohud to run on nvk? mine crashes looking for nvctrl
18:25 mangodev[d]: mine used to work, but mangohud or nvk must have updated and mangohud is insistent that i'm using the proprietary driver and thus crashes instantly
18:27 gfxstrand[d]: I've never had it not work. But also, I haven't updated in a minute so maybe something has changed?
18:27 airlied[d]: karolherbst[d]: There is a fix in my branch
18:28 karolherbst[d]: oh, I might have seen that one
18:28 karolherbst[d]: yeah, I should try it out
18:28 mangodev[d]: gfxstrand[d]: strange
18:28 mangodev[d]: and i haven't found any flags for telling mangohud to ignore the nvidia card
18:28 mangodev[d]: even goverlay crashes trying to open mangohud
18:29 mangodev[d]: maybe spoofing the device name would work?
18:30 mangodev[d]: if it sees `Noveedia© GeeTeeEx® 16600 Superb™` it shouldn't suspect a thing
18:31 snowycoder[d]: Yesss! Kepler linear image storage works!
18:37 gfxstrand[d]: Woohoo! Looks like nouveau booted my card
18:39 gfxstrand[d]: airlied[d]: Is the latest all in the MR?
18:41 airlied[d]: gfxstrand[d]: mesa is just my branch, I haven't made an MR
18:43 gfxstrand[d]: Branch name?
18:43 airlied[d]: nvk-wip-gb20x
18:54 mangodev[d]: looks like a good day for architecture support
19:48 mhenning[d]: gfxstrand[d]: Does proprietary actually hit LLVM for spirv compiles? I assumed they just went directly to an internal IR, but I don't actually know
19:49 pac85[d]: mangodev[d]: It has an implicit layer and you really only need to set and env var to enable it.
19:49 pac85[d]: Iirc you can just run `renderdoc-cmd env` to see which one it is
19:54 pac85[d]: Should be `ENABLE_VULKAN_RENDERDOC_CAPTURE=1 `
19:54 pac85[d]: Then you can connect with the GUI to the running instance and capture
19:55 mhenning[d]: gfxstrand[d]: I'm still not the biggest fan of that particular predication approach
19:56 gfxstrand[d]: mhenning[d]: Yeah, they're all LLVM these days
20:12 gfxstrand[d]: I'm tempted to squash this whole branch
20:13 mhenning[d]: the blackwell stuff?
20:13 gfxstrand[d]: yeah
20:14 mhenning[d]: might make sense to rebase first, since a few bits and pieces have landed already
20:15 gfxstrand[d]: I already did
20:15 mhenning[d]: but yeah might make sense
20:15 gfxstrand[d]: There are so many reworks of reworks of reworks of reworks
20:17 airlied[d]: I'd probably start by making it work 🙂
20:18 airlied[d]: I think figuring out how to upstream it was the step after it works
20:18 airlied[d]: it's also why I was trying to get a GH100 running, because it might be good to incrementally enable hopper then blackwell
20:30 airlied[d]: but yeah a squash to reset might also be fine since there's a lot of back-n-forth around the opcode changes
20:31 gfxstrand[d]: Yes but also we could be a bit more methodical about changes. 🙃
20:44 airlied[d]: only if we knew in advance how it worked 🙂
20:45 airlied[d]: but once QMDs are in, then the real work has to start 🙂
20:46 mohamexiety[d]: will push in a sec but I added in major/minor version and another guess for upper program address (90% certain this guess is wrong tho but eh)
20:47 snowycoder[d]: I have a Blackwell card too, you can also throw work my way
20:48 snowycoder[d]: But it's in my only pc so I'll wait for things to get a bit more stable
20:51 mohamexiety[d]: ok pushed: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34331/. if you will apply on top of dave's branch you only need the top 2 commits (and one of them will conflict I think; the stub compute header)
20:57 mohamexiety[d]: this leaves these:
20:57 mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_API_VISIBLE_CALL_LIMIT` in module `clcdc0`
20:57 mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_API_VISIBLE_CALL_LIMIT_NO_CHECK` in module `clcdc0`
20:57 mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_SAMPLER_INDEX` in module `clcdc0`
20:57 mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_SAMPLER_INDEX_INDEPENDENTLY` in module `clcdc0`
20:57 mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_BARRIER_COUNT` in module `clcdc0`
20:57 mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_SHADER_LOCAL_MEMORY_HIGH_SIZE` in module `clcdc0`
20:57 mohamexiety[d]: error[E0425]: cannot find value `QMDV05_00_SHADER_LOCAL_MEMORY_LOW_SIZE` in module `clcdc0`
20:57 mohamexiety[d]: error[E0425]: cannot find function, tuple struct or tuple variant `QMDV05_00_CONSTANT_BUFFER_SIZE_SHIFTED4` in module `clcdc0`
20:57 mohamexiety[d]: error[E0425]: cannot find function, tuple struct or tuple variant `QMDV05_00_CONSTANT_BUFFER_VALID` in module `clcdc0`
20:57 mohamexiety[d]: I am not sure what some of these mean tbh; what does shader_local_memory_high/low_size represent for example? the api_visible_call_limit as well. the other are self explanatory but I am not entirely sure how to get some of them
21:02 airlied[d]: I think enabling an extra ubo might help find the valid bits
21:04 mohamexiety[d]: how so? 😮
21:04 mohamexiety[d]: also what does the const buffer hold anyways? part of what makes me unsure how to test for the size is I don't know what it holds
21:05 mohamexiety[d]: (I know it sounds stupid but the impression I got is it's for more than just constants :KEKW:)
21:05 mhenning[d]: constbuf 0 has a few things in it - root descriptors and push constants
21:05 mhenning[d]: other constbufs are mostly used for ubos
21:06 karolherbst[d]: LDSM is such a disaster of an instruction...
21:07 mohamexiety[d]: mhenning[d]: I see, thanks!
21:09 karolherbst[d]: it's so wild.. it's kinda powerful, but also...
21:09 karolherbst[d]: just have to figure out how to properly map it to the layouts
21:11 karolherbst[d]: gfxstrand[d]: do we have any infrastructure to figure out what instructions are doing based on nak or something? Like where I can just create a stream of instructions and run it on the GPU
21:12 mhenning[d]: karolherbst[d]: look at hw_tests.rs
21:13 karolherbst[d]: and I also need shared memory, but if that's not supported I can also hack it in
21:13 karolherbst[d]: ahh cool, thanks
21:25 gfxstrand[d]: Test case 'dEQP-VK.api.smoke.triangle'..
21:25 gfxstrand[d]: Pass (Rendering succeeded)
21:25 gfxstrand[d]: Okay, rebase succeeded
21:35 gfxstrand[d]: How long do y'all think a CTS run will last?
21:36 tiredchiku[d]: at least 5
21:36 tiredchiku[d]: :ha:
21:36 airlied[d]: gfxstrand[d]: I think I nearly got one completed, but it takes days
21:36 airlied[d]: since there are so many device losts
21:36 gfxstrand[d]: Pass: 673, Fail: 2471, Crash: 350, Skip: 4005, Flake: 1, Duration: 1:24, Remaining: 8:54:05
21:37 airlied[d]: like there are a lot of compute shader tests
21:37 gfxstrand[d]: Yeah
21:37 gfxstrand[d]: Any progress on QMDs?
21:38 airlied[d]: what mohamexiety[d] said above, feel free to try and accelerate the RE of them, I'm pinging nvidia weekly, but nothing yet
21:38 airlied[d]: (like they know about it, but nothing published)
21:40 airlied[d]: I was mostly dealing with ./deqp-vk --deqp-case=dEQP-VK.glsl* area
21:41 airlied[d]: but gh100 is really pissing me off
21:59 gfxstrand[d]: Where do I find the latest QMD patches?
22:00 gfxstrand[d]: Or I can just keep playing with fragment shaders for a bit
22:01 mohamexiety[d]: mohamexiety[d]: ^
22:01 mohamexiety[d]: karolherbst[d]: do you remember what CTS you were using that had the mutable sparse fails?
22:02 karolherbst[d]: I've filed an issue
22:02 karolherbst[d]: details should be there
22:14 mohamexiety[d]: well that's funny then. it's not failing on my end :thonk:
22:15 mohamexiety[d]: I'll try CTS 1.4.2.1 tomorrow but that was released after the fails on your end
22:17 gfxstrand[d]: tip-of-tree fails
22:23 mohamexiety[d]: Yeah this was 25.0.5 will try main and such tomorrow
22:24 mhenning[d]: I thought I fixed the mutable sparse fails in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34646
22:25 mhenning[d]: oh wait no nevermind different group of tests
22:41 gfxstrand[d]: gfxstrand[d]: I meant tip-of-tree CTS. But yes, also mesa
22:41 gfxstrand[d]: mohamexiety[d]: ^^
22:45 mohamexiety[d]: Oh yeah will try that too. Thanks!
22:46 gfxstrand[d]: Woop! Killed my GSP...