00:49mangodev[d]: mhenning[d]: so do changes like https://gitlab.freedesktop.org/mesa/mesa/-/commit/6b8a4e6bb73117e1141fe80e6d8fdfe5d2a39d33 improve the performance of the compiler, or is this future-proofing/just for testing for now?
00:53gfxstrand[d]: mangodev[d]: That improves the performance of the generated code.
00:53mangodev[d]: oooh alr
00:53mangodev[d]: was considering rebuilding my drivers to include that commit, but didn't know if it'd have any tangible changes
00:54mangodev[d]: good to know
00:54gfxstrand[d]: And it's running now. There's nothing just for testing about it.
00:54mangodev[d]: testing? who needs testing :cursedgears:
00:55mangodev[d]: also hiii ms. nvk :D
00:56mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357519276126240920/image.png?ex=67f07fd8&is=67ef2e58&hm=010293b331c6816fcc1c85d1d59a55ddfb9971a289df678df87087e1f61f717d&
00:56mangodev[d]: i love updating drivers >:)
00:56mangodev[d]: very solid way to do so with plenty of fallback
00:58mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357519724396937327/image.png?ex=67f08043&is=67ef2ec3&hm=5ba0c473723d2ed83313ec2443dd4bf4f33da4f5f4263d7a4b44eb191aa46ad2&
00:58mangodev[d]: curious about how much i can strip from this (to improve compile time)
00:59mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357519920811741476/image.png?ex=67f08072&is=67ef2ef2&hm=4f49971a21e6be143d65d99729e5ca158a258c7b0685eada89c35254267a849e&
00:59mangodev[d]: i *think* this is all nvk/nouveau needs?
01:00orowith2os[d]: Virgl, svga, softpipe, and LLVMpipe could probably go
01:00orowith2os[d]: I don't remember if there was anything needing them?
01:00mangodev[d]: i mean
01:00mangodev[d]: isn't it good to have llvmpipe as a fallback driver?
01:00orowith2os[d]: Yes, but if you're testing NVK, you don't want it to use LLVMpipe without you knowing
01:00mangodev[d]: and i forgot what softpipe does
01:01mangodev[d]: orowith2os[d]: i'm using nvk rn
01:01orowith2os[d]: The less drivers available, the less to build, and less to debug
01:01gfxstrand[d]: It's llvmpipe but worse
01:01mangodev[d]: gfxstrand[d]: oh yeah then i should yoink that then
01:01orowith2os[d]: orowith2os[d]: Your choice though. Doesn't make a big difference in the end.
01:01mangodev[d]: and wait
01:01mangodev[d]: i don't even think nvk supports virgl yet
01:01orowith2os[d]: Isn't svga an openvg driver...?
01:01mangodev[d]: or vmware svga
01:01mangodev[d]: OH WAIT
01:01mangodev[d]: isn't vmware-svga only needed on the client?
01:02mangodev[d]: or is it needed both sides
01:02orowith2os[d]: Oh, old svga. The new one is a VM driver.
01:02mangodev[d]: *oh*
01:03mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357520853910163596/image.png?ex=67f08151&is=67ef2fd1&hm=18066cc3478bccc7863eab09fedef912a3232fdc0ae064ef53283696e7943295&
01:03mangodev[d]: this should have full functionality (with llvmpipe as a backup in case nouveau/nvk is unstable)
01:03orowith2os[d]: Are you running a VM?
01:04mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357521049956385029/image.png?ex=67f0817f&is=67ef2fff&hm=cc915e5e39b4466ba385de953e7915632aedfe5c7a3cb43fe13723fcce5ba47c&
01:04mangodev[d]: and this is probably full functionality too
01:04gfxstrand[d]: Honestly, `-Dvulkan-drivers=swrast,intel,amd,nouveau -Dgallium-drivers=llvmpipe,iris,radeonsi,zink` should be all anyone needs these days.
01:04mangodev[d]: orowith2os[d]: i'm not in a vm, no
01:04mangodev[d]: this is on my host pc
01:04orowith2os[d]: Get rid of virtio and virgl
01:04mangodev[d]: done
01:04orowith2os[d]: Those are only useful in a vm
01:04mangodev[d]: orowith2os[d]: ohhhh okay okay makes sense
01:05mangodev[d]: mistook them as being required to accelerate a vm
01:05mangodev[d]: not the graphics driver for the client
01:05mangodev[d]: iirc vulkan swrast is lavapipe?
01:05gfxstrand[d]: mangodev[d]: Yes
01:06mangodev[d]: okay good, what i thought
01:06airlied[d]: mhenning[d]: nice with the texture bits, I've move to having a lot more mmu faults than illegal instruction encodings ๐
01:06mangodev[d]: lavapipe is surprisingly fast
01:06mangodev[d]: runs minecraft at a solid 20fps
01:06gfxstrand[d]: We should probably rename it now that the GL drivers use their real names.
01:06airlied[d]: I think we have some address field that lost 8 bits off the end
01:06orowith2os[d]: mangodev[d]: On Zink?
01:06mangodev[d]: orowith2os[d]: no, mod that uses vulkan directly
01:07orowith2os[d]: Ah
01:07mangodev[d]: works amazing with nvk
01:07orowith2os[d]: Haven't tried it in a while. I should play with it again
01:07orowith2os[d]: Does it work with Wayland?
01:07mangodev[d]: orowith2os[d]: native
01:07orowith2os[d]: (not Xwayland?)
01:07mangodev[d]: nope
01:07mangodev[d]: native wayland windowing
01:07mangodev[d]: although sadly glfw doesn't support xdg-cursor-scale yet
01:08orowith2os[d]: That's fine
01:08mangodev[d]: and window icon is default
01:09mangodev[d]: which either means glfw doesn't support toplevel-icon, or the mod just didn't make the necessary means to make it work on wayland
01:09airlied[d]: hmm looks like a real address actually
01:10mangodev[d]: airlied[d]: in what context
01:10mangodev[d]: previous convo, or looking in the github?
01:10orowith2os[d]: mangodev[d]: Probably GLFW needing to do the Wayland bits
01:10mangodev[d]: fair
01:10mangodev[d]: glfw in my experience has kind of bad wayland support
01:10mangodev[d]: for cursor warp, it's just a hardcoded error ๐ซ
01:11mangodev[d]: doesn't even try to check
01:11orowith2os[d]: Wayland limitation
01:11mangodev[d]: orowith2os[d]: iirc some compositors support it
01:11mangodev[d]: like wlroots
01:11airlied[d]: mangodev[d]: sorry parallel conversation ๐
01:11orowith2os[d]: It's undefined behavior
01:12orowith2os[d]: I was involved in GLFW and Wayland a while back. I even patched it into PrismLauncher's Flatpak, with libdecor and everything
01:12orowith2os[d]: :glorp:
01:12mangodev[d]: airlied[d]: that's fine, kinda thought so (given this has happened other days too)
01:13mangodev[d]: ~~race condition~~
01:13mangodev[d]: orowith2os[d]: ouch
01:13mangodev[d]: wait
01:13mangodev[d]: then how does xwayland warp the cursor?
01:15mangodev[d]: or can the x server modify the wayland cursor position
01:17mhenning[d]: airlied[d]: yeah, that's possible. I think pre-blackwell, there's a case where ldg can lose offset bits that instead encode a predicate output depending on the instruction form. Wouldn't surprise me if they changed how some of that works
01:19mangodev[d]: gfxstrand[d]: speaking of llvmpipe
01:19mangodev[d]: the arch maintainer for `lib32-mesa-git` made a little oopsie :|
01:19mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357525050269503731/image.png?ex=67f08539&is=67ef33b9&hm=725a138cfb292a3e9eb66b495fa87bd86bb4408673d81679713b5456f3a37862&
01:19mangodev[d]: spot what's missing
01:21mangodev[d]: they forgot to add nvk to the lib32 package
01:21mangodev[d]: that's probably why a lot of proton games ran so miserably slow for me :|
01:21mangodev[d]: because the next best driver was lavapipe :/
01:21mangodev[d]: i mean, impressive that my cpu could run no man's sky at 2fps
01:21mangodev[d]: on lowest settings
01:26mangodev[d]: also
01:26mangodev[d]: does anything use the mesa screenshot layer?
01:26gfxstrand[d]: Woof
01:26gfxstrand[d]: File a bug?
01:27mangodev[d]: gfxstrand[d]: well i don't think it'd be worth it rn because both `lib32-mesa-git` and `mesa-git` are frozen due to a mesa build bug ๐
01:27mangodev[d]: i just don't run into the bug
01:27mangodev[d]: so i can build latest
01:27mangodev[d]: oh wait
01:28gfxstrand[d]: It's still worth informing them off the issue, especially if the 64-bit package is okay.
01:28mangodev[d]: issue finally closed?
01:28mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357527213716803734/image.png?ex=67f0873d&is=67ef35bd&hm=2eff15edcf2b54ff4382bb646d5229787cfbc31605cc1e1a6407dea7e509b656&
01:28mangodev[d]: wh-
01:29orowith2os[d]: mangodev[d]: The Wayland compositor gets to say
01:29orowith2os[d]: There's a dedicated Xwayland cursor grab protocol, for example
01:29orowith2os[d]: I think xwayland can only warp and modify other x11 windows otherwise
01:29orowith2os[d]: It might also use libei, which will go through a portal
01:30mangodev[d]: gfxstrand[d]: i'll do so when i make an account
01:32orowith2os[d]: orowith2os[d]: ~~cursor~~ keyboard
01:33orowith2os[d]: I'll move more Wayland talk to Discord channel #tech-talk, if you want more
01:34mangodev[d]: also, out of curiosity
01:34mangodev[d]: what is the `vram-report-limit` vulkan layer for?
01:38mangodev[d]: gfxstrand[d]: nevermind
01:38mangodev[d]: i think it's intended
01:39airlied[d]: uggh the mmu fault is an illegal kind on the z buffer
01:40mangodev[d]: mangodev[d]: never nevermind
01:40mangodev[d]: the script doesn't include rust ๐
01:44airlied[d]: skeggsb9778[d]: any ideas on mmu pte changes that might affects pte kinds? I'm setting Z16 kind 1 on the buffer, but getting an mmu fault for illegal kind
01:46orowith2os[d]: snowycoder[d]: Kepler things: who's all focusing on Kepler? It was you and someone else, right?
01:47orowith2os[d]: :akipeek:
01:55airlied[d]: d32_sfloat seems to work, all the others give mmy faults on kind
01:58airlied[d]: which makes sense since that is GENERIC memory
01:59leftmostcat[d]: Net -33 lines and that's with me indulging my love of whitespace. ๐
02:05skeggsb9778[d]: airlied[d]: yeah, i think _PITCH/_GENERIC/_GENERIC_COMPRESSIBLE/_DISABLE_PLC are your valid options now
02:06skeggsb9778[d]: i only took a quick look into it so far though
02:06airlied[d]: the openrm codebase still has defines for Z* ones, but 'll probably have to RE the prop driver to figure it out
02:09skeggsb9778[d]: have a look at memmgrChooseKindZ_TU102 vs _GB202
02:11airlied[d]: oh oops didn't spot that
02:21gfxstrand[d]: airlied[d]: Maybe they're all generic now?
02:21airlied[d]: seems like it
02:28airlied[d]: fixed in my branch now
02:50airlied[d]: might be a bit adventurous and throw a run-deqp at it!
03:10gfxstrand[d]: leftmostcat[d]: Read the first patch and I'm already liking it. ๐ฅฐ
03:19leftmostcat[d]: There are a few ugly bits in there, but I tried not to do anything outrageous.
03:47redsheep[d]: orowith2os[d]: The only other person I have seen going into detail on kepler recently has been Faith
04:16mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1357569367340220539/image.png?ex=67f0ae7f&is=67ef5cff&hm=21badf8af627474826019ee8ecb65b6d27e3989dacfb38d734a14b7e8555f5af&
04:16mangodev[d]: having trouble trying to compile nouveau for lib32 ๐ญ
04:17mangodev[d]: seems to be trying to use a 64 bit avx512 build of blake3, for some reason?
04:17mangodev[d]: my cpu doesn't even *support* avx512, so i don't think it's the native target
05:40mangodev[d]: might roll back
05:40mangodev[d]: feels more stuttery than last ver
05:42airlied[d]: mhenning[d]: just fyi the texture instructions are encoded okay, but no texturing tests pass
05:52airlied[d]: did you look at the 0xf61 instruction?
06:20airlied[d]: mhenning[d]: btw was is the second column in the opclass dumps?
07:13moogai: Why is nvidia on linux so horrible still :(
07:13moogai: nouveau drivers work okay, but it crawls along - nvidia open source drivers are super broken
08:09mangodev[d]: moogai: to be expected, nvk is *far* newer than proprietary
08:09mangodev[d]: and nouveau for the longest time was working off practically zero documentation
08:26moogai: well the proprietary drivers are also crap I think, I guess I could try them again but I doubt they are substantially less broken, last I used them they were also bad
08:27moogai: It's just dissapointing is all, just wish Nvidia sucked less
08:27tiredchiku[d]: what hardware
08:28tiredchiku[d]: though the proprietary drivers have always been performant
08:57snowycoder[d]: orowith2os[d]: Faith helps when she has time, If you want to help I'm very welcome, I'm really new with vulkan drivers๐
10:57x512[m]: moogai: Because of https://www.kernel.org/doc/Documentation/process/stable-api-nonsense.rst and lack of official documented Linux graphics drivers platform API like Windows WDDM. There were even no OpenGL ICD driver API in Linux until Nvidia invented GLVND.
12:01gfxstrand[d]: airlied[d]: They may have juggled the operand order on us again.
12:02gfxstrand[d]: snowycoder[d]: I'm mostly just chucking the odd patch over the wall. You're doing most of the work.
12:02gfxstrand[d]: I haven't actually plugged in a Kepler in months.
12:24gfxstrand[d]: moogai: Without knowing what hardware you have, that's an impossible question to answer.
13:06moogai: x512[m]: somehow AMD and intel manages, and nvidia has much more resources. And sure, Windows may have better platform APIs for drivers, but then you still have the rest of it, which is windows
13:07moogai: Card is: VGA compatible controller [0300]: NVIDIA Corporation GA107GLM [RTX A2000 8GB Laptop GPU] [10de:25ba] (rev a1)
13:08moogai: tiredchiku[d]: the proprietary drivers are fast enough, but it kind of messes up with suspend, and it keeps getting confused about screens when I add/remove them
13:09moogai: I'm sure its fine for a server farm or desktop computer though
13:12orowith2os[d]: snowycoder[d]: Let me know if there's anything I can run - I have a 780 that should do the job
13:13moogai: The problem is that Nvidia think it's the dog when it's the tail.
13:13orowith2os[d]: orowith2os[d]: I was gonna run mesa-git on it at some point and see where it goes brrrrr, and if I could maybe do anything
13:30gfxstrand[d]: orowith2os[d]: I've got three Keplers in my box. I just haven't plugged them in because I've been working on other stuff.
13:31orowith2os[d]: I just want to do things, and have no idea where to start
13:31orowith2os[d]: :hammy:
13:32gfxstrand[d]: What do you want to work on?
13:32gfxstrand[d]: Or what general area do you want to help improve?
13:42orowith2os[d]: Kernel drivers would be fun, but I'd need to set up proxmox or something. I'm also not sure who else is hacking on those.
13:42orowith2os[d]: NVK seems like it's pretty well off already, maybe I could look at Vulkan extensions and just implement whatever looks simple and interesting to start off?
14:13gfxstrand[d]: There's lots of kernel work that needs to be done but I'm not sure where's best to get started.
14:13gfxstrand[d]: For NVK, helping out with Kepler is honestly a pretty good entrypoint if you've got the hardware.
14:14gfxstrand[d]: I'm less sure about features. Most of what's left is pretty big stuff.
14:14gfxstrand[d]: But maybe there's a little thing or two that's come out?
14:15gfxstrand[d]: I'm not aware of anything off hand, though.
14:16gfxstrand[d]: Compiler optimization is another area where we could always use help. Looking at shaders and finding instructions combination where we could be emitting something more efficient and then figuring out how to optimize it in the compiler.
14:16gfxstrand[d]: But that can be a lot of red herrings, too.
15:51leftmostcat[d]: A couple of the outstanding clippy lints (and some of the ugliness in my MR) deal with `Box<Instr>`. I'm poking around a bit and it's not clear to me why `Instr` shows up boxed so many places. There are a couple places like https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/compiler/nak/calc_instr_deps.rs#L457 where the `Box` just seems like an unnecessary alloc/redirection. Is there
15:51leftmostcat[d]: any section of the code I should look at to understand the motivation better?
15:56mhenning[d]: leftmostcat[d]: The motivation is just that Instrs are large structs and we don't want to be copying them all the time. So, our core data structures all have Box<Instr> instead of Instr
15:56mhenning[d]: I've wondered about changing that but it's a very involved change
15:58mhenning[d]: airlied[d]: Oh, I'm surprised that none pass - I thought they'd be mostly okay with some minor issues. I'll see what I can figure out from the cuda compiler
15:59mhenning[d]: airlied[d]: The second column is two other bits that are sometimes needed for instructions to encode properly
15:59leftmostcat[d]: I might see how much I can get away with changing signatures/usages to use refs instead of `Box` without changing the underlying data structures.
16:08mohamexiety[d]: mhenning[d]: btw are there any particular methods I should prioritize searching for or do I just go through the CTS tests that call into the things commented out?
16:08mohamexiety[d]: (for the missing method/class definitions)
16:29leftmostcat[d]: ...okay, turns out it's not too much work to ditch `Box` entirely, but this was a ridiculous thing for me to do, because I can't test the perf implications.
16:58mhenning[d]: mohamexiety[d]: I'm not really sure what to prioritize in terms of methods
17:23leftmostcat[d]: Pushed a branch anyhow, since I made the changes: https://gitlab.freedesktop.org/leftmostcat/mesa/-/commits/nak-no-box
17:23leftmostcat[d]: If someone wants to perf test it, that'd be swell, but I don't have strong feelings about it either way.
17:24orowith2os[d]: leftmostcat[d]: I wouldn't expect it to make too much difference, but I'm not sure. So long as there aren't too many frequent allocs.
17:24orowith2os[d]: Is nak really making enough allocs for it to matter?
17:24orowith2os[d]: You could jam some giant shaders into nak and see how it fares
17:25leftmostcat[d]: I'm not expecting it to _improve_ performance. Mostly just don't want to MR it if it has a negative impact.
17:26leftmostcat[d]: Is it possible to run `nak` independently? I don't have access to any relevant hardware.
17:27orowith2os[d]: I don't think it should be tied to hardware if you're testing compiler performance
17:27orowith2os[d]: You'll just need to build it as a separate binary
17:31mhenning[d]: leftmostcat[d]: Yes, you can run nvk under drm-shim and then run shader-db pipelines on it, which runs the compiler and can get compilation statistics
17:32mhenning[d]: you can measure compilation time that way
17:33leftmostcat[d]: Great, thank you.
17:34orowith2os[d]: Forgot drm-shim was a thing. Does it actually run the whole driver?
17:35mhenning[d]: I'm not sure if we'd consider a change that completely removes the boxes though. What I've considered doing in the past is to remove the Box around Instr and then box each of the structs in Op, so the type of Op wouldn't be behind a pointer
17:36mhenning[d]: orowith2os[d]: Yes, it runs the driver on top of a fake ioctl emulation layer. That layer does just enough to get the compiler to execute
17:37orowith2os[d]: Fun
18:09leftmostcat[d]: mhenning[d]: Due to performance considerations or other concerns?
18:11mhenning[d]: I think performance is the main concern, yes
18:23snowycoder[d]: mhenning[d]: Wait, you can use the cuda compiler for texture stuff?
18:24mhenning[d]: snowycoder[d]: Yep, textures are available in cuda
18:25mhenning[d]: eg. https://github.com/NVIDIA/cuda-samples/tree/master/Samples/3_CUDA_Features/bindlessTexture has an example
18:26mhenning[d]: see also the ptx docs: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#texture-instructions
18:28snowycoder[d]: orowith2os[d]: If you want to help on Kepler there's a list of things on the MR.
18:28snowycoder[d]: Also there's a bug on doom 2016 that should be not too hard to find with a bisection (it worked previously)
18:29snowycoder[d]: mhenning[d]: Thank you so much! I need this in this to debug Kepler texs.
18:29snowycoder[d]: I just assumed cuda = no graphics
18:31mhenning[d]: Yeah, cuda mostly doesn't have graphics features, but the texturing hardware can be pretty useful for loading data even in compute (textures used to have a different memory hierarchy than eg. LDG and therefore had pretty different performance characteristics)
18:48mohamexiety[d]: TIL ๐ฎ
19:00airlied[d]: mhenning[d]: but 91 and what other one?
19:02mhenning[d]: airlied[d]: 90 and 91
19:02HdkR: Turns out texture ops in cuda are as good as texture operations in GL/Vulkan compute :P
19:03mhenning[d]: You can see the bits set in the script https://gitlab.freedesktop.org/mhenning/re/-/blob/main/opclass/opclass.py?ref_type=heads
19:24airlied[d]: mohamexiety[d]: dEQP-VK.glsl.texture_functions.texture.sampler2d_fixed_fragment might be a place to start, but also anything with uldc usage, since I'm not sure I've gotten all of those right
19:29mohamexiety[d]: airlied[d]: Alright, on it
19:32airlied[d]: figuring out QMD's is probably the other major blocker, since we can't launch compute shaders right now
20:16leftmostcat[d]: With the caveat of not knowing how representative shader-db is as a sample, I ran the full shaders dir against nak + drm-shim both on main and on the no-box branch 100 times each. Total real time was 894.49 s and 876.86 s, respectively.
20:21redsheep[d]: The no-box branch is the faster one?
20:22leftmostcat[d]: Yes.
20:23leftmostcat[d]: I doubt it's a significant benefit in any sense, but at the least it doesn't seem to be a detriment (for that sample).
20:24mohamexiety[d]: airlied[d]: Yeah afaiu QMDs are usually uploaded in the command buffer so my hope is that what we have through envyhooks will be able to reveal a bit or at least give us hints
21:17mhenning[d]: leftmostcat[d]: Alright, might be worth considering then. Maybe post the patch and discussion can continue there
21:37orowith2os[d]: I can put sysprof to work and see if I can find anything else too
23:11mhenning[d]: karolherbst[d]: You said before that what we call the "yield flag" is actually part of the same field as the "cycle count". When yield=true, what do the low bits of that field mean?
23:13mhenning[d]: the blackwell disassembler calls it "?trans1" through "?trans11" when yield=true and "?wait1_end_group" through "?wait15_end_group" when yield=false
23:14mhenning[d]: and "?wait3_end_group" makes some sense to me but I have no idea what "?trans3" means here
23:46karolherbst[d]: it's also wait
23:46karolherbst[d]: no idea why it calls it trans?
23:47karolherbst[d]: but yeah.. the important part is the end_group and the reduced range of values
23:48karolherbst[d]: the point is more that it's an enum covering all those bits