IRC Logs of #nouveau on irc.freenode.net for 2025-05-06

00:00 karolherbst[d]: mhh looks like `nir_lower_bit_size` simply converts the value...
00:27 TranquilIty[m]: Random question, how feasible is getting competent (read: comparable to CUDA) compute working on the FOSS stack, if someone would work on it ?
00:27 TranquilIty[m]: With a custom API, let's say. So, no need to reimpl CUDA, but rather focus on stuff that could get written from scratch working with comparable performance.
00:27 TranquilIty[m]: Just to clarify, this is me asking if it's possibly something I could work on once I learn how stuff works, and not me asking someone else to work on this.
00:28 airlied[d]: probably depends on the workload you want to run
00:29 airlied[d]: vulkan/opencl can dispatch work, just how much effort needs to be put in to connect that to the app and optimise the compiler stuff
00:31 karolherbst[d]: CUDA isn't great because of the runtime, it's great because of everything around it
00:32 TranquilIty[m]: Hmm
00:32 karolherbst[d]: the biggest reason that CUDA is so successful is probably because nvidia shipped it on every GPU and students bored in uni were just doing fun things with CUDA
00:32 TranquilIty[m]: So just to make sure we got all of the ISA stuff figured out, and now it's just compiler optimization?
00:33 HdkR: Multiple companies have tried to take on CUDA. They've all failed multiple times because they usually don't quite understand why it's so popular.
00:33 karolherbst[d]: *cough* like AMD *cough*
00:33 HdkR: OneAPI will save us </s>
00:34 karolherbst[d]: the moment they shipped HIP only on the enterprise super expensive GPUs I knew it wouldn't beat CUDA
00:34 TranquilIty[m]: My goal here is not to beat CUDA in popularity, my goal here is to get something working that interested folks can use to get comparable perf to CUDA on the FOSS stack
00:34 karolherbst[d]: then just go with CL
00:34 TranquilIty[m]: Like, idk, Blender
00:34 karolherbst[d]: or Vulkan
00:34 orowith2os[d]: We have Rusticl and Vulkan compute for those though, don't we?
00:34 HdkR: You can get comparable perf on a FOSS OpenCL/Vulkan stack easily then
00:34 orowith2os[d]: Blender I know has Vulkan stuff in place, it's just not built with it right now, I don't think
00:34 HdkR: perf isn't what makes CUDA great, even if it is one component of it.
00:35 TranquilIty[m]: Can you ? From what I heard VK lacks the intrinsics needed (I only know VK, I have not done CUDA yet)
00:35 karolherbst[d]: yeah, that's just the boring part, spending tons of time on compiler work isn't the hard part
00:35 karolherbst[d]: it doesn't matter
00:35 karolherbst[d]: spirv is turing complete, you can do whatever you want
00:35 karolherbst[d]: rusticl is conformant on vulkan drivers
00:35 karolherbst[d]: so what's the point
00:36 HdkR: karolherbst[d]: Congrats again on that btw, I think I forgot to say.
00:36 karolherbst[d]: I let google beat me, because I was lazy 🙃
00:36 HdkR: :D
00:36 TranquilIty[m]: So, OpenCL on RustiCL will be just as fast for, say, Blender workloads as CUDA, plus minus like 10% ?
00:36 karolherbst[d]: I support gl_sharing tho
00:36 karolherbst[d]: so there is that
00:37 karolherbst[d]: TranquilIty[m]: something like that
00:37 TranquilIty[m]: Ah interesting hmm
00:37 karolherbst[d]: I think rusticl on zink on radv is like 5-7% slower than rusticl on radeonsi
00:37 HdkR: Don't need gl_sharing on ES/Vulkan only platforms 💦
00:37 karolherbst[d]: 😄
00:38 karolherbst[d]: yeah.. you won't but you need gl_sharing to run davinci resolve
00:38 karolherbst[d]: sooo...
00:38 karolherbst[d]: but it's all for the memes
00:38 karolherbst[d]: https://assets.chaos.social/media_attachments/files/114/043/534/892/107/052/original/6ad872ca60a02fcc.png
00:38 TranquilIty[m]: I am now really confused since I got two opposite pieces of info haha
00:38 TranquilIty[m]: Folks have been telling me that OpenCL & Vulkan compute is slow compared to CUDA on the same hardware
00:39 HdkR: haha
00:39 HdkR: TranquilIty[m]: Usually because someone at NVIDIA implemented some code path for CUDA only and never pushed it to the OpenCL or Vulkan implementation.
00:39 karolherbst[d]: TranquilIty[m]: I mean... it kinda depends
00:39 karolherbst[d]: CUDA has a lot of optimizations and hardware specific features
00:39 HdkR: But that's not the only case of course*
00:40 airlied[d]: it all depends on the workload and how much effort/polish you want to apply
00:40 karolherbst[d]: there is no reason you can't do the same with CL/VK
00:40 karolherbst[d]: just needs somebody to do it
00:40 TranquilIty[m]: karolherbst[d]: Like, in the compiler? Or runtime? Or
00:41 karolherbst[d]: depends on the feature
00:41 TranquilIty[m]: Ah hmm
00:41 HdkR: Mythbusters Polishing dorodango definitely applies.
00:42 TranquilIty[m]: Wouldn't that usually be in the form of intrinsics? Like, extending SPIR-V then NIR to make them be able to make use of the feature
00:43 HdkR: It can be. Like matrix multiply extensions make people's lives easier, but they aren't strictly necessary.
00:43 karolherbst[d]: at least' it's just compiler work
00:43 HdkR: Until you start breaking out profile tools and digging in, it can be anything :D
00:44 karolherbst[d]: the thing is... none of this even matters if you don't have the tooling that's fun to use
00:44 HdkR: And that's still equating the ecosystem succeeding to being an optimization problem, which is strictly isn't
00:44 karolherbst[d]: we kinda need something like cuda-gdb just for mesa/vulkan
00:44 karolherbst[d]: or some GUI tool or whatever
00:44 TranquilIty[m]: Yea I mean, my goal here is to have something that can be used, it does not have to be as popular as CUDA
00:44 HdkR: karolherbst[d]: Make it for us. gimme.
00:45 karolherbst[d]: TranquilIty[m]: that already exists
00:46 karolherbst[d]: though maybe it's time for another API which is more fun to use
00:46 karolherbst[d]: dunno
00:46 HdkR: Vulkan 2.0
00:46 TranquilIty[m]: Hmm
00:46 HdkR: This time with less mobile constraints
00:46 orowith2os[d]: What's the problem with Vulkan on mobile?
00:47 orowith2os[d]: Aiui it's mostly just Android drivers not being conformant. Can't fix that.
00:47 HdkR: Mobile's the reason why the API is a nightmare.
00:47 HdkR: And a bit of older desktop parts, but they dead yo.
00:47 TranquilIty[m]: So just to make sure I understand it right, OpenCL is good enough, but it is still quite slower than CUDA due to the compiler + drivers lacking paths to optimize CUDA stuff ? I heard that there's issues with the higher level API not being expressive enough but idk
00:47 karolherbst[d]: thing is.. doesn't really matter much for compute
00:48 TranquilIty[m]: I also gotta look up what cuda gdb is haha
00:48 karolherbst[d]: well
00:48 karolherbst[d]: it's gdb
00:48 karolherbst[d]: but you can set breakpoints in GPU code
00:48 karolherbst[d]: and the program just stops on the GPU
00:48 karolherbst[d]: like with normal gdb
00:48 karolherbst[d]: and you can read out GPU registers and everything
00:48 HdkR: 🤌It's so good.
00:49 karolherbst[d]: yeah, it's like.. don't even dream about competing with CUDA if you don't have a cuda-gdb alternative
00:49 TranquilIty[m]: Oh huh
00:49 TranquilIty[m]: Now I want that
00:49 karolherbst[d]: yeah, it's awesome
00:49 TranquilIty[m]: I wonder how it works
00:50 karolherbst[d]: they removed support for debugging CL applications with it tho 🙃
00:50 karolherbst[d]: TranquilIty[m]: well.. you just put the GPU into single stepping mode
00:50 TranquilIty[m]: Oh, welp :/
00:50 karolherbst[d]: it's not even hard
00:50 karolherbst[d]: there is a trap handler you can set up
00:50 TranquilIty[m]: Oh huh, TIL
00:50 karolherbst[d]: and it handles well.. trap signals
00:50 karolherbst[d]: and a breakpoint is one of those
00:51 HdkR: It's basically exactly how a CPU debugger does it.
00:51 karolherbst[d]: it's just GPU code handling the commands sent to it
00:51 TranquilIty[m]: Do we have docs or like, stuff that I could reference to try to get something hooked up x3
00:51 karolherbst[d]: need some communication channel, but that's just a ring-buffer or whatever you want to use
00:51 karolherbst[d]: we don't
00:51 TranquilIty[m]: Tho probably not a great introductory project ? Idk
00:51 TranquilIty[m]: Welp
00:51 karolherbst[d]: in principle we know how it works, but it needs more reverse engineering
00:52 orowith2os[d]: HdkR: I'll probably need some explanation on that, but now you've got me interested, and I might start up a termux session or smth...
00:53 TranquilIty[m]: karolherbst[d]: Ah, on the prop stack, with cuda gdb?
00:53 karolherbst[d]: the simpler thing to do is to have a trap handler which just dumps state, e.g. when you hit invalid memory or something
00:53 HdkR: orowith2os[d]: What? The API being jank? Look no further than renderpasses my friend.
00:54 TranquilIty[m]: Hmm, do we have anything in Mesa using trap handlers that I could reference ?
00:54 karolherbst[d]: nope
00:54 orowith2os[d]: HdkR: Yeah, but, I don't have much experience other than seeing what Vulkan looks like from Rust, and a Vulkan tutorial
00:54 TranquilIty[m]: So I'd have to do reverse engineering of the comms that the prop driver does ?
00:55 orowith2os[d]: I wouldn't know what jank would be, for Vulkan
00:55 TranquilIty[m]: I like Vulkan
00:55 TranquilIty[m]: We ignore render passes
00:56 karolherbst[d]: vulkan became great with shader objects 😛
00:57 TranquilIty[m]: I really need to look properly into the more recent VK extensions
00:57 TranquilIty[m]: I could really use smth like cuda gdb rn on Intel for Vulkan as I debug a Vulkan app
00:57 karolherbst[d]: shader objects is the thing hated by every vendor besides nvidia
00:58 TranquilIty[m]: Oh?
00:58 karolherbst[d]: nvidia is the only vendor which has all those shader stages natively in hardware
00:58 karolherbst[d]: so for everybody else it's pain
00:59 TranquilIty[m]: Oooh lmao
01:00 karolherbst[d]: the mobile space usually only has vert and fp shaders in hardware
01:00 HdkR: Just tell those other vendors to get gud :P
01:00 karolherbst[d]: everything else.. well.. let's just say they have vertex shaders in hardware
01:01 HdkR: Hey, at least the new ones tend to have some sort of compute specific stage now, which might just be their tile stage rebranded :P
01:01 karolherbst[d]: oh right
01:01 orowith2os[d]: TranquilIty[m]: Not the Vulkan validation layers?
01:01 karolherbst[d]: mesh and task shaders, where everybody else was like "ah yeah, let's run it on the compute stage" and nvidia was like: ".... you know what? We have tes and vertex shaders and..."
01:01 TranquilIty[m]: orowith2os[d]: ? I am not sure I understand the question
01:02 HdkR: You take your Vertex A stage, and your Vertex B stage, and then it feeds in to your GS stage. Everyone wins.
01:02 karolherbst[d]: it makes perfect sense
01:02 orowith2os[d]: TranquilIty[m]: The validation layers make sure the API usage is correct. They can't help outside of that, but depending on what you're debugging, they might be helpful
01:02 karolherbst[d]: shared memory is a pain, but....
01:02 karolherbst[d]: just wing it and hope nobody actually uses it
01:03 HdkR: Shared memory is such a rats nest of UB that the developer likely stepped on a landmine anyway. Can blame it on them.
01:03 karolherbst[d]: it's funny, because in the 3D stage you got no shared mem on nvidia
01:04 karolherbst[d]: I think unused vertex attribute space is used to implement it or something?
01:04 HdkR: Random things are shared yea
01:05 HdkR: You don't really need anything more than butterfly shuffles anyway, use registers as your shared memory.
01:05 karolherbst[d]: well.. on mobile chips it's just system RAM 🙃
01:05 karolherbst[d]: HdkR: yeah pain...
01:05 airlied[d]: shaders objs were nvidia's revenge for descriptor buffers
01:05 karolherbst[d]: I already haave to deal with shuffles in my coop matrix layout conversion code
01:05 karolherbst[d]: it's great
01:05 HdkR: :D
01:05 karolherbst[d]: just do... uhm.. 8 shuffles in a row
01:05 karolherbst[d]: because reasons (tm)
01:06 TranquilIty[m]: orowith2os[d]: I am getting writes into the depth buffer that I would not expect, as they do not equal interpolated gl_Position .z/.w
01:06 TranquilIty[m]: The only thing I can think of rn is to run it with a software impl of Vulkan and attach GDB
01:07 HdkR: Early-Z and Fragment depth go brrr without barriers?
01:08 gfxstrand[d]: snowycoder[d]: FYI: I just merged the large `SSARef` MR which is going to conflict with the SM30 stuff we haven't landed yet. If you want me to help with the rebase, just say the word.
01:21 gfxstrand[d]: TranquilIty[m]: I think the answer that everyone is dancing around is that you CAN get a competent compute application running today with Vulkan compute or OpenCL. The problem is that the dev environment sucks and no one has a plan to fix that. But if you're dedicated to making something work and are okay with kinda shitty tools, you can totally build that app today.
01:22 karolherbst[d]: mesa-gdb when
01:22 gfxstrand[d]: Like, there was a presentation at this year's Vulkanised about porting a high-energy astrophysics package to Vulkan. It can totally be done. It's just a sucky experience.
01:23 orowith2os[d]: We have LLMs and upscalers and whatnot running on top of Vulkan, and Davinci on Rusticl (on Vulkan), so it's all definitely possible, and already out in the wild
01:24 gfxstrand[d]: (Said high-energy astrophysics package was also ported to Rust at the same time.)
01:24 orowith2os[d]: I know [Upscaler](https://flathub.org/apps/io.gitlab.theevilskeleton.Upscaler) uses some model that makes use of Vulkan
01:24 karolherbst[d]: the thing is...
01:25 karolherbst[d]: at some point you want to debug
01:25 karolherbst[d]: and you also want great perf
01:25 karolherbst[d]: and you get those two things by having great tooling
01:25 gfxstrand[d]: Oh, I'm not arguing with that
01:26 karolherbst[d]: anyway.. mesa-gdb when 😄
01:28 karolherbst[d]: could we hook into gdb like that without a fork?
01:28 karolherbst[d]: I honestly have no idea where to begin even
01:28 gfxstrand[d]: We'd have a lot better chance of getting into upstream GDB than NVIDIA
01:28 gfxstrand[d]: But IDK
01:28 karolherbst[d]: not sure doing it in upstream gdb will be viable, given that it would require us to have some weird ABI
01:28 karolherbst[d]: it requires shader code
01:28 TranquilIty[m]: <gfxstrand[d]> "Tranquil Ity: I think the answer..." <- Oh hmm
01:29 karolherbst[d]: so that's going to be lots of fun
01:29 karolherbst[d]: maybe could have a custom vulkan ext that exposes tons of weirdo internal stuff, but...
01:29 airlied[d]: there's been some discussions around dwarf extensions
01:29 karolherbst[d]: dwarf extensions is one thing
01:29 karolherbst[d]: it requires compiled GPU binaries
01:30 karolherbst[d]: and I know that the upstream answer to that is "yeah, packagers are just compiling 10000 versions of it and..."
01:30 karolherbst[d]: though maybe just have a binary blob in gdbs source code
01:31 karolherbst[d]: don't have to compile it all the time
01:31 karolherbst[d]: or well.. assembly
01:31 gfxstrand[d]: Yeah, we would need some sort of way of loading the debug info on-demand through the live shader compiler, not from the on-disk binary.
01:31 karolherbst[d]: ohh that's another issue
01:31 karolherbst[d]: what I mean is, doing the actual debugging also requires GPU shader code
01:31 karolherbst[d]: like the debugger interfaces with the shader trap handle being part of the GPU pipeline
01:32 karolherbst[d]: there are 3D/compute methods to set it all up
01:32 karolherbst[d]: and I think it also requires mmio whacking
01:32 karolherbst[d]: though that might work from userspace
01:32 gfxstrand[d]: Yeah. All the trap stuff would also have to go through the driver as well.
01:32 karolherbst[d]: yeah...
01:32 karolherbst[d]: and I don't think we want to have a super stable ABI there
01:33 gfxstrand[d]: Not sure. We'd want to build it first and then iterate for a while and maybe eventually stabilize something.
01:33 karolherbst[d]: could be some internal API between mesa and gdb to set it all up and add some reflection to figure out what registers the GPU has and stuff..
01:33 karolherbst[d]: yeah...
01:33 karolherbst[d]: so that was my initial thought
01:34 karolherbst[d]: can we just provide a gdb plugin built inside mesa or something
01:34 karolherbst[d]: or have a mesa-gdb wrapper that sets everything up
01:34 karolherbst[d]: not sure how nvidia does it
01:35 airlied[d]: the other option is some sort of gdb server
01:35 airlied[d]: you treat it like a remote target
01:39 gfxstrand[d]: That's not a bad plan.
01:45 gfxstrand[d]: Or at least not a bad general shape of a plan. There's a hell of a lot of details in "some sort of GDB server".
01:45 HdkR: gdbserver is the route yea, you can load all your objects from memory with the remote filesystem commands.
01:46 gfxstrand[d]: But hey, we wrote down a good enough description for a PPT slide. Design work done. Right?
01:49 airlied[d]: just stick in the backlog and we can run sprints 🙂
01:50 airlied[d]: reminds me I've another few hours of agile training to imbibe
01:50 karolherbst[d]: there is agile training?
01:52 airlied[d]: there will be 🙂
01:53 karolherbst[d]: pleasant
01:53 gfxstrand[d]: The agile training will continue until productivity and moral improve.
01:54 karolherbst[d]: agile trianing AI edition
01:54 gfxstrand[d]: AIgile training?
01:54 airlied[d]: gfxstrand[d]: the prod/morale statment is entirely true 🙂
01:55 airlied[d]: it's like until the happiness numbers go up, we are going to mandate training on things that don't make you happy
01:56 gfxstrand[d]: And that's why the training never ends...
01:58 airlied[d]: not sure if the paint fumes from painters are making this better or worse
02:06 swee: does this channel offer help for the new NVK driver too?
02:09 mhenning[d]: yes, this channel covers nvk
02:10 swee: yea so recently I updated mesa and got NVK working, but I found Falkon segfaulting from time to time (especially interacting with a site)
02:10 swee: I was able to get these logs, stdout: https://p.swee.codes/swee/72513a50a886402db6c1d5deff6361df
02:11 swee: dmesg: https://p.swee.codes/swee/affa1b4392de4e2289836753409c86da
02:12 swee: I'm not very sure if it's nvk+zink or nouveau finally updated to a stable ver on alpine
02:14 mhenning[d]: What do you get from ` glxinfo | grep "OpenGL renderer string" `?
02:14 swee: OpenGL renderer string: NV118
02:14 swee: fyi: GPU is NVIDIA GeForce 940MX
02:16 gfxstrand[d]: It's nouveau GL
02:16 orowith2os[d]: What Mesa version?
02:16 swee: 25.0.5-r0
02:17 orowith2os[d]: Yeah, as Faith said - NGL, 25.1 did the swap to Zink iirc
02:17 gfxstrand[d]: Not on that card. Only for Turing+
02:17 orowith2os[d]: Blech
02:18 orowith2os[d]: :glorp:
02:18 mhenning[d]: From the stdout, I think Falkon might be hitting nvk directly (using vulkan, not zink)
02:18 gfxstrand[d]: I should hack on CPU descriptors and maybe we can flip to Zink for Kepler+ starting with 25.2.
02:19 gfxstrand[d]: mhenning[d]: Nah, NVK would use a higher address for the semaphore report.
02:20 gfxstrand[d]: We allocate top-down. Kernel allocates bottom-up.
02:21 swee: currently just using DRI_PRIME=0 for now, https://p.swee.codes/swee/a94d65c390a74fb2ae53819fc147a66b
02:22 mhenning[d]: gfxstrand[d]: The printouts look like nvk debug stuff to me. Unless something changed recently the GL driver doesn't print things out in that format
02:22 gfxstrand[d]: Yeah, I think karolherbst[d] hooked the printer up to the GL driver.
02:22 mhenning[d]: Ah, okay. Might be a gl bug then
02:22 gfxstrand[d]: Looks like something with render targets.
02:23 gfxstrand[d]: I haven't decoded the failed method datas to try and figure out what
02:23 gfxstrand[d]: But those are render target methods
02:24 mhenning[d]: based on what? the dmesg?
02:24 gfxstrand[d]: Yeah
02:24 gfxstrand[d]: 0808 is SET_COLOR_TARGET_WIDTH
02:25 mhenning[d]: ah, makes sense
02:25 airlied[d]: I hooked it up a while back, my in-brain hex to push converter wasn't working
02:25 swee: there seems to be a lot of this in the dmesg:
02:25 swee: nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 6013d4 [ PRIVRING ]
02:26 mhenning[d]: airlied[d]: right, I was just confused because the gl driver predates the headers and therefore that naming convention
02:27 mhenning[d]: swee: I have a feeling we won't figure this out right away. could you file a bug on the bug tracker: https://gitlab.freedesktop.org/mesa/mesa/-/issues
02:30 mhenning[d]: You're using the nouveau gl driver right now. If you want, you could also try switching to nvk+zink using environment variables and seeing if that works better (requires a very recent mesa)
02:30 swee: how do I do that?
02:32 mhenning[d]: oh, wait, 25.1 isn't actually released yet, is it?
02:34 mhenning[d]: you'd probably want a 25.1 release candidate and then to set `NOUVEAU_USE_ZINK=1`
02:35 mhenning[d]: I think that would get you to nvk+zink - you can check by seeing if zink is mentioned in ` glxinfo | grep "OpenGL renderer string" `
02:36 gfxstrand[d]: No, it just hit RC3. Probably released in another week or two. I think we usually do 4 RCs.
09:18 snowycoder[d]: gfxstrand[d]: Don't worry, I'll rebase it, I also need to fix some opcodes because they now assert, oops.
09:18 snowycoder[d]: I'm just a bit short on time with work, sorry
09:21 snowycoder[d]: gfxstrand[d]: You really think Kepler+ is ready?
13:13 gfxstrand[d]: snowycoder[d]: No worries
13:14 gfxstrand[d]: snowycoder[d]: Maxwell+ is mostly fine. It's just slow because everything is bindless but that's fixable.
13:14 gfxstrand[d]: Kepler needs to be finished, obviously.
13:48 snowycoder[d]: gfxstrand[d]: Bindless is that much slower than binded?
13:50 gfxstrand[d]: It does take a hit, yeah. Especially for UBOs
13:50 gfxstrand[d]: If it was just textures, it might be alright. UBOs are gonna murder us, though.
15:01 mangodev[d]: gfxstrand[d]: funny thing, i've found minecraft shaders to be an easy way to test the performance of different parts of a game
15:01 mangodev[d]: easily toggleable (and hand-writeable) components that can be separately toggled and modified
15:01 mangodev[d]: admittedly it's ogl and not vulkan, which would be more ideal because it's direct to the driver, but it still shows parts that run slower than expected
15:01 mangodev[d]: i'm replying to this because ubo's feel slow no matter what, even with binding
15:02 mangodev[d]: although anti-aliasing takes an alarmingly massive performance hit 🫠
15:02 mangodev[d]: even fxaa is a little hefty of a cost on frame rate, and that's normally supposed to be effectively free
15:03 mangodev[d]: shadows surprisingly don't cost a crap ton anymore, although they *could* cost less
15:04 mangodev[d]: didn't someone here say anti-aliasing is so slow because of a lack of dcc?
15:04 mohamexiety[d]: I'd imagine MSAA is yeah
15:05 mangodev[d]: I was gonna say
15:05 mangodev[d]: i don't see why dcc would help with fxaa, taa, smaa, etc
15:07 mangodev[d]: because all of those visibly struggle on nvk
15:07 mangodev[d]: taa doesn't cause a drop from 120 to 90 for most drivers afaik
15:08 mangodev[d]: (mangohud crashes on nvk so i can't gather ms frame times)
15:11 gfxstrand[d]: Newegg says my 5090 is out for delivery. 🥳
16:27 mhenning[d]: gfxstrand[d]: want to review a patch? this one's tiny I promise https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34591
16:37 gfxstrand[d]: Looks good
18:28 gfxstrand[d]: Ugh... lower_pack is madness.
18:29 gfxstrand[d]: mhenning[d]: Did you have patches to improve packing lowering at one point? I don't want to duplicate effort here but I found another pack opt loop. 😩
18:32 gfxstrand[d]: I'm not seeing anything
18:41 gfxstrand[d]: I'm tempted to just turn off nir_lower_packing
18:43 karolherbst[d]: probably the best option
19:20 gfxstrand[d]: This just in: NIR's data [un]packing is way too complicated
19:30 gfxstrand[d]: Damnit! My machine needs a reboot and my PiKVM isn't plugged in.
19:30 gfxstrand[d]: Let's see if a soft reboot works. If not, I may be going in to the office today after all.
19:33 gfxstrand[d]: Soft reboot worked. :silvy_sweat:
20:18 mhenning[d]: gfxstrand[d]: I don't think so
20:19 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34849
20:19 gfxstrand[d]: Haven't run any sort of shader-db on it but it fixes some bugs I'm seeing in some new CTS tests where they get caught in a NIR lowering/opt loop because of pack/unpack lowering.
20:20 gfxstrand[d]: It's all prmt to us
20:23 airlied[d]: I had to enable op.lower_pack_64_4x16 = true; in my coop work, but I can't remember why
20:30 karolherbst[d]: isn't that the fix I landed for iris and lp?
20:30 karolherbst[d]: but yeah.. `lower_pack_64_4x16` was the fix to those opt loops issues
20:30 karolherbst[d]: anyway, it's a disaster
20:34 gfxstrand[d]: That might be more backportable
20:34 gfxstrand[d]: But I think the above patch should backport okay
20:35 gfxstrand[d]: airlied[d]: Yeah, it's the 4x16 case that breaks. There's an opt_algebraic rule that conflicts with lower_pack
20:44 gfxstrand[d]: My primary concern is that if we disable `lower_pack`, we might loose out on some ability to optimize things in NIR. But IDK if that's actually true or not
20:45 gfxstrand[d]: Looks like RADV already pretty much dropped it in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6527 so we're probably okay to do the same.
20:46 airlied[d]: should probably at least shaderdb it
20:49 gfxstrand[d]: Ugh... My shader-db is on the other desktop. 😢
20:57 airlied[d]: the modern day, I left my wallet in my other pants
21:00 gfxstrand[d]: Pretty much
21:00 gfxstrand[d]: That means I either need to set it up all over again or wait a week
21:01 airlied[d]: or ask mhenning[d] 🙂
21:01 gfxstrand[d]: Or that
21:01 airlied[d]: I'm sure I have had access to a more complete shaderdb in the past, but I'm also sure I've totally forgotten any other details
21:03 mhenning[d]: I don't have the largest collections of shaders here either, but I can run it
21:14 gfxstrand[d]: airlied[d]: Okay, I added `op.lower_pack_64_4x16 = true` as a back-port before my patch which then deletes it.
21:29 gfxstrand[d]: I'm very intrigued by these new sparse fails. But I'm gonna let mohamexiety[d] take a crack at them first
21:29 gfxstrand[d]: Must control my debugging itch...
21:33 mohamexiety[d]: yep, will do tomorrow!