IRC Logs of #nouveau on irc.freenode.net for 2025-03-12

00:38 airlied[d]: if I have " st.global.a64.strong.gpu.u16 [r8..10] r5 // delay=1 rd:0
00:38 airlied[d]: r0 = prmt r2 [0x32] rZ // delay=6
00:38 airlied[d]: st.global.a64.strong.gpu.u16 [r8..10+0x2] r0 // delay=2 rd:1
00:38 airlied[d]: " what is stopping me from getting a 32-bit store instead of 2 16bit ones?
00:39 mhenning[d]: alignment maybe
00:39 karolherbst[d]: address alignment
00:40 karolherbst[d]: there is this cool nir pass which can merge those loads taking alignment information into account
00:40 karolherbst[d]: I'm sure nak already uses it tho
00:40 mhenning[d]: The relevant optimization is nir_opt_load_store_vectorize
00:40 mhenning[d]: and yes we do call it
00:41 airlied[d]: yes but for some reason that also hasn't kicked in when I expect it, but maybe alignment is an issue
00:41 karolherbst[d]: I didn't meana the vectorize one
00:41 mhenning[d]: it may also be broken apart in nir_lower_mem_access_bit_sizes
00:41 karolherbst[d]: nir_lower_mem_access_bit_sizes can do both
00:41 mhenning[d]: no? I don't think it optimizes at all
00:41 airlied[d]: 32 %76 = pack_32_2x16 %66
00:41 airlied[d]: 32x4 %77 = deref_ptr_as_array &(*%45)[%75] (ssbo float16_t) // &((Storage *)%41)->result[0][%75]
00:41 airlied[d]: 32x4 %78 = deref_cast (float *)%77 (ssbo float) (ptr_stride=4, align_mul=0, align_offset=0)
00:41 airlied[d]: 32x4 %79 = deref_ptr_as_array &(*%78)[%74] (ssbo float) // &(*(float *)%77)[%74]
00:41 airlied[d]: @store_deref (%79, %76) (wrmask=x, access=none)
00:41 airlied[d]: is the hacky NIR
00:42 karolherbst[d]: uhh wait.. nir_lower_mem_access_bit_sizes can only increase the bit_size if the load wasn't split in the first place...
00:43 mhenning[d]: right nir_opt_load_store_vectorize combines them. nir_lower_mem_access_bit_sizes runs later and breaks them apart
00:43 karolherbst[d]: airlied[d]: yeah.. needs align_mul to be 4
00:43 mhenning[d]: yep. whatever's generating it needs to set the alignment info
00:44 karolherbst[d]: one interesting thing to do might be to do a vec8 16 bit load/store with nir_opt_load_store_vectorize and then convert that to vec4 32 bit with nir_lower_mem_access_bit_sizes
00:46 mhenning[d]: not sure what you're suggesting
00:46 airlied[d]: ah yes align_mul combines them! thanks!
00:47 karolherbst[d]: mhenning[d]: you could vectorize 8 16 bit load/stores to a combined vec8@16, and then let nir_lower_mem_access_bit_sizes convert that to a vec4@32 operation
00:47 karolherbst[d]: as the latter is actually supported in hardware
00:48 karolherbst[d]: though that of course requires quite the optimistic alignment guarantee
00:48 mhenning[d]: yeah, we already support 128-bit loads, although i forget if it's represented that way or as a 2x64 at the nir level
00:49 karolherbst[d]: I'd assume it's 4x32 for nak
00:49 karolherbst[d]: because that's how the hardware reasons about it
00:49 karolherbst[d]: but dunno
00:50 karolherbst[d]: might not matter when converting from nir to nak, I haven't checked
00:50 mhenning[d]: looking at it, I think we use either 4x32 or 2x64 at the nir level
00:50 mhenning[d]: and yeah, it doesn't matter how the nir breaks it apart, only the total bit size
00:51 karolherbst[d]: ahh so in theory 8x16 would also work?
00:51 karolherbst[d]: though not sure you want to deal with vec8s 😄
00:51 karolherbst[d]: unless you really have to
00:51 mhenning[d]: i think so?
00:51 karolherbst[d]: I might have to look at more nvk/nak stuff soonish anyway
00:56 zmike[d]: Our first bug report!
00:57 airlied[d]: having to think in lanes is messing with my poor brain
00:59 karolherbst[d]: at least it's always 32 lanes
00:59 karolherbst[d]: could be worse
00:59 karolherbst[d]: though I like how nvidia has an actual query in the ISA to read out that 32
01:00 karolherbst[d]: as if there were plans to ever change it
01:00 karolherbst[d]: it's there since forever
01:00 gfxstrand[d]: karolherbst[d]: Yeah, NAK can handle whatever because as long as things have the right alignments.
01:01 karolherbst[d]: I wonder if nvidia will ever use a different subgroup size
01:02 gfxstrand[d]: Probably not
01:02 gfxstrand[d]: Not without redesigning everything
01:02 karolherbst[d]: yeah...
01:02 karolherbst[d]: and doing whatever AMD did isn't a promising outlook as well
01:04 karolherbst[d]: it's still funny that they keep the sysval around
01:04 airlied[d]: now I've made it going into an infinite opt loop trying to store a 64-bit
01:05 airlied[d]: nir_lower_pack, copy_prop and algebraic getting into some sort of optimise it harder I told you already
01:06 karolherbst[d]: ohhhhhh
01:06 karolherbst[d]: is it the one I fixed for other drivers?
01:06 karolherbst[d]: wiat a sec
01:06 karolherbst[d]: airlied[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33347
01:06 karolherbst[d]: do the same for nak I guess
01:07 karolherbst[d]: also.. somebody should clean up this pack/unpack lowering mess
01:07 airlied[d]: oh indeed looks like that
01:09 karolherbst[d]: there is a `skip_lower_packing_ops` where you need to disable lowering if you support the op
01:14 gfxstrand[d]: karolherbst[d]: Should be easy enough to wire up. It needs a little prmt but that's not too bad.
01:18 gfxstrand[d]: Or we can just lower to 2x16 and 2x32 late. That'll do the same thing
01:51 airlied[d]: using 32-bit load/stores got me one more TFLOP 😛
02:21 HdkR: It's FLOP hunting season?
02:21 HdkR: Make sure to feed those deep pipelines with no bubbles :P
02:29 airlied: I'm more of a scattershot shotgun than a scalpel :-P
02:35 mohamexiety[d]: one big boom, then big perf on multiple fronts? 😝
02:37 HdkR: I do like optimizing like that. Delete the code doing the real work, see what the "speed of light" is, and see how close it's possible to get to that :D
02:39 HdkR: Nobody needed those real results anyway
02:45 airlied: thankfully nvidia have already demonstrated the speed of light for me :-)
02:45 airlied: and gave a nice talk at vulkanised
02:51 HdkR: Cheeky. I guess I don't often know how far away from SOL I am
03:11 gfxstrand[d]: Why are you doing the cool shit and I'm stuck fixing Kopper bugs?
03:28 airlied[d]: I think typing in latency timings was my advance penance for doing cool stuff 🙂
03:29 airlied[d]: I don't think I have the disk space or ram to build Firefox :-p
03:30 gfxstrand[d]: 🤭
03:30 gfxstrand[d]: Both valid points, TBH.
03:33 gfxstrand[d]: I will get to reviewing the latency stuff. It might be too big for tomorrow but I'll hopefully be able to get back to my code review pile in the next week or so
03:33 gfxstrand[d]: Assuming people stop finding new Zink bugs. 🙃
03:42 gfxstrand[d]: I also have an entire Asahi UAPI to review (because apparently reviewing new DRM driver APIs is my job now 🤷🏻‍♀️) and a Rust Mali driver.
03:43 gfxstrand[d]: Oh, maybe tomorrow I'll fire up my switch and run the CTS on it. 🤔
03:44 gfxstrand[d]: That sounds like an entertaining diversion
03:48 gfxstrand[d]: I could also try to fix Maxwell A. I think I know how. I just need to type the code.
04:54 airlied[d]: hell yeah 11 TFLOPS 😛
04:54 airlied[d]: ldsm usage got that one
04:55 orowith2os[d]: Hmmm... Would a Nvidia laptop be a good test for NVK?
04:55 orowith2os[d]: I kinda want to use the Nvidia GPU as if it were on a desktop, i.e. no igpu...
04:55 tiredchiku[d]: look for laptops with a mux switch
04:55 orowith2os[d]: Ah, and then just disable the igpu?
04:56 orowith2os[d]: I guess then I need to find a laptop cheap and thin enough that I can carry it in my bag
04:56 tiredchiku[d]: no need to disable it even, since the mux switch makes it so that the nvidia gpu will render to the panel
04:56 tiredchiku[d]: or you can get any laptop and pair it with an external display
04:56 tiredchiku[d]: the hdmi port on laptops is usually wired to the dGPU
04:57 orowith2os[d]: The only external display I have is my TV
04:57 orowith2os[d]: And I'm rarely in my room outside of sleeping
04:57 orowith2os[d]: So I'll probably want one with a mux switch
04:57 tiredchiku[d]: mhm
04:57 tiredchiku[d]: or
04:57 tiredchiku[d]: you could get one that only has the dgpu
04:58 orowith2os[d]: True, not sure how common those are though
04:58 tiredchiku[d]: some amd laptop chips have no iGPU
04:59 tiredchiku[d]: friend has an Asus TUF A15 that's like that
05:00 orowith2os[d]: Hmm
05:01 orowith2os[d]: I'll do some more looking, but I might end up getting an old laptop from my dad if he has one. I know he likes to get new stuff.
05:01 gfxstrand[d]: Gonna file endless "by laptop won't wake up!" bug reports until airlied[d] fixes all the suspend bugs? 😅
05:03 tiredchiku[d]: tbf even desktop doesn't wake up sometimes
05:03 tiredchiku[d]: on both nouveau and nvprop
05:04 tiredchiku[d]: or rather, it wakes up, but the display never comes up
05:05 gfxstrand[d]: Yeah, I'm definitely having issues with X11 failing to bring the display back up. My desktops are set up to never actually suspend, though. I need to be able to SSH into them.
05:06 orowith2os[d]: Do the suspend bugs carry over to Wayland?
05:06 orowith2os[d]: I won't be touching X11 with a ten-foot pole (hell, I'll be compiling it out of my system, so...)
05:06 tiredchiku[d]: yeah, I've had issues on wl
05:07 orowith2os[d]: Hmm
05:07 orowith2os[d]: Okay
05:07 orowith2os[d]: I'd like to see what I can do to help out a bit there
05:07 orowith2os[d]: I would REALLY like to mess around with the kernel, though
05:08 orowith2os[d]: Sounds like userspace is pretty okay ;)
05:08 airlied[d]: gfxstrand[d]: is that on 570 as well?
05:08 airlied[d]: oh fun times, have a bunch of fp16 phis and they get put each into a 32-bit register, then there's a lot of prmt going on
05:36 tiredchiku[d]: bisecting xserver for the cursor thing
05:36 tiredchiku[d]: :froge:
05:37 tiredchiku[d]: redsheep[d]: you won't see the bug if you use software cursor, check your env vars
05:57 tiredchiku[d]: this makes no sense
06:07 tiredchiku[d]: ok nvm it makes sense
06:08 airlied[d]: okay got to 11.4 by hacking out a bunch of phi scalarizing, reduced gprs to 91 and max warps from 12->20
06:25 tiredchiku[d]: ok no this makes no sense
06:29 tiredchiku[d]: my xserver bisect has gone past the last release (by commit date)
06:29 tiredchiku[d]: and the cursor is still squished
06:33 tiredchiku[d]: this is super stinky
06:36 redsheep[d]: tiredchiku[d]: I wouldn't expect my cursor plane to conditionally appear in the kernel debug sysfs if it was software, and I also stopped using that env var a few weeks ago, it's commented out and was only ever set in my profile
06:37 tiredchiku[d]: okay just making sure, I didn't know you'd stopped using it
06:37 tiredchiku[d]: tiredchiku[d]: the config options are identical to the distro pkg..
06:37 tiredchiku[d]: :Hmmm:
06:37 redsheep[d]: I am just never seeing less than a 256x256 cursor plane and nothing I have done has changed that, if your bisect isn't identiying an xorg patch that should have caused that then idk
06:38 tiredchiku[d]: time to build arch's pkgbuild locally
06:39 tiredchiku[d]: inb4 it's caused by my makepkg config
06:40 redsheep[d]: It was also happening on fedora for Faith though, right? It could be config but if so fedora is also doing it
06:49 tiredchiku[d]: so
06:49 tiredchiku[d]: it is not my makepkg config
06:49 tiredchiku[d]: <a:angr:1022261683332321350>
06:53 tiredchiku[d]: bah
06:53 tiredchiku[d]: I'll investigate this more in the evening
07:27 tiredchiku[d]: oh ffs
07:28 tiredchiku[d]: I was bisecting the wrong set of commits
07:38 tiredchiku[d]: okay
07:38 tiredchiku[d]: xorg tagging is a mess, there's a lot of backports from master onto 21.1.15 to make 21.1.16
07:38 tiredchiku[d]: and the commits aren't even in the same order
07:39 tiredchiku[d]: ~~I think it'll be easier to wait for the next xserver release and bisect between those, if the issue still happens on the next tag~~
11:21 jumpingnemo: this thing to trim in the access of format is doable in classy fashion, but i tried to avoid to talk about it (but you get this anyhow). However here the final term was base-reference-value-higherbound+distance for example, so 202 is for 120index base 256-70+16, since high bound for 120 is 70 and low 66. now to a compiled sequence one adds, 66-16-16=34, so hence you can use all the spectrum
11:21 jumpingnemo: below 34-1 , so let's try 66+33 is the sequence, where 66 we do not touch it's for some X index, now 34+16+16 was 70, hence -202 comes out as -137 as , you add the known 136, and add back the original value 66+33 with twice distances removed. This boils down to 33+66+16+16-202+136=65+33+66 is 32+132=146 and 66+132-146=34 so 34+1-34 results in 1 , so 34-1 is 33, value extracted. So such is
11:21 jumpingnemo: the method under hood, and this is for real now. Spectrum is easy to analyse under supporting formulas.
12:02 snowycoder[d]: If anyone wants to check the parser out, it's in an useful state.
12:02 snowycoder[d]: I've added some basic unit tests for most optimization passes (if you have any suggestions I'll add them!).
12:02 snowycoder[d]: If anyone wants to leave a comment: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33691
12:08 gfxstrand[d]: airlied[d]: Yes
12:24 gfxstrand[d]: snowycoder[d]: Thanks! I'll take a look soon.
14:00 phomes_[d]: Not sure if this is the same issue. I disconnect/reconnect the hdmi cable at least once a day to get my display to come back. I am using an actual TV as screen so it turns off completely
16:36 gfxstrand[d]: Let's see if putting QMDs in their own heap is enough for Maxwell A or if I have to DMA them.
16:39 gfxstrand[d]: Gotta rebuild the CTS first to get rid of the annoying assert. :blobcatnotlikethis:
16:43 tiredchiku[d]: ironically, with all the fixes
16:44 tiredchiku[d]: znvk x11 is better than znvk wayland for me, at least on plasma
16:44 mhenning[d]: better in what way?
16:45 tiredchiku[d]: it handles high refresh rate + dp audio on my single monitor setup better than kwin wayland
16:46 tiredchiku[d]: kwin wayland does this:
16:46 tiredchiku[d]: https://discord.com/channels/1033216351990456371/1034184951790305330/1280868506044203030
16:46 tiredchiku[d]: jittery display that eventually causes a mode reset on any framerate above 120
16:47 tiredchiku[d]: I have a 180hz monitor, which gets handicapped to 165 on the nouveau kmod
16:50 tiredchiku[d]: 1440p
16:50 orowith2os[d]: Sounds like a bug in nouveau, where the nouveau xf86 driver does something different
16:50 orowith2os[d]: Or are you using modesetting?
16:51 orowith2os[d]: Could you try on GNOME or a standalone WM?
16:52 tiredchiku[d]: modesetting
16:53 tiredchiku[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1349424992911098048/e8ea8e8e-1070-4c35-a2a6-b944aba762c9.jpg?ex=67d30d75&is=67d1bbf5&hm=75cf7a9ed07fc0982dbf5fcbc9306252de93afed966c4ce3246d1329f8cce2ba&
16:53 tiredchiku[d]: orowith2os[d]: sure, what do you want me to try
16:53 orowith2os[d]: OOF
16:53 orowith2os[d]: Weston, Gnome Wayland, and Sway, in that order, pretty pls
16:53 tiredchiku[d]: 🎵 yo ho yo ho a tester's life for me 🎵
16:54 tiredchiku[d]: 🅱️eston appears to be fine
16:55 mhenning[d]: orowith2os[d]: There's no nouveau xf86 driver after pascal
16:55 orowith2os[d]: Oh, silly
16:55 tiredchiku[d]: gnome also appears to be fine
16:55 orowith2os[d]: Sounds like a weird kwin thing, then
16:55 tiredchiku[d]: yeah, might go bully Xaver about it
16:56 tiredchiku[d]: xfce-wayland (wlroots based) is also fine
16:57 tiredchiku[d]: labwc backend for that
16:57 tiredchiku[d]: :evil:
16:57 mhenning[d]: It's possible kwin is just choosing the modes differently. It's not totally clear to me if this is a kernel or a kwin bug
16:57 tiredchiku[d]: Xaver I'm coming for you
16:57 tiredchiku[d]: oh
16:57 tiredchiku[d]: I should probably check the default mode they're all selecting
16:58 tiredchiku[d]: since it doesn't happen on 120hz and below
16:58 tiredchiku[d]: back to the beginning!
16:58 tiredchiku[d]: aha
16:59 tiredchiku[d]: :B:eston is on 60hz
16:59 tiredchiku[d]: as is gnome by default
17:00 tiredchiku[d]: moment of truth
17:00 tiredchiku[d]: !
17:00 tiredchiku[d]: Gnome is not fine at 165hz
17:03 tiredchiku[d]: neither is labwc/xfce-wayland
17:04 tiredchiku[d]: how do I change 🅱️eston's mode..
17:06 tiredchiku[d]: bah I shouldn't have said anything
17:06 tiredchiku[d]: kwin x11 is also not fine anymore
17:08 tiredchiku[d]: sounds like a kernel bug then :froge:
17:09 karolherbst[d]: is that on HDMI or DP?
17:10 tiredchiku[d]: deepee
17:10 tiredchiku[d]: I don't HDMI, since this monitor only supports 180hz on DP
17:25 orowith2os[d]: tiredchiku[d]: Finicky little guy
17:25 orowith2os[d]: :blobcatnotlikethis:
17:25 tiredchiku[d]: <a:kittynod:1081663261046489169>
17:26 tiredchiku[d]: back to using front panel I guess
17:27 orowith2os[d]: Is anybody working on the kernel driver right now outside of uAPI for NVK?
17:28 orowith2os[d]: That and GSP are all I've heard
18:28 gfxstrand[d]: There's some bug fixing that happens
18:36 gfxstrand[d]: But it's all pretty ad-hoc as there aren't many of us working on any part of nouveau full-time.
19:49 mhenning[d]: Do we have a way of dumping QMDs that the blob generates?
20:08 pavlo_kozlenko[d]: tiredchiku[d]: It's scary to imagine how much it weighs and how many branded programs from graphics environments you have installed
20:08 pavlo_kozlenko[d]: ☠️
20:08 tiredchiku[d]: only the kde apps
20:08 gfxstrand[d]: mhenning[d]: The blob usually uses inline QMDs so you can see them pretty easily but the parser isn't smart enough to parse them.
20:09 airlied[d]: I should know this, what's the best way to dump a shader asm from the nvidia vulkan driver?
20:10 mhenning[d]: airlied[d]: https://gitlab.freedesktop.org/nouveau/nv-shader-tools/
20:12 mhenning[d]: gfxstrand[d]: You're saying it'll probably show up as inline data in an envyhooks trace?
20:52 gfxstrand[d]: yes
20:53 gfxstrand[d]: Got Maxwell B to survive a run:
20:53 gfxstrand[d]: Pass: 1105778, Fail: 27, Crash: 25, Skip: 1707655, Timeout: 16, Duration: 3:44:22, Remaining: 0
20:56 orowith2os[d]: Only 27 failing? Wow.
21:00 gfxstrand[d]: Yeah. My QMD pool trick seems to have worked to fix the SKED faults
21:08 gfxstrand[d]: airlied[d]: Does sparse binding have a different alignment requirement on older hardware?
21:08 gfxstrand[d]: skeggsb9778[d]: ^^
21:13 airlied[d]: I don't really know what's happening pre-turing
21:13 airlied[d]: oh removing some membars gives me 11->15TFLOPS, time to figure out what those are for 😛
21:18 gfxstrand[d]: woohoo
21:21 jumpingnemo: bwidawsk once said how I am going to conquer the world, yeah good cryptography artist who follows the rules of logics can conquer a lot indeed. I slightly trolled you as to show bad examples, how things would not work out, then showed the really correct ways how cryptographically things solve very simply, i hope you did not get too upset, my idea was to give enough number system tweaks and
21:21 jumpingnemo: produce enough data so you definitely start to grasp as to how to manipulate the spectrum range, those terms i gave are interfused in a dependent way, and it is utterly cool and simple to manipulate the spectrum with supported arithmetic on those variables in numeral system, for an example i could write a pdf how those fields influence each other as to what happens to very field when on is
21:21 jumpingnemo: changed, and how it influences the outcome, but i do not actually know whether your bosses need such high throughput systems, but the world starts to go mad and out of order anyhow, we might as well have a backup plan, and i do have it, i am working with compiler based tweaks now soon. But in contrast to you calling me insane, i would not call anyone of you in the era stupid or insane, i
21:21 jumpingnemo: love life , and this UNIX might need new breathing since it grows out of size goes too bulky, but is just wonderful system, linux is totally fantastic. And i am going to contribute if you want this to happen, but can also split ways if this is desired instead. I am in so good form my own i achieve everything i want alone too.
21:22 jumpingnemo: mathematicians solve those tasks in a very intuitive and fluend ways i saw.
21:41 mhenning[d]: karolherbst[d]: hey, want to approve some email I sent to the nouveau mailing list
21:41 mhenning[d]: its not spam i swear
21:41 karolherbst[d]: your zcull patches?
21:41 mhenning[d]: yeah
21:41 karolherbst[d]: they are on the ML, no?
21:41 karolherbst[d]: uhm..
21:41 karolherbst[d]: maybe I got them directly
21:42 mhenning[d]: Oh looks like it's in the archive now, never mind me
21:43 karolherbst[d]: I can't login anyway 🙃
21:44 karolherbst[d]: maybe I stored the wrong password? weird
21:45 karolherbst[d]: maybe it got an even newer password now?
21:48 karolherbst[d]: nevermind, I can access the mod queue
21:48 karolherbst[d]: just not the actual admin interface
22:40 airlied[d]: mhenning[d]: I should use that with Faith's shader dump layer? or does it do extraction and I missed it?
22:45 mhenning[d]: airlied[d]: The nvdump tool assumes you have a spirv file and handles everything needed to get disassembly from that
22:47 airlied[d]: oh I do have a spirv file so that should be easy then
22:47 mhenning[d]: The shader dump layer is an earlier version of a similar idea - you don't need to use that
22:49 airlied[d]: dang it spirv-reflect doesn't support something in it
23:20 airlied[d]: hacked pass those to crash inside nvidia driver 😦
23:41 airlied[d]: uggh and dump layer seems to need an older nvidia driver 😦
23:43 airlied[d]: oh lzma
23:54 pavlo_kozlenko[d]: jumpingnemo: thanks you