01:57rinlovesyou[d]: Guh i might've broken my nvidia drivers and nouveau isn't taking me to the login
01:58rinlovesyou[d]: I don't wanna have to chmod :KumiWaaa:
02:12rinlovesyou[d]: okay i've somehow managed to recover it, still don't understand why nouveau suddenly gets stuck on a blank terminal screen
02:19rinlovesyou[d]: Oh yeah it's completely freezing even a tty...
02:32mhenning[d]: What mesa version / graphics card do you have?
02:33rinlovesyou[d]: i'm building from the latest git on a 2070 super
02:34rinlovesyou[d]: also installed the latest stock linux kernel `6.15.6` just to make sure cachyos' kernel isn't doing anything funny
02:34mhenning[d]: are you building both zink and nvk?
02:34rinlovesyou[d]: yep
02:37rinlovesyou[d]: As soon as it gets past the splash screen it just freezes the tty
02:39rinlovesyou[d]: Alright i think i definitely see some errors after disabling the splash screen but it's a bit difficult since it flashes by so quick
02:40rinlovesyou[d]: Some gsp related stuff
02:42mhenning[d]: hmm. There was a report that had some similar symptoms but I thought should be fixed already at this point https://gitlab.freedesktop.org/mesa/mesa/-/issues/13317
02:43mhenning[d]: Knowing what the log says would also be useful
02:43rinlovesyou[d]: let me pull up the last boot with journalctl
02:45mhenning[d]: you could also try https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36014 which fixes a bunch of random stuff
02:45rinlovesyou[d]: Jul 14 04:38:29.973601 cachyos-sarah kernel: [drm] Initialized nouveau 1.4.0 for 0000:0b:00.0 on minor 0
02:45rinlovesyou[d]: Jul 14 04:38:29.973755 cachyos-sarah kernel: nouveau 0000:0b:00.0: gsp: cli:0xc1d00001 obj:0x00730000 ctrl cmd:0x00731341 failed: 0x00000065
02:45rinlovesyou[d]: Jul 14 04:38:29.973917 cachyos-sarah kernel: nouveau 0000:0b:00.0: gsp: cli:0xc1d00001 obj:0x00730000 ctrl cmd:0x00731341 failed: 0x00000065
02:45rinlovesyou[d]: Jul 14 04:38:29.974070 cachyos-sarah kernel: nouveau 0000:0b:00.0: gsp: cli:0xc1d00001 obj:0x00730000 ctrl cmd:0x00731341 failed: 0x00000065
02:45rinlovesyou[d]: Jul 14 04:38:29.974227 cachyos-sarah kernel: nouveau 0000:0b:00.0: gsp: cli:0xc1d00001 obj:0x00730000 ctrl cmd:0x00731341 failed: 0x00000065
02:45rinlovesyou[d]: Jul 14 04:38:29.974388 cachyos-sarah kernel: nouveau 0000:0b:00.0: gsp: cli:0xc1d00001 obj:0x00730000 ctrl cmd:0x00731341 failed: 0x00000065
02:45rinlovesyou[d]: Jul 14 04:38:29.974541 cachyos-sarah kernel: nouveau 0000:0b:00.0: gsp: cli:0xc1d00001 obj:0x00730000 ctrl cmd:0x00731341 failed: 0x00000065
02:45rinlovesyou[d]: Jul 14 04:38:29.974689 cachyos-sarah kernel: nouveau 0000:0b:00.0: gsp: cli:0xc1d00001 obj:0x00730000 ctrl cmd:0x00731341 failed: 0x00000065
02:45rinlovesyou[d]: so this is where it starts
02:46rinlovesyou[d]: that just keeps going for a long time, in the middle i also find
02:46rinlovesyou[d]: `Jul 14 04:38:29.998686 cachyos-sarah kernel: nouveau 0000:0b:00.0: drm: DDC responded, but no EDID for DP-1`
02:49airlied[d]: NV_ERR_TIMEOUT
02:49rinlovesyou[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1394148711302561892/previous.log?ex=6875c1aa&is=6874702a&hm=82edb4356430a920959bd4c962f25093de68e702c8fb66a2e41fe229afb73db1&
02:49rinlovesyou[d]: here's the full thing for completeness sake
02:52mhenning[d]: looks like possibly a kernel issue to me. Does downgrading the kernel help?
02:53rinlovesyou[d]: Well this was happening on 6.15.3 and i think i ran into this even before
02:56airlied[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1394150520599347210/message.txt?ex=6875c35a&is=687471da&hm=d592ff065045158aaa8da5caccc2ab7d1dca2c38ce835e84841db3c3f03cc7de&
02:56airlied[d]: that might fix it
02:58rinlovesyou[d]: Kernel patching, scary
02:59rinlovesyou[d]: Will give it a try
03:20skeggsb9778[d]: mhenning[d]: just added my R-b in case it helps!
03:23gfxstrand[d]: rinlovesyou[d]: You may not have the latest firmware. I don't think modern kernels are properly falling back to 535 when 570 is missing.
03:25gfxstrand[d]: I haven't filed a bug mostly because it's a pain to A/B test but I'm pretty sure I've seen that pretty recently. If you don't pick up the new firmware in your initram, for instance, you may not get a GPU.
03:26gfxstrand[d]: I had to fix my laptop last week because it got the linux-firmware update but didn't update the initram so I didn't have my discrete card. I didn't notice because I have an Intel which runs my desktop and I usually test on a different machine.
03:27gfxstrand[d]: But yeah, we should hunt down and fix that. Otherwise we're gonna get a lot of bug reports.
03:28skeggsb9778[d]: ```nv-dev /lib/firmware/nvidia/ga104/gsp # rm gsp-570.144.bin
03:28skeggsb9778[d]: nv-dev /lib/firmware/nvidia/ga104/gsp # dmesg | grep "RM version"
03:28skeggsb9778[d]: [ 331.622397] nouveau 0000:03:00.0: gsp: RM version: 535.113.01```
03:28skeggsb9778[d]: it should be
03:28skeggsb9778[d]: (i modprobed from a different window)
03:29rinlovesyou[d]: gfxstrand[d]: i see, i did update my linux firmware package but i can see that only 535 is in `/lib/firmware/nvidia/tu104/gsp/`
03:31rinlovesyou[d]: oh oops there *was* an update
03:31gfxstrand[d]: Make sure you check your initram if you have one
03:35rinlovesyou[d]: yeah well it does *say* it's updating it
03:35rinlovesyou[d]: `Creating zstd-compressed initcpio image: '/boot/initramfs-linux.img'`
03:36rinlovesyou[d]: but it's still happening
03:44rinlovesyou[d]: from what i can see only 535 is actually in the initramfs
03:51skeggsb9778[d]: does /lib/firmware have 570?
04:03rinlovesyou[d]: Yes
04:03rinlovesyou[d]: rinlovesyou[d]: It's now in that folder
04:40rinlovesyou[d]: ah yeah no it's definitely there
04:40rinlovesyou[d]: i don't think the firmware was the problem
05:32airlied[d]: gfxstrand[d]: what's stopping blackwell by default?
09:06karolherbst[d]: airlied[d]: anyway, will focus on getting the ldsm stuff cleaned up next
09:09karolherbst[d]: also the coop matrix MR could use another round of review and I think I've addressed everything from the last one
10:15karolherbst[d]: airlied[d]: gfxstrand[d] also.. not sure if anybody looked into it already, but is there a plan how to support the GRP + UGRP + imm24 variants of the load instructions?
10:15karolherbst[d]: *GPR
10:55karolherbst[d]: uhhh.. that ldsm const stuff depends on that one nir algebraic opt, but that causes an infinite loop, pain
10:56karolherbst[d]: or can cause it if you wiggle it enough
11:05airlied[d]: I think there is some stuff in my hacks to get ugpr + gpr + const
11:25karolherbst[d]: yeah... but I'd rather not want to diverge how the load ops are currently implemented if there is a bigger plan to properly model it across all the ops
11:29karolherbst[d]: though maybe I should take a look at that before LDSM, though I don't think it's necessarily needed and I can just have a constant offset in ldsm like the other load ops for now
11:33karolherbst[d]: maybe I make `nak_get_io_addr_offset` smarter...
11:42karolherbst[d]: div 32 %397 = iadd %396, %306 (0x2800)
11:42karolherbst[d]: div 16x4 %398 = @cmat_load_shared_nv (%397) (num_matrices=2, matrix_layout=col_major, base=0)
11:42karolherbst[d]: perfection
11:44karolherbst[d]: I uhm.. made `nak_nir_opt_ld_shared` not needed anymore
11:46karolherbst[d]: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/bb1c922e2f6bf5800faf3fa8d1385c36492743c8
11:46karolherbst[d]: `(('ushr', ('imul', a, '#b'), '#c'), ('imul', a, ('ushr', b, c))),`
11:46karolherbst[d]: this _feels_ right, but I don't know if it's actually right...
11:46karolherbst[d]: probably something with value range
11:46karolherbst[d]: but...
11:46karolherbst[d]: I don't see how it can matter
11:48karolherbst[d]: maybe I should feed it random values and see where it breaks
12:19gfxstrand[d]: airlied[d]: 30-day conformance review period. I'll backport the enable patch.
12:21gfxstrand[d]: karolherbst[d]: I think we're gonna need new NIR intrinsics. I already added new ones in my prediction branch which take predicates. We can expand those.
12:36karolherbst[d]: nice
12:36karolherbst[d]: ldsm can make use of it, and before I add weird lowering passes or hack it up, I'll probably just wait until you have proper offset handling in place fitting nicely in or something
12:37karolherbst[d]: because pulling out a constant offset is also annoying if the expression get more complicated
12:40karolherbst[d]: mhhh
12:40karolherbst[d]: does nak have the value range analysis thing set up?
12:40karolherbst[d]: the one for system values
13:50karolherbst[d]: I'm sad how little this one helps: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36113
13:50karolherbst[d]: `Totals from 5 (0.01% of 87622) affected shaders` 🥲
13:51karolherbst[d]: but it does help my coop matrix ldsm stuff 😄
14:37ermine1716[d]: Why nova-core is located outside of drivers/gpu/drm ?
14:38tiredchiku[d]: because nova-core isn't the drm driver
14:38tiredchiku[d]: that's drm/nova
14:39tiredchiku[d]: iirc nova-core and drm/nova are like host1x and drm/tegra
14:39tiredchiku[d]: but I could be wrong
14:43karolherbst[d]: nova-core will be doing more than host1x
14:43karolherbst[d]: I think
14:43karolherbst[d]: like it's doing the entire device bring up
14:43karolherbst[d]: firmware loading etc...
14:43karolherbst[d]: enough that you can virtualize the GPU
14:44ermine1716[d]: So basically nova-core just allows to access hardware in some way, while nova allows to do stuff one would expect from a gpu driver?
14:44karolherbst[d]: and then hosts would be able to bring up the nvidia GPU without having to bring up drm
14:44karolherbst[d]: and then only guests need drm
14:44ermine1716[d]: Oic now
14:46tiredchiku[d]: fancy
14:49karolherbst[d]: I should prolly look into that membar stuff, because that alone alone gives a 2.5x speed up 🙃
14:51karolherbst[d]: I'm sure it matters almost not at all for any game but who knows
15:08karolherbst[d]: https://gist.github.com/karolherbst/d868a3d2cec398e6dd73d7f0748b44f3
15:08karolherbst[d]: need a good opt pattern for this stuff...
15:08karolherbst[d]: like part of `%354` is const
15:08karolherbst[d]: but it's... difficult to do a proper opt because of the shifts
15:09karolherbst[d]: `(('ushr', ('iadd', a, '#b'), '#c'), ('iadd', ('ushr', a, c), ('ushr', b, c))),` helps, but I'm sure it's also wrong
15:22karolherbst[d]: maybe I should use shifts for the coop matrix stuff 🙃
15:22karolherbst[d]: *shouldn't use
15:22karolherbst[d]: or mark them as not overflowing..
15:25karolherbst[d]: or I just if it's in bound...
15:34karolherbst[d]: uhhh.. those are from the subgroup invoc ID lowering..
16:54ristovski[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1394361314557820969/image.png?ex=687687ab&is=6875362b&hm=d807fc3d345467b0dc3e292994fd32a7b452c0969d09549f921ba209c264ca20&
16:54ristovski[d]: is this just nvk having less driver overhead for this specific workload?
16:56mhenning[d]: Assuming that the benchmark is compute bound (if it's well-written it probably should be) I'd guess that we optimize the shader better, but it's hard to know without actually looking at the assembly.
16:59TheHypervisor[m]: <ristovski[d]> "https://cdn.discordapp.com/..." <- Possibly just an edge case between these two completely different drivers?
17:56karolherbst[d]: wait a second...
17:56karolherbst[d]: I have a great optimization we should add 😄
18:10magic_rb[d]: Did phoronix kill rss? I dont get posts anymore
18:21snowycoder[d]: karolherbst[d]: What 😮?
18:24karolherbst[d]: value range
18:24karolherbst[d]: but.. that's going to be annoying
18:25asuasuasu[d]: magic_rb[d]: works on my machine (tm)
18:28mhenning[d]: karolherbst[d]: you mean like nir_range_analysis?
18:28magic_rb[d]: asuasuasu[d]: Grr
18:28karolherbst[d]: mhenning[d]: yeah
18:28karolherbst[d]: need infos on sysvals
18:28karolherbst[d]: like thread id
18:28karolherbst[d]: so optimizations around shifts can know when something would overflow or not
18:29karolherbst[d]: or wrap or whatever
18:29karolherbst[d]: mhenning[d]: like to extract the constant part of https://gist.github.com/karolherbst/d868a3d2cec398e6dd73d7f0748b44f3
18:29karolherbst[d]: and fold it into the IO offset
18:30karolherbst[d]: but without know what %320 is, an optimization could change the result
18:31karolherbst[d]: like instead of 0x3800 the offset could be 0x3820
18:31karolherbst[d]: and the first add disappears
18:31karolherbst[d]: which also allows for CSE to kick in more often in the shader I'm looking at
18:54gfxstrand[d]: !36119 scares me...
18:56mhenning[d]: are you worried that it might regress other drivers?
18:57gfxstrand[d]: No. I'm pretty sure it'll only really affect nouvau
18:57gfxstrand[d]: I just don't like the whole "just fail and trust Zink to catch us" plan
18:57gfxstrand[d]: But it's working for X11, so... <a:shrug_anim:1096500513106841673>
18:57gfxstrand[d]: It's just the absolute worst way to fix that bug. 😂
18:58mhenning[d]: yeah, it's a bit of a hack but I'm struggling to think of any better options
18:59gfxstrand[d]: Yeah, I got nothing
18:59gfxstrand[d]: I just hate to fix a bug by sticking in another bug. 😂
18:59mhenning[d]: Yeah
19:01gfxstrand[d]: I'm waiting to hear from Francisco before I tell Eric ot merge.
19:04mhenning[d]: That sounds fine as long as we don't cut it too close to the release on wednesday
19:12gfxstrand[d]: They've been pretty responsive so I expect to hear from them by tomorrow
19:27gfxstrand[d]: Maybe tomorrow I can work on something more fun. 😂
19:50karolherbst[d]: could review my super awesome opt that's not doing much at all, but it's still great 😛
20:01karolherbst[d]: ohhh I overlooked an iand... this makes things a bit easier
20:02karolherbst[d]: https://gist.githubusercontent.com/karolherbst/3b946b6371e60595de7a0f765498bc90/raw/5db9b1033fdcb172e8f97b901d09732362162d6e/gistfile1.txt
20:02karolherbst[d]: anyway..
20:02karolherbst[d]: that needs value range analysis to really do something about it
20:03karolherbst[d]: also the ushr+ishl+ushr part
20:12airlied[d]: karolherbst[d]: I added new intrinsics because I dislike throwing perfectly good information away in the frontend and trying to reinvent it in the backend later
20:15karolherbst[d]: what information?
20:15airlied: with the ugpr/gpr/offset combo, we often lost the offset into the gpr
20:15karolherbst[d]: the offset stuff or something else?
20:15karolherbst[d]: right...
20:15airlied[d]: even if offset was uniform
20:15karolherbst[d]: I mean I'm happy if we have a proper solution, I'd just avoid doing something special for ldsm until that happens
20:16karolherbst[d]: or already do something like the solution we want to go with
20:16karolherbst[d]: but the hard part isn't the final offset
20:16karolherbst[d]: it's the reshuffling of prior stuff which still needs opts
20:16airlied[d]: there are a few probably illegal algebraic opts in my hacks, and I think I had to drop lea
20:17karolherbst[d]: yeah
20:17karolherbst[d]: that's what I'm currently trying to figure out
20:17karolherbst[d]: and moving the constant to the end isn't the hard part
20:17karolherbst[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36113
20:17karolherbst[d]: even helps other shaders
20:17airlied[d]: well stopping the convergent offset and divergent offset from getting mixed up was where I had the least fun
20:18karolherbst[d]: mhhh
20:18karolherbst[d]: right...
20:18karolherbst[d]: that's for global stuff
20:18karolherbst[d]: but for that I'd wait until Faith pushes that part
20:18karolherbst[d]: sounds like she already got an idea
20:19karolherbst[d]: or well some new intrinsic to add anyway
20:19karolherbst[d]: for predication
20:21karolherbst[d]: anyway.. so far I only focus on the actual constant offset
20:21karolherbst[d]: for ldsm that is
20:21karolherbst[d]: I think it's best to have three sources, GPR, UGPR, constant to model it.. Not sure I'm a fan of the Src approach for the offset? dunno
20:23karolherbst[d]: anyway no idea really.. I'm more busy figuring out nir opt passes that are actually legal and don't cause random issues
20:23karolherbst[d]: (for extracting the const offset properly and reliably)
22:29karolherbst[d]: ohh tomorrow is branch point 🙃
22:30karolherbst[d]: guess uhm... if we want to land coop matrix stuff...
22:32karolherbst[d]: well.. Wednesday
22:35gfxstrand[d]: I'll try to look tomorrow
22:35gfxstrand[d]: I'm trying to dig myself out of EGL
22:35gfxstrand[d]: But of course I've found more things to hack on. 🙃
22:37karolherbst[d]: does it have to land before the branchpoint?
22:38karolherbst[d]: but yeah.. I mean if coop matrix stuff _has_ to wait until 25.3 so be it, but I'd at least like to land the basic support and fix stuff later
22:38karolherbst[d]: passes the CTS anyway
22:41sonicadvance1[d]: Are users clamouring for coop matrix in nvk today?
22:46karolherbst[d]: I have no idea
22:50karolherbst[d]: getting a proper feeling for LDSM will be rough, but I feel like we can use it way more often than what Dave came up with...
22:50karolherbst[d]: just need to get my head around it properly
22:51karolherbst[d]: like we can use it also for int8 matrices
22:53karolherbst[d]: anyway...
22:53karolherbst[d]: that's for later this week
22:54sonicadvance1[d]: LDSM is pretty cool, I bet it's spicy to utilize fully 😄
22:59airlied[d]: Having a decent baseline for llama.cpp out of the box on all 3 vendors is useful