04:14fdobridge_: <redsheep> With 26615 now merged I decided to retest talos, but I don't see any more fps. More interestingly I can test zink now, turns out having my amd igpu enabled was an issue even though I was setting the icd path and nothing is plugged into those ports on the motherboard. Heaven is getting 139 fps zink vs 106 without, which is pretty solid.
04:56fdobridge_: <airlied> @gfxstrand care to see if https://gitlab.freedesktop.org/nouvelles/kernel/-/commits/nouveau-gsp-panel-fix-rework helps if you remove disp=0?
04:56fdobridge_: <gfxstrand> Sure. I'll try tomorrow.
04:57airlied: Lyude: I rewrote one of your patches in that branch above, (and reworked a second)
04:57airlied: I think I got the allow ctrl messages to report errors and info right
04:58fdobridge_: <gfxstrand> Mind dropping that in a comment on the MR. That way it's in the permanent record.
04:58airlied: at least my panel functions now with those
05:37fdobridge_: <redsheep> I did some more sanity checks of my Talos numbers to comment, and as I am playing with it more I am puzzled by which settings have a huge impact
05:39fdobridge_: <redsheep> If the reason the fps is lower than the blob right now is mainly due to issues making it hard to saturate a wide gpu wouldn't settings that rely on lots of width have very little impact? MSAA still cuts my frames right in half
05:56fdobridge_: <gfxstrand> 🤷🏻♀️ I'm still not convinced we're turning on render compression properly.
05:59fdobridge_: <redsheep> Ah. That would explain quite a bit, particularly some kind of wild results in The Witness where I need to turn it down to 720p before it is smooth
06:00fdobridge_: <gfxstrand> There's probably a lot of problems, not just one big thing. I'm sure there's more low-hanging fruit but now begins the big bottleneck hunt.
06:02fdobridge_: <redsheep> It would be good to get things wired up so we can see the frequencies and utilization info, I feel pretty blind without it and I don't feel like wiring an oscilloscope to my card
06:03fdobridge_: <gfxstrand> Zcull, tile rendering, render compression, better texture handling, shader scheduling, async DMA, getting rid of stalls in general...
06:04fdobridge_: <gfxstrand> Yeah, IDK what we have available there. I'd add prop kernel driver support if it would get me perf counters (I doubt it). 😅
06:04fdobridge_: <gfxstrand> RenderDoc will timestamp everything which is at least something. You can look for expensive sections if nothing else.
06:06fdobridge_: <redsheep> Tile based rasterization isn't on yet? That one is pretty huge. Is it also possible there's something that needs to be done to make better use of the huge cache on Ada or should that be automatic?
06:11fdobridge_: <gfxstrand> Yeah, that's not on yet. It's not automatic and it's not something nouveau ever configured so we don't really know how it works. I doubt it's hard, just uncharted territory.
06:12fdobridge_: <gfxstrand> So far I've mostly been focused on correctness so I've been going for the paths we know and trust. But we're getting really close to done with all the major features and Vulkan versions so perf is going to be the main focus very soon.
06:14fdobridge_: <gfxstrand> I need to fix the dep tracker and get memory model working. Then, in January I'm going to rework all the pipeline stuff and get us all the shader/pipeline toys. And... That's about it for the core driver. At that point we'll be solid, even if we don't have quite everything.
06:18fdobridge_: <redsheep> That sounds great, thank you for all the hard work! Ping me if you ever want tests ran on ad102, I'm not much of a developer but I love benchmarking and I would like to help tease out bottlenecks as that becomes the focus.
06:28fdobridge_: <airlied> @redsheep I expect ZCULL is a lot of it
06:29fdobridge_: <airlied> at least on other GPUs, Z compression stuff always makes a big difference
06:30fdobridge_: <airlied> I think on modern NVIDIA ZCULL might not even be that insane to configure
06:34fdobridge_: <enigma9o7> ```
06:34fdobridge_: <enigma9o7> Dec 14 22:28:04 VPCF115FM kernel: nouveau 0000:01:00.0: FSBroker15504[14833]: failed to idle channel 16 [FSBroker15504[14833]]
06:34fdobridge_: <enigma9o7> ```
06:48fdobridge_: <gfxstrand> Yeah, could be. I'd guess it's in the 5-20% category, depending on the app.
06:49fdobridge_: <gfxstrand> Assuming it does roughly what Intel and AMD's HiZ does.
06:50fdobridge_: <gfxstrand> Yeah, could be. I'd guess it's in the 5-15% category, depending on the app. Some may be higher if they do lots of stupid depth. (edited)
06:50fdobridge_: <gfxstrand> Yeah, could be. I'd guess it's in the 5-15% category, depending on the app. Some may be higher if they do lots of overdraw. (edited)
06:58fdobridge_: <!DodoNVK (she) 🇱🇹> How about vkd3d support?
07:00fdobridge_: <airlied> @gfxstrand .EXT_depth_range_unrestricted = info->cls_eng3d >= VOLTA_A might be worth dropping into your next CTS run, it seems to pass the tests here
07:01fdobridge_: <airlied> vkd3d support isn't a monolithic thing
07:07fdobridge_: <tom3026> im not much of a developer too, but im up for testing anything on this 3060 laptop gpu :p
07:16fdobridge_: <!DodoNVK (she) 🇱🇹> I mean implementing the advanced shader/sparse bits (minus RT) that vkd3d requires
07:20fdobridge_: <!DodoNVK (she) 🇱🇹> Hopefully we can get something like FH4 working by the end of next year at least
07:32fdobridge_: <riesi> not sure if this is up2date But probably helpful to see what is needed.
07:32fdobridge_: <riesi> https://github.com/HansKristian-Work/vkd3d-proton/blob/master/VP_D3D12_VKD3D_PROTON_profile.json
07:36fdobridge_: <!DodoNVK (she) 🇱🇹> I based my tracker issue on this profile I think
12:48f_: Hi all, does anyone know if GF106 supports 1440p @ 165 Hz? Sorry for the stupid question, but I wasn't able to find out just by searching
12:48f_: (Quadro 2000M)
12:51f_: (am still looking that up)
15:29karolherbst: f_: good question.. which connector? DP or HDMI?
15:30f_: Laptop has a DP connector.
15:30karolherbst: let's see...
15:30karolherbst: needs HBR2
15:30f_: I can go up to 3440x1440 but trying to use 165 Hz ends up outputing not quite 60.
15:30karolherbst: which is DP2
15:30karolherbst: ehh
15:30karolherbst: DP1.2
15:31karolherbst: f_: is this a HDR display?
15:31karolherbst: and what kernel are you on?
15:31karolherbst: I recently fixed something in this regard
15:31f_: (1) it can use HDR and (2) recently updated to 6.6.4
15:32karolherbst: should be fine with 6.6.4...
15:32karolherbst: 4k@60 is fine you say?
15:32f_: The monitor isn't quite 4k, it's 3440x1440@165
15:33karolherbst: ohh
15:33karolherbst: I see..
15:33f_: But 3440x1440@60 is fine, anything above results in something that's not even quite 60
15:33f_: (more 45ish..)
15:33karolherbst: I think fermi is stuck at DP1.1
15:33karolherbst: and 1440@165 needs more bandwidth than 3440@60
15:34karolherbst: f_: https://en.wikipedia.org/wiki/DisplayPort#Refresh_frequency_limits_for_common_resolutions
15:34karolherbst: does the HBR column match what you can use?
15:34karolherbst: (the next table might be better)
15:34f_: I can use 1920x1080@165 BTW, forgot to mention that.
15:34karolherbst: yeah...
15:34karolherbst: then you can only use DP1.1 it seems
15:35f_: Hm?
15:35karolherbst: and there is no bug on the nouveau side unless you can use those modes with the old nvidia driver supporting femri
15:35f_: Seems like I can use 2560x1440@100
15:35karolherbst: yeah.. so that all sounds like very close to the "HBR" bandwidth limit
15:35karolherbst: so yeah.. you are running into hardware limitations here
15:36f_: oh ok. Thanks anyway!
15:36karolherbst: np
15:36f_: I cannot really test nVidia's driver BTW, mostly because the version I need isn't packaged by arch anymore
15:36f_: (version 390)
15:37karolherbst: yeah...
15:37f_: And I don't really want to grab it from the AUR..
15:37karolherbst: there is DSC which might help, but I don't know for sure if that's wired up in nouveau, but that won't change all that much in the end
15:37karolherbst: but it's also a DP1.4+ feature I think
15:38f_: Still nice to see nouveau improving though, I had issues before with 3440x1440 but they're long gone now.
15:38karolherbst: f_: no HDMI port?
15:38f_: No, only DP and VGA.
15:38f_: And the monitor limits HDMI to 100 Hz, thanks samsung.
15:38karolherbst: yeah.. we fixed some of the bugs there
15:39karolherbst: f_: I suspect this might have fixed things: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/nouveau?h=v6.7-rc5&id=7f67aa097e875c87fba024e850cf405342300059
15:40f_: Speaking of bugs, in the laptop's screen I can see a distorted line appearing on the top-right of the screen every few seconds, and it sometimes takes the whole screen with it for a second.
15:40karolherbst: yeah...
15:40karolherbst: sooo
15:40karolherbst: we can't change clocks on fermi GPUs
15:40karolherbst: and those boot up with very low clocks
15:40karolherbst: and some of those issues are cause by the GPU being overloaded
15:40f_: Still high enough for daily tasks ;)
15:40karolherbst: yeah...
15:41karolherbst: it might be better with wayland or the modesetting DDX instead of nouveau, but that might even cause stuttering instead of distortions
15:41f_: I run Wayland. Switched away from X11 a while ago.
15:41karolherbst: I see
15:42karolherbst: I think some of the synchronization isn't all correct there
15:42karolherbst: anyway...
15:42f_: The line also appears on the tty
15:43f_: Anyway it's not much of a dealbreaker.
15:43f_: it's just a small line appearing at the top-right and doesn't disturb me much.
15:45f_: Thanks!
16:31fdobridge_: <gfxstrand> Yeah, that, too. I need to figure out descriptor buffer.
16:31fdobridge_: <gfxstrand> Sure. I'm about to do a new run on main now that some stuff has landed. I'll chuck that in.
16:32fdobridge_: <!DodoNVK (she) 🇱🇹> Luckily vkd3d(-proton) has a test suite so you can test if it works properly on NVK
16:34fdobridge_: <!DodoNVK (she) 🇱🇹> I got at least 5 fails when I last tested it (it does cause the previously mentioned CPU lockups fairly frequently too)
16:37fdobridge_: <!DodoNVK (she) 🇱🇹> And quite a bit of skips due to NVK/NAK not supporting certain features (like SM 6.6, ray-tracing or ROVs)
16:38fdobridge_: <!DodoNVK (she) 🇱🇹> ROVs will require ~~the Triang3l extension~~ EXT_fragment_shader_interlock 🔺
16:40HdkR: PSI woo \o/
16:53fdobridge_: <gfxstrand> Yeah, I'm sure PSI is implementable. We just need to dump the blob and do what it does.
16:59HdkR: The Switch emulator devs should have some documentation that they RE'd already since some random games used it
17:03fdobridge_: <gfxstrand> Yeah, it's probably just some magic CCtl or something
17:04fdobridge_: <gfxstrand> But it also smells like the kind of thing that is totally different on Volta+ from Maxwell so IDK if Switch will help.
17:06HdkR: The basic ideas of the implementation are very similar
17:06fdobridge_: <triang3l> It looks like some kind of a wait for an "ordering ticket" somewhat similar to pre-RDNA3 AMD, I think
17:07HdkR: Indeed
17:08fdobridge_: <triang3l> Not just one instruction unlike on Intel and RDNA 3, but probably should be just a couple of fixed sequences. Just need to order memory accesses relative to everything involved in them correctly
17:08HdkR: Yea, entering and leaving the mutex is like a few dozen instrctions each. It's quite heavy
17:09fdobridge_: <gfxstrand> Fun...
17:12fdobridge_: <triang3l> And to order the instructions themselves
17:13fdobridge_: <triang3l> On AMD Vega–RDNA2, entering the critical section was done by waiting until the value of one special ALU operand reached a certain number
17:14fdobridge_: <triang3l> So elimination of the wait loop should have been prevented
17:14fdobridge_: <triang3l> https://github.com/Ryujinx/Ryujinx/pull/2768
17:24fdobridge_: <!DodoNVK (she) 🇱🇹> One of the people who approved this PR is also one of the NVK contributors 🕵️
17:29fdobridge_: <gfxstrand> ```
17:29fdobridge_: <gfxstrand> dEQP-VK.glsl.builtin_var.fragdepth.line_list_d32_sfloat_large_depth,Fail
17:29fdobridge_: <gfxstrand> dEQP-VK.glsl.builtin_var.fragdepth.point_list_d32_sfloat_large_depth,Fail
17:29fdobridge_: <gfxstrand> dEQP-VK.glsl.builtin_var.fragdepth.triangle_list_d32_sfloat_large_depth,Fail
17:29fdobridge_: <gfxstrand> ```
17:29fdobridge_: <gfxstrand> ```
17:29fdobridge_: <gfxstrand> dEQP-VK.glsl.builtin_var.fragdepth.line_list_d32_sfloat_large_depth,Fail
17:29fdobridge_: <gfxstrand> dEQP-VK.glsl.builtin_var.fragdepth.point_list_d32_sfloat_large_depth,Fail
17:30fdobridge_: <gfxstrand> dEQP-VK.glsl.builtin_var.fragdepth.triangle_list_d32_sfloat_large_depth,Fail
17:30fdobridge_: <gfxstrand> ```
17:30fdobridge_: <gfxstrand> IDK what's wrong with any of them but we apparently have at least some work to do there. (edited)
18:01fdobridge_: <enigma9o7> https://cdn.discordapp.com/attachments/1034184951790305330/1185280576387809300/shot-2023-12-15_10-01-02.png?ex=658f098e&is=657c948e&hm=5cad5b42c4cdb957a6f7aeba5e823bd44885228b8507d33128e9ca5a968bcc8f&
18:02fdobridge_: <enigma9o7> Hello. I have an older GPU and when I use nouveau driver, it freezes using the desktop. I was previously using kernel 5.4 and would freeze up every day or two. I just built and installed new kernel 6.6 to see if problem goes away. IT's now happening more often.
18:03fdobridge_: <enigma9o7> I am not too familiar with logging so asked about this a couple days ago and was suggested to get logs of what's happening, and I *think* what I just pasted is the kinda thing you want. It also happened last night. When it happens, the mouse still moves, but desktop is unresponsive, can't even switch to tty.
18:03fdobridge_: <enigma9o7> Pressing power button doesn't shut down, have to hold it in to force poweroff.
18:04fdobridge_: <enigma9o7> ```
18:04fdobridge_: <enigma9o7> Dec 15 09:53:29 VPCF115FM kernel: nouveau 0000:01:00.0: Xorg[1760]: nv50cal_space: -16
18:04fdobridge_: <enigma9o7> Dec 15 09:53:44 VPCF115FM kernel: nouveau 0000:01:00.0: FSBroker24197[23553]: failed to idle channel 18 [FSBroker24197[23553]]
18:04fdobridge_: <enigma9o7> Dec 15 09:53:44 VPCF115FM kernel: nouveau 0000:01:00.0: Xorg[1760]: nv50cal_space: -16```
18:05fdobridge_: <enigma9o7> ```bash
18:05fdobridge_: <enigma9o7> $ inxi -G
18:05fdobridge_: <enigma9o7> Graphics:
18:05fdobridge_: <enigma9o7> Device-1: NVIDIA GT216M [GeForce GT 330M] driver: nouveau v: kernel
18:05fdobridge_: <enigma9o7> Device-2: Suyin Sony Visual Communication Camera type: USB
18:05fdobridge_: <enigma9o7> driver: uvcvideo
18:05fdobridge_: <enigma9o7> Display: x11 server: X.Org v: 1.20.8 driver: X: loaded: modesetting
18:05fdobridge_: <enigma9o7> unloaded: fbdev,vesa dri: nouveau gpu: nouveau resolution: 1: 1920x1080~60Hz
18:05fdobridge_: <enigma9o7> 2: 1920x1080~60Hz
18:05fdobridge_: <enigma9o7> API: OpenGL v: 3.3 Mesa 20.0.8 renderer: NVA5```
18:05fdobridge_: <enigma9o7> (Note if I use nvidia-340 prop driver, no freezes ever)
18:08fdobridge_: <enigma9o7> Hello. I have an older GPU and when I use nouveau driver, it freezes using the desktop. I was previously using kernel 5.4 and would freeze up every day or two. I just built and installed new kernel 6.6 to see if problem goes away. IT's now happening more often (so far havent gotten 3 hours without freeze...) (edited)
18:08fdobridge_: <enigma9o7> I am not too familiar with logging so asked about this a couple days ago and was suggested to get logs of what's happening, and I *think* what I just pasted is the kinda thing you want. When it happens, the mouse still moves, but desktop is unresponsive, can't even switch to tty. (edited)
18:08fdobridge_: <enigma9o7> Hello. I have an older GPU and when I use nouveau driver, it freezes using the desktop. I was previously using kernel 5.4 and would freeze up every day or two. I just built and installed new kernel 6.6 to see if problem goes away. IT's now happening more often - so far havent gotten 3 hours without freeze, it also happened last night a couple hours after I first installed this kernel. (edited)
18:09fdobridge_: <enigma9o7> I am not too familiar with logging so asked about this a couple days ago and was suggested to get logs of what's happening, and I *think* what I just pasted is the kinda thing you want. When it happens, the mouse still moves, but desktop is unresponsive, can't even switch to tty with CTRL-ALT-F2 etc. (edited)
18:10fdobridge_: <enigma9o7> Pressing power button doesn't shut down, have to hold it in to force poweroff (although the log happens to show it did notice my button press and try to shut down...) (edited)
18:10fdobridge_: <enigma9o7> (Note if I use nvidia-340 prop driver, no freezes ever, but bad EGL support, tty's are invisible, and I'm stuck using old OS). (edited)
18:12fdobridge_: <enigma9o7> Hello. I have an older GPU and when I use nouveau driver, it freezes my desktop while I'm doing typical stuff like web browsing, switching windows, document editing, chat clients, etc. I was previously using kernel 5.4 and would freeze up every day or two. I just built and installed new kernel 6.6 to see if problem goes away. IT's now happening more often - so far havent gotten 3 hours without freeze, it also happened last night a cou
18:13fdobridge_: <enigma9o7> Hello. I have an older GPU and when I use nouveau driver, it freezes my desktop while I'm doing typical stuff like web browsing, switching windows, document editing, chat clients, etc. I was previously using kernel 5.4 and would freeze up every day or two. I just built and installed new kernel 6.6 to see if problem goes away. It's now happening more often than it used to - so far havent gotten 3 hours without freeze, it first happened
18:14fdobridge_: <enigma9o7> Pressing power button doesn't shut down, have to hold it in to force poweroff (although the log happens to show it did notice my button press and try to shut down, next time I'll give it a full five minutes or something before hard shutdown to see). (edited)
18:15fdobridge_: <enigma9o7> If there is a forum, or if I should post bugreport, or something better than using chatroom, please direct me right place, thanks.
18:26fdobridge_: <gfxstrand> And... memory model: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26716
18:46fdobridge_: <!DodoNVK (she) 🇱🇹> I wonder what's the NVK contributor scoreboard now :triangle_nvk:
18:48fdobridge_: <esdrastarsis> Now several games will work on DXVK without us needing to activate fake memory model support 🎉
18:48fdobridge_: <!DodoNVK (she) 🇱🇹> *On NAK-only
18:50fdobridge_: <tom3026> what requirements does NAK have?
18:51fdobridge_: <tom3026> still turing only as the MR 3 months ago seems to mention?
18:53fdobridge_: <gfxstrand> Yeah. Maxwell is in progress but a ways off yet.
18:54fdobridge_: <gfxstrand> I've not done much with it in a bit. I'm trying to leave it as a sandbox for newer devs right now.
18:55fdobridge_: <tom3026> okay 🙂
18:58fdobridge_: <karolherbst🐧🦀> any errors in dmesg? could be some missing float support somewhere
18:58fdobridge_: <karolherbst🐧🦀> uhm..
18:58fdobridge_: <karolherbst🐧🦀> d32_sfloat I mean
18:58fdobridge_: <gfxstrand> Not that I saw but also it was a full CTS run.
18:58fdobridge_: <karolherbst🐧🦀> I see
18:58fdobridge_: <gfxstrand> So there's some noise
18:59fdobridge_: <karolherbst🐧🦀> oh btw.. did you try `LDC.64` again with this? maybe the unaligned stuff came from the indirect not being ready in time 🥲
19:02fdobridge_: <gfxstrand> Yeah, LDC.64 is working fine now that I fixed my scoreboarding issues.
19:02fdobridge_: <karolherbst🐧🦀> btw, I'll double check the min latencies on volta/turing/ampete (as those are all different docs), so it's not causing hard to debug bugs 🙂
19:02fdobridge_: <gfxstrand> I reenabled it as part of the CBufs MR
19:02fdobridge_: <karolherbst🐧🦀> already found one difference on ampere
19:06fdobridge_: <karolherbst🐧🦀> it makes the code even simpler 🙂
19:07fdobridge_: <karolherbst🐧🦀> done
19:10fdobridge_: <karolherbst🐧🦀> though might not matter once `DEFER_BLOCKING` is used :ferrisUpsideDown:
19:24fdobridge_: <karolherbst🐧🦀> btw..if somebody wants to play with micro optimizations.. instead of emitting `MOV`s after each other, alternate them with `IMAD` preferably starting with `MOV` if it comes right after another alu instruction. However nvidia seems to prefer `IMAD` aggressively instead of `MOV`
19:24fdobridge_: <karolherbst🐧🦀> *over
19:30fdobridge_: <!DodoNVK (she) 🇱🇹> ACO developers definitely love them
19:31fdobridge_: <karolherbst🐧🦀> mhhh.. adding `LEA` might also be a big help: `lea(a, b, c) == (a << c) + b`, supports`.HI`/`.LO`/`.X` and stuff
19:31fdobridge_: <karolherbst🐧🦀> c is 5 bit immediate
19:32fdobridge_: <!DodoNVK (she) 🇱🇹> How common is `lea` on GPUs?
19:34fdobridge_: <karolherbst🐧🦀> `lea.hi(a, b, c, d) == (((c << 32) + a) << d) >> 32 + b`
19:34fdobridge_: <karolherbst🐧🦀> useful in address calculation
19:35fdobridge_: <!DodoNVK (she) 🇱🇹> I think of x86 when I see `lea`
19:35fdobridge_: <karolherbst🐧🦀> the `((c << 32) + a) ` part is essentialy using a as low and c as high bits
19:36fdobridge_: <karolherbst🐧🦀> `c` can only be a reg or a 32 bit imm
19:36fdobridge_: <karolherbst🐧🦀> it has some uses
19:36fdobridge_: <karolherbst🐧🦀> we've added it as `ISCADD` in codegen, which is a special form of `LEA`
19:38fdobridge_: <karolherbst🐧🦀> `ISCADD == LEA.LO(a, b, RZ, d, PT)` (the last input is the carry consumed with `.X`
19:40fdobridge_: <karolherbst🐧🦀> I don't think we have lea in nir, so that would be interesting 🙂
20:12fdobridge_: <mhenning> @gfxstrand When you have a chance, could you take a look at https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26473 ? I need to rebase and re-test that one, but it would nice to have a first round of review
20:14fdobridge_: <gfxstrand> Nvidia doesn't use that so I'm a bit scared to.
20:16fdobridge_: <gfxstrand> Yeah, adding some NIR ops for that and an algebraic rule probably wouldn't be a bad idea.
20:17fdobridge_: <gfxstrand> Yeah, I'll try to take a look. Sorry. I've been focused on trying to sort out the memory model stuff and other 1.3 blockers. I'm trying to get around to more review this week.
20:52fdobridge_: <karolherbst🐧🦀> probably best to instead of `.has_lea` add a `lea_shift_bits` option and deal with that somehow.. not sure if we can check the amount of bits of an immediate in algebraic at this time, but on nvidia we can simply set `.lea_shift_bits = 5` and then it's all magic. Wondering if other hardware has that opt and if they have other restrictions
20:54fdobridge_: <karolherbst🐧🦀> looks like it's 5 bits since Fermi
20:54fdobridge_: <karolherbst🐧🦀> and prev gens don't have it anyway
20:55fdobridge_: <karolherbst🐧🦀> mhhhhhh
20:55fdobridge_: <karolherbst🐧🦀> potentially we could also just extract 5 bits _if_ the shift would drop from a 64 bit to 32 bit operation...
20:55fdobridge_: <karolherbst🐧🦀> but not sure if that gives us anything
20:56fdobridge_: <karolherbst🐧🦀> let's see if nvidia does it 😄
20:58fdobridge_: <karolherbst🐧🦀> the problem I have with this op is, that it kinda looks useless compared to `IMAD`...
20:59fdobridge_: <karolherbst🐧🦀> ohhh
20:59fdobridge_: <karolherbst🐧🦀> they use it on 64 bit values...
21:00fdobridge_: <karolherbst🐧🦀> mhhh
21:01fdobridge_: <karolherbst🐧🦀> yeah nevermind, it's really only useful for 64 bit lowering
21:02fdobridge_: <karolherbst🐧🦀> otherwise they just use `IMAD`
21:03fdobridge_: <gfxstrand> We can. Worst case you have to add a helper to `nir_search_helpers.h`.
21:04fdobridge_: <karolherbst🐧🦀> nvidia never seems to use it for 32 bit math
21:04fdobridge_: <gfxstrand> Which? LEA?
21:04fdobridge_: <karolherbst🐧🦀> yeah
21:04fdobridge_: <karolherbst🐧🦀> they just use `IMAD` instead
21:04fdobridge_: <karolherbst🐧🦀> but for pointers they heavily use it
21:07fdobridge_: <karolherbst🐧🦀> regardless of the shift they use 2 `LEA` if it's below 0x20 and `IADD3` + `IMAD` if the shift is above 0x1f
21:07fdobridge_: <karolherbst🐧🦀> *regardless of the shift it's 2 instructions
21:07fdobridge_: <karolherbst🐧🦀> ohh interesting
21:08fdobridge_: <karolherbst🐧🦀> with the later they replaced an `IMAD.MOV` with `MOV`: https://gist.github.com/karolherbst/27c3540eb7a03804ed8e204b92e4fbcf
21:09fdobridge_: <karolherbst🐧🦀> so `LEA` is useful for 64 bit `IMAD` lowering with a constant POT multiplier
21:11fdobridge_: <karolherbst🐧🦀> anyway.. still useful enough 🙂
21:14fdobridge_: <karolherbst🐧🦀> the pain part is just that we don't have `imad` in nir...
21:15HdkR: All this discussion about lea reminded me to optimize some code. So thanks :D
21:16fdobridge_: <karolherbst🐧🦀> 😄
21:17fdobridge_: <karolherbst🐧🦀> anyway.. I think atm it's better as a nak specific opt pass
21:17fdobridge_: <karolherbst🐧🦀> could be done in nir via opt_algebraic, but then it needs to check if it has lea but also if imul64 and iadd64 or shl64 and iadd64 needs lowering or something funky...
21:20fdobridge_: <karolherbst🐧🦀> on a second thought.. it can't be nak lowering as int64 lowering happens before NAK 🥲
21:20fdobridge_: <karolherbst🐧🦀> okay.. adding `LEA` support sounds like non trivial amount of work
21:22Lyude: airlied: oh nice!
21:23Lyude: airlied: what did you end up changing with the rewrite?
21:24Lyude: oh you made it way less complicated nice
21:24Lyude: I will see if this works on my panel then!
21:26airlied: Lyude: yeah the keep msgs around stuff is a simpler, that code design is quite hard to work through
21:26airlied: i didnt push the port of your other timeout patch i dont think
21:27Lyude: yeah - I wanted to try to have something like that at first but it was quite difficult to figure out if we could get away with not freeing things like that
21:28Lyude: i can probably port the rest :)
21:28fdobridge_: <gfxstrand> Not really. Add a NIR op, wire it up, and add a bit more to nak_nir_algebraic.py. Not really hard.
21:28Lyude: plus I want to make sure I add the stuff for the EDP panel info as well, since that should handle some quirks that other displays might need
21:29fdobridge_: <gfxstrand> But yeah, lots of moving parts. None of them are hard, there's just a lot of pieces.
21:30airlied: Lyude: https://paste.centos.org/view/9fcc3177
21:30airlied: was the second patch
21:31Lyude: gotcha, will add it to my branch
21:31Lyude: and thank you a ton for the help :)
21:36fdobridge_: <karolherbst🐧🦀> the problem is, that it's mostly useful for `imad64` lowering
21:36fdobridge_: <karolherbst🐧🦀> and nothing else
21:37fdobridge_: <karolherbst🐧🦀> and because we don't have that, we kinda have to either make int64_lowering handle that, or make sure opt_algebraic does by searching for a couple of patterns
21:40fdobridge_: <karolherbst🐧🦀> we could add an `imad` operation to nir and then use `lea` for `imad64` lowering inside `lower_int64` or we trick `lower_int64` to not touch certain `iadd+imul/ishl` patterns where `opt_algebraic` does
21:40fdobridge_: <karolherbst🐧🦀> something something
21:42fdobridge_: <gfxstrand> Or just run it before int64 lowering...
21:43fdobridge_: <karolherbst🐧🦀> yeah... I mean.. a quick solution here is probably easy with a nak nir pass, just doing it in core nir is more involved
21:43fdobridge_: <karolherbst🐧🦀> but.. might never have to do it
22:17fdobridge_: <marysaka> They do for mesh ISBE calculations sometime
22:17fdobridge_: <karolherbst🐧🦀> but for 32 bit math?
22:17fdobridge_: <marysaka> yeah
22:17fdobridge_: <marysaka> I saw that happen with per primitive stuffs
22:17fdobridge_: <karolherbst🐧🦀> mhhh.. odd
22:18fdobridge_: <karolherbst🐧🦀> I only saw them using it on 64 bit operations here
22:21fdobridge_: <gfxstrand> Bah! The PTX docs don't have LEA
22:22fdobridge_: <karolherbst🐧🦀> what do you want to know?
22:26fdobridge_: <gfxstrand> More detailed semantics, mostly. Often the PTX docs are quite detailed.
22:26fdobridge_: <karolherbst🐧🦀> I see
22:26fdobridge_: <karolherbst🐧🦀> it's kinda like imad
22:26fdobridge_: <karolherbst🐧🦀> in regard sto lo/hi and .x
22:27fdobridge_: <gfxstrand> "It's kinda like X with regard to Y and Z except it's different with regards to C" is kinda frustrating, honestly. I want the pseudocode. I want to know exactly what it does.
22:27fdobridge_: <gfxstrand> Otherwise, I'm in RE land.
22:28fdobridge_: <gfxstrand> RE land with a bit of guidance, yes, but still RE land.
22:28fdobridge_: <gfxstrand> I really need to re-up my unit test framework and implement everything in Rust with tests.
22:29fdobridge_: <karolherbst🐧🦀> yeah, fair
22:30Lyude: btw airlied, I assume not updating the function signatures for stuff like nvkm_gsp_rm_ctrl_push() from void* to void** was a mistake? :p
22:30Lyude: oh wait - that might make the C compiler complain less so maybe not
22:30fdobridge_: <gfxstrand> Of course, it would be nice if the PTX docs were correct.... 😂 (I'm looking at you `mufu.rcp64h`... 👿 )
22:30fdobridge_: <karolherbst🐧🦀> :d
22:30fdobridge_: <karolherbst🐧🦀> :d
22:30fdobridge_: <karolherbst🐧🦀> ..
22:30fdobridge_: <karolherbst🐧🦀> 😄
22:31fdobridge_: <karolherbst🐧🦀> yeah, but I kinda shared how LEA semantically worked, but yeah.. the details with the carry might be annoying to figure out, though there doesn't seem to be more to it then to pass it from the `.LO` one into the `.HI` one
22:32fdobridge_: <gfxstrand> Yeah
22:33fdobridge_: <gfxstrand> It's probably pretty straightforward.
22:33fdobridge_: <karolherbst🐧🦀> how long is imad64 lowering normally when b is a constant pot value?
22:37fdobridge_: <karolherbst🐧🦀> mhhhh
23:04Lyude: shame, seems it still doesn't light up the panel on my machine properly - guess I'll do a little bit more digging. not too much more though, I want to get started on rust stuff soon
23:42fdobridge_: <enigma9o7> https://cdn.discordapp.com/attachments/1034184951790305330/1185366352379269151/shot-2023-12-15_15-41-57.png?ex=658f5971&is=657ce471&hm=a4d7e67d3751511791c85409dc29ba94c3f21eeae19942be05af43a07295ee0a&
23:43fdobridge_: <enigma9o7> Same thing as last time. This time I did wait and it does eventually shut itself down. But in any case....
23:44fdobridge_: <enigma9o7> (Issue explained previosly https://discord.com/channels/1033216351990456371/1034184951790305330/1185280803299655740 )
23:46fdobridge_: <enigma9o7> This is happening way more often than on the older kernel, and it seems to be a different error than I was getting with 5.4 (nouveau 0000:01:00.0 DRM: GPU lockup - switching to software fbcon). So unless other advice given, next time I freeze up I'm gunna boot back to that, since it only happens every day or two t hen, with this new kernel its happening every few hours.
23:52fdobridge_: <enigma9o7> This is happening way more often than on the older kernel, and it seems to be a different error than I was getting with 5.4 (nouveau 0000:01:00.0 DRM: GPU lockup - switching to software fbcon). So unless other advice given about what kinda logs would be useful, next time I freeze up I'm gunna boot back to that, since it only happens every day or two t hen, with this new kernel its happening every few hours. (edited)
23:53fdobridge_: <enigma9o7> Same thing as last time. This time I did wait and it does eventually shut itself down (while screen doesn't change). But in any case.... (edited)