00:19 fdobridge: <g​fxstrand> We should be putting it a lot more places. At the very least, anything variable-latency should yield.
00:19 fdobridge: <g​fxstrand> I think yield is almost free.
00:19 fdobridge: <g​fxstrand> Are you sure about that? If not then how the hell are you supposed to scoreboard them when you can't set wt flags?
00:19 fdobridge: <g​fxstrand> The HW gets REALLY picky about deps on those instructions.
00:19 fdobridge: <g​fxstrand> the disassembler, too
00:20 fdobridge: <g​fxstrand> I'm going to play with it more tomorrow.
00:24 fdobridge: <g​fxstrand> ```
00:24 fdobridge: <g​fxstrand> Test case 'dEQP-VK.graphicsfuzz.cov-dfdx-dfdy-after-nested-loops'..
00:24 fdobridge: <g​fxstrand> Pass (Pass)
00:24 fdobridge: <g​fxstrand> ```
00:47 fdobridge: <k​arolherbst🐧🦀> yeah, afaik they removed everything from the hardware and it's up to the compiler to get it right otherwise it's undefined behavior. The only thing the hardware does is wait/signal barriers
00:47 fdobridge: <k​arolherbst🐧🦀> though the ISA _does_ have requirements on some of those things
00:48 fdobridge: <k​arolherbst🐧🦀> but they are documented
00:48 fdobridge: <k​arolherbst🐧🦀> (except .yld)
01:12 fdobridge: <g​fxstrand> What do the docs say about BMOV, BSSY, and BSYNC?
01:20 fdobridge: <k​arolherbst🐧🦀> BMOV: barriers only when using gprs (and only valid for the gpr), no barriers for the others. BMOV (with register source and writing MACTIVE/MKILL) is an instruction which stops issueing new instructions in the warp until it's resolved (sources read and instruction executed). No scoreboarding required. This kind of `BMOV` also implies `.yld` (or also called drain)
01:22 fdobridge: <g​fxstrand> Okay, that seems consistent with my model at least.
01:23 fdobridge: <g​fxstrand> Though I may be messing it up somewhere.
01:24 fdobridge: <g​fxstrand> "barriers only when using gprs (and only valid for the gpr), no barriers for the others." <- How is that not barrier registers being internally scoreboarded?
01:24 fdobridge: <k​arolherbst🐧🦀> ehh wait.. drain is this "wait on everything" thing
01:25 fdobridge: <k​arolherbst🐧🦀> the read/write barriers on registers only make sense if there is no fixed latency
01:26 fdobridge: <k​arolherbst🐧🦀> you don't use them if the dep is fixed latency
01:27 fdobridge: <g​fxstrand> For most things you can use them, you just don't have to
01:27 fdobridge: <g​fxstrand> At least on Turing
01:27 fdobridge: <k​arolherbst🐧🦀> you shouldn't if you don't ahve to
01:27 fdobridge: <k​arolherbst🐧🦀> they add a min latency if you use them
01:28 fdobridge: <k​arolherbst🐧🦀> that's why you have the stall count so instructions wait the proper time
01:29 fdobridge: <k​arolherbst🐧🦀> (as the hardware doesn't do it)
01:33 fdobridge: <k​arolherbst🐧🦀> @gfxstrand btw, there are instructions which are fixed and variable depending on the hardware, but the ISA allows you to treat them as both... for binary compatibility reasons
01:34 fdobridge: <k​arolherbst🐧🦀> (And they ignore the barriers set on some hardware)
01:34 fdobridge: <k​arolherbst🐧🦀> (this is usually the case for FP64)
01:36 fdobridge: <k​arolherbst🐧🦀> (also for FP16)
02:05 fdobridge: <g​fxstrand> Oh fun...
03:17 fdobridge: <k​arolherbst🐧🦀> yeah.. I think it's one of those "quadro vs geforce" moments
03:17 fdobridge: <k​arolherbst🐧🦀> and on geforce those are variable and on quadro they are fixed
03:18 fdobridge: <k​arolherbst🐧🦀> and because the SM version can't really tell, the emitted binary needs to run on both
04:10 fdobridge: <g​fxstrand> So does this mean it's variable-latency for the GPR part?
04:12 fdobridge: <g​fxstrand> I bet barrier regs have a higher latency like predicates do or something like that...
04:14 fdobridge: <k​arolherbst🐧🦀> no, but the dep might be
04:14 fdobridge: <k​arolherbst🐧🦀> so it needs to be able to wait
04:33 fdobridge: <g​fxstrand> I'm kinda wondering if I should have multiple BMov instructions in the IR...
05:54 fdobridge: <g​fxstrand> Well, looks like my interpretation of all this is mostly okay now. Doing one more CTS run and I'll merge tomorrow.
05:54 fdobridge: <g​fxstrand> That's the last big hill to climb in the compiler, I think.
05:54 fdobridge: <g​fxstrand> There's more optimizations to do, of course, but I think that's the last really hard compiler theory problem.
06:19 fdobridge: <g​fxstrand> The other issue I need to figure out is that you have to wait some number of cycles after a barrier before doing shared memory ops if you want coherency. IDK why. Annoyingly, you can't set a delay on the barrier, so you have to do it with a nop+delay. At least that's what it looks like reading the output of the blob compiler.
06:20 fdobridge: <g​fxstrand> There's a handful of memory model tests failing because of this.
11:58 fdobridge: <k​arolherbst🐧🦀> never seen that in the wild.. I can check if I find something about that
14:46 Ilgaz: I filed opensuse tw bug about display corruption on gnome-terminal-gtk3 and kde system monitor which doesn't happen on fedora live usb
14:46 Ilgaz: https://bugzilla.opensuse.org/show_bug.cgi?id=1217748
14:51 karolherbst: Ilgaz: does that happen on a fresh boot?
14:51 karolherbst: there is this bug where updating fonts can mess things up or something
14:51 RSpliet: those macbooks are still running... that's crazy! :D
14:52 karolherbst: but yeah.. could also be a mesa bug
14:52 karolherbst: what version of mesa are you running?
15:10 Ilgaz: karolherbst: it also happens on gnome terminal showing very crazy glitches especially when you move the window
15:11 karolherbst: I see
15:11 Ilgaz: Mesa 23.2.1-1699.364.pm.1
15:11 karolherbst: do you know if it's caused by an update to mesa? Can you try older versions easily?
15:12 karolherbst: mhhh
15:12 karolherbst: but there is also this `imem: OOM: 00100000 00001000 -2` error
15:13 Ilgaz: karolherbst: I can go back to 2023/11/23 via snapper. yes about imem,b things really went out of hand when I launched kde system monitor. system slowed down
15:14 Ilgaz: I could only get dmesg and kill the app
15:20 Ilgaz: karolherbst: the glitch I see on kde system monitor looks exactly like this sddm bug I reported. They sent me here actually and I figured gnome things while trying to figure if it is a DE bug. https://bugzilla.opensuse.org/show_bug.cgi?id=1217486
15:23 Ilgaz: btw as I am a kde user I don't have gnome things on that snapshot. Do you mind if I try tomorrow as I have to leave soon?
15:30 Ilgaz: will update in about 14 hours bbl
15:30 karolherbst: sure
16:41 fdobridge: <k​arolherbst🐧🦀> the wait on `BAR` needs to be at least 6
16:42 fdobridge: <k​arolherbst🐧🦀> so nvidia _might_ use nop+delay if they want to wait less?
16:42 fdobridge: <k​arolherbst🐧🦀> if you mean a scoreboard barrier, then there is always `DEPBAR`
16:42 fdobridge: <k​arolherbst🐧🦀> (needs a wait of at least 4)
16:44 fdobridge: <k​arolherbst🐧🦀> (on ampere)
16:45 fdobridge: <k​arolherbst🐧🦀> on Turing `BAR` needs a wait of at least 5
16:45 fdobridge: <k​arolherbst🐧🦀> @gfxstrand I can give you more of those "minimal waits" on other instructions, there are a couple
16:46 fdobridge: <k​arolherbst🐧🦀> ehh `MEMBAR` is like `BAR`
16:51 fdobridge: <g​fxstrand> Right. I think the problem is that setting a delay on BAR itself doesn't work properly. So they insert a NOP and use that for the delay.
16:59 fdobridge: <g​fxstrand> Unfortunately, that means the dependency pass needs the ability to insert instructions. 😭
16:59 fdobridge: <g​fxstrand> I can get it sorted. It's just annoying.
17:13 fdobridge: <k​arolherbst🐧🦀> annoying. I don't really see directly on why that's needed, might have to take a look at the code to see if something strange is going on
17:15 fdobridge: <g​fxstrand> I'm not sure either. All I know is what I've seen from dumping memory model test shaders on the blob
19:30 fdobridge: <S​id> would it be worth enabling GSP by default on ampere as well?
19:31 fdobridge: <S​id> ada's gsp symlinks back to ampere's
19:35 fdobridge: <S​id> the proprietary driver package only ships 2 gsp bins as well
19:55 fdobridge: <a​irlied> Upstream changing the default is hard, because there is no way to ensure new fw on user systems, we probably need to provide a config option and have distros turn it on once they ship new fw
19:57 cmiller: Hello, I am running GNU Guix with Linux Libre and Xorg with nouveau drivers on a GM204 (GeForce 970) which reports in the Xorg log file the following (EE) NOUVEAU(0): Error creating GPU channel: -19" and therefore "(EE) NOUVEAU(0): Error initialising acceleration. Falling back to NoAccel"
19:58 cmiller: I also noticed the following in dmesg "[ 0.000719] gran_size: 64K chunk_size: 64K num_reg: 10 lose cover RAM: 62M" which appears to be related to the GPU, since if I remove it, those messages are gone. I Also have a 770 which producs the same output (but did not test 3D acceleration with it)
20:04 fdobridge: <!​DodoNVK (she) 🇱🇹> cmiller: So you don't have linux-firmware installed, right?
20:05 cmiller: fdobridge: Yes
20:31 fdobridge: <m​henning> I believe the GeForce 970 requires proprietary firmware to function.
20:32 fdobridge: <m​henning> The 770 should have open firmware available
20:32 karolherbst: cmiller: don't use linux-libre :)
20:32 karolherbst: but yeah, 3D accel should work on the 770 without blobs
20:43 DodoGTA: karolherbst: Does GM1xx require firmware for 3D acceleration?
20:45 cmiller: Ah thanks. How would I see this? The only reason I tried is because I looked at the feature matrix and assumed it works on libre.
20:46 karolherbst: DodoGTA: no
20:47 karolherbst: cmiller: I don't think we point it out explicitly there
20:48 DodoGTA: karolherbst: So GM2xx is special then? I remember something about high-secure firmware
20:50 cmiller: karolherbst: Would be nice for beginners like me. Wasted a bit to much time to make it work. I will try again with the 770.
20:51 karolherbst: cmiller: well.. for beginners it's better to not use linux-libre, because linux-libre isn't supported by us at all
20:53 cmiller: karolherbst: I did not now that. But why should I use nouveau then? Isn't the point being libre? If it requires firmware anyways I don't see a point.
20:54 cmiller: s/now/know
21:02 karolherbst: well.. the kernel module and userspace are still open source
21:02 karolherbst: anyway, we can't replace the firmware as they are cryptographically signed
21:08 cmiller: Ah okay. Nvidia is really hardcore to this it seems.
21:34 sneil: I'm trying out the GSP firmware on a TU106 (manjaro linux 6.7.0rc3 test kernel) and I'm getting: https://pastebin.com/wGAxaBiG
21:37 xiphmont: Hello all! So, I have a 'fix a longstanding crash' patch. I don't want to create an Issue as I have a fix for review. However, I'm not a maintainer (and don't really want to be one). As a result... I kind of fall through the documentation cracks. I'm not even sure which of several repos/branches to base the patch against for submission. Any quick pointers?
21:38 xiphmont: Also, nouveau list mail all reference GitHub, all the dics tell ne to use GitLab. And all the actual patch indexes aren't at either, so I assume an internal freedesktop repo?
21:38 xiphmont: er, s/dics/docs sorry
21:44 soreau: xiphmont: is it a mesa or kernel patch?
21:46 xiphmont: kernel
21:46 xiphmont: nouveau_gem.c and nouveau_dma.h Fairly small.
21:48 xiphmont: To be fair, it raises a few questions--- I'm not opposed to an Issue so much as I don't want it to get lost amongst the voliminous "AUIGH NOTHING WORKS" issues.
21:49 xiphmont: also, the usual 'submit the following 40 complete logs/dumps' is not relevant :-)
21:57 soreau: I'd submit it here first https://gitlab.freedesktop.org/drm/nouveau
21:59 soreau: this way you can get feedback on it and if it's not the right place, someone will say (or if not, you can at least have a link to the MR to ping about it later)
21:59 karolherbst: airlied: might want to take a look at sneils error
22:00 xiphmont: OK! So nouveau-next branch on gitlab, or is nouveau-fixes a better choice? [I did read the docs, they just left me uncertain given that none of the patches I see discussed are there except as part of merges]
22:01 soreau: I'd think nouveau-next since it's the default..
22:02 karolherbst: nouveau kernel patches are still send to the nouveau ML, but we also kinda want to accept patches on gitlab
22:18 fdobridge: <m​henning> xiphmont: I don't think nouveau-next is correct - that branch is a bit stale. I don't personally understand the kernel branches, but my guess might be to use drm-next from https://cgit.freedesktop.org/drm/drm
22:19 fdobridge: <m​henning> but even the latest 6.7 rc is fresher than nouveau-next
22:20 xiphmont: everything on the public gitlab repo looks stale (with the exception of the merges)
22:21 xiphmont: Anyway, having a URL probably is the important thing.
22:21 xiphmont: So... drm-next is best guess right now.
22:23 xiphmont: I'll put it there and ping the mailing list after.
22:23 xiphmont: I think the head on all of them is ~ the same code anyway.