00:19fdobridge: <gfxstrand> We should be putting it a lot more places. At the very least, anything variable-latency should yield.
00:19fdobridge: <gfxstrand> I think yield is almost free.
00:19fdobridge: <gfxstrand> Are you sure about that? If not then how the hell are you supposed to scoreboard them when you can't set wt flags?
00:19fdobridge: <gfxstrand> The HW gets REALLY picky about deps on those instructions.
00:19fdobridge: <gfxstrand> the disassembler, too
00:20fdobridge: <gfxstrand> I'm going to play with it more tomorrow.
00:24fdobridge: <gfxstrand> ```
00:24fdobridge: <gfxstrand> Test case 'dEQP-VK.graphicsfuzz.cov-dfdx-dfdy-after-nested-loops'..
00:24fdobridge: <gfxstrand> Pass (Pass)
00:24fdobridge: <gfxstrand> ```
00:47fdobridge: <karolherbst🐧🦀> yeah, afaik they removed everything from the hardware and it's up to the compiler to get it right otherwise it's undefined behavior. The only thing the hardware does is wait/signal barriers
00:47fdobridge: <karolherbst🐧🦀> though the ISA _does_ have requirements on some of those things
00:48fdobridge: <karolherbst🐧🦀> but they are documented
00:48fdobridge: <karolherbst🐧🦀> (except .yld)
01:12fdobridge: <gfxstrand> What do the docs say about BMOV, BSSY, and BSYNC?
01:20fdobridge: <karolherbst🐧🦀> BMOV: barriers only when using gprs (and only valid for the gpr), no barriers for the others. BMOV (with register source and writing MACTIVE/MKILL) is an instruction which stops issueing new instructions in the warp until it's resolved (sources read and instruction executed). No scoreboarding required. This kind of `BMOV` also implies `.yld` (or also called drain)
01:22fdobridge: <gfxstrand> Okay, that seems consistent with my model at least.
01:23fdobridge: <gfxstrand> Though I may be messing it up somewhere.
01:24fdobridge: <gfxstrand> "barriers only when using gprs (and only valid for the gpr), no barriers for the others." <- How is that not barrier registers being internally scoreboarded?
01:24fdobridge: <karolherbst🐧🦀> ehh wait.. drain is this "wait on everything" thing
01:25fdobridge: <karolherbst🐧🦀> the read/write barriers on registers only make sense if there is no fixed latency
01:26fdobridge: <karolherbst🐧🦀> you don't use them if the dep is fixed latency
01:27fdobridge: <gfxstrand> For most things you can use them, you just don't have to
01:27fdobridge: <gfxstrand> At least on Turing
01:27fdobridge: <karolherbst🐧🦀> you shouldn't if you don't ahve to
01:27fdobridge: <karolherbst🐧🦀> they add a min latency if you use them
01:28fdobridge: <karolherbst🐧🦀> that's why you have the stall count so instructions wait the proper time
01:29fdobridge: <karolherbst🐧🦀> (as the hardware doesn't do it)
01:33fdobridge: <karolherbst🐧🦀> @gfxstrand btw, there are instructions which are fixed and variable depending on the hardware, but the ISA allows you to treat them as both... for binary compatibility reasons
01:34fdobridge: <karolherbst🐧🦀> (And they ignore the barriers set on some hardware)
01:34fdobridge: <karolherbst🐧🦀> (this is usually the case for FP64)
01:36fdobridge: <karolherbst🐧🦀> (also for FP16)
02:05fdobridge: <gfxstrand> Oh fun...
03:17fdobridge: <karolherbst🐧🦀> yeah.. I think it's one of those "quadro vs geforce" moments
03:17fdobridge: <karolherbst🐧🦀> and on geforce those are variable and on quadro they are fixed
03:18fdobridge: <karolherbst🐧🦀> and because the SM version can't really tell, the emitted binary needs to run on both
04:10fdobridge: <gfxstrand> So does this mean it's variable-latency for the GPR part?
04:12fdobridge: <gfxstrand> I bet barrier regs have a higher latency like predicates do or something like that...
04:14fdobridge: <karolherbst🐧🦀> no, but the dep might be
04:14fdobridge: <karolherbst🐧🦀> so it needs to be able to wait
04:33fdobridge: <gfxstrand> I'm kinda wondering if I should have multiple BMov instructions in the IR...
05:54fdobridge: <gfxstrand> Well, looks like my interpretation of all this is mostly okay now. Doing one more CTS run and I'll merge tomorrow.
05:54fdobridge: <gfxstrand> That's the last big hill to climb in the compiler, I think.
05:54fdobridge: <gfxstrand> There's more optimizations to do, of course, but I think that's the last really hard compiler theory problem.
06:19fdobridge: <gfxstrand> The other issue I need to figure out is that you have to wait some number of cycles after a barrier before doing shared memory ops if you want coherency. IDK why. Annoyingly, you can't set a delay on the barrier, so you have to do it with a nop+delay. At least that's what it looks like reading the output of the blob compiler.
06:20fdobridge: <gfxstrand> There's a handful of memory model tests failing because of this.
11:58fdobridge: <karolherbst🐧🦀> never seen that in the wild.. I can check if I find something about that
14:46Ilgaz: I filed opensuse tw bug about display corruption on gnome-terminal-gtk3 and kde system monitor which doesn't happen on fedora live usb
14:46Ilgaz: https://bugzilla.opensuse.org/show_bug.cgi?id=1217748
14:51karolherbst: Ilgaz: does that happen on a fresh boot?
14:51karolherbst: there is this bug where updating fonts can mess things up or something
14:51RSpliet: those macbooks are still running... that's crazy! :D
14:52karolherbst: but yeah.. could also be a mesa bug
14:52karolherbst: what version of mesa are you running?
15:10Ilgaz: karolherbst: it also happens on gnome terminal showing very crazy glitches especially when you move the window
15:11karolherbst: I see
15:11Ilgaz: Mesa 23.2.1-1699.364.pm.1
15:11karolherbst: do you know if it's caused by an update to mesa? Can you try older versions easily?
15:12karolherbst: mhhh
15:12karolherbst: but there is also this `imem: OOM: 00100000 00001000 -2` error
15:13Ilgaz: karolherbst: I can go back to 2023/11/23 via snapper. yes about imem,b things really went out of hand when I launched kde system monitor. system slowed down
15:14Ilgaz: I could only get dmesg and kill the app
15:20Ilgaz: karolherbst: the glitch I see on kde system monitor looks exactly like this sddm bug I reported. They sent me here actually and I figured gnome things while trying to figure if it is a DE bug. https://bugzilla.opensuse.org/show_bug.cgi?id=1217486
15:23Ilgaz: btw as I am a kde user I don't have gnome things on that snapshot. Do you mind if I try tomorrow as I have to leave soon?
15:30Ilgaz: will update in about 14 hours bbl
15:30karolherbst: sure
16:41fdobridge: <karolherbst🐧🦀> the wait on `BAR` needs to be at least 6
16:42fdobridge: <karolherbst🐧🦀> so nvidia _might_ use nop+delay if they want to wait less?
16:42fdobridge: <karolherbst🐧🦀> if you mean a scoreboard barrier, then there is always `DEPBAR`
16:42fdobridge: <karolherbst🐧🦀> (needs a wait of at least 4)
16:44fdobridge: <karolherbst🐧🦀> (on ampere)
16:45fdobridge: <karolherbst🐧🦀> on Turing `BAR` needs a wait of at least 5
16:45fdobridge: <karolherbst🐧🦀> @gfxstrand I can give you more of those "minimal waits" on other instructions, there are a couple
16:46fdobridge: <karolherbst🐧🦀> ehh `MEMBAR` is like `BAR`
16:51fdobridge: <gfxstrand> Right. I think the problem is that setting a delay on BAR itself doesn't work properly. So they insert a NOP and use that for the delay.
16:59fdobridge: <gfxstrand> Unfortunately, that means the dependency pass needs the ability to insert instructions. 😭
16:59fdobridge: <gfxstrand> I can get it sorted. It's just annoying.
17:13fdobridge: <karolherbst🐧🦀> annoying. I don't really see directly on why that's needed, might have to take a look at the code to see if something strange is going on
17:15fdobridge: <gfxstrand> I'm not sure either. All I know is what I've seen from dumping memory model test shaders on the blob
19:30fdobridge: <Sid> would it be worth enabling GSP by default on ampere as well?
19:31fdobridge: <Sid> ada's gsp symlinks back to ampere's
19:35fdobridge: <Sid> the proprietary driver package only ships 2 gsp bins as well
19:55fdobridge: <airlied> Upstream changing the default is hard, because there is no way to ensure new fw on user systems, we probably need to provide a config option and have distros turn it on once they ship new fw
19:57cmiller: Hello, I am running GNU Guix with Linux Libre and Xorg with nouveau drivers on a GM204 (GeForce 970) which reports in the Xorg log file the following (EE) NOUVEAU(0): Error creating GPU channel: -19" and therefore "(EE) NOUVEAU(0): Error initialising acceleration. Falling back to NoAccel"
19:58cmiller: I also noticed the following in dmesg "[ 0.000719] gran_size: 64K chunk_size: 64K num_reg: 10 lose cover RAM: 62M" which appears to be related to the GPU, since if I remove it, those messages are gone. I Also have a 770 which producs the same output (but did not test 3D acceleration with it)
20:04fdobridge: <!DodoNVK (she) 🇱🇹> cmiller: So you don't have linux-firmware installed, right?
20:05cmiller: fdobridge: Yes
20:31fdobridge: <mhenning> I believe the GeForce 970 requires proprietary firmware to function.
20:32fdobridge: <mhenning> The 770 should have open firmware available
20:32karolherbst: cmiller: don't use linux-libre :)
20:32karolherbst: but yeah, 3D accel should work on the 770 without blobs
20:43DodoGTA: karolherbst: Does GM1xx require firmware for 3D acceleration?
20:45cmiller: Ah thanks. How would I see this? The only reason I tried is because I looked at the feature matrix and assumed it works on libre.
20:46karolherbst: DodoGTA: no
20:47karolherbst: cmiller: I don't think we point it out explicitly there
20:48DodoGTA: karolherbst: So GM2xx is special then? I remember something about high-secure firmware
20:50cmiller: karolherbst: Would be nice for beginners like me. Wasted a bit to much time to make it work. I will try again with the 770.
20:51karolherbst: cmiller: well.. for beginners it's better to not use linux-libre, because linux-libre isn't supported by us at all
20:53cmiller: karolherbst: I did not now that. But why should I use nouveau then? Isn't the point being libre? If it requires firmware anyways I don't see a point.
20:54cmiller: s/now/know
21:02karolherbst: well.. the kernel module and userspace are still open source
21:02karolherbst: anyway, we can't replace the firmware as they are cryptographically signed
21:08cmiller: Ah okay. Nvidia is really hardcore to this it seems.
21:34sneil: I'm trying out the GSP firmware on a TU106 (manjaro linux 6.7.0rc3 test kernel) and I'm getting: https://pastebin.com/wGAxaBiG
21:37xiphmont: Hello all! So, I have a 'fix a longstanding crash' patch. I don't want to create an Issue as I have a fix for review. However, I'm not a maintainer (and don't really want to be one). As a result... I kind of fall through the documentation cracks. I'm not even sure which of several repos/branches to base the patch against for submission. Any quick pointers?
21:38xiphmont: Also, nouveau list mail all reference GitHub, all the dics tell ne to use GitLab. And all the actual patch indexes aren't at either, so I assume an internal freedesktop repo?
21:38xiphmont: er, s/dics/docs sorry
21:44soreau: xiphmont: is it a mesa or kernel patch?
21:46xiphmont: kernel
21:46xiphmont: nouveau_gem.c and nouveau_dma.h Fairly small.
21:48xiphmont: To be fair, it raises a few questions--- I'm not opposed to an Issue so much as I don't want it to get lost amongst the voliminous "AUIGH NOTHING WORKS" issues.
21:49xiphmont: also, the usual 'submit the following 40 complete logs/dumps' is not relevant :-)
21:57soreau: I'd submit it here first https://gitlab.freedesktop.org/drm/nouveau
21:59soreau: this way you can get feedback on it and if it's not the right place, someone will say (or if not, you can at least have a link to the MR to ping about it later)
21:59karolherbst: airlied: might want to take a look at sneils error
22:00xiphmont: OK! So nouveau-next branch on gitlab, or is nouveau-fixes a better choice? [I did read the docs, they just left me uncertain given that none of the patches I see discussed are there except as part of merges]
22:01soreau: I'd think nouveau-next since it's the default..
22:02karolherbst: nouveau kernel patches are still send to the nouveau ML, but we also kinda want to accept patches on gitlab
22:18fdobridge: <mhenning> xiphmont: I don't think nouveau-next is correct - that branch is a bit stale. I don't personally understand the kernel branches, but my guess might be to use drm-next from https://cgit.freedesktop.org/drm/drm
22:19fdobridge: <mhenning> but even the latest 6.7 rc is fresher than nouveau-next
22:20xiphmont: everything on the public gitlab repo looks stale (with the exception of the merges)
22:21xiphmont: Anyway, having a URL probably is the important thing.
22:21xiphmont: So... drm-next is best guess right now.
22:23xiphmont: I'll put it there and ping the mailing list after.
22:23xiphmont: I think the head on all of them is ~ the same code anyway.