01:38 fdobridge_: <g​fxstrand> @airlied the kernel I've been running with my Ampere just fine doesn't like my new Ada. Should I pull misc? misc-next?
01:46 fdobridge_: <T​om^> if you are using arch and the aur package it overrides it in the pkgbuild at line 44 https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=vulkan-nouveau-git#n44
02:01 fdobridge_: <a​irlied> Doesn't load or crashes later?
02:01 fdobridge_: <a​irlied> Can't think of anything ada specific we landed recently
02:05 fdobridge_: <k​arolherbst🐧🦀> ~~maybe it needs updated firmware~~
02:37 fdobridge_: <a​irlied> Could be, would have to get the model number
02:39 fdobridge_: <!​DodoNVK (she) 🇱🇹> I'm definitely faking it for newer DXVK versions
03:25 fdobridge_: <g​fxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1184697747014684792/dmesg?ex=658ceac1&is=657a75c1&hm=ab2074dd866f48686e4a7024b98cb3f1b896fefa6a16504e3a9c1cb0b95a631b&
03:31 fdobridge_: <g​fxstrand> Looks like I was running a pretty old branch. I just pulled drm-next-fixes
03:31 fdobridge_: <T​om^> just curious was that ampere a laptop with an dgpu and did you get zink running on it? wondering if i have just built something wrong or there is some bug im hitting in either zink or nouveau
03:32 fdobridge_: <g​fxstrand> Looks like I was running a pretty old branch. I just pulled drm-misc-fixes (edited)
03:32 fdobridge_: <g​fxstrand> Alienware x14 with an RTX 4060
03:32 fdobridge_: <g​fxstrand> I've not got my kernel working yet
03:33 fdobridge_: <T​om^> yeah i meant the amepere not ada 🙂
03:51 fdobridge_: <a​irlied> @gfxstrand does the bios have an option for Optimus? Otherwise I expect you will need some of Lyude work to get the panel to work
04:03 fdobridge_: <g​fxstrand> The Intel GPU drives the panel and that works just fine.
04:04 fdobridge_: <a​irlied> Wierd it's doing some display stuff in that dmesg
04:10 fdobridge_: <g​fxstrand> Here
04:10 fdobridge_: <g​fxstrand> Here's another (edited)
04:11 fdobridge_: <g​fxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1184709105307353179/dmesg?ex=658cf555&is=657a8055&hm=3f5bf01d808298807aaf05a6683470cfd05ad247848c5cde7dae78cb4c4314c0&
04:11 fdobridge_: <g​fxstrand> IDK what's going on. All I know is that NVK can't open the device
05:23 fdobridge_: <g​fxstrand> Lyude: ^^
05:53 fdobridge_: <a​irlied> @gfxstrand booting with config=NvGspRm=1,disp=0 might work
06:19 fdobridge_: <g​fxstrand> No dice
06:19 fdobridge_: <g​fxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1184741412919574588/dmesg?ex=658d136c&is=657a9e6c&hm=85cf9313b1b0821e6647a037dc18b52da818a3a50c718d68a08efc5644c956c8&
06:52 fdobridge_: <a​irlied> Uggh that would have worked if the path wasn't buggy 🙂
16:34 fdobridge_: <g​fxstrand> Just got around to looking at the trace and uh... mm/slub.c? That sounds scary.
16:34 fdobridge_: <g​fxstrand> It's also in the display code so it sounds like a problem for Lyude
16:36 gfxstrand: Oh, Lyude probably can't see that because it's a Discord download
16:38 gfxstrand: https://people.freedesktop.org/~gfxstrand/dmesg
16:44 fdobridge_: <g​fxstrand> It's a double-free. This is why drivers should be written in Rust. 🌶️ :ferris:
16:45 f_: Rust is rusty.
17:15 fdobridge_: <k​arolherbst🐧🦀> @gfxstrand nah, the bot posts links to discord's cdn
17:32 fdobridge_: <g​fxstrand> So... rewriting my dependency pass fixed those issues as well as a bunch of memory model fails. Maybe we can turn on MM now?
18:39 fdobridge_: <!​DodoNVK (she) 🇱🇹> DXVK v2 works OK with the previous memory model support (but it's nice to enable it properly)
18:52 fdobridge_: <k​arolherbst🐧🦀> what was the problem in the shader then? unknown or indeed something funky with the movs?
18:52 fdobridge_: <g​fxstrand> Yeah, something funky with the moves
18:52 fdobridge_: <g​fxstrand> I think
18:52 fdobridge_: <k​arolherbst🐧🦀> mhh
18:52 fdobridge_: <k​arolherbst🐧🦀> at least it's working now
18:53 fdobridge_: <g​fxstrand> There were a number of issues with the previous pass
18:53 fdobridge_: <g​fxstrand> It was generating too many barriers and also not enough. 🤡
18:54 fdobridge_: <k​arolherbst🐧🦀> classic
18:54 fdobridge_: <g​fxstrand> And then I reworked delay calculations to be barrier-aware and track the barriers just like registers with 2 cycle delays on them.
18:54 fdobridge_: <k​arolherbst🐧🦀> flashback to this commit: https://gitlab.freedesktop.org/mesa/mesa/-/commit/e4f675dc4288
18:54 fdobridge_: <k​arolherbst🐧🦀> ...
18:55 fdobridge_: <g​fxstrand> I should probably put a Fixes tag on that patch...
18:55 fdobridge_: <k​arolherbst🐧🦀> mhhh though you only really have to ensure a minimum delay of two
18:56 fdobridge_: <k​arolherbst🐧🦀> unless it's what you meant
18:57 fdobridge_: <g​fxstrand> Yeah but you have to track them to get that
18:59 fdobridge_: <k​arolherbst🐧🦀> yeah...
18:59 Lyude: gfxstrand: i can take a look today
19:31 fdobridge_: <p​homes_> 26676 is for nvk but does not have the label
20:03 fdobridge_: <a​irlied> @gfxstrand this might fix the disp=0 https://paste.centos.org/view/raw/60101a7c
20:08 airlied: Lyude: the patch for the freeing of rpc msgs is wrong, esp for msg writes, since we have to queue some messages up before the hw starts
21:51 fdobridge_: <g​fxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1184976039575814346/image.png?ex=658dedef&is=657b78ef&hm=9abe7466fc91c953e568db5797727f466e9e1d3b1d944dd9bec623c99b0b7981&
21:53 HdkR: A-pose at 90hz!
21:58 fdobridge_: <a​irlied> @gfxstrand win!
22:01 fdobridge_: <g​fxstrand> It is locking up, though. 🫤
22:03 fdobridge_: <g​fxstrand> Well, heaven locks it up
22:03 fdobridge_: <g​fxstrand> When I set it up extreme anyway
22:05 fdobridge_: <a​irlied> locking up dead OS? or locking up GPU faults?
22:06 fdobridge_: <a​irlied> don't think I've cranked extreme
22:06 fdobridge_: <!​DodoNVK (she) 🇱🇹> Is it a CPU lockup/hang?
22:09 fdobridge_: <g​fxstrand> Dead OS. IDK root cause.
22:10 fdobridge_: <g​fxstrand> Well, okay this time it's a dead GPU because I see other stuff still animating on the screen
22:10 fdobridge_: <!​DodoNVK (she) 🇱🇹> How well does NAK work on Ampere?
22:11 fdobridge_: <a​irlied> okay just cranked ultra/extreme on tu117
22:11 fdobridge_: <g​fxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1184980977743839353/message.txt?ex=658df288&is=657b7d88&hm=68cd9b00bf9844c2d87e7f5f439d3d872a7d8387e842742be88a72221cab9c33&
22:11 fdobridge_: <g​fxstrand> Ampere is great. Ada seems good too
22:12 fdobridge_: <g​fxstrand> I've been using Ampere for my CTSing lately and my new laptop is Ada
22:13 fdobridge_: <!​DodoNVK (she) 🇱🇹> YES :cursedgears: (you reproduced my issue)
22:13 fdobridge_: <g​fxstrand> I wouldn't at all be surprised if some of this is PRIME interactions between i915 and nouveua
22:13 fdobridge_: <!​DodoNVK (she) 🇱🇹> I still get the lockups with amdgpu
22:14 fdobridge_: <a​irlied> yeah that does smell a bit PRIMEy
22:14 fdobridge_: <g​fxstrand> I also saw one where nouveau took down i915 (unfortunately no logs that time)
22:15 fdobridge_: <g​fxstrand> Oh, and there's a workqueue error further up that I didn't paste
22:15 fdobridge_: <g​fxstrand> Here's the full thing:
22:16 fdobridge_: <g​fxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1184982174877884526/dmesg?ex=658df3a6&is=657b7ea6&hm=aa4e501c2e46c45c012f16c735d1930dfd6f6431ec1c534c94bfa2e8f5a3fa6d&
22:18 fdobridge_: <a​irlied> I think there might be a case where we spin polling the hw under a spinlock that we probably shouldn't not sure yet
22:23 HdkR: Can I interest you in some userspace spinlocks that behave more like futexes? Intel and AMD have an monitorx and umonitor set of instructions that might be handy :)
22:30 fdobridge_: <!​DodoNVK (she) 🇱🇹> I hope netborg doesn't suddenly appear and rewrite the nouveau KMD to be lock-free /s
22:34 fdobridge_: <r​hed0x> I dont think @ airlied understands that reference...
22:34 fdobridge_: <r​hed0x> so idk why you post it here
22:35 airlied: Lyude: so places where nvkm_gsp_rpc_wr takes false for wait, you can't free the rpc
22:39 fdobridge_: <g​fxstrand> I should figure out 64-bit shared atomics....
22:40 fdobridge_: <g​fxstrand> I really should figure out 64-bit shared atomics.... (edited)
22:41 fdobridge_: <k​arolherbst🐧🦀> you are in luck as there isn't much to figure out :ferrisUpsideDown:
22:42 fdobridge_: <g​fxstrand> I'm pretty sure I know what I need to do. I need to use generic atomics and map shared memory somewhere.
22:42 fdobridge_: <k​arolherbst🐧🦀> shared 64 bit only has `.CAS` and `.CAST`
22:42 fdobridge_: <g​fxstrand> That sucks
22:42 fdobridge_: <k​arolherbst🐧🦀> and `.EXCH` I think
22:42 fdobridge_: <k​arolherbst🐧🦀> `.CAST` is a shared only variant anyway
22:43 fdobridge_: <k​arolherbst🐧🦀> compare and store`
22:43 fdobridge_: <g​fxstrand> So I either do a CAST loop or generic
22:43 fdobridge_: <g​fxstrand> I guess a CAST loop is probably okay.
22:43 fdobridge_: <k​arolherbst🐧🦀> what do you mean with generic?
22:44 fdobridge_: <g​fxstrand> I mean `ATOM` instead of `ATOMS`
22:44 fdobridge_: <k​arolherbst🐧🦀> can't do
22:44 fdobridge_: <g​fxstrand> oh?
22:44 fdobridge_: <g​fxstrand> Does that have the same restriction?
22:44 fdobridge_: <k​arolherbst🐧🦀> yes
22:44 fdobridge_: <g​fxstrand> Bugger
22:44 fdobridge_: <g​fxstrand> Okay
22:44 fdobridge_: <g​fxstrand> I can do a CAST loop in NIR
22:45 fdobridge_: <k​arolherbst🐧🦀> don't we alreayd have lowering for it anyway?
22:45 fdobridge_: <g​fxstrand> Maybe?
22:45 fdobridge_: <g​fxstrand> We've got a variety of things. I'm not sure what all at this point
22:45 fdobridge_: <k​arolherbst🐧🦀> maybe some driver private thing... intel won't as they don't even have a 64 bit `.CAS` anyway
22:46 fdobridge_: <g​fxstrand> Intel doesn't implement shared 64-bit atomics.
22:46 fdobridge_: <g​fxstrand> Intel is why the extension has a bit
22:46 fdobridge_: <k​arolherbst🐧🦀> yeah.. but intel CL does and I've seen the code :ferrisUpsideDown:
22:46 fdobridge_: <g​fxstrand> The CL driver literally takes a lock.
22:46 fdobridge_: <k​arolherbst🐧🦀> yeah
22:46 fdobridge_: <g​fxstrand> It sucks
22:46 fdobridge_: <k​arolherbst🐧🦀> cursed
22:46 fdobridge_: <k​arolherbst🐧🦀> I wonder if other drivers have a 64 bit `CAS` but nothing elsse
22:48 fdobridge_: <k​arolherbst🐧🦀> I mean.. the lowering code is the same for any adress space as long as you have `CAS`...
22:48 fdobridge_: <g​fxstrand> I mean it kinda makes sense
22:48 fdobridge_: <g​fxstrand> 64-bit arithmetic is annoying so kick that back to the shader
22:48 fdobridge_: <k​arolherbst🐧🦀> yeah
22:48 fdobridge_: <k​arolherbst🐧🦀> and who needs it anyway
22:49 fdobridge_: <g​fxstrand> I mean doing it in the memory controller is faster but if it's not that common and you can lower to a single subgroup invocation anyway...
22:49 fdobridge_: <k​arolherbst🐧🦀> yeah
22:49 fdobridge_: <d​adschoorse> amd even has fp64 shared atomics
22:50 fdobridge_: <k​arolherbst🐧🦀> cursed
22:50 fdobridge_: <g​fxstrand> Well, I can implement fp64 atomics on nvidia. 😛
22:51 fdobridge_: <k​arolherbst🐧🦀> we really need a `nir_lower_atomics` thing
22:51 fdobridge_: <!​DodoNVK (she) 🇱🇹> Definitely not on TeraScale though 🪳
22:51 fdobridge_: <g​fxstrand> We have a handful of things but not one unified thing. IDK how much sense a unified thing makes, though.
22:52 fdobridge_: <k​arolherbst🐧🦀> yeah, though `lower atomic on cas` is kinda the same everywhere, no?
22:53 fdobridge_: <g​fxstrand> Sure but that doesn't mean everyone needs it
22:53 fdobridge_: <g​fxstrand> So sure, the pass can go in common code. But there's a reason it doesn't exist yet.
22:54 fdobridge_: <k​arolherbst🐧🦀> one of them is "I was too lazy to port the codegen lowering to nir"
22:54 fdobridge_: <g​fxstrand> I think I landed atomic lowering on scratch some time
22:54 fdobridge_: <g​fxstrand> (thanks, OpenCL...)
22:54 fdobridge_: <k​arolherbst🐧🦀> that really exists?
22:56 fdobridge_: <k​arolherbst🐧🦀> mhh.. through generic memory probably...
22:57 fdobridge_: <k​arolherbst🐧🦀> but isn't the lowering to just remove the atomic and do the op?
22:58 fdobridge_: <g​fxstrand> Yeah, that's what we do.
22:58 fdobridge_: <g​fxstrand> So we lower generic to an if-ladder and then lower the scratch branch to load/alu/store
22:59 fdobridge_: <k​arolherbst🐧🦀> I wonder how global atomics on mapped memory regions would work...
23:00 HdkR: PCIe atomics?
23:00 fdobridge_: <k​arolherbst🐧🦀> shared/local maps I mean
23:01 fdobridge_: <k​arolherbst🐧🦀> @gfxstrand btw, nvidia has a `ATOM.SPIN` thing which selects one thread to do the thing
23:01 fdobridge_: <k​arolherbst🐧🦀> only works with `.CAST`
23:02 fdobridge_: <k​arolherbst🐧🦀> but mhh..
23:02 fdobridge_: <k​arolherbst🐧🦀> it's kinda weird, because it returns 0 for non selected threads
23:04 fdobridge_: <k​arolherbst🐧🦀> it kinda takes memory banks into account
23:04 fdobridge_: <k​arolherbst🐧🦀> and is apparently faster than `.CAS`
23:04 fdobridge_: <k​arolherbst🐧🦀> well.. if you do the spin lock thing
23:05 fdobridge_: <k​arolherbst🐧🦀> wait what...
23:05 fdobridge_: <k​arolherbst🐧🦀> `ATOMS` has a stride modifier on the source, so you can fold in an `IMUL` with 4, 8 or 16
23:08 airlied: Lyude: though the lifetimes in this code is all over the place
23:11 fdobridge_: <g​fxstrand> I also really need to add a bit of code to NAK which uses `RED` instead of `ATOM` when the destination is unused.
23:12 fdobridge_: <k​arolherbst🐧🦀> mhhh
23:13 fdobridge_: <k​arolherbst🐧🦀> I actually wonder if works as epxected on the local/shared memory window...
23:13 fdobridge_: <k​arolherbst🐧🦀> (same for `ATOM`)
23:14 fdobridge_: <k​arolherbst🐧🦀> docs indicate that `ATOM` doesn't, but `RED` doesn't list any restrictions in this regard
23:15 fdobridge_: <g​fxstrand> Oh, well that would truly be cursed
23:16 fdobridge_: <k​arolherbst🐧🦀> yeah...
23:16 fdobridge_: <g​fxstrand> It makes sense, though.
23:16 fdobridge_: <g​fxstrand> If you're going to optimize one case, that's the one to optimize.
23:16 fdobridge_: <k​arolherbst🐧🦀> I am mostly wondering if 64 bit shared atomics would work on `RED`
23:17 fdobridge_: <g​fxstrand> Yeah
23:17 fdobridge_: <k​arolherbst🐧🦀> `RED` doesn't have `EXCH`, `CAS` or `CAST` tho
23:18 fdobridge_: <g​fxstrand> Well, yeah. It's a set-and-forget and those pretty much require you to look at the result.
23:18 fdobridge_: <k​arolherbst🐧🦀> btw.. float atomics are all `.FTZ.RN`
23:18 fdobridge_: <g​fxstrand> I mean, you could choose not to but that would be weird
23:19 fdobridge_: <k​arolherbst🐧🦀> something to keep in mind
23:19 fdobridge_: <k​arolherbst🐧🦀> with the new UAPI we can actually use the memory windows...
23:19 fdobridge_: <g​fxstrand> Yeah, it's good to know. So far I think float atomics are pretty sloppy and they don't have float controls.
23:40 fdobridge_: <g​fxstrand> Coming back to this... Wait on BAR. What happens if you don't wait? Do you get unconverged threads? Also, how does a wait work? It's a variable-latency instruction.
23:42 fdobridge_: <k​arolherbst🐧🦀> not sure, I only know that the wait needs to be at least 6 😄
23:42 fdobridge_: <g​fxstrand> *sigh*
23:42 fdobridge_: <k​arolherbst🐧🦀> there are some instructions which restrictions like that
23:42 fdobridge_: <k​arolherbst🐧🦀> *with
23:43 fdobridge_: <k​arolherbst🐧🦀> mhh
23:43 fdobridge_: <g​fxstrand> But wait until what? The next non-nop? Anything?
23:43 fdobridge_: <k​arolherbst🐧🦀> I think the scoreboard doc have something on that
23:43 fdobridge_: <g​fxstrand> *sigh*
23:43 fdobridge_: <k​arolherbst🐧🦀> there is `.DEREF_BLOCKING`
23:44 fdobridge_: <k​arolherbst🐧🦀> if you use `BAR` without `.DEFER_BLOCKING` you need to use that minimum wait
23:44 fdobridge_: <g​fxstrand> Hrm...
23:44 fdobridge_: <g​fxstrand> defer blocking sounds interesting...
23:44 fdobridge_: <g​fxstrand> Also sounds a little insane
23:44 fdobridge_: <k​arolherbst🐧🦀> I have no idea what it does 🙂
23:44 fdobridge_: <k​arolherbst🐧🦀> ohh wait
23:44 fdobridge_: <g​fxstrand> Barrier but... uhhhh... later, maybe?
23:44 fdobridge_: <k​arolherbst🐧🦀> there it is
23:45 fdobridge_: <k​arolherbst🐧🦀> synchronization is defered until the next synch event
23:45 fdobridge_: <k​arolherbst🐧🦀> so yeah..
23:45 fdobridge_: <k​arolherbst🐧🦀> so you can issue a couple of instructions more until the warp synchronizes or something
23:45 fdobridge_: <k​arolherbst🐧🦀> it's all pretty vague
23:46 fdobridge_: <g​fxstrand> I'm going to assume that I need at least 6 cycles between BAR and the next thing that observably touches memory
23:46 fdobridge_: <g​fxstrand> Like BAR then FADD should be fine, right?
23:46 fdobridge_: <g​fxstrand> One would think....
23:46 fdobridge_: <k​arolherbst🐧🦀> no, it's a restriction on the instruction itself
23:46 fdobridge_: <g​fxstrand> So 6 cycles before ANYTHING?
23:46 fdobridge_: <g​fxstrand> Okay, I can do that.
23:46 fdobridge_: <k​arolherbst🐧🦀> yeah
23:46 fdobridge_: <g​fxstrand> Does MemBar have anything like that?
23:46 fdobridge_: <k​arolherbst🐧🦀> unless you use `.DEREF_BLOCKING` then this restriction doesn't exist anymore
23:46 fdobridge_: <k​arolherbst🐧🦀> *DEFER
23:47 fdobridge_: <g​fxstrand> Yeah but who knows what that does
23:47 fdobridge_: <k​arolherbst🐧🦀> (it's 5 on ampere btw)
23:48 fdobridge_: <g​fxstrand> kk
23:48 fdobridge_: <k​arolherbst🐧🦀> `CCTL.C` needs 8, `CCTL.I` needs 11, `MEMBAR` needs 5, `DEPBAR` needs 4
23:48 fdobridge_: <k​arolherbst🐧🦀> (on turing)
23:49 fdobridge_: <k​arolherbst🐧🦀> I think that's all the relevant restrictions you might run into
23:49 fdobridge_: <g​fxstrand> kk