01:38fdobridge_: <gfxstrand> @airlied the kernel I've been running with my Ampere just fine doesn't like my new Ada. Should I pull misc? misc-next?
01:46fdobridge_: <Tom^> if you are using arch and the aur package it overrides it in the pkgbuild at line 44 https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=vulkan-nouveau-git#n44
02:01fdobridge_: <airlied> Doesn't load or crashes later?
02:01fdobridge_: <airlied> Can't think of anything ada specific we landed recently
02:05fdobridge_: <karolherbst🐧🦀> ~~maybe it needs updated firmware~~
02:37fdobridge_: <airlied> Could be, would have to get the model number
02:39fdobridge_: <!DodoNVK (she) 🇱🇹> I'm definitely faking it for newer DXVK versions
03:25fdobridge_: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1184697747014684792/dmesg?ex=658ceac1&is=657a75c1&hm=ab2074dd866f48686e4a7024b98cb3f1b896fefa6a16504e3a9c1cb0b95a631b&
03:31fdobridge_: <gfxstrand> Looks like I was running a pretty old branch. I just pulled drm-next-fixes
03:31fdobridge_: <Tom^> just curious was that ampere a laptop with an dgpu and did you get zink running on it? wondering if i have just built something wrong or there is some bug im hitting in either zink or nouveau
03:32fdobridge_: <gfxstrand> Looks like I was running a pretty old branch. I just pulled drm-misc-fixes (edited)
03:32fdobridge_: <gfxstrand> Alienware x14 with an RTX 4060
03:32fdobridge_: <gfxstrand> I've not got my kernel working yet
03:33fdobridge_: <Tom^> yeah i meant the amepere not ada 🙂
03:51fdobridge_: <airlied> @gfxstrand does the bios have an option for Optimus? Otherwise I expect you will need some of Lyude work to get the panel to work
04:03fdobridge_: <gfxstrand> The Intel GPU drives the panel and that works just fine.
04:04fdobridge_: <airlied> Wierd it's doing some display stuff in that dmesg
04:10fdobridge_: <gfxstrand> Here
04:10fdobridge_: <gfxstrand> Here's another (edited)
04:11fdobridge_: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1184709105307353179/dmesg?ex=658cf555&is=657a8055&hm=3f5bf01d808298807aaf05a6683470cfd05ad247848c5cde7dae78cb4c4314c0&
04:11fdobridge_: <gfxstrand> IDK what's going on. All I know is that NVK can't open the device
05:23fdobridge_: <gfxstrand> Lyude: ^^
05:53fdobridge_: <airlied> @gfxstrand booting with config=NvGspRm=1,disp=0 might work
06:19fdobridge_: <gfxstrand> No dice
06:19fdobridge_: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1184741412919574588/dmesg?ex=658d136c&is=657a9e6c&hm=85cf9313b1b0821e6647a037dc18b52da818a3a50c718d68a08efc5644c956c8&
06:52fdobridge_: <airlied> Uggh that would have worked if the path wasn't buggy 🙂
16:34fdobridge_: <gfxstrand> Just got around to looking at the trace and uh... mm/slub.c? That sounds scary.
16:34fdobridge_: <gfxstrand> It's also in the display code so it sounds like a problem for Lyude
16:36gfxstrand: Oh, Lyude probably can't see that because it's a Discord download
16:38gfxstrand: https://people.freedesktop.org/~gfxstrand/dmesg
16:44fdobridge_: <gfxstrand> It's a double-free. This is why drivers should be written in Rust. 🌶️ :ferris:
16:45f_: Rust is rusty.
17:15fdobridge_: <karolherbst🐧🦀> @gfxstrand nah, the bot posts links to discord's cdn
17:32fdobridge_: <gfxstrand> So... rewriting my dependency pass fixed those issues as well as a bunch of memory model fails. Maybe we can turn on MM now?
18:39fdobridge_: <!DodoNVK (she) 🇱🇹> DXVK v2 works OK with the previous memory model support (but it's nice to enable it properly)
18:52fdobridge_: <karolherbst🐧🦀> what was the problem in the shader then? unknown or indeed something funky with the movs?
18:52fdobridge_: <gfxstrand> Yeah, something funky with the moves
18:52fdobridge_: <gfxstrand> I think
18:52fdobridge_: <karolherbst🐧🦀> mhh
18:52fdobridge_: <karolherbst🐧🦀> at least it's working now
18:53fdobridge_: <gfxstrand> There were a number of issues with the previous pass
18:53fdobridge_: <gfxstrand> It was generating too many barriers and also not enough. 🤡
18:54fdobridge_: <karolherbst🐧🦀> classic
18:54fdobridge_: <gfxstrand> And then I reworked delay calculations to be barrier-aware and track the barriers just like registers with 2 cycle delays on them.
18:54fdobridge_: <karolherbst🐧🦀> flashback to this commit: https://gitlab.freedesktop.org/mesa/mesa/-/commit/e4f675dc4288
18:54fdobridge_: <karolherbst🐧🦀> ...
18:55fdobridge_: <gfxstrand> I should probably put a Fixes tag on that patch...
18:55fdobridge_: <karolherbst🐧🦀> mhhh though you only really have to ensure a minimum delay of two
18:56fdobridge_: <karolherbst🐧🦀> unless it's what you meant
18:57fdobridge_: <gfxstrand> Yeah but you have to track them to get that
18:59fdobridge_: <karolherbst🐧🦀> yeah...
18:59Lyude: gfxstrand: i can take a look today
19:31fdobridge_: <phomes_> 26676 is for nvk but does not have the label
20:03fdobridge_: <airlied> @gfxstrand this might fix the disp=0 https://paste.centos.org/view/raw/60101a7c
20:08airlied: Lyude: the patch for the freeing of rpc msgs is wrong, esp for msg writes, since we have to queue some messages up before the hw starts
21:51fdobridge_: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1184976039575814346/image.png?ex=658dedef&is=657b78ef&hm=9abe7466fc91c953e568db5797727f466e9e1d3b1d944dd9bec623c99b0b7981&
21:53HdkR: A-pose at 90hz!
21:58fdobridge_: <airlied> @gfxstrand win!
22:01fdobridge_: <gfxstrand> It is locking up, though. 🫤
22:03fdobridge_: <gfxstrand> Well, heaven locks it up
22:03fdobridge_: <gfxstrand> When I set it up extreme anyway
22:05fdobridge_: <airlied> locking up dead OS? or locking up GPU faults?
22:06fdobridge_: <airlied> don't think I've cranked extreme
22:06fdobridge_: <!DodoNVK (she) 🇱🇹> Is it a CPU lockup/hang?
22:09fdobridge_: <gfxstrand> Dead OS. IDK root cause.
22:10fdobridge_: <gfxstrand> Well, okay this time it's a dead GPU because I see other stuff still animating on the screen
22:10fdobridge_: <!DodoNVK (she) 🇱🇹> How well does NAK work on Ampere?
22:11fdobridge_: <airlied> okay just cranked ultra/extreme on tu117
22:11fdobridge_: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1184980977743839353/message.txt?ex=658df288&is=657b7d88&hm=68cd9b00bf9844c2d87e7f5f439d3d872a7d8387e842742be88a72221cab9c33&
22:11fdobridge_: <gfxstrand> Ampere is great. Ada seems good too
22:12fdobridge_: <gfxstrand> I've been using Ampere for my CTSing lately and my new laptop is Ada
22:13fdobridge_: <!DodoNVK (she) 🇱🇹> YES :cursedgears: (you reproduced my issue)
22:13fdobridge_: <gfxstrand> I wouldn't at all be surprised if some of this is PRIME interactions between i915 and nouveua
22:13fdobridge_: <!DodoNVK (she) 🇱🇹> I still get the lockups with amdgpu
22:14fdobridge_: <airlied> yeah that does smell a bit PRIMEy
22:14fdobridge_: <gfxstrand> I also saw one where nouveau took down i915 (unfortunately no logs that time)
22:15fdobridge_: <gfxstrand> Oh, and there's a workqueue error further up that I didn't paste
22:15fdobridge_: <gfxstrand> Here's the full thing:
22:16fdobridge_: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1184982174877884526/dmesg?ex=658df3a6&is=657b7ea6&hm=aa4e501c2e46c45c012f16c735d1930dfd6f6431ec1c534c94bfa2e8f5a3fa6d&
22:18fdobridge_: <airlied> I think there might be a case where we spin polling the hw under a spinlock that we probably shouldn't not sure yet
22:23HdkR: Can I interest you in some userspace spinlocks that behave more like futexes? Intel and AMD have an monitorx and umonitor set of instructions that might be handy :)
22:30fdobridge_: <!DodoNVK (she) 🇱🇹> I hope netborg doesn't suddenly appear and rewrite the nouveau KMD to be lock-free /s
22:34fdobridge_: <rhed0x> I dont think @ airlied understands that reference...
22:34fdobridge_: <rhed0x> so idk why you post it here
22:35airlied: Lyude: so places where nvkm_gsp_rpc_wr takes false for wait, you can't free the rpc
22:39fdobridge_: <gfxstrand> I should figure out 64-bit shared atomics....
22:40fdobridge_: <gfxstrand> I really should figure out 64-bit shared atomics.... (edited)
22:41fdobridge_: <karolherbst🐧🦀> you are in luck as there isn't much to figure out :ferrisUpsideDown:
22:42fdobridge_: <gfxstrand> I'm pretty sure I know what I need to do. I need to use generic atomics and map shared memory somewhere.
22:42fdobridge_: <karolherbst🐧🦀> shared 64 bit only has `.CAS` and `.CAST`
22:42fdobridge_: <gfxstrand> That sucks
22:42fdobridge_: <karolherbst🐧🦀> and `.EXCH` I think
22:42fdobridge_: <karolherbst🐧🦀> `.CAST` is a shared only variant anyway
22:43fdobridge_: <karolherbst🐧🦀> compare and store`
22:43fdobridge_: <gfxstrand> So I either do a CAST loop or generic
22:43fdobridge_: <gfxstrand> I guess a CAST loop is probably okay.
22:43fdobridge_: <karolherbst🐧🦀> what do you mean with generic?
22:44fdobridge_: <gfxstrand> I mean `ATOM` instead of `ATOMS`
22:44fdobridge_: <karolherbst🐧🦀> can't do
22:44fdobridge_: <gfxstrand> oh?
22:44fdobridge_: <gfxstrand> Does that have the same restriction?
22:44fdobridge_: <karolherbst🐧🦀> yes
22:44fdobridge_: <gfxstrand> Bugger
22:44fdobridge_: <gfxstrand> Okay
22:44fdobridge_: <gfxstrand> I can do a CAST loop in NIR
22:45fdobridge_: <karolherbst🐧🦀> don't we alreayd have lowering for it anyway?
22:45fdobridge_: <gfxstrand> Maybe?
22:45fdobridge_: <gfxstrand> We've got a variety of things. I'm not sure what all at this point
22:45fdobridge_: <karolherbst🐧🦀> maybe some driver private thing... intel won't as they don't even have a 64 bit `.CAS` anyway
22:46fdobridge_: <gfxstrand> Intel doesn't implement shared 64-bit atomics.
22:46fdobridge_: <gfxstrand> Intel is why the extension has a bit
22:46fdobridge_: <karolherbst🐧🦀> yeah.. but intel CL does and I've seen the code :ferrisUpsideDown:
22:46fdobridge_: <gfxstrand> The CL driver literally takes a lock.
22:46fdobridge_: <karolherbst🐧🦀> yeah
22:46fdobridge_: <gfxstrand> It sucks
22:46fdobridge_: <karolherbst🐧🦀> cursed
22:46fdobridge_: <karolherbst🐧🦀> I wonder if other drivers have a 64 bit `CAS` but nothing elsse
22:48fdobridge_: <karolherbst🐧🦀> I mean.. the lowering code is the same for any adress space as long as you have `CAS`...
22:48fdobridge_: <gfxstrand> I mean it kinda makes sense
22:48fdobridge_: <gfxstrand> 64-bit arithmetic is annoying so kick that back to the shader
22:48fdobridge_: <karolherbst🐧🦀> yeah
22:48fdobridge_: <karolherbst🐧🦀> and who needs it anyway
22:49fdobridge_: <gfxstrand> I mean doing it in the memory controller is faster but if it's not that common and you can lower to a single subgroup invocation anyway...
22:49fdobridge_: <karolherbst🐧🦀> yeah
22:49fdobridge_: <dadschoorse> amd even has fp64 shared atomics
22:50fdobridge_: <karolherbst🐧🦀> cursed
22:50fdobridge_: <gfxstrand> Well, I can implement fp64 atomics on nvidia. 😛
22:51fdobridge_: <karolherbst🐧🦀> we really need a `nir_lower_atomics` thing
22:51fdobridge_: <!DodoNVK (she) 🇱🇹> Definitely not on TeraScale though 🪳
22:51fdobridge_: <gfxstrand> We have a handful of things but not one unified thing. IDK how much sense a unified thing makes, though.
22:52fdobridge_: <karolherbst🐧🦀> yeah, though `lower atomic on cas` is kinda the same everywhere, no?
22:53fdobridge_: <gfxstrand> Sure but that doesn't mean everyone needs it
22:53fdobridge_: <gfxstrand> So sure, the pass can go in common code. But there's a reason it doesn't exist yet.
22:54fdobridge_: <karolherbst🐧🦀> one of them is "I was too lazy to port the codegen lowering to nir"
22:54fdobridge_: <gfxstrand> I think I landed atomic lowering on scratch some time
22:54fdobridge_: <gfxstrand> (thanks, OpenCL...)
22:54fdobridge_: <karolherbst🐧🦀> that really exists?
22:56fdobridge_: <karolherbst🐧🦀> mhh.. through generic memory probably...
22:57fdobridge_: <karolherbst🐧🦀> but isn't the lowering to just remove the atomic and do the op?
22:58fdobridge_: <gfxstrand> Yeah, that's what we do.
22:58fdobridge_: <gfxstrand> So we lower generic to an if-ladder and then lower the scratch branch to load/alu/store
22:59fdobridge_: <karolherbst🐧🦀> I wonder how global atomics on mapped memory regions would work...
23:00HdkR: PCIe atomics?
23:00fdobridge_: <karolherbst🐧🦀> shared/local maps I mean
23:01fdobridge_: <karolherbst🐧🦀> @gfxstrand btw, nvidia has a `ATOM.SPIN` thing which selects one thread to do the thing
23:01fdobridge_: <karolherbst🐧🦀> only works with `.CAST`
23:02fdobridge_: <karolherbst🐧🦀> but mhh..
23:02fdobridge_: <karolherbst🐧🦀> it's kinda weird, because it returns 0 for non selected threads
23:04fdobridge_: <karolherbst🐧🦀> it kinda takes memory banks into account
23:04fdobridge_: <karolherbst🐧🦀> and is apparently faster than `.CAS`
23:04fdobridge_: <karolherbst🐧🦀> well.. if you do the spin lock thing
23:05fdobridge_: <karolherbst🐧🦀> wait what...
23:05fdobridge_: <karolherbst🐧🦀> `ATOMS` has a stride modifier on the source, so you can fold in an `IMUL` with 4, 8 or 16
23:08airlied: Lyude: though the lifetimes in this code is all over the place
23:11fdobridge_: <gfxstrand> I also really need to add a bit of code to NAK which uses `RED` instead of `ATOM` when the destination is unused.
23:12fdobridge_: <karolherbst🐧🦀> mhhh
23:13fdobridge_: <karolherbst🐧🦀> I actually wonder if works as epxected on the local/shared memory window...
23:13fdobridge_: <karolherbst🐧🦀> (same for `ATOM`)
23:14fdobridge_: <karolherbst🐧🦀> docs indicate that `ATOM` doesn't, but `RED` doesn't list any restrictions in this regard
23:15fdobridge_: <gfxstrand> Oh, well that would truly be cursed
23:16fdobridge_: <karolherbst🐧🦀> yeah...
23:16fdobridge_: <gfxstrand> It makes sense, though.
23:16fdobridge_: <gfxstrand> If you're going to optimize one case, that's the one to optimize.
23:16fdobridge_: <karolherbst🐧🦀> I am mostly wondering if 64 bit shared atomics would work on `RED`
23:17fdobridge_: <gfxstrand> Yeah
23:17fdobridge_: <karolherbst🐧🦀> `RED` doesn't have `EXCH`, `CAS` or `CAST` tho
23:18fdobridge_: <gfxstrand> Well, yeah. It's a set-and-forget and those pretty much require you to look at the result.
23:18fdobridge_: <karolherbst🐧🦀> btw.. float atomics are all `.FTZ.RN`
23:18fdobridge_: <gfxstrand> I mean, you could choose not to but that would be weird
23:19fdobridge_: <karolherbst🐧🦀> something to keep in mind
23:19fdobridge_: <karolherbst🐧🦀> with the new UAPI we can actually use the memory windows...
23:19fdobridge_: <gfxstrand> Yeah, it's good to know. So far I think float atomics are pretty sloppy and they don't have float controls.
23:40fdobridge_: <gfxstrand> Coming back to this... Wait on BAR. What happens if you don't wait? Do you get unconverged threads? Also, how does a wait work? It's a variable-latency instruction.
23:42fdobridge_: <karolherbst🐧🦀> not sure, I only know that the wait needs to be at least 6 😄
23:42fdobridge_: <gfxstrand> *sigh*
23:42fdobridge_: <karolherbst🐧🦀> there are some instructions which restrictions like that
23:42fdobridge_: <karolherbst🐧🦀> *with
23:43fdobridge_: <karolherbst🐧🦀> mhh
23:43fdobridge_: <gfxstrand> But wait until what? The next non-nop? Anything?
23:43fdobridge_: <karolherbst🐧🦀> I think the scoreboard doc have something on that
23:43fdobridge_: <gfxstrand> *sigh*
23:43fdobridge_: <karolherbst🐧🦀> there is `.DEREF_BLOCKING`
23:44fdobridge_: <karolherbst🐧🦀> if you use `BAR` without `.DEFER_BLOCKING` you need to use that minimum wait
23:44fdobridge_: <gfxstrand> Hrm...
23:44fdobridge_: <gfxstrand> defer blocking sounds interesting...
23:44fdobridge_: <gfxstrand> Also sounds a little insane
23:44fdobridge_: <karolherbst🐧🦀> I have no idea what it does 🙂
23:44fdobridge_: <karolherbst🐧🦀> ohh wait
23:44fdobridge_: <gfxstrand> Barrier but... uhhhh... later, maybe?
23:44fdobridge_: <karolherbst🐧🦀> there it is
23:45fdobridge_: <karolherbst🐧🦀> synchronization is defered until the next synch event
23:45fdobridge_: <karolherbst🐧🦀> so yeah..
23:45fdobridge_: <karolherbst🐧🦀> so you can issue a couple of instructions more until the warp synchronizes or something
23:45fdobridge_: <karolherbst🐧🦀> it's all pretty vague
23:46fdobridge_: <gfxstrand> I'm going to assume that I need at least 6 cycles between BAR and the next thing that observably touches memory
23:46fdobridge_: <gfxstrand> Like BAR then FADD should be fine, right?
23:46fdobridge_: <gfxstrand> One would think....
23:46fdobridge_: <karolherbst🐧🦀> no, it's a restriction on the instruction itself
23:46fdobridge_: <gfxstrand> So 6 cycles before ANYTHING?
23:46fdobridge_: <gfxstrand> Okay, I can do that.
23:46fdobridge_: <karolherbst🐧🦀> yeah
23:46fdobridge_: <gfxstrand> Does MemBar have anything like that?
23:46fdobridge_: <karolherbst🐧🦀> unless you use `.DEREF_BLOCKING` then this restriction doesn't exist anymore
23:46fdobridge_: <karolherbst🐧🦀> *DEFER
23:47fdobridge_: <gfxstrand> Yeah but who knows what that does
23:47fdobridge_: <karolherbst🐧🦀> (it's 5 on ampere btw)
23:48fdobridge_: <gfxstrand> kk
23:48fdobridge_: <karolherbst🐧🦀> `CCTL.C` needs 8, `CCTL.I` needs 11, `MEMBAR` needs 5, `DEPBAR` needs 4
23:48fdobridge_: <karolherbst🐧🦀> (on turing)
23:49fdobridge_: <karolherbst🐧🦀> I think that's all the relevant restrictions you might run into
23:49fdobridge_: <gfxstrand> kk