01:59 fdobridge: <i​shitatsuyuki> Was looking into implementing split barriers (with layout transitions) on RADV but immediately hit a roadblock. The problem is like this:
01:59 fdobridge: <i​shitatsuyuki> 1. We need to wait for the pre-transition execution barrier.
01:59 fdobridge: <i​shitatsuyuki> 2. We then perform the transition and wait for it to complete (+fuse with post-transition execution barrier).
01:59 fdobridge: <i​shitatsuyuki> Here 1. and 2. are both blocking operations, and you can only defer one blocking operation at once (i.e. Promise.then() style stuff is not possible in hardware). As far as I've examined nouveau stuff the command processor seems to work in a similar way wrt semaphores. Any idea if nouveau will also hit this problem, or NV hardware has some magic to handle this?
20:24 cobbler: Hey, anyone got any idea why load on my GPU completely freezes my system?
20:35 apteryx: cobbler: did you look at https://nouveau.freedesktop.org/HangDiagnosis.html ?
20:46 apteryx: also perhaps browsing the existing known issues at https://gitlab.freedesktop.org/mesa/mesa/-/issues?label_name%5B%5D=nouveau
20:50 cobbler: no sysrq keys work at all after the hang, and i couldn't find any issues similar to mine on the gitlab
20:51 cobbler: http://0x0.st/HNJT.txt
20:51 cobbler: dmesg output
21:02 fdobridge: <g​fxstrand> How's TK1 support looking kernel-side?
21:12 fdobridge: <k​arolherbst🐧🦀> mhhh... okayish?
21:32 fdobridge: <m​arysaka> You have a TK1 around? :nya_peek:
21:33 fdobridge: <m​arysaka> If you need any testing/debugging on TX1 I have quite a lot of them ^^'
21:39 cobbler: so, can anyone help me with this? by the way, it's an eGPU
21:47 karolherbst: cobbler: eGPU? mhh I can imagine something being odd with the controller or some workaround we didn't add yet
21:47 karolherbst: cobbler: is this running coreboot?
21:47 cobbler: libreboot yeah
21:48 karolherbst: kinda looks like the GPU is crashing
21:48 cobbler: weird, it works completely fine with proprietary nvidia drivers
21:49 karolherbst: yeah.. but some PCIe controllers are funky and nvidia has tons of workarounds
21:49 cobbler: hm
21:50 karolherbst: could also be d3cold/d3hot doing weird things
21:50 karolherbst: cobbler: mind booting with `nouveau.runpm=0`?
21:50 cobbler: sure, gimme a sec
21:50 karolherbst: I doubt it changes things, but I also don't want this to be the issue and we didn't try it
21:51 karolherbst: but it shouldn't change things, because it's already crashing at driver init
21:51 karolherbst: but who knows.. that "timer: stalled at ffffffffffffffff" basically means the GPU is not accessible anymore
21:52 karolherbst: this line is also kinda weird: "nouveau 0000:04:00.0: bus: MMIO read of 00000000 FAULT at 3e6684 [ PRIVRING ]"
21:52 fdobridge: <g​fxstrand> Have you tried NVK out on them at all? In theory it might work but IDK.
21:54 fdobridge: <m​arysaka> I guess I know what to do about those boards again now 😄
21:54 karolherbst: cobbler: it's also kinda funky that the GPU loads twice
21:55 karolherbst: anyway.. GPU hot unplug is totally not supported and drivers are kinda crashing on that atm
21:56 karolherbst: cobbler: but anyway.. we don't have the bandwidth to support eGPUs at the moment, because that does require some bigger driver changes
22:02 cobbler: welp
22:03 karolherbst: cobbler: seems like you didn't get my last messages
22:03 karolherbst: anyway.. eGPUs are not supported as we lack the bandwidth to actually fix all those issues
22:03 cobbler: i'm joined on matrix so i saw them
22:03 cobbler: hence the "welp"
22:03 karolherbst: ahh
22:04 karolherbst: it might also be that the controller is a bit funky and reconnects the device once on boot or whatever
22:04 karolherbst: it's already weird enough that nouveau loads the device twice
22:04 cobbler: this is what i get for buying a random gpu off of ebay huh
22:04 cobbler: i tried to get another dmesg but alas the file is empty
22:04 karolherbst: ehh. the GPU might be fine, just eGPU cases and thunderbolt is.. funky
22:04 cobbler: oh you thought i was using thunderbolt? no no no, this is expresscard
22:05 karolherbst: oooh
22:05 karolherbst: oof
22:05 cobbler: yep :|
22:05 fdobridge: <M​ohamexiety> that's a really old eGPU...
22:05 karolherbst: well.. technically it's the same thing
22:06 karolherbst: mhhh
22:06 karolherbst: something weird with interrupts is going on I think
22:06 cobbler: by the way, on that last boot, there was no strange reconnection
22:06 karolherbst: also "[ 193.713217] DMAR: [DMA Read NO_PASID] Request device [04:00.0] fault addr 0x0 [fault reason 0x06] PTE Read access is not set" is probably not helping
22:07 karolherbst: yeah.. so the second load in the log looks more reasonable
22:07 karolherbst: just that the firmware loading process fails as we don't really get an answer back or something
22:07 karolherbst: also command submission totally doesn't seem to work
22:11 cobbler: here's dmesg from another boot if this helps at all: http://0x0.st/HNJd.txt
22:12 karolherbst: cobbler: do you know if that GPU works properly in a normal desktop?
22:13 cobbler: there is a slim chance i'm able to check
22:13 cobbler: only desktop computer i have is mega old
22:13 karolherbst: well.. as long as it has a PCIe slot it should be good enough
22:13 cobbler: i'll go open it up
22:14 karolherbst: I put 20 years old GPUs in my 2 year old motherboard, so the reverse should also work, no? :D
22:30 fdobridge: <B​yLaws> Yeah I think it's the same on NV
22:30 fdobridge: <B​yLaws> At least maxwell
22:30 fdobridge: <B​yLaws> You can signal a sema async and then wait for it
22:31 fdobridge: <B​yLaws> You can signal a sema async and then acquire it (edited)
22:32 fdobridge: <B​yLaws> You can signal a sema async and then acquire it but that wouldn't really help here (edited)
22:36 fdobridge: <B​yLaws> You can signal a sema async and then acquire it which half gets you there but not really (edited)
23:23 cobbler: couldn't get the desktop to boot :/
23:38 karolherbst: cobbler: mhhh.. sad
23:39 karolherbst: what's the problem?
23:39 cobbler: no idea