08:22 colo: are the fans on modern Radeon GPU boards usually PWM? (there's a 4-pin connector in use on this Sapphire RX 6600 I have where fans need replacement)
10:27 kode54: um
10:27 kode54: Venemo: if I were to try bisecting that issue
10:28 kode54: should I try from head down to the 24.2 branchpoint?
10:28 Venemo: kode54: yes, I think so.
10:28 kode54: good, thanks
10:28 Venemo: assuming that it used to work at the branch point
10:35 soreau: kode54: the app crashes in the bad case? possible candidate for `git bisect run`
10:36 kode54: I still have to install the radv driver for it
10:36 kode54: I already have a working bisect build setup for mesa
10:36 kode54: how does `git bisect run` work, though?
10:37 soreau: basically, you feed it a script and the return code tells it good, bad or skip
10:37 kode54: aha
10:37 soreau: so it can automate the process
10:37 kode54: the manual steps here aren't terribly time consuming, but I'll think about it for the future
10:38 kode54: the time consuming part is the builds
10:38 kode54: though I am using ccache
10:38 soreau: yea, it's not really worth writing a reliable test script unless the bisect has a ton of steps
10:38 soreau: but sometimes, when you can, it works very nicely
10:39 kode54: bisect said roughly 11 steps
10:39 soreau: kode54: the run script would build it
10:39 kode54: I build it with mesa-tkg-git
10:40 kode54: which builds it into a package with the package config I use to otherwise run a git build
10:40 kode54: I just need to edit the commit hash into a config file
10:40 kode54: I do this with two separate git trees
10:40 soreau: right, you would put all the required commands to repro each step, in the script, including build/install/run
10:40 kode54: mesa-tkg-git has a bare repo, I cloned the bare repo to a work dir and set its remote address to match the origin
10:40 kode54: yup
10:41 kode54: the repro is only, install radv, run ghb, does it run or does it crash
10:41 kode54: first step: *crash*
10:41 soreau: so in this case, I guess you'd background the app and then sleep for it to crash and then try killing it, and maybe kill return code can tell you if the app existed or not
10:43 soreau: but when done right, you're sippin' coffee while everyone else has to babysit the run ;)
10:47 kode54: god dammit, rust build failure
10:47 kode54: how do I skip a step?
10:48 soreau: git bisect skip
10:49 kode54: thanks
10:52 kode54: hell with it, I'm disabling rusticl
10:52 kode54: I'm not even testing it
10:53 kode54: it really is too bad that so many broken commits made it into the history at this particular spot
10:57 kode54: nice build time
10:57 kode54: Executed in 268.70 secs fish external
10:57 kode54: usr time 46.52 mins 318.00 micros 46.52 mins
10:57 kode54: sys time 4.89 mins 462.00 micros 4.89 mins
10:58 kode54: heeeyyyy, a working build
10:58 kode54: git bisect good it is
10:58 kode54: well, I better verify it can encode
11:00 kode54: encodes
12:13 kode54: found the "bad" commit
12:13 kode54: this is so stupid
12:13 kode54: https://gitlab.freedesktop.org/mesa/mesa/-/commit/7db16e7cdd71d7cafaeca644325bda5ca81be072
12:13 kode54: it doesn't like radv that has enabled vulkan video encode/decode
12:40 Venemo: soreau: how would git bisect run work with something that hangs the gpu and requires you to log in again?
12:50 soreau: Venemo: no reboot? possibly scriptable! xD
12:52 soreau: if the gpu reset was successful, then you could run the bisect from a tty and launch the compositor then check if it crashed after the test?
12:56 DemiMarie: Will per-queue reset require user submission?
13:10 Venemo: soreau: possibly...
13:11 Venemo: DemiMarie: assuming you are referring to user queues; no per-queue reset doesn't require it, but the two work together better for sure
13:13 DemiMarie: Venemo: Better in what ways?
13:15 Venemo: when each context has its own user queue, it becomes possible to reset just 1 user queue without affecting other contexts. so when a game hangs your gpu, it can reset the user queue that belongs to the game's context, and it will go away without bringing down the rest of your system
13:16 DemiMarie: Why do per-context queues require user queue?
13:25 MrCooper: without user queues, there's one queue per HW engine, many of which are used by ~every HW accelerated apps, so a queue reset affects all of them
13:27 DemiMarie: MrCooper: why do per-context queues require submission to happen from userspace?
13:30 Venemo: DemiMarie: there is no such thing as per-context queue. do you mean a user queue?
13:32 Venemo: if you mean why do user queues need submissions to happen from userspace, Alex explained it some time ago in this chat. due to memory management issues, the userspace process needs to allocate the ring buffer.
13:32 DemiMarie: Which memory management issues?
13:32 Venemo: well, you could scroll up the chat
13:33 DemiMarie: Is this chat archived somewhere?
13:33 MrCooper: I want to say it was in an e-mail thread, not here
13:33 DemiMarie: I can't see old chat messages, sorry
13:37 Venemo: https://people.freedesktop.org/~cbrill/dri-log/?channel=radeon&highlight_names=&date=2024-08-27
13:38 Venemo: the reply to your question is: "The UMD has to allocate the memory for the queue and it's metadata, since it has to be in the user's GPU address space"
13:38 Venemo: on a different note, a benefit of doing the submit from userspace is that there is supposed to be less overhead
16:35 Venemo: DemiMarie: did that answer your question? are you curious about anything else?
16:39 DemiMarie: Venemo: It did, thanks!
16:42 Venemo: are you looking to contribute or just asking out of curiosity?
18:21 DemiMarie: Curiosity and research. I'm working on GPU acceleration for Qubes OS, and while a guest being able to hang the GPU isn't a security vulnerability in and of itself, it isn't great for user experience. Also, it's hard to tell if a GPU hang is because of something that might be exploitable, such as memory corruption in the firmware.
23:32 Venemo: DemiMarie: I see. By any chance did we meet at XDC? I remember there was someone there who was really enthusiastic about Qubes OS