IRC Logs of #radeon on irc.freenode.net for 2024-08-02

08:00 MrCooper: Venemo: that radeonsi killing the process just because one GL context is unusable is bad should be common sense not needing any explanation
08:05 emersion: the tone of this discussion isn't constructive
08:18 kode54: MrCooper: why do you keep arguing this, please drop it
08:19 kode54: the app cannot know its context is unusable
08:19 kode54: it will never know
08:19 kode54: it will just keep trying to render to it
08:19 kode54: is an unresponsive app that just has a garbage window better than a terminated app?
08:20 MrCooper: "please drop it" followed by rehashing old arguments...
08:21 kode54: fine
08:21 kode54: I'll stop arguing sensibly, because you'll never see it as sensible
08:21 emersion: that's enough
09:57 Venemo: well, I guess it's good to know that I lack common sense because I don't understand why this idea is "common sense not needing any explanation"
10:01 kode54: hasn't this already been explained to death?
10:02 kode54: the renderer and all handles and pointers become invalid. the only way for an app to know they're invalid is to be notified. the only API for notifying an app is the robustness interface. or simply crashing the app.
10:02 kode54: or you could just let the app keep poking those handles and pointers and see what happens
10:04 kode54: maybe the other way the app could know is that all of its graphics pointers suddenly SIGSEGV
10:05 kode54: but either way, the app needs to be changed to know how to handle this
10:05 soreau: kode54: is there not a way to send an event through a wayland protocol or so?
10:05 kode54: unless maybe you suggest some sort of restore mechanism that re-allocates every resource the app originally used, at the same exact addresses and handles, and restores all the data that was lost due to the VRAM becoming invalid
10:06 kode54: there probably is a way to do that
10:06 kode54: but it still requires the apps to implement that event
10:06 soreau: well they can or choose to suffer the consequences
10:06 soreau: if toolkits do it, that's like some of the apps
10:07 kode54: true
10:07 kode54: toolkits can already implement the current API
10:07 kode54: whatever the mechanism is, the apps and/or whatever they use to poke at hardware has to implement it
10:07 soreau: perhaps with an officiating protocol, people will be more apt to use it
10:08 kode54: I mean, I suppose a protocol is needed anyway, to make it more generic
10:08 kode54: since this robustness api is specific to mesa
10:08 kode54: who knows what nvidia does
10:09 emersion: there is nothing specific to mesa in the robustness APIs in Vulkan/GL
10:09 kode54: got it
10:09 kode54: so I don't know what the hangup about implementing it is
10:09 kode54: other than xwayland, which is a mess
10:10 kode54: xwayland would still require every client app to implement something as well, or else they individually crash
10:11 kode54: btw
10:11 kode54: emersion: swaybg doesn't seem to survive a GPU reset here, but I don't know if that's the compositor's fault
10:11 kode54: or at least, it loses its connections to the outputs
10:11 emersion: swaybg doesn't use Vulkan/GL
10:11 kode54: yeah, so it's something else
10:12 kode54: I'll try WAYLAND_DEBUGing it later
10:12 emersion: swaybg submits shm buffers, the compositors uploads them to the GPU then releases the shm buffer
10:12 kode54: maybe tomorrow night
10:12 emersion: upon GPU reset, the swaybg buffers are lost
10:12 kode54: oh, the compositor is losing its background buffers
10:13 emersion: https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/180
10:13 kode54: got it, so minor consequence of reset, not sure what the best fix would be
10:13 emersion: ^ this protocol is required to fix that case
10:14 kode54: got it
10:36 MrCooper: kode54: the process may use other GL contexts (even using radeonsi) which still work fine, and it may have functionality which isn't related to GL at all, in which case killing it may result in data loss
10:37 kode54: what other GL contexts?
10:37 kode54: software renderer?
10:38 MrCooper: other GPUs
10:38 kode54: so the 5 people who have two GPUs
10:39 kode54: either the display GPU resets, or the rendering one does
10:39 kode54: either case, the app's rendering is hosed
10:39 MrCooper: there can be more than two GPUs in a system
10:39 kode54: I'll accept the integrated and dedicated option
10:40 kode54: 99% of people have exactly one x16 slot
10:40 kode54: so one of the contexts is ripped away
10:40 kode54: the app won't know
10:41 kode54: it will start poking at memory addresses that are suddenly inaccessible, causing a SIGSEGV
10:45 MrCooper: that would be a driver issue, it has to be able to handle this safely even with robustness until the context is destroyed
10:46 pixelcluster: MrCooper: "killing it may result in data loss" <- the data loss already happened, that's why we're in that situation in the first place
10:46 MrCooper: one kind of data loss doesn't justify another one
10:49 pixelcluster: in general yes, although in this specific instance I disagree (and it's the same thing we've been arguing around the entire time: yes this is a tradeoff. yes nop-ing/allowing contexts to live can have advantages, such as having the data loss be slightly less catastrophic than it might otherwise be. but it also has drawbacks which I consider to be bad enough that the tradeoff shouldn't be done)
10:50 pixelcluster: I don't fundamentally disagree with your views, what you and I disagree with is which side of the tradeoff is more important
10:51 soreau: no way to make it an option? :P
10:51 soreau: default to the way things are currently(?) with opt-in to do extra stuff
10:51 kode54: just need to adjust the kernel memory maps so that the entire GPU context is blackholed to memory that does nothing
10:52 kode54: the app can sit there with a garbage window until restarted
10:53 kode54: hope you can see your way to the save option with no visual interface
10:54 soreau: kode54: would be nice if mesa could fallback to swrast in that case
10:54 kode54: and how does mesa get the swrast into the app?
10:54 MrCooper: in a scenario where it's acceptable (say a Steam Deck), there could be some kind of watchdog which kills such apps
10:54 kode54: should mesa be shadowing every GPU allocation and just silently run swrast in the background the whole time?
10:55 kode54: and keep a table of handles that map to either GPU or swrast
10:55 soreau: idk but with latest mesa, it feels like wayfire blur is running entirely on cpu
10:56 soreau: radeontop doesn't fluctuate much but wayfire goes to 100% cpu with modest blur settings
10:56 kode54: there's the true solution
10:56 kode54: a virtual GPU that can silently map to any real GPU or swrast
10:57 soreau: a shim!
10:57 kode54: all memory handles are in system memory
10:57 soreau: **pointer->plus_plus;
10:57 kode54: !bail
12:33 Venemo: kode54: sorry for asking, maybe it was discussed before but the idea is new to me.
13:23 vedranm: orbea: has your bug just made the front page of Phoronix? https://www.phoronix.com/news/AMD-HDMI-Audio-Fix-Linux-6.11
13:24 vedranm:is inspired by Takashi Iwai's guessing of where the issue lies and what the fix could be
13:25 orbea: vedranm: yes :P
13:37 vedranm: orbea: congrats
13:39 orbea: Takashi did most of the hard work, i just tested, still nice its fixed
14:13 MrCooper: pixelcluster: though even with a game, doesn't just killing it risk losing progress which might otherwise be possible to save?
14:46 Venemo: how do you save something in a graphical application when its graphical ui is frozen? you can't click save in your game when the gpu hung because its ui is frozen
14:55 MrCooper: keyboard shortcut?
14:56 MrCooper: of course it won't always be possible anyway, at least it's more than 0% chance though
14:56 Venemo: sounds like a long shot, but of course if you have the time to test it, I'd be curious if you can find any game in which the shortcuts work after a hang
15:00 MrCooper: why wouldn't they work? The premise is that the app has no idea there's any problem with the GPU
15:01 Venemo: I haven't seen any but of course that doesn't mean there isn't any. Which is why I say, I'd be curious to see which ones, if any, work like that
15:02 MrCooper: sounds like a different kind of hang
15:03 Venemo: I personally haven't yet seen any graphical app which could do anything at all when its graphics was frozen, at least. Not saying it isn't possible, but would be interesting to find one
15:04 MrCooper: sounds like the GPU reset failed
15:05 MrCooper: or some driver bug related to it
15:06 Venemo: well, I don't remember any game with shortcuts off-hand to try with, but would be interesting to see, if someone has the time
15:06 MrCooper: if such a hang was inevitable, robust apps couldn't recover either