06:38sally: Hello, I am facing an issue with mesa 24.3.4-3 on RPi5, which firefox has some flakes or flickers when open some websites like x.com or paddle.com, I so a bug and I have added a post there, but I am not sure if those have the same issue or not https://gitlab.freedesktop.org/mesa/mesa/-/commit/68db5481f4ec080de462aaf5b2c0b49db28e9ae9
06:56soreau: in the meantime, maybe try `gfx.canvas.accelerated = false` in firefox about:config
08:33fufexan: also limitation of what? drm? kernel? drivers? hardware?
10:02stsquad: digetx: did you manage to replicate the VK_KHR_Display failure?
10:29lucaceresoli: mripard: hi, I'm back on the DRM hotplug work, dropped all the panel_bridge patches and went back to the drawing board to give the drm_bridge.destroy callback a fresh restart
10:30lucaceresoli: mripard: however I'm still trying to make the idea work, and there is something I don't grasp
10:31lucaceresoli: mripard: back in https://lore.kernel.org/dri-devel/ksxomce6vddld7vikzyjd55babho63vj6ej5vrsiwfp2tid6yu@xfpagqpata4v/ you suggested a release sequence (C'321 + B'21)
10:32lucaceresoli: mripard: for the panel_bridge the removal actions are, as I can understand from the code: drm_bridge_remove + dem_bridge_put -> am I missing something here?
10:34mripard: the removal actions of what component ? panel_bridge itself or a panel_bridge user?
10:34lucaceresoli: mripard: the remocal actions of a panel_bridge itself
10:36lucaceresoli: mripard: (for hot-unplug, we need to handle the case where the panel is removed but the panel_bridge user stays)
10:50lucaceresoli: mripard: another question is whether you prefer me to drop or keep the debugfs improvement patches from my next iteration
10:50mripard: lucaceresoli: then yeah, it looks like the correct sequence
10:51mripard: I'm not sure why you need to deal with panel_bridge from the get-go though, but I might miss something
10:51mripard: for debugfs, I'd send it as a separate series
10:52lucaceresoli: mripard: ah, so that means: drm_bridge_remove is B'3, drm_bridge_put is B'1, and so there is nothing left for the .destroy callback?
10:53lucaceresoli: mripard: for debugfs I might postpone then, because the two series will likely conflict and I don't have a real testing setup without the hotplug series
10:57lucaceresoli: mripard: the reason I struggle^W have to deal with panel_bridge is mainly devm_drm_of_get_bridge(). It might create a new panel_bridge or not. If it does, someone will have to call drm_bridge_remove for that panel_bridge, but the devm_drm_of_get_bridge() caller cannot know.
10:59mripard: but it's not to the caller to call drm_bridge_remove anyway
11:01lucaceresoli: mripard: who should do that then? If the panel is not going away, just the user is done with the bridge pointer?
11:01mripard: and devm_drm_of_get_bridge() calls devm_drm_panel_bridge_add(), which calls devm_drm_panel_bridge_add_typed(), which registers a devm action that will call drm_bridge_remove()
11:01mripard: so it's already dealt with?
11:02lucaceresoli: mripard: but the devm action will not trigger if the user is not being removed
11:03lucaceresoli: mripard: use case: the user driver is always present, the panel may disappear
11:03lucaceresoli: devm_drm_panel_bridge_add_typed() adds a devm action on the user struct device
11:05mripard: so the problem here is no longer "the lifetime of drm_bridge is broken" but "devm_drm_of_get_bridge() is fundamentally broken and should be replaced "
11:05mripard: you need both to deal with hotplug, but they are orthogonal
11:05lucaceresoli: mripard: I need both _what_?
11:06mripard: you need to fix both problems
11:06mripard: but since they are orthogonal, they don't need to be in the same series
11:07lucaceresoli: mripard: meaning I need "bridge refcounting" + "replace devm_drm_of_get_bridge"?
11:10lucaceresoli: mripard: about fixing devm_drm_of_get_bridge, I agree it is broken, but I don't see any obvious solution.
11:11lucaceresoli: (sorry in case the last message is duplicated, the matrix bridge does not seem to work very well)
11:15lucaceresoli: mripard: as I see it: some user needs a next_bridge pointer: if it already exists, the get it nd need to put it; if there is a panel they need to create it via *_panel_bridge_add() and later call *_panel_bridge_remove()
11:15lucaceresoli: mripard: that's what devm_drm_of_get_bridge() does, and drivers not calling it are doing the same manually, e.g. samsung-dsim.c
11:16mripard: lucaceresoli: the solution to devm_drm_of_get_bridge() is to come up with a new, safe, function, deprecate devm_drm_of_get_bridge() and call it a loss
11:16mripard: there's no other way really
11:17lucaceresoli: mripard: OK. Now what I'm missing is what the new functions should do. My brain failed to find a solution except the panels_always_create_panel_bridge one.
11:21fufexan: my bad, my client did an oopsie earlier. what I wanted to ask was: are we allowed to tear the cursor plane while the rest is vsyncd? windows and mac both manage to make the cursor not wait an additional vblank. and also is this a drm/kernel/drivers/hardware limitation?
11:22mripard: lucaceresoli: A) it should be drmm and take a reference to the bridge returned. B) when the final reference to panel_bridge is dropped, we call drm_bridge_remove
11:23mripard: lucaceresoli: that way you avoid the need for the caller to know if it has to call drm_bridge_remove or not, it'll just have to put the reference just like any other bridge
11:24lucaceresoli: mripard: but then this would violate the idea that first of all in C'3 we unpublish the bridge, i.e. drm_bridge_remove()
11:24mripard: B3 you mean?
11:25lucaceresoli: mripard: yeah, sorry: s/C'3/B'3/
11:25lucaceresoli: mripard: so that would mean unpublishing the bridge at the last put, possibly long after the actual B'
11:26mripard: yeah, maybe
11:28mripard: or maybe your solution to add a bridge to every panel is a good one too, but then we need to start the discussion about how to properly handle the panel lifetimes too, which is another can of worms
11:29mripard: I'm not sure it's something we can discuss over IRC, really. Once we have the allocation / reference API mostly figured out, send a proposal for this that works, and we'll go from there
11:29lucaceresoli: mripard: what about a conterpart to devm_drm_of_get_bridge() [not sure about the name, maybe devm_drm_of_unget_bridge()?] that does: `if (is_panel_bridge()) {drm_anel_bridge_remove()} else {drm_bridge_put()}`
11:31lucaceresoli: mripard: and then the devm_drm_of_get_bridge() kerneldoc will say "to dispose of this bridge poitner before removal, call devm_drm_of_unget_bridge()"
11:32lucaceresoli: mripard: that's basically what I have implemented in this (horrible) patch for samsung-dsim (https://mensuel.framapad.org/p/samsung-dsim-out_bridge-disposal-acub?lang=en), but moved to a function in panel.c
11:43lucaceresoli: mripard: ah, no, such a function would not be a totally correct solution. It can know whether the bridge is a panel bridge, not if it was created by the caller
11:44MrCooper: fufexan: it's an atomic KMS API limitation
11:46MrCooper: many drivers are handling the cursor asynchronously with the legacy cursor ioctl though
11:47lucaceresoli: mripard: however it might work if it used a _nowarn version of devm_release_action(): IOW instead of `if (is_panel_bridge())` it would do `if (there is a specific devm action *added by this specific device*)`
12:12mripard: lucaceresoli: devm_drm_of_get_bridge() just cannot work, so any solution that involves it is not the right path
12:13lucaceresoli: mripard: so what should a driver looking for "a pointer to the next bridge" do?
12:14lucaceresoli: mripard: especially in the case the next bridge is actually a panel, and the panel bridge for it is not yet eisting?
12:14lucaceresoli: mripard: open code just like in samsung_dsim_host_attach?
12:26mripard: "the solution to devm_drm_of_get_bridge() is to come up with a new, safe, function, deprecate devm_drm_of_get_bridge() and call it a loss"
12:27sima: more refcounting maybe too, but it's been a while I smacked my head against this particular wall
12:30lucaceresoli: mripard: sure, I've been trying to imagine the code for that new, safe function, but I'm always circling around the same ideas; any dirty pseudocode proposal would be very welcome and help towards a pragmatic solution
12:31lucaceresoli: mripard: do you have in mind a function "just returning a drm_bridge*"?
12:32lucaceresoli: mripard: creating a panel_bridge on the fly if a panel is found, just like devm_drm_of_get_bridge()?
12:33mripard: lucaceresoli: we already discussed this today
12:34mripard: but also, I stand by what I also said earlier, I still think it's too early to discuss it at this point
12:34mripard: and I'm sorry, but I won't have time to do that function myself
12:40lucaceresoli: mripard: I'll be glad to implement it if I have an idea of how it may be done. But other than creating a panel_bridge on the fly, I don't have ideas of my own.
12:47mripard: then do that?
12:47mripard: (and I did suggest an alternative at 12:28)
12:54fufexan: MrCooper: can you mix legacy and atomic KMS calls though?
12:55zamundaaa[m]: fufexan: you can, but I would strongly recommend against it
12:56zamundaaa[m]: If you use anything from atomic that legacy can't do, you quickly run into problems
13:07fufexan: zamundaaa[m]: well then how do you recommend to solve this problem? we want the cursor updates to be done independently, just like windows and mac can do
13:10zamundaaa[m]: fufexan: the only way to fully fix it is to change the atomic API
13:11zamundaaa[m]: You can do a lot in userspace too though, by committing only right before vblank
13:32fufexan: zamundaaa[m]: is there even a reliable way to do it though, other than guesswork, timing threads, and praying?
13:34zamundaaa[m]: No, it's the same as any other time critical task
13:39fufexan: alright, thank you!
14:38MrCooper: zamundaaa[m]: since drmModeMoveCursor only affects the cursor plane position, not really seeing the big danger
14:38zamundaaa[m]: Cursor plane position, and enabling/disabling the cursor plane are dangerous
14:39zamundaaa[m]: Hardware rotation on AMD doesn't like the cursor plane being enabled for example
14:39MrCooper: it can move the cursor even outside of vblank, resulting in ~1 refresh cycle lower average latency compared to the best case with atomic API
14:40MrCooper: zamundaaa[m]: my mutter MR uses only drmModeMoveCursor, everything else still goes through the normal atomic commit machinery
14:41MrCooper: even cursor moves fall back to the latter if drmModeMoveCursor fails
14:41llyyr: MrCooper: I was mixing legacy and atomic API for about a year because of the amdgpu cursor plane updates issue, and it worked fine until recently when they added the overlay cursor mode. since then, mixing them up completely breaks direct scanout.
14:42MrCooper: breaks direct scanout how exactly?
14:42llyyr: garbled output
14:42llyyr: green artifacts all over your screen if the cursor is visible
14:43llyyr: Issue I was working around (fixed a while ago): https://gitlab.freedesktop.org/drm/amd/-/issues/2186 and the commit that breaks mixing up legacy/atomic api https://github.com/torvalds/linux/commit/1b04dcca4fb10dd3834893a60de74edd99f2bfaf
14:44MrCooper: which legacy API were you using exactly? Haven't hit any such issues FWIW
14:44MrCooper: sounds like driver bugs anyway
14:45llyyr: MrCooper: I posted the patch I was using for myself in the tracker https://gitlab.freedesktop.org/drm/amd/-/issues/2186#note_1982978
14:45llyyr: I guess it's a YMMV thing
14:46MrCooper: that uses more than just drmModeMoveCursor
14:47MrCooper: that indeed has more potential for issues
14:47llyyr: Yeah true, I could try just using drmModeMoveCursor and see if that has issues as well or not
14:50MrCooper: anyway, the current atomic API is just unusable for low latency cursor movement
14:50llyyr: probably better to improve it on the kernel side in any case than mixing up atomic/legacy
14:52MrCooper: not so obvious to me — either it just emulates drmModeMoveCursor, in which case what's the difference?, or it might open up a can of worms in the form of lots of new corner cases
14:53llyyr: could you link your mutter MR?
14:53MrCooper: https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/4249
14:57llyyr: IMO even if it just emulates drmModeMoveCursor underneath it's probably still better purely for ease of use downstream, but I'm not familiar enough to say if there's any technical reasons for/against it.
15:00MrCooper: FWIW, my mutter MR calls drmModeMoveCursor only from the KMS thread, i.e. it can't run concurrently with drmModeAtomicCommit
15:15digetx: stsquad: it doesn't work with your rootfs, while works with mine arch-based one
20:15fufexan: hey, is it possible to query the scanline feedback for a crtc via drm? as in get where the display currently is in its scanning out of a buffer