00:17 Company: people just need to learn that there's 3 different names for each gpu: kernel driver, GL driver, Vulkan driver
00:17 Company: and those names are creatively chosen, not because they make sense
00:22 Company: but I can confirm from Gnome that people who aren't aware of this get very confused by those names
00:22 airlied: in some cases they made sense when they were chosen, but the meanings have shifted
00:23 alyssa: for us, kernel=gl=asahi, vk=honeykrisp which.. could be worse
00:23 Company: also, Gnome lost its creativity because we named everything gnome-thing
00:24 feaneron: you can't say that with a straight face
00:24 Company: which worked for a few years but now that we want to replace older apps with newer ones, we don't have names available anympre
00:24 feaneron:named his app Boatswain, and all types start with Bs
00:25 feaneron: BsWindow, BsStreamDeck, it's all bs
00:26 Company: i'm still salty that it's called gnome-text-editor because typing that in a terminal takes way too long
00:27 Company: and autocomplete doesn't work either, because everything starts with "gnome-"
00:28 zf: KDE had the right idea :-)
00:45 alyssa: Company: rip gedit
00:46 alyssa: wait is gedit not gnome-text-edtior
00:52 feaneron: it is not
00:52 mattst88: nope, separate thing
00:52 mattst88: gnome-text-editor is the new thing
00:54 alyssa: joy
01:01 Company: maintainer issues caused a fork
01:01 Company: so gedit still exists
01:06 Company: bunch of projects had some reckoning when the gtk4 transition happened, both because gtk4 nudged very hard towards design changes, from menu + toolbar towards "touch" design
01:06 Company: and because backends that weren't reasonably clean and operating under an X model suddenly had to deal with a toolkit that didn't bent over backwards to make that work on Wayland
01:07 Company: so you can no longer do updates via XCopyArea()
01:08 pac85: I thought gnome-text-editor was done from scratch, it looks very different than gedit (at least how I remember it)
01:09 Company: not sure - but 90% of it is GtkSourceView and that remained a thing
01:10 Company: it's either gedit and deleting all the plugin stuff or redone from scratch around the GtkSourceView port to GTK4
01:12 Company: https://gitlab.gnome.org/GNOME/gnome-text-editor/-/commit/bdf472712b995ec737a4913ac2f57cf89bf7bc4a
01:13 Company: it's a case of "what do I do now that there's a lockdown?"
01:24 alyssa: i mean go back far enough and krita is a gimp fork right? ;P
01:33 DemiMarie: alyssa: why is the userspace code not named Asahi too?
01:34 DemiMarie: Company: Yup, GTK4 very very much pushes one towards a certain design style, which makes porting some applications almost impossible. I have no idea how I would port Horizon EDA.
02:21 Company: DemiMarie: there's 2 answers to that. One of them is to find some modern UI designers to take on that topic, and change the parts that don't fit well anymore
02:21 DemiMarie: Company: fit well with *what*?
02:21 Company: DemiMarie: and the other option is to do an alternative/companion to libadwaita that focuses on that older style of application design
02:22 DemiMarie: Is the existing UI objectively bad in some way, even for applications that do not need touch support?
02:23 Company: none of that design is about touch support, that's just how people call it
02:23 DemiMarie: Is there scientific evidence that the newer design is objectively (as opposed to subjectively) superior?
02:23 Company: I have no idea
02:24 Company: though I'd be interested how anyone would quantify "better" for design
02:24 DemiMarie: “How long does X take?” studies would be the most obvious one I can think of, but there are people (not me) who actually do work on this stuff.
02:24 Company: but people like developing apps this way, so that's what's happening
02:25 DemiMarie: Is GTK no longer intended to be used without libadwaita or another platform library?
02:26 Company: I always compare it to the web for the answer
02:26 Company: can you make a webpage without some framework? Sure. Should you? Probably not.
02:27 Company: GTK is trying to push the widgets that imply some sort of UI design out of the platform
02:28 Company: and focus on the core building blocks
02:29 Company: but that leaves you without the base widgets that make up the UI - sidebars, headerbars, toolbars, statusbars, etc (why are those all bars?)
02:30 Company: and you also have no design language, ie no consistent spacing, no good color/contrast choices, all the themeing stuff is missing
02:31 Company: and that's basically what a framework/platform lib gives you
02:34 Company: if someone made a library with a toolbar widget, a menu and some MDI docking widget, so that you could implement Gimp's and Inkscape's UI with it, then you could port those and the whole Cinnamon/Mate apps to it and you could probably find a bunch more
02:34 Company: gedit!
02:34 Company: but you need to find someone who wants to create that library, and there's been a distinct lack of interest for years
02:35 DemiMarie: At least some are just leaving GTK instead.
02:39 Company: that's also an option - depends on how much UI you have and how well other stuff fits
02:41 Company: all I know is that the Gnome community is not gonna make it happen
03:00 Lynne: is there some database of the throughput of GPUs in terms of 32/16/8-bit (non-matrix) integer ops?
06:35 justalowendguy: Did you miss me?
06:36 justalowendguy: I replaced battery finally on one of the macbooks.
06:37 justalowendguy: it had overscrewed screws, but with brand new battery it works rediculously well. And can finally work now, did set up some more equipment to my office.
06:42 justalowendguy: But i do the combinations tasks on phone a lot too, It really seems my theory is functional.
08:39 MrCooper: RAOF: my recommendation is to just signal the release point when the atomic commit completion event arrives, not materialize any fence before
08:40 MrCooper: zamundaaa[m]: even a client which prefers signalled release points might reasonably say "the release point has a fence, I can re-use this buffer, don't need to allocate another one"
09:03 emersion: sounds like a broken client
09:07 MrCooper: why? That's how I'd implement it for dynamic number of buffers
09:08 MrCooper: if there being a fence doesn't imply the buffer can be re-used, what does?
09:38 emersion: MrCooper: it's easy to wait for the timeline point to be signalled
09:38 emersion: as opposed to materialized
09:39 MrCooper: right, then there's no point materializing the release point from OUT_FENCE_PTR though?
09:41 MrCooper: materializing the relase point with a compositor GPU work fence and the client re-using that buffer does make sense though
09:43 emersion: MrCooper: hm, what's the difference between GPU work fence and KMS fence work? why does it make sense to use one but not the other?
09:43 emersion: i suppose because one will happen "sooner" than the other?
09:43 MrCooper: GPU fence is guaranteed to signal ASAP, OUT_FENCE_PTR might miss a display refresh cycle
09:44 MrCooper: and can't signal before the next cycle in the first place
09:44 emersion: right
09:44 lynxeye:seems to be equally confused as emersion and RAOF
09:44 emersion: i agree
09:44 emersion: now i'm kinda wondering why OUT_FENCE_PTR exists in the first place
09:44 lynxeye: What's the point of the out fence if you can use it to wait for the GPU (scanout) to be done with the buffer?
09:45 MrCooper: emersion: as a trap? ;)
09:45 emersion: :D
09:45 emersion: it would be useful for writeback, maybe
09:45 MrCooper: it could be used instead of completion events in some cases
09:46 MrCooper: e.g. when turning off a CRTC?
09:47 lynxeye: Wasn't the point of the fence that you could use it to pass back to whoever is waiting for the buffer to be free again instead of waiting for the atomic commit completion event and signaling that back to waiters?
09:49 franchiseeofhell: I only contriubuted this hack in words salad format. I,e you do not show this code online indeed, and YES i understand , it's perfectly fine, however i do not have a proper convoy or asylym given where i can safely work with that code, though i personally think, that Jews , Americans and Russians already have that code in the world of military and i gave them nothing, they have it
09:49 franchiseeofhell: mixed with self-timing. The code itself is very easy. I think China has it too, but other europe ones i do not know , they have all sorts of things there. I need that code to defend my life and save it. So give me some security jamming signal the least or asylym to work it out, it's very simple couple months work. And one more request i have, could someone look at the dma engines and
09:49 franchiseeofhell: clean them up better than atm? I see that karolherbst knows howto code well, and there is no such problem, i can see that 2011 intel laptop works also seriously good, any troubles i no longer do not have. If you say i might leak something , i seriously try not to, but i suspect i am under a huge number of taps. I need reliable signal and area to work at, every one knows where i stay,
09:49 franchiseeofhell: then i suppose just arrange it. Please can some look and bring in sane dma engines API bits also, cheers.
09:50 MrCooper: lynxeye: if it was, it was ill-conceived
09:57 sima: MrCooper, emersion lynxeye so for some tiling gpus it actually makes sense to start rendering while the flip is scheduled but hasn't happened yet
09:58 sima: when you're extremely limited on memory you can kinda get 3 buffers but only allocate 2 by max pipelining everything
09:58 sima: and with tilers you can do the tiling preprocessing while the buffer isn't needed yet
09:58 sima: android apparently does that for some platforms, which is why that out fence exists
09:59 sima: there's also an entire can of worms on the kernel side around that out fence breaking the kernel-internal rules to avoid dma_fence deadlocks
09:59 lynxeye: MrCooper: agreed. I mean the point of in and out fence FDs was to allow explicit fencing. But the out fence is useless for that, as scanout is potentially still using the FB after the out fence has signaled if no flip away from this FB is scheduled
09:59 sima: and I haven't found out a good way to fix it yet
09:59 sima: so, it's a bit a mess
10:00 sima: lynxeye, that sounds like compositor bug to me
10:01 lynxeye: sima: Huh? Unless your tile buffer covers the whole FB (which I guess is pretty unlikely) you can't start rendering even on a tiler until the buffer isn't scheduled for scanout anymore.
10:02 sima: lynxeye, yeah hence the fence, because that allows you to queue up the entire batch to the gpu already and the gpu to start with vertex shader and putting the vertices into the right buckets
10:03 sima: hwcomposer v1 went even further and made that out fence a full future fence iirc
10:04 sima: and just assuming that both surface flinger and the app would get around to scheduling the flip
10:04 lynxeye: sima: Okay, so OUT_FENCE_PTR is really just a footgun for everyone not reading the docs carefully in the general (non-android) case.
10:05 sima: yeah I think unless your use-case is "extremely memory limited machine where you realistically can only ever allocate 2 buffers and are willing to trade latency for throughput by pipeling everything as deeply as possible" you do not want it
10:05 sima: maybe we should add that to the property docs
10:06 sima: if someone volunteers to type this patch I could upfront r-b it?
10:17 lynxeye: sima: I guess I can add some words of caution there. I still don't see how signaling that fence without a follow-up flip queued does improve pipelining. I do see how making this fence wait for queued flips before signaling would add lots of state tracking to the kernel, as the fence is on a atomic commit level, not the much simpler FB level.
10:20 sima: lynxeye, hm maybe we're talking past each another still
10:20 sima: so you have two buffers, A currently being scanned out, B finished rendering, flip queued up
10:21 sima: now in the old days you'd need to stall the opengl stack until A becomes available
10:21 sima: with the out-fence userspace/app can start to queue up rendering and push it to the kernel
10:21 sima: and the gpu even start with vertex processing
10:22 deliveryguysad: That you know better than me, who is at my side and who is not, is not true, your secret intelligence parts, do not know my case better than me, i hacked all the monsters, even though they thought i was stupid. I now i have changeable ip addresses based vpn's so that llvm irc retaliate fart guy is correct technically , you can not distinguish vpn dynamic ip address as offshore
10:22 deliveryguysad: signalling or scam. It is not possible 160bit HMAC 128-bit aes, it does not matter what vpn it is vray mesh, openvpn, or wireguard, if you renew the ip lease dynamically for public ip, instagram nor facebook nor any site understands that i use vpn. So he is there and in many other subject spot on, but not on me.
10:22 sima: even while the flip is still queued up
10:22 deliveryguysad: it's sha1 two rounds of emersion client you can not handle either, dynamic session.
10:22 sima: and if you miss the next vblank then sure there's going to be a stall, but the assumption is that you simply do not have memory for buffer 3, so there's no way to avoid it
10:23 sima: also with a more modern app/driver stack like vk you could do this yourself mostly
10:24 sima: but this was designed back when command parsing and relocations in the kernel was all the rave, and gl drivers where a lot dumber
10:24 sima: so queuing up in userspace means you still had to do that kernel work when the buffer was finally available
10:24 lynxeye: sima: Yea, I guess it makes sense for the case where you expect rendering and flips being busy. It's just that it doesn't make a lot of sense to signal that fence when there is no B flip queued up, which is what causes the footgun trap.
10:25 sima: lynxeye, yeah that's just compositor being broken
10:26 sima: but that's the same for gpu compositing, if you just hand back a random out-fence for a buffer that has no relationship with when the buffer isn't in use anymore, then yeah that's a bug
10:27 sima: like gpu compositing might also need to later on recomposite, and if you've already signalled the out_fence for the only current buffer you have from the app, you're broken
10:27 sima: exact same thing applies to kms, except with kms direct plane scanout the recompositing is guaranteed
10:28 lynxeye: sima: It means the compositor can't implement a fenced release of the buffer by using the KSM out fence. With GPU rendering you can do that: if you know the client has a new buffer lined up for the next time you render, you can use the GPU render fence to pass back to the client for fenced release. If you do the same with the KMS out fence you might miss the vblank for scheduling the next flip and the fence signals too early, so
10:28 lynxeye: ompositor/client can't use that for fenced release.
10:28 sima: oh I forgot: on some android they actually used the manual scanout buffer to hold the frame, so they could signal the buffer much earlier
10:28 sima: but that's again future fence semantcis
10:29 sima: lynxeye, you only get the out_fence when you've submitted the flip to the kernel
10:29 sima: at which point if you miss the vblank your entire screen eats the miss and there's nothing you can do
10:29 sima: if you try to hand the fence back earlier it's a future fence, with all the perils that entails
10:30 MrCooper: lynxeye: the fence doesn't signal before the atomic commit completes
10:30 sima: and if you drop a frame because it's not ready but still hand out the out_fence for that flip back to the app, you're just broken
10:31 lynxeye: sima: You get the out fence when you submit the flip _to_ this buffer and it will signal when the buffer has been scanned out once. What you want for fenced release is a fence that singals when a next flip _away_ from that buffer is scheduled and scanout is done.
10:31 sima: lynxeye, yeah that's just busted
10:31 MrCooper: which is indeed what RAOF's plan was presumably, it's still problematic for the reasons I describe
10:32 MrCooper: *described
10:32 sima: and unless you're ok with trading latency for more throughput with deeper pipelining, you shouldn't even do that 2nd approach, like MrCooper explained
10:32 sima: but the first one you've described is just plain broken
10:33 sima: and it would also be broken on the gpu compositor path, if your compositor ever needs that frame again for a recomposition
10:34 sima: unless you go with the X11 school of "surely background color is an acceptable fallback to a damage event"
10:37 lynxeye: sima: Again, it works for the render composition path if you know the client has lined up a new buffer for the next composition cycle, as the next render composition will use the new buffer for sure. So you can pass the render fence to the client for fenced release (which is what weston does today). With the KMS fence that's not possible, even if you have a flip to another buffer scheduled it might still miss the vblank and you e
10:37 lynxeye: with scanout reusing the buffer after the out fence has signaled.
10:38 sima: lynxeye, how?
10:38 sima: like buffer A is currently being scanned out, B is queued up with a kms flip
10:38 sima: you tell the app that it can render into A as soon as the out_fence for B has signalled
10:38 sima: the kernel misses a vblank
10:39 MrCooper: lynxeye: again, the fence doesn't signal before the commit completes
10:39 sima: the out_fence is delayed appropriately, it will not signal on the next vblank, but the next vblank after the flip actually happened
10:39 MrCooper: so what you describe can't happen
10:39 sima: so how does the app manage to render into A while it's still being scanned out?
10:39 sima: the docs should be really clear on this, if they're not we need to fix them
10:40 MrCooper: lynxeye: the commit missing a refresh cycle is a problem in itself
10:40 sima: but that's not a "latency/throughput tradeoff" but a "it's just broken" thing
10:41 oftckindones: now, you perhaps wanted to signal me, that i was abusive to david heidelberg, which i know i was firing words at you all, i have absolute memory, do not bother, maybe that i would get some in the ass, do not bother, maybe that its time to change my identity, yeah i know , do not worry, i know everything about my case, however i do not know why you changed the name to female ones, like
10:41 oftckindones: sima, emma, and whatever the gfxstrand is , cause that is your internal life, i hope that i did not cause the like for the earlier reasons or for some other reason.
10:42 lynxeye: sima: Now I'm confused again. Why would the app want to wait until the out fence for B signals if it want's to render into A? That's a full scanout cycle of latency. Surely it would want to start rendering into A as soon as the display engine has flipped away from A?
10:43 sima: lynxeye, that's what will happen
10:43 sima: where do you see an additional vblank happening?
10:43 sima: the out_fence for B signals the moment the hw stops scanning out A and starts scanning out B
10:44 sima: it does _not_ signal when the hw has finishes scanning out B for the first time
10:44 lynxeye: argh, seems I read the docs wrong _again_
10:44 sima: it's an out_fence for the flip itself, not for buffer B
10:44 sima: same applies to gpu compositing
10:45 sima: you don't need to wait with the out_fence for buffer A until you've finished rendering with B
10:45 sima: all you need is wait until you've committed to rendering with B and don't need A anymore
10:45 sima: so the out_fence for A would be whatever the end-of-batch fence for the last render job that used A is
10:45 sima: not the end-of-batch fence of the first render job that uses B
10:46 sima: otherwise you have a notch too much latency in your signalling
10:46 sima: exact same story with kms
10:49 lynxeye: right, so it's totally fine to plug the out fence from commit with buffer B into the fenced release for buffer A.
10:54 MrCooper: it works correctly, so it's "fine" if you don't care about the issues I described
11:04 lynxeye: MrCooper: But isn't this a policy decision you would want the client to make? If it doesn't want to allocate more buffers for any reason, it can pick a buffer with a unsignaled release fence for the next rendering, potentially waiting for a missed vblank or whatever. If the client cares about avoiding that, it should pick a buffer with the fence already signaled or potentially allocate a new one. How is this different from a GPU
11:04 lynxeye: r fence received from the compositor?
11:05 MrCooper: how can the client know if it's a KMS fence or a GPU fence?
11:05 MrCooper: (warning, rhetorical question :)
11:06 MrCooper: if it can't, it can't make that choice
11:07 lynxeye: MrCooper: Why does the client care? If the buffer release fence is unsignaled it may stay in that state for quite a while, regardless if it's a KMS or GPU render fence. If you want to avoid blocking at any cost, you must use a buffer with a signaled fence.
11:08 MrCooper: as explained before above: GPU fences are guaranteed to signal ASAP, KMS ones aren't
11:09 MrCooper: look, feel free not to trust me on this, you can't say I didn't warn you though :)
11:12 lynxeye: MrCooper: I still don't get the difference. The flip for the KMS fence is queued, so it's ASAP as-in whatever the next reachable vblank is. The job signaling the compositor GPU render fence might be delayed in the same way by another job hogging the GPU queue.
11:13 lynxeye: If the client want to avoid blocking on buffer availability it must choose a buffer where the release fence is already signaled.
11:13 MrCooper: to put it differently, it's very unlikely that any future GPU work by the client could start before a GPU fence from the compositor signals anyway, this isn't true for a KMS fence though
11:14 MrCooper: a job which blocks the compositor GPU work also blocks future client GPU work
11:15 lynxeye: MrCooper: Agreed, if you talk about a single GPU with one execution queue. In a hybrid setup you might run the composition on a different GPU than the client.
11:15 MrCooper: true
11:16 MrCooper: (having a déjà vu right now :)
11:18 MrCooper: that just makes GPU fences more problematic though, not KMS ones less so
11:18 lynxeye: right
11:19 lynxeye: I guess what I'm saying is that the client should always expect the fences to be problematic ;)
11:20 MrCooper: indeed, so clients should probably only re-use a buffer with unsignalled release point if they can't allocate another one
11:25 lynxeye: yep, randomly picking a buffer just because you received a fenced release for it is a recipe for hurt, maybe just more pronounced right now if the release fence happens to be a KMS one.
11:30 sima: MrCooper, there's also multi ctx scheduling in pretty much all hw
11:31 sima: or most at least
11:31 sima: plus if someone hangs on a ringbuffer gpu you look at a multi-second timeout
11:31 sima: so not sure why the gpu fence is less problematic than the kms one
11:32 sima: my take is this is down to how much memory you can waste, and where you are on the latency/perf tradeoff
11:32 sima: which is really tricky
11:32 MrCooper: right (amdgpu being a notable exception, it's getting there though :), still unlikely that future client work can preempt already-flushed compositor work though
11:32 sima: yeah you generally don't win against the compositor
11:33 sima: but the latency/throughput still applies
11:33 sima: like maybe app wants to queue up the next frame as soon as possible, because it's cpu intensive to do that
11:33 sima: or it wants lowest latency and does everything super late close to next vblank
11:33 sima: or whatever the app feels like
11:34 sima: so I'm with lynxeye that I'm not seeing why returning a buffer with an unsignalled out_fence is harmful
11:34 sima: no matter which one
11:34 sima: if you don't care about memory, just allocate more winsys if you'd block otherwise
11:34 sima: if you don't, pick the right choice according to your latency/throughput goals
11:35 sima: ofc if the app is dumb and just blindly starts rendering the moment it gets a frame back
11:35 sima: you get to keep all the pieces
11:35 MrCooper: point is that assuming the client uses the same GPU as the compositor, it can use a buffer with unsignalled release GPU fence without penalty, whereas this isn't true with a KMS fence
11:35 MrCooper: *my point
11:35 sima: unless the goal was intentionally to not ever waste memory and prioritize throughput
11:35 sima: MrCooper, but does that case exist?
11:36 sima: like with a reasonable compositor you don't start the next frame before the previous one finished
11:36 sima: so if the compositor then picks a new buffer for that rendering, the old buffer is already not in use
11:37 sima: because the out-fence for the buffer is when it was last used for rendering, _not_ the out-fence for the first rendering of the next buffer
11:37 MrCooper: not sure what you're asking, it's like the majority of GL and possibly Vulkan apps
11:37 sima: I'm asking whether the compositor in the gpu path actually ever hands back a buffer with a non-signalled outfence
11:37 sima: unless it's kinda busted
11:38 MrCooper: mutter does
11:38 sima: or the compositor already decided to toss latency overboard, at which point the app trying is pointless
11:38 sima: how does that happen?
11:39 MrCooper: it may set the GPU fence of its last compositing work on the release point as soon as the client has attached another buffer, which may be before that work has finished
11:40 lynxeye: same for weston iirc
11:40 sima: but that's the "we tossed latency already" case
11:41 sima: if the compositor doesn't toss latency, it waits with picking which buffer when it composites the next frame
11:41 sima: at which point the previous has finished
11:41 MrCooper: not sure what you mean by "toss latency"
11:41 sima: unless your app renders so fast that the new app frame finishes faster than the gpu compositing of the dekstop
11:41 MrCooper: if anything this helps latency, doesn't hurt it
11:42 sima: if the compositor already commits to using B while A is still being used it might miss a frame because B isn't done yet
11:42 MrCooper: not what I'm saying
11:42 sima: so I'm assuming we have a compositor which does a late decision about which frame, shortly before the point of no return for the next vblank
11:42 MrCooper: commits to using B while GPU work using A is still in flight
11:43 sima: then quickly queues up gpu work and issues the kms flip
11:43 MrCooper: gamescope maybe
11:43 sima: MrCooper, at that point, is B finished rendering or fence still pending?
11:44 MrCooper: finished (in mutter and most other compositors)
11:45 MrCooper: actually mutter also does something like what you describe (though the mechanics work a bit differently)
11:46 sima: so if B is finished, but A isn't yet, the app is rendering much faster than your compositor
11:46 sima: you're not going to have any problem at all
11:46 MrCooper: unless you care about benchmark numbers ;)
11:46 sima: but if the compositor commits to B before it's finished, it's not prioritizing latency
11:47 daniels: fwiw OUT_FENCE_PTR was indeed written to support the case where people wanted to queue up deeper pipelines of work without necessarily caring about immediate latency or hitches
11:47 sima: MrCooper, the app allocates more frames to keep the benchmark people happy and the compositor hopefully does mbox semantics for flips?
11:47 daniels: if you are gunning for the absolute minimal possible latency and getting some kind of new content on every refresh no matter what, then that is not the hammer for you
11:48 sima: unless you're super constrained and want to limit to just 2 buffers, at which point you'll block until the previous one is available no matter what
11:48 sima: and would much prefer you can block on a dma_fence since that block point is later
11:48 MrCooper: sima: the point is that if the client re-uses a buffer with an unsignaled KMS fence, its frame rate will be capped to the display refresh rate
11:48 sima: MrCooper, isnt' that the point?
11:49 sima: if you want free-wheeling mbox winsys flips, you need to make sure those happen
11:49 MrCooper: not if the client wants to go as fast as possible, which it can with unsignalled GPU fences
11:49 sima: and sure usually the kms flip fence takes a bit longer than the gpu flip fence, but it would still not be mbox
11:50 sima: MrCooper, who says your compositor is not super dense and queued up that gpu fence behind a kms flip out_fence?
11:50 sima: if you want free-wheel, you need to do that
11:50 MrCooper: k, this is getting too hypothetical
11:50 daniels: sima: it's not just about being super-constrained, but by the time you've allocated a buffer with all the disruption that ensues (mmu etc), you're probably too late anyway
11:52 sima: I'm still not sure what's the use-case beyond benchmark numbers
11:52 sima: like if you have something like gamescope you still want to free-wheel and absolutely ignore every buffer with unsignalled out-fence
11:53 sima: daniels, yeah but aside from startup that shouldn't happen during runtime
11:53 daniels: yeah. if you want to build something like kmscube, then use OUT_FENCE_PTR and it'll be useful to you. if you want to build something like gamescope, build gamescope instead and don't use OUT_FENCE_PTR because it's not useful to you.
11:53 daniels: I don't think there's anything controversial there
11:53 lynxeye: I guess that takeaway is simple: if you don't want to block unexpectedly, don't use random buffers with unsignaled release fences, in which case you don't care if it's a render or kms fence and also don't care about compositor policy regarding latency vs. throughput.
11:53 MrCooper: I never claimed anything else; if you're fine with your compositor potentially producing orders of magnitude lower benchmark numbers than others, go for it! ;)
11:54 daniels:shrugs
11:56 sima: daniels, there at least was more than just kmscube that wanted out_fence for deeper queues
11:56 sima: but it's really only "I can allocate 2 buffers but not 3 because simply not enough memory"
12:08 daniels: or, you do have 3 or 4 buffers, but for whatever reason (ease of design, deep hardware pipelines, slow hardware, whatever), you queue work up long in advance
14:35 rgallaispou: Hi guys, regarding this patch https://lore.kernel.org/dri-devel/20241115-drm-connector-mode-valid-const-v1-3-b1b523156f71@linaro.org/ Is it better to wait for someone to merge the whole serie, or can I apply it on -next independently ?
15:15 alyssa: karolherbst: How do we feel about enqueue_kernel in Mesa? *sweat*
15:18 alyssa:doesn't love the syntax but
15:37 karolherbst: alyssa: I haven't though about how to implement that one at all
15:37 karolherbst: but it's required for CL C 2.0 support
15:46 alyssa: my objection to it is that the argument passing model feels weird
15:46 alyssa: I want to just call `kernels` with arguments
15:46 alyssa: instead of this weird closure trampoline thing which will surely not translate to good code without backflips
15:50 alyssa: like I can't enqueue the same kernel from both host and device without compiling it twice, effectivley
16:15 alyssa: yeah.. studying the spir-v more, this is definitely implementable but it would mean extra variants
16:15 alyssa: maybe that's ok though?
16:16 alyssa: Just feels really silly
17:38 robclark: sima, mlankhorst: any idea what this lockdep splat is about:
17:38 robclark: https://www.irccloud.com/pastebin/ZpISiGMV/
17:41 sima: robclark, huh, never seen one of those
17:42 sima: bit dinner time already, but let me try at least
17:42 robclark:neither
17:44 sima: hm held_lock->references is an uint, at least here
17:44 sima: so overflowing that with lots of gem bo seems unlikely
17:47 robclark: tbf this is sparse/vm_bind type thing, so same bo could appear many times (but I think drm_exec should just be skipping the dups)
17:48 robclark: still, I don't think it would be 2^^32
17:50 sima: robclark, we should skip the already locked ww_mutex for the EDEADLCK case, those should never get to lockdep
17:51 robclark: right
17:52 sima: well for the initial trylock case, but the lock_acquired is only for the success case
17:52 sima: robclark, feels like quicksand and I'm scared
17:52 sima: can you repro this reasonably well?
17:53 robclark: so far just saw it once when running dEQP-VK.sparse_resources.buffer.\*
17:53 sima: hm
17:54 robclark: I can see if it is repeatable, but so far sample size of 1
17:54 sima: I guess first try to repro reliably or faster because I have no idea what's up here
17:54 sima: and then maybe we can try to trac ww_mutex_lock for dma-buf and see what's up
17:55 robclark: k.. just wanted to see if anyone else was familiar with that before I spent more time on it vs debugging $my_bugz
17:55 sima: nah this sounds like lockdep internals gone very wrong potentially
17:56 sima: but it's current->held_locks and if you somehow manage to corrupt that I'd expect the entire kernel to crash&burn much earlier
17:58 robclark: well, I _am_ playing with objs, locks, and mapping .. so can't rule out corrupting things, but yeah, I'd expect more of a fireball if things went wrong there
17:58 sima: robclark, before you waste time trying to repro
17:59 sima: ww_mutex_lock on the same lock in a loop, until you've gotten -EDEADLCK UNIT_MAX times?
17:59 sima: because I'm not entirely sure that's handled correctly, and it's about the only guess I have about what could go wrong
17:59 sima: because I'd guess you do not actually have UINT_MAX distinct ww_mutex in your machine
18:01 robclark: hmm, we could be re-using the same obj (and lock) many times..
18:02 sima: yeah
18:02 DemiMarie: MrCooper: why would a program ever want to render more than once per frame? That is just wasting the user’s GPU and electricity.
18:02 sima: and then maybe walk those a few times
18:02 sima: and then maybe an accounting bug in lockdep so that you exhaust much quicker than 2^32 attempts
18:02 sima: a stretch at best, but the only one I've come up with
18:02 robclark: seems plausible
18:07 sima: robclark, and allocate the lock from a gem_bo or so, because lockdep handles allocated locks differently from static ones
18:07 sima: just to make sure you're not chasing the wrong phantom here
18:08 robclark: yeah, will hack that up in ~5 or so.. just looking at a differnt bug first
18:12 sima: I'll get myself stuffed with raclette meanwhile, it's ready now
18:12 robclark: enjoy
18:24 mattst88: mmmm, raclette