IRC Logs of #dri-devel on irc.freenode.net for 2023-03-28

02:19 Venemo: gfxstrand: considering that we rarely need to prove workgroup-uniformity, the alternate idea is to just do a parent-walk starting from the ssa def and determine it that way. which would be probable much easier.
02:25 diwrrr: Hi. Does anyone know if it's possible to use VRR in Xorg with PRIME render offloaded applications? I have a Intel iGPU and an RX580, and it's working but only on Wayland, for render offloaded apps. Just wanna know if it's even supported by the modesetting driver, can't find much info anywhere.
04:53 i509vcb: In the context of a Vulkan driver that supports VkPhysicalDeviceDrmPropertiesEXT, would the driver be expected to report a primary node or would it only technically be required to report a render node? Technically there could be a card node that is advertised but it wouldn't do much at all then.
05:05 mareko: the joke that ChatGPT could replace a CEO is a reality now, in China
06:38 emersion: i509vcb: software rendering
07:02 Lynne: i509vcb: the render node, no point in exposing the display node, since you couldn't run graphics/compute ops on it
07:02 Lynne: which are mandatory
07:05 emersion: Lynne: swrast can
07:27 MrCooper: hays: make sure CONFIG_FRAMEBUFFER_CONSOLE is enabled in the kernel configuration
07:44 mslusarz: Venemo, cmarcelo: there are some crazy internal people who force enable nv mesh shader on anv (even though I told them once they should stop doing that), so we have to understand why they do that, but other than that I don't see any problem with removing NV_mesh_shader
07:47 mslusarz: we have to be careful to not remove all codepaths which are, in theory, nv extension-only, because some of them will still be needed for ext on anv (at least for dg2) :/
07:48 pivi: hello everybody
07:48 pivi: is it fine to ask general question on DRM development here? I am getting crazy on how to fix a few drivers and I would need some guidance
07:54 MrCooper: pivi: sure
08:04 pivi: I have the following situation DRM display chain: TIDSS (parallel RGB output) -> DSI Bridge (tc358768) -> LVDS bridge (ti-sn65dsi83) -> LVDS display. Between the TIDSS and the TC358768 I have a MEDIA_BUS_FMT_RGB666_1X18, however the TIDSS receive the LVDS mode e.g. MEDIA_BUS_FMT_RGB888_1X7X4_SPWG. If I manually force the mode on the TIDSS it's all good, otherwise, nothing really works
08:04 danvet: uh
08:05 pivi: I tried implementing atomic_get_input_bus_fmts on the tc358768, but without much luck so far
08:05 danvet: so the short answer is that the atomic framework isn't quite ready for this yet
08:05 danvet: or at least I need to check where we are
08:06 pivi: so there is no way on the framework to have a bridge having a differnt media fmt between input/output ?
08:06 danvet: pivi, well I need to first read up on where we are, we have some pieces but maybe not all yet
08:07 danvet: might also be better for bbrezillon
08:12 danvet: pivi, nah from a quick look bbrezillon all added it in 2020 and it should work with that hook
08:13 danvet: unfortunately there's no overall howto doc with maybe a nice dot graph that shows how it does work
08:13 danvet: pivi, so where are you struggling?
08:14 pivi: I defined atomic_get_input_bus_fmts in tc358768 and the TIDSS just keep do not get it
08:15 pivi: not sure if there is more that needs to be implemented, the tc358768 does not really implement the whole atomic callbacks
08:16 danvet: you need them all
08:16 danvet: it's not documented I guess, there's just a code comment
08:17 pivi: I would need to move to the atomic API also the TIDSS?
08:17 pivi: anyway, is in general appreciated to update DRM drivers to the atomic API ?
08:18 danvet: hm tidss is already atomic?
08:18 danvet: what you need is that each bridge is supporting the atomic state stuff
08:18 danvet: because that's where the input/output bus fmt is stored in
08:19 danvet: i.e. the bridge_funcs->atomic_duplicate/destroy_state hooks
08:20 pivi: ok, cool. so sn65dsi83 seems fine, tc358768 I can update myself. the TIDSS I am not sure on the current state neither if I need to update it (it's not just a bridge)
08:20 danvet: the kerneldoc in there explains that they're mandatory if you implement any of the other atomic_ hooks
08:20 danvet: pivi, from a quick look tidss is atomic already
08:21 danvet: pivi, if you feel like, might be good to do a patch to add the "To implement this hook implementing the @atomic_duplicate_state and @atomic_destroy_state hooks must be implemented too" boilerplate at the bottom of each atomic_ hook kerneldoc to help people realize this?
08:22 pivi: ok, so if this is the case I am confused. because the tc358768 I already did before asking here, but I was not able to get the results I was expecting. (to be clear, while I am familiar with kernel development, I am not with the DRM subsystem, so who knows ;-)
08:23 pivi: danvet: I can do it, as soon as I get something "working" :-)
08:23 bbrezillon: pivi: do you have a public branch I can look at?
08:23 danvet: pivi, thx!
08:24 pivi: bbrezillon: not yet, I have a mess I am ashamed of :-). In addition to that I need some TI downstream patch on the TIDSS to test on my HW (TI AM62).
08:24 bbrezillon: it's supposed to be a 2 steps thing => first select the bus formats for all links, store them in the atomic state, and then, when the atomic state is applied, program the bridge/display controller accordingly
08:24 pivi: bbrezillon: I can prepare something easily, would you prefer this or just a RFC on the mailing list ?
08:25 pivi: bbrezillon: do I need to have all the bridged to implement atomic_get_input_bus_fmts ?
08:25 pivi: bbrezillon: s/bridged/bridges/
08:25 danvet: I guess if you really want to go overboard, a bus format negotiation DOC: section with dot graph that explains how it all flows and connects would be perfect
08:25 danvet: pivi, yup
08:25 danvet: otherwise the next bridge gets MEDIA_BUS_FMT_FIXED
08:26 danvet: maybe also something we should add to the docs somewhere ...
08:26 pivi: danvet: hehe, I feel a little bit bad about me documenting stuff I can barely understand at the moment ;-)
08:26 danvet: plus a link to drm_atomic_helper_bridge_propagate_bus_fmt as the no-op/pass through helper
08:26 danvet: pivi, don't be, you're the perfect guinea pig to discover the gaps
08:26 bbrezillon: for those where you have a choice to do, yes. If the bridge supports just one input or output format, the core can pass the FMT_FIXED for you.
08:27 danvet: like we try to document stuff, be the devs working on something generally are blind to what is important to note when you have no idea
08:27 danvet: bbrezillon, I guess making drm_atomic_helper_bridge_propagate_bus_fmt is too much risk of breaking stuff?
08:27 javierm: pivi: IME that's the best moment to post doc patches since a) it's meant exactly for people like you and b) getting review is the best way to make sure that you understand the concepts
08:27 danvet: ^^ this ^^
08:27 pivi: anyway, I do no understand what's going on. I have implemented atomic_get_input_bus_fmts() on the tc358768, yet the tidss get neither MEDIA_BUS_FMT_FIXED nor the only one option I provided from the tc358768 (RGB666)
08:28 danvet: yeah it just skips these callbacks if you don't have the other atomic hooks implemented
08:28 pivi: no, I am sure atomic_get_input_bus_fmts()
08:28 danvet: pivi, you should get a WARN_ON splat though
08:28 pivi: danvet: I had all the callback implemented
08:28 danvet: select_bus_fmt_recursive() <- the one in here
08:28 danvet: hm
08:29 pivi: + .atomic_duplicate_state = drm_atomic_helper_bridge_duplicate_state,
08:29 pivi: + .atomic_destroy_state = drm_atomic_helper_bridge_destroy_state,
08:29 pivi: + .atomic_reset = drm_atomic_helper_bridge_reset,
08:29 pivi: + .atomic_get_input_bus_fmts = tc358768_atomic_get_input_bus_fmts,
08:29 danvet: hm
08:29 pivi: anyway, I agree with bbrezillon , I should really have code to share, otherwise it's just too difficult
08:29 danvet: note that tidss needs to dig out the bus fmt from the input fmt for the first brige's state
08:29 danvet: it's not in the crtc state tidss has access to
08:29 danvet: I think at least
08:29 pivi: danvet: probably this is what is missing
08:30 danvet: bbrezillon, we don't have a helper for this for the crtc?
08:30 danvet: this = getting the output fmt for the crtc so it matches input_fmt of the first bridge
08:30 danvet: maybe with a default of looking at display info fmt if there's no bridge chain
08:32 pivi: danvet, bbrezillon: I gonna prepare some branch that I can share here to make this a little bit more concrete.
08:35 danvet: bbrezillon, or do the crtc drivers just open code the bridge state lookup dance?
08:35 bbrezillon: danvet: nope, we don't have that, but the first bridge in the chain is usually part of the display controller driver. Besides, there's no concept of output bus format at the CRTC level
08:35 danvet: bbrezillon, yeah, but for bridges they have both input&output in their state
08:35 bbrezillon: I'd need to look at it, it's been a long time
08:35 danvet: so just figured a nice helper might be useful
08:36 danvet: but if all the drivers just put the fifo stuff into the crtc/planes the the first bridge is the first real output then I guess they don't need that
08:36 danvet: if the crtc->bridge bus fmt is an internal impl detail
08:36 bbrezillon: yeah, not sure get_input_fmts() is used on the first bridge, I'd have to check
08:36 danvet: but maybe for some drivers (tidss?) it's different
08:36 danvet: drm_crtc_helper_get_output_fmt(crtc, state) or so might be good
08:37 danvet: state so we get the right one both in check and commit functions
08:37 danvet: pivi, ^^ you might want to consider adding that
08:40 danvet: bbrezillon, mxsfb_crtc_atomic_enable and mxsfb_crtc_mode_set_nofb essentially open-code (and in a convoluted way) the helper I have in mind
08:41 danvet: hm sounds like we even want to provision for a default to handle that case
08:41 danvet: but essentially the algorithm to compute the crtc output bus flags:
08:42 danvet: 1. look at first bridge's state input_bus_cfg
08:42 danvet: 2. look at first bridge->timing->input_bus_flags old style approach
08:42 danvet: 3. look at connector->display_info.bus_format[0]
08:42 danvet: 4. some default
08:43 danvet: pivi, if you also need this in tidss then I think would be good to extract that from mxsfb (the code is a mess) and reuse
08:43 danvet: 4. is some _driver_ default
08:48 HdkR: Did Xe ever fix its struct packing in the uapi with the RFC? I didn't pay attention to it after the first posting.
08:49 danvet: mlankhorst, ^^
08:49 danvet: mbrost/thellstrom not here, so this is on you :-)
08:50 HdkR: https://gist.github.com/Sonicadvance1/66a34f175d467b266629f7a5423ba2cb Was the naive diff change I had at the time
08:57 bbrezillon: danvet: I think step 3 only makes sense if there's just one bridge in the chain, but I also don't want to change that :D
08:58 danvet: bbrezillon, step 3 is for no bridge
08:58 danvet: but most drivers just avoid that by having a panel bridge so I guess in practice it's really just legacy fallback
09:01 bbrezillon: uh, you're right
09:02 danvet: so I think the value in this would just be in kinda officiating it all (plus it's what mxfsb has implemented right now)
09:02 danvet: with maybe a very strong hint in the kerneldoc that bridges really should implement the atomic state stuff
09:04 danvet: plus display_info is the fallback already in the bridge helpers too, so this would be consistent
12:08 zamundaaa[m]: When allocating memory with VK_EXT_image_drm_format_modifier, is it possible to decide whether or not the resulting buffer should be scanout capable? If so, how?
12:15 dj-death: we should probably assume that by default
12:19 emersion: zamundaaa[m]: it is not possible
12:19 emersion: per the spec
12:20 emersion: in practice: mesa always allocates scanout capable, nvidia proprietary does not
12:20 emersion: (given a scanout-capable modifier)
12:21 emersion: zamundaaa[m]: if you want guaranteed scanout-capable, you need GBM and GM_BO_USE_SCANOUT
12:21 zamundaaa[m]: well that's annoying then
12:21 emersion: why?
12:22 zamundaaa[m]: The background behind this question is that multi gpu in KWin right now isn't fast on all PCs. That is, on some devices it needs to fall back to CPU copy, which on some devices is very very slow and inefficient
12:23 emersion: why do you want to allocate with vulkan?
12:23 zamundaaa[m]: Using EGL to import a dmabuf across GPUs proved to be buggy and doesn't exactly offer a ton of control either
12:23 emersion: EGL or Vulkan should work the same
12:23 emersion: IOW, one shouldn't be more buggy than the other
12:24 zamundaaa[m]: I can't decide if a buffer is accessible from the CPU with EGL
12:24 zamundaaa[m]: With Vulkan I could allocate buffers where I'm guaranteed that copying it around and rendering both work well
12:24 lumag: aknautiy_, I'm sorry to ping you again, is there any chance you can help with the review of the second half of https://patchwork.freedesktop.org/series/114473/ ?
12:25 emersion: afaik, these are somewhat exclusive
12:26 lumag: aknautiy_, jani sugested that you might help with it (while he doesn't have time)
12:27 zamundaaa[m]: emersion: yeah I know that performance wise it's never gonna be the best for both at the same time
12:28 zamundaaa[m]: but with EGL I'm facing a situation where using modifiers doesn't work because of a RadeonSI bug, and if I force linear then performance is absolutely horrible once I need to blit stuff from the framebuffer (to OpenGL-allocated textures). So I assume Mesa allocates the buffer in system memory, because I can't explain it any other way.
12:29 lumag: narmstrong, robertfoss: while the rest of the series is not finished with the reviews, Jani wrote that he is fine with merging patches 1-5 through drm-misc. Would that be possible?
12:30 zmike: ccr: I think you could review this https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22164
12:30 narmstrong: lumag: sure, do you need me to apply them ? I can ask Jani on the thread if it;s ok
12:30 lumag: narmstrong, robertfoss: for the reference: https://lore.kernel.org/dri-devel/874jqvczy1.fsf@intel.com/
12:30 narmstrong: lumag: yep I saw his reply, I thought an intel guy would apply those
12:31 lumag: jani, could you please comment ^^
12:32 ccr: zmike, aye aye
12:35 pepp: zamundaaa[m]: which radeonsi bug? #8431?
12:38 zamundaaa[m]: yeah
13:28 lumag: narmstrong, ok, Intel guys will take care of those series.
13:47 danvet: gfxstrand, robclark syncobj ack or discussion still ongoing?
14:43 danvet: fyi moved drm-fixes to -rc4 if anyone wants to roll forward
14:43 danvet: plan to do the same with -next once the syncobj pull is in
14:43 pivi: danvet, bbrezillon : I was able to make some (_small_) progress on this morning discussion. Hacking around the tidss code I was able to confirm that indeed my changes on the tc358768 input bus format are correct.
14:44 pivi: what is still confusing to me is how this tidss driver should be changed. in your example this morning you somehow mixed up bus_flags and bus_format, and I have no idea if this is wanted and I just do not understand it or something else
14:45 pivi: (for the tc358768 bridge I'll prepare and send a patch in the next couple of days)
14:45 pivi: danvet: I mean this
14:45 pivi: 1. look at first bridge's state input_bus_cfg
14:45 pivi: 2. look at first bridge->timing->input_bus_flags old style approach
14:45 pivi: 3. look at connector->display_info.bus_format[0]
14:45 pivi: 4. some driver default
14:46 danvet: I just looked at what mxfsb is doing, maybe that doesn't make much sense
14:46 pivi: this fe141cedc433 ("drm/imx: pd: Use bus format/flags provided by the bridge when available") from bbrezillon looks also interesting
14:47 pivi: as an example to look at
14:47 danvet: hm yeah that looks a bit like another copy of the same theme
14:49 danvet: pivi, and yeah that helper would need to fill out a drm_bus_cfg with both format and flags I guess
14:49 danvet: I think mxfsb only computed flags for whatever reasons
14:51 pivi: the current API, atomic_get_input_bus_fmts, is only taking care of bus format. is supported to have the equivalent for the bus flags or not yet? On my specific case I do not need it, the whole bridge chain is able to do anything and I do not need to restrict/alter it in any way
14:51 gfxstrand: danvet: I think I'll RB today if robclark responded since I last looked
14:52 danvet: pivi, it supports filling out drm_bus_cfg I thought, which has both?
14:53 robclark: ok, I guess I'm sending another revision for a couple more typo's
14:53 bbrezillon: for tidss to take part in the bus format negotiation, the encoder bit of the tidss driver should implement the drm_bridge interface
14:53 pivi: from what I can tell it has a `unsigned int *num_input_fmts` output parameter, and that's it.
14:54 danvet: robclark, I'm fine if you just respin the pr with the ack and fixes
14:54 danvet: robclark, if you do maybe also include the igt/userspace links for completeness, since the pr mail is kinda like series cover letter if we do topic branch
14:54 pivi: bbrezillon: I did hackaround something in tidss_encoder_atomic_check() at the moment.
14:54 robclark: ok, will do.. need to do a couple other things first
14:55 pivi: it is already going through the bridges, I just take the bus fmt out of the first one and use it. in the immediate it was mainly to prove that my changes on the other bridge were correct
14:56 bbrezillon: pivi: https://elixir.bootlin.com/linux/v6.3-rc4/source/drivers/gpu/drm/drm_bridge.c#L1139
14:56 bbrezillon: pivi: well, that can work, but that means you don't get to select the input format chosen by the first bridge in the chain
14:57 bbrezillon: meaning you can end up with something that's not supported by the tidss driver at the time the atomic state is checked
14:57 pivi: bbrezillon: this fe141cedc433 ("drm/imx: pd: Use bus format/flags provided by the bridge when available") is doing exactly what you wrote here, implementing the drm_bridge interface, right?
14:58 bbrezillon: I don't remember, let me check
14:58 jani: lumag: narmstrong: looks like the patches didn't pass our CI (didn't apply). would you mind rebasing and resending patches 1-5 again as a new thread, Cc: intel-gfx, and we can apply with the CI results
14:59 bbrezillon: pivi: yep, I think it does
15:01 mslusarz: Venemo, cmarcelo: I got a confirmation that we are free to remove support for NV_mesh_shader from ANV
15:01 gfxstrand: danvet, robclark: The problem is that I'm still not 100% seeing how deadlines fit into Vulkan. Like, robclark's patch works and does a thing but is it the right thing? IDK.
15:02 danvet: gfxstrand, you mean for the entire end-to-end tuning problem from app to hw?
15:03 pivi: bbrezillon: thanks. I'll focus on getting the tc358768 changes ready first, and after dig on the proper way to improve the tidss driver.
15:03 danvet: I have vague terrible memories to an endless framerate vulkan extension shed
15:03 robclark: I think there is some room for vk extension to give app control
15:03 robclark: but that doesn't exist today
15:04 robclark: in the mean time we can do what is right 90% of the time and driconf our way around the rest if needed
15:04 Venemo: mslusarz: great, feel free to open a MR to do so. after both radv and anv are merged we can then remove it from nir and spirv
15:04 lumag: jani, I tried, they apply on top of drm-tip. Is there any other branch that I should use?
15:06 jani: lumag: huh. let me hit retest and see what happens
15:07 danvet: robclark, yeah that's my take too, we've had deadlines boosting in some form in i915 since forever, it's clearly useful
15:07 danvet: but also the larger bikeshed is epic so we just need something which is 80% or so but at least across drivers to move one step forward
15:07 danvet: I think at least
15:07 danvet: worst case we'll have have a DEADLINE2 ioctl or something :-)
15:08 robclark: yeah, my thought as well
15:08 robclark: I don't think we need new ioctl for this.. since the debate is in userspace
15:08 robclark: maybe for tursulin's deadline scheduler, that is a different flag
15:09 danvet: gfxstrand, so the ack I'm looking for is not "this is definitely what we'll use in a vk extension" but more "this doesn't look totally wrong and looks good enough to at least move the discussion"
15:19 dvereb: Good morning. Apologies if I'm in the wrong place. I've encountered a bug using fox-toolkit and their mailing list indicated it appears to be in the mesa library. I'm here to double check where to post it. I've narrowed down one specific line that fixes my issue (a simple boolean flag). It was fixed in 22.3.4, but has since been changed back by 23.0.0. Do I subscribe & post it to the mesa-users mailing list, should I bring
15:19 dvereb: it up here, or perhaps something else? Thank you.
15:23 robclark: dvereb: https://gitlab.freedesktop.org/mesa/mesa/
15:23 dvereb: Perfect. Thank you, again!
15:23 robclark: np
15:36 lumag: jani, I don't see test results in the CI. Should I resend the series?
15:38 alyssa: eric_engestrom: please boot out the marge queue when landing farm downages
15:38 jani: lumag: it'll take a while after clicking retest
15:38 lumag: jani, ack, I see
15:38 lumag: excuse me then
15:38 jani: lumag: it gets run on actual hardware so there's some queue
15:38 eric_engestrom: alyssa: ack, I just saw that it wasn't done, misunderstanding on who was doing that sorry
15:39 eric_engestrom: alyssa: in the mean time, power is back :)
15:39 lumag: I see. It's that there is no 'test in progress' state (unlike github/gitlab).
15:39 lumag: So I was worried a bit
15:40 jani: lumag: https://intel-gfx-ci.01.org/queue/index.html
15:40 lumag: thanks!
15:41 alyssa: eric_engestrom: fair enough :)
15:56 gfxstrand: danvet: Well, that seems to be a bit of the problem.
15:56 gfxstrand: danvet: I think the biggest confusion is over what a deadline means and whether or not not having a deadline is a secret third thing that we actually want.
15:56 gfxstrand: Or should not having a deadline be considered a legacy behavior?
15:57 daniels: life sure would be easier if deadlines were legacy and deprecated, yeah
15:57 danvet: yeah it's not really a deadline but more a "maybe uncomfortableness will ensue past this time" line
15:57 bnieuwenhuizen: random Q I have with the entire thing is how does this affect the direction of going UMF?
15:58 danvet: bnieuwenhuizen, you don't :-)
15:58 danvet: the entire umf conundrum is that you get lower latency at the price of less fairness because the scheduler has no idea wtf is going on anymore
15:58 gfxstrand: bnieuwenhuizen: Yup. That's another question.
15:59 gfxstrand: Do we want to add more usages of dma_fence as a communication channel.
15:59 gfxstrand: We tried that with error propagation. It did not go well.
15:59 danvet: in theory intel hw/fw can schedule away on umf waits
15:59 danvet: in practice it kills both throughput _and_ latency and so isn't enabled
15:59 bnieuwenhuizen: side-note, did anyone do anything interesting with the value of the deadline? I thought the series just made Intel boost if there is a deadline?
15:59 gfxstrand: I assume HW/SW can schedule away on UMF waits. The problem is that there's no communication channel with UMF.
16:00 danvet: bnieuwenhuizen, iirc it roughly controls sched order in the same prio class
16:00 gfxstrand: bnieuwenhuizen: Rob's been doing some power management stuff with freedreno based on it. There was an XDC talk, I think.
16:00 danvet: but the main one is ramping clocks
16:00 danvet: bnieuwenhuizen, imo what exactly we do can be changed since it's best effort tuning problem
16:01 danvet: and those are a lot more lax wrt regressions
16:01 bnieuwenhuizen: yeah I guess for scheduling that could work. For clocks I'm still like "what should the clock be if we don't know the workload?"
16:01 danvet: bnieuwenhuizen, best I've seen is try to remember the old one and reset that one
16:01 danvet: per ctx or so
16:01 danvet: that's good for spikey workloads
16:02 danvet: you could also co to max and then ramp down, but with todays power constrained chips that often ends badly
16:02 danvet: *go to max
16:03 danvet: remembering the old freq needed for that ctx is the hard part :-)
16:04 danvet: the new cpu most recent virtual deadline patches that floated around and lwn covered look really interesting for this
16:04 danvet: but on the gpu we don't have the infra to get accurate enough timings
16:04 danvet: plus no useful preempt either
16:04 danvet: useful as in = doesn't take substantial part of an entire frame, you might as well just wait for the job to finish
16:06 danvet: gfxstrand, bnieuwenhuizen robclark do we need to rename it to something like DISCOMFORTLINE to make the concept clear and untangle from actual deadline scheduling?
16:08 robclark: s/DEADLINE/PONY/
16:09 robclark: danvet: anyways, at least the name does change uabi
16:10 danvet: well uapi but yeah it's be annoying
16:10 robclark: I tried to emphasis "hint" everywhere.. that was the best solution to communicate what it does that I came up with
16:19 danvet: robclark, yeah it really should be clear enough ...
16:19 alyssa: By the way -- I'm experimenting with doing intra-driver review downstream https://gitlab.freedesktop.org/asahi/mesa/-/merge_requests
16:19 alyssa: and then once in a while will send up a big batch of reviewed commits to Marge upstream
16:19 alyssa: to reduce mesa ci roundtrips
16:20 alyssa: anything that touches non-vendor code is still going directly to mesa/mesa for review, this is just for purely src/asahi + src/gallium/drivers/asahi changes
16:20 danvet: this starts to feel a lot like the kernel pr model with local integration trees
16:20 alyssa: TBD whether this is a materially better or worse workflow
16:20 danvet: (minus the boutique tree nonsense maybe)
16:20 alyssa: danvet: it's not ideal, no.
16:21 danvet: alyssa, oh no judgement meant, just observation and maybe a place for where to steal ideas
16:21 alyssa: but there's no good option here so I'm trying a bad option in the hopes it's useful.
16:21 danvet: like "don't make the subtrees too small" for some value of "too small"
16:21 alyssa: ah, yeah
16:22 alyssa: I'm doing this at the asahi project level with the idea that it is mostly a closed set of developers and reviewers going around there
16:23 danvet: yeah it makes some sense, eventually no monotree scales well enough
16:23 alyssa: yeah
16:24 HdkR: "Week 48 mega-sync"
16:24 alyssa: even if our CI were perfect, the serialization of the merge queue is a fundamental limit
16:24 danvet: *monorepo
16:24 HdkR: I like it, how soon until drm entirely lives in gitlab for PRs? :)
16:24 alyssa: spamming the marge queue with a zillion alyssa MRs will only slow down everyone else's merges and frustrate alyssa
16:24 danvet: monotree with multiple repos is imo better than outright full splitting
16:25 alyssa: AFAIU, the major pain point with this is that it discourages drive-by reviewa
16:25 danvet: HdkR, I'm pretty sure there's a universe where it already happened :-(
16:25 danvet: alyssa, yeah it encourages silos, hence the "don't be too small"
16:25 alyssa: and then by the time people see the upstream batched MR it's too late for making changes
16:26 bnieuwenhuizen: also starts to introduce rebase conflicts, where whoever is upstreaming is now likely responsible for the rebase pain
16:26 alyssa: which is why I'm hopeful that by bringing this up here, anybody who cares about reviewing asahi changes can do so
16:26 bnieuwenhuizen: though I bet that isn't the worst yet
16:26 danvet: bnieuwenhuizen, which is why linux does real merges for these integration trees
16:26 alyssa: bnieuwenhuizen: I'm betting on the rebase conflicts being less painful than the merge queue. This is an experiment. If it doesn't work we'll go back to the old way.
16:27 danvet: least because a rebase also invalidates all the subtree testing you've done
16:27 danvet: it's some tradeoff, but generally if you rebase right before merging there's excellent chances git bisect falls over on it
16:28 danvet: the tradeoff ofc is that if you don't rebase, then there's a chance you might be stuck on an unlucky regression in your baseline
16:29 alyssa: danvet: re size of tree, the level I would see would be per-hardware vendor
16:29 alyssa: s/tree/repo
16:29 alyssa: e.g. both radeonsi and radv would be a repo together
16:29 alyssa: (even though it's different software teams backing them)
16:30 bnieuwenhuizen: other side is whether it makes sense to have marge-bot batch MRs together for CI to make the single repo scale better
16:30 alyssa: that seems harder to do nicely
16:30 bnieuwenhuizen: but that comes with more human overhead in case the batch fails
16:30 alyssa: I mean
16:30 alyssa: I'm all good for people making CI scale better
16:31 alyssa: This is one approach Lina and I are going to try. If it works, great. If it doesn't, we go back, great.
16:31 alyssa: I
16:31 danvet: alyssa, ime 10 people is the lower boundary
16:31 danvet: plus/minus
16:31 alyssa: I'm mostly making noise here not to endorse the approach but to let people know that this is where the Asahi commits are going if they're looking for them.
16:31 danvet: below that you don't have a team and the overhead of the separate tree in terms of paperwork doesn't pay for itself
16:32 danvet: at least long term, for an experiment smaller might be better
16:33 danvet: paperwork = rebasing, doing integration mr, handling the inevitable fallout for everyone, rotating that responsibility
16:33 alyssa: yeah, we'll see how bad that gets
16:34 alyssa: it doesn't seem scary now but we'll see
16:34 gfxstrand: robclark, danvet: I don't think the name is the problem.
16:35 gfxstrand: The problem to me is what things with no deadline mean in this context. Is it the same as 0? The same as UINT64_MAX? Some secret 3rd thing?
16:36 gfxstrand: If it's a secret 3rd thing, is that the thing we actually want for vkCmdWaitForFences()? Or do we want deadline=0? Sure, deadline=0 solves a particular bug because the Intel wait-boosting is funky but is that what we really want?
16:37 gfxstrand: The reason why I care is because, if deadline=Some(0) isn't what we want but we really want deadline=None (Switching to Rust because it makes more sense), then Mesa as a userspace driver isn't a good idea.
16:39 HdkR: alyssa: But how do you feel about shorter pre-merge CI and longer post-merge CI and regressions get shamed until resolved? :)
16:41 alyssa: HdkR: this was shot down very hard.
16:41 HdkR: oop
16:41 tursulin: gfxstrand: Yeah, and we don't really need deadline=now to solve the clvk wait-boosting problem. Just *some* way to know someone is waiting. Hence I was concerned about the proliferation of "now" for random things.
16:43 tursulin: what does no deadline hint mean is probably not interesting, just means no one cares
16:44 tursulin: I do not understand what is the issue with marking waits as deadline=U64_MAX
16:45 gfxstrand: Same problem as you were raising with deadline=0. Someone may suddenly de-prioritize all Vulkan workloads because they say they don't care how fast they run.
16:45 tursulin: the only extra part is that it needs extra handling in the kernel and both msm and i915 parts can AFAICT work just the same as proposed
16:45 gfxstrand: Whatever default we pick will have implications in the future.
16:45 gfxstrand: At least with deadline=0 we're indicating that someone is actively waiting on it right now and, as far as we can tell, has work to do whenever we get finished.
16:45 tursulin: gfxstrand: you worry about deadline=UINT_MAX getting de-prioritized relative to deadline=no-deadline?
16:46 gfxstrand: Sure
16:46 gfxstrand: Depends on the behavior of no-deadline.
16:46 gfxstrand: That's why I'm harping on the no-deadline case and nailing down exactly what that means relative to the other two.
16:46 tursulin: right
16:48 tursulin: it is probably worth trying to nail that down rather than to give up and immediately talk about adding more deadline flags. It is not likely policy discussion will be easier with more flags..
16:49 tursulin: I mean more deadline flags if this becomes a problem.
16:49 gfxstrand: One could make an argument that None should be equivalent to Some(u64::MAX) and that setting a deadline does `deadline = min(deadline, new)`. That seems like reasonable behavior. If that's the behavior we want, I want to make sure we're clear on it.
16:50 tursulin: From the scheduling point of view that is IMO sane and reasonable. I think only problem arises with the waitboost hack.
16:50 gfxstrand: What Vulkan does is a separate policy decision, IMO.
16:50 gfxstrand: It's not really a waitboost hack. It's us providing the most accurate information we know.
16:51 gfxstrand: And the most accurate information we know is that the client is waiting on us. What will they do when we're done? How important is that work? Will they actually schedule more GPU work right away? We have no way of answering those questions. All we know is that they're sitting there waiting.
16:52 tursulin: yep, hence "now" is questionable and I wouldn't call it accurate information - we have information someone is waiting sure, but not what the deadline is
16:52 gfxstrand: But U64_MAX isn't accurate, either. We know that someone's waiting and they probably care when we get done.
16:53 tursulin: that is also true :)
16:53 tursulin: well that they are waiting, whether they care when they get done who knows
16:53 gfxstrand: I said "probably" :)
16:54 bnieuwenhuizen: some of it might also be a wait before just freeing the memory, which could totally be U64_MAX unless you have memory pressure
16:54 gfxstrand: The problem we have right now with i915 is that there is a heuristic and it's basically "Things using the old uAPIs win"
16:54 tursulin: gfxstrand: it's not even that clear cut
16:54 tursulin: we know it harms some workloads pretty badly
16:55 gfxstrand: bnieuwenhuizen: Yeah, we could add a VK_SEMAPHORE_WAIT_LOW_PRIORITY flag to communicate that information.
16:55 tursulin: what people really want is ability to control waitboost per context
16:55 bnieuwenhuizen: gfxstrand: pls don't make Vulkan more complicated :P
16:55 gfxstrand: lol
16:56 danvet: gfxstrand, so in terms of rust FencePrio = None | SomeWaiter | Deadline uint64
16:57 danvet: which is I think what you're arguing for?
16:57 danvet: atm we just have None | Some uint64
16:58 danvet: (yes it's haskell not rust syntax)
16:59 gfxstrand: danvet: ¯\_(ツ)_/¯
16:59 tursulin: if we document that no deadline and U64_MAX should be handled/scheduled equally, and we add a wart (documented or not?) of "intercepting" U64_MAX in drivers to apply waitboost also in that case, does that work?
16:59 gfxstrand: danvet: I guess, over-all, I'm just not sure why Mesa should be the driver for this.
16:59 danvet: tursulin, atm they're not
16:59 danvet: at least not in the patches because of boosting
17:00 tursulin: yes I know it's not like that in the series
17:00 danvet: gfxstrand, aside from atomic flips the kernel has no clue what's going on
17:01 danvet: tursulin, I guess you could scale the waitboost somehow with the deadline, dunno
17:01 danvet: so that UINT_MAX = no boosting
17:01 danvet: that would ditch the special case of "no deadline ever applied" being different from UINT_MAX
17:02 danvet: like you get full boost if the deadline is less than 20ms and nothing if it's more than 10s or whatever you feel like
17:03 danvet: so I'd go more towards "drivers must make sure there's no difference between MAX and no deadline"
17:03 tursulin: that makes sense but "is not the waitboost you are looking for" :)
17:03 danvet: yeah it's a different thing
17:03 gfxstrand: danvet: If None and Some(u64::MAX) are different, then wouldn't we want None for all the Mesa things? In that case, why do we have meas patches?
17:03 danvet: vk ext to specify the deadline, just kick the can down the road
17:03 tursulin: series is piggy backing i915 style waitboost on top of deadline hints
17:03 gfxstrand: This feels a lot like "mesa should do a thing so we have an excuse to land uAPI"
17:03 gfxstrand: I don't like those.....
17:04 danvet: gfxstrand, I think because that's not what i915 did
17:04 danvet: gfxstrand, oh I thought robclark has chromium patches for this stuff
17:04 danvet: if chromium only uses sync_file imo drop the syncobj patch and move on
17:04 tursulin: syncobj is critical for clvk
17:05 danvet: still could split it out if that helps to keep things moving?
17:05 tursulin: it's what brings the i915 style waitboosting to it
17:05 gfxstrand: Wait-boosting is critical for clvk
17:05 gfxstrand: If we want wait-boosting to happen via deadline=Some(0) then sure
17:05 gfxstrand: But then why are we arguing that vkWaitForFences should be None/Some(u64::MAX)?
17:05 tursulin: but the deadline is IMO "fake" - what is needed is a side channel to say "someone is waiting"
17:06 danvet: well that's why I argued for enum FencePrio = { None, SomeWaiter, Deadline{uint64)}
17:06 danvet: and then I guess internally the kernel can just map that to Deadline(0) until we have more clue
17:06 danvet: since on the compositor/kms side we do have an actual deadline
17:07 tursulin: hm okay, that sounds plausible to me
17:08 tursulin: I did send https://patchwork.freedesktop.org/series/113846/ after all..
17:08 tursulin: which allows channeling "someone is waiting" data
17:09 tursulin: and avoids ABI conundrums
17:09 danvet: we could also just add the wait flag to syncobj for now if that's all mesa needs
17:10 danvet: bit funny uapi with deadline on sync_file and waiters-present on syncobj but oh well :-)
17:11 mareko: in NIR, it seems that it's possible to have a non-divergent SSA that is in a conditional block executed based on a divergent condition
17:12 robclark: danvet: yeah, syncobj is the critical thing right now.. the deadline part is meant to also encompass what i915 does on missed vblank deadlines
17:13 danvet: robclark, so since the cover letter didn't have it, what is the userspace for sync file then?
17:13 robclark: as far as scheduling.. I'm on the fence about whether it should piggy-back on the same mechanism/flag.. the current thing is meant to be feedback for gpu freq mgmt
17:13 danvet: and is there another one for syncobj?
17:13 gfxstrand: mareko: Yes. As long as all active channels agree, it's considered convergent.
17:13 robclark: right now the sync_file userspace is igt
17:13 danvet: uh
17:13 robclark: but also response to compositor folks
17:13 danvet: well that's not enough for sure for merging
17:13 danvet: tsk, tsk
17:14 gfxstrand: mareko: The moment that gets fed through a phi converging different values from divergent control-flow, the result of the phi is divergent.
17:14 robclark: pointing out that atomic helper vblank isn't enough if userspace is making composition decision
17:14 danvet: robclark, I guess I was naively assuming chromium compositor was using that
17:14 robclark: we aren't _yet_
17:14 robclark: but the need is pretty clear
17:14 danvet: yeah some mr/pr/changelist/whatever needs to be ready
17:15 danvet: yeah but we don't conjectured uabi in drm
17:15 robclark: if you have userspace deciding whether to recomposite with previous frame or new one
17:15 danvet: (ok sometimes mistakes happen but they really shouldn't)
17:15 danvet: *merge conjecture uabi
17:15 danvet: yay another typo, I give up
17:16 gfxstrand: mareko: The choice as to whether or not you can use an SGPR and how that works with possible conflicts across divergent branches is a problem left for the back-end RA.
17:16 Lynne: dj-death: nice, I'll test your descriptor buffer patchset with my code on an a750 tonight
17:18 robclark: danvet: I do also have a use for the sync_file uapi but because it is from a vm guest it's going to take some virtgpu plumbing to get to point where I can use it
17:19 danvet: I guess if the plumbing is really this hard we could merge behind modoption like with the atomic ioctl
17:19 danvet: but maybe kms atomic ioctl boosting is good enough for that?
17:20 danvet: (essentially what I replied on-list too)
17:21 robclark: for android we have a similar issue to what gnome-shell/etc have, we have SF waiting for fence in userspace and deciding shortly before vblank which version of a surface to use
17:21 robclark: I'll catch up on list in a bit. doing three different things at once atm
17:22 mareko: gfxstrand: it's for new linking opts... if I have a store writing a non-divergent value, the stored value can be divergent if the store is conditional
17:25 gfxstrand: mareko: Yes, that tracks. Stores and phis behave similarly in that way.
17:26 gfxstrand: So a variable value is convergent if all its stores store convergent values in convergent control-flow.
17:26 gfxstrand: You have to look at both
17:26 dj-death: Lynne: thanks, I haven't tested much with vkd3d-proton
17:26 dj-death: Lynne: I need to do that
17:27 zamundaaa[m]: <danvet> "but maybe kms atomic ioctl..." <- At least in the case of KWin, boosting once the atomic commit is submitted would be too late, as I'm currently working towards only ever submitting buffers that are ready to KMS, to allow for cursor updates etc to happen later in the refresh cycle
17:27 danvet: zamundaaa[m], I know
17:27 danvet: I'm trying to get this thing unstuck somehow :-)
17:27 danvet: atm everything dies in "this isn't good enough for my use-case"
17:28 danvet: or in "we didn't fully plumb this through, it's just an igt"
17:28 cwabbott: gfxstrand: I'm finally looking at the vulkan common state thing, and I think vk_subpass_info isn't enough for us - we need to fill out the full thing
17:30 cwabbott: we need depth/stencil format, color formats, etc. to precalculate a rendering-bandwidth-per-pixel estimate
17:31 mareko: gfxstrand: I don't think it's tracked with load_input/store_output
17:32 cwabbott: I don't think it's useful to have a vk_subpass_info that has stuff that vk_renderpass_state doesn't
17:35 gfxstrand: mareko: IDK what you mean by "tracked". We don't have a divergence bit on variables today.
17:37 i509vcb: emersion Lynne: I guess I forgot to mention this was a hypothetical kmsro device
17:37 emersion: i509vcb: vulkan doesn't do kmsro
17:39 gfxstrand: cwabbott: Sure. Feel free to add more stuff.
17:39 i509vcb: unless I have the model wrong in my head, if the underlying device is kmsro expose the primary node makes no sense?
17:39 i509vcb: Though I'd think a second card for display hardware would belong to a device group then?
17:40 gfxstrand: cwabbott: Hrm... I'm trying to remember how I intended this to work now.
17:40 emersion: ideally for kmsro and EGLDevice, a single device would be exposed with both primary and render nodes
17:40 emersion: that's not the case atn
17:40 emersion: atm*
17:40 emersion: and considered a mesa bug
17:40 i509vcb: I did mention what vulkan would be expected to expose there, not EGL
17:41 gfxstrand: cwabbott: I think just expand vk_subpass_info as needed and make sure that data gets propgated to `vk_render_pass_state` as needed.
17:42 emersion: i509vcb: for vulkan i'd expect to see only one device with a render node, and nothing for the display-only device
19:01 Piraty: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8198#note_1843659 is back
19:01 Piraty: Newbyte: yes
20:12 Venemo: cmarcelo: have you resolved Daniel's concerns on MR 17922 ?
20:12 Venemo: cmarcelo: +26% branches is a pretty big deal
20:25 cmarcelo: Venemo: I've got the impression was the loop/continue issue (fixed now) that was causing the problem. But I'll ping him to double check.
20:38 Venemo: cmarcelo: I'm gonna run our fossils now
20:42 Venemo: cmarcelo: it still regresses branches in affected shaders, see https://pastebin.com/raw/GjFKySxL
21:08 alyssa: gfxstrand: i think NVK should be renamed NAK
21:08 alyssa: meaning Not A Kompiler
21:08 gfxstrand: alyssa: :P
21:08 alyssa: much more descirptive
21:09 karolherbst: too late
21:09 karolherbst: we already got t shirts
21:09 karolherbst: (and I have a sticker on my laptop)
21:09 gfxstrand: I still need to figure out how to get airlied his T-shirt
21:09 karolherbst: it would be really invonvenient for me to replace it
21:11 zmike: I have two stickers so it definitely can't be done
21:11 alyssa: you can just claim the stickers were for NVK all along
21:12 alyssa: Nvidia Very Kool kompiler
21:12 alyssa: not to be confused with codegen, also known as NVK, meaning Not Very Kool
21:13 gfxstrand: :P
21:21 gfxstrand: daniels: Someone's telling me they're unable to fork Mesa. Some "you've reached your project max" message?
21:22 daniels: gfxstrand: they need to click on the issue link in the banner telling them that they can’t fork until they do
21:42 gawin: as CI is disabled anywhere outside of main repo, only way to make a run is to create MR to main repo?
21:46 daniels: gawin: try now, you should be able to run in your repo
21:58 gawin: seems moving forward, thx
21:59 cwabbott: gfxstrand: the thing is, I'd have to expand vk_subpass_info until it's basically the same as vk_renderpass_state, at which point what's the reason for a separate type?
21:59 cwabbott: I'd rather just delete vk_subpass_info