07:28sima: mlankhorst, mripard1, tzimmermann, agd5f, jani, rodrigovivi, dolphin, robclark, whoever else cares: I just rolled drm-fixes to -rc1
07:29sima: agd5f, oh and -fixes pr please a bit earlier than Fri my evening :-P
07:29sima: or ping me here that it's still coming, I almost missed it
08:19jani: sima: roger. rolling drm-intel-next-fixes and drm-intel-fixes as well
08:20javierm: tzimmermann, sima: do you think the approach makes sense to handle this issue? https://lists.freedesktop.org/archives/dri-devel/2023-November/430455.html
08:20javierm: tzimmermann, sima: I haven't had a confirmation from Andrew yet but I'm confident that understood what's going on in his platform
08:21tzimmermann: javierm, from a quick look, that patch makes no sense
08:22tzimmermann: i didn't yet read the whole thing, but simpledrm shouldnt remove other drivers
08:22jani: sima: hmm. did you have to solve a conflict in drm-misc-next? I'm hitting one
08:22sima: jani, yeah but it's pushed now
08:23javierm: tzimmermann: Andrew patch makes no sense, agree. I was asking about my suggested patch
08:23sima: oh actually no, I have one in the fixup branch now too :-/
08:23tzimmermann: ah, ok
08:23jani: sima: why doesn't dim cover it for me? https://paste.debian.net/hidden/b7fa8518/
08:24sima: jani, because dim rebuild-tip failed on the topic/core-for-CI branch conflict
08:24sima: I pushed my drm-misc-next fixup out for you since I have to do some interviewing right now ...
08:24sima: uh wait, raw git push failed
08:25jani: naughty naughty
08:25sima: jani, ok should work now
08:26sima: and I guess 61d9b3364a6ba276a972ae88f6cad5a8a31c3243 should be dropped?
08:27tzimmermann: javierm, we do have such a workaround as in your patch in our trees at suse. it happens on the rpi as well
08:28javierm: tzimmermann: yeah, I guess that happens on any DT platform that does a EFI boot and the firmware/bootloader adds a EFI GOP handle for the Linux EFI stub to look at
08:29sima: javierm, only question I have is whether sysfb could load before this of setup? but I guess at worst we have a bit more driver handover in that case ...
08:30sima: maybe another one: should simple-framebuffer really be preferred over efi?
08:32javierm: sima: 1) it can't happen before, because the OF initcall is arch_initcall_sync() while the sysfb initcall is subsys_initcall() (and we are talking with tzimmermann to move it later to device_initcall_sync())
08:32sima: ah if that assumption is already baked in, then I guess it's good
08:32javierm: sima: 2) AFAIU the DT is the real source of truth on an DT platform, only the EFI boot services are used, not the EFI runtime services on these platforms
08:33sima: ah that might be good to put down as justification and maybe check with robher or someone who knows ...
08:33javierm: sima: the only reason they use EFI to boot is to make the boot path more standard AFAIK. Am I correct here tzimmermann ?
08:33javierm: sima: Ok
08:34javierm: sima: I'll mention these two things in the commit message as well and post as an RFC then, Cc'ing robher and other stake holders
08:36jani: sima: I'm taking care of core-for-CI
08:36sima: jani, thx
08:42jani: sima: taking a while to build it after rebasing...
08:43sima: jani, yeah I don't think we have a huge queue of people who want to push their patches on Monday morning :-)
08:53jani: :)
09:01emersion: enunes: did you have the chance to finish up your patches to drop wl_drm auth from v3d? or not yet?
09:01emersion: just making sure i haven't missed anything
09:04enunes: emersion: not yet, I should go back to look into that today, hopefully what it needs is to figure out how to pass the into from the dmabuf feedback common layer
09:04enunes: itoral ^
09:04emersion: ah, let me know if you have questions regarding this
09:13MrCooper: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26170 kind of impressive how many bugs they managed to squeeze into two lines
09:15emersion: to their credit, i don't think i ever groked the divisor/remainder thing
09:15emersion: and how generic it is for timestamps vs seqnos
10:33itoral: enunes: I guess we would still have the same issue with allocating display buffers we currently have though, right?
10:40enunes: itoral: I think it would work as it would receive the display device to allocate buffers when the scanout flag is set, and the gpu device when buffers are going to be composited anyway
10:41enunes: I think we concluded that authentication is not actually needed to allocate dumb buffers in general so we could keep the dumb buffer allocation as the fallback for now
10:42itoral: oh cool, if that is true then yeah, that would fix the issue
10:42itoral: looking forward to seeing that, let me know if you need assistance :)
10:47itoral: about allocating dumb buffers... I just checked and drm_mode_create_dumb_ioctl doesn't seem to require DRM_AUTH... was that changed though? I am pretty sure we needed that before
10:47emersion: yeah i'm pretty confused by this as well
10:47emersion: i tried to dig the git history but didn't find anything
10:48itoral: uh... weird :-/
10:51emersion: oh, found it
10:51emersion: had to go through quite a few refactoring commits
10:52emersion: fb30edf5e4b4 ("drm: make buffer management work without DRM_MASTER")
10:52emersion: the motivation for allowing CREATE_DUMB are… questionable
10:55itoral: ah, good find, so I was not crazy after all :)
10:56emersion: sima: hm so mripard is suggesting auto-creating a DMA heap by default in DRM core, and adding a func pointer to let drivers override this behavior
10:56emersion: (correct me if i'm wrong mripard)
10:57emersion: but i'm arguing we should really enable this on a per-driver basis, because each hw is different, and heaps should not be named by their purpose, but by their nature
10:58emersion: do you have an opinion on this?
10:58emersion: like, i don't think always creating a heap which behaves like dumb buffers is a proper solution
10:59emersion: (and then there's the "should we have more function pointers, which make DRM core more of a midlayer" discussion, but let's leave that for later)
11:01mripard: note that I don't really care about the automatic part, but I do care about fixing all the affected drivers at once
11:01mripard: if that's with a helper + a coccinelle script, that's fine by me
11:03emersion: (i'm arguing that doing all at once is too much work, difficult to test, and not required)
11:06emersion: (we have 26 kmsro drivers in Mesa)
11:08MrCooper: emersion: it's not clear to me why the CRTC getting pulled into an atomic commit guarantees that a flip is programmed for a plane
11:08emersion: MrCooper: you can see it that way: a FB_ID no-op update is no different from another prop no-op update
11:09emersion: the driver doesn't get to know which prop has changed, only a list of affected planes/CRTCs/etc
11:09emersion: sorry
11:09emersion: doesn't get to know which prop was included in the atomic commit
11:10emersion: the testing zamundaaa has done also confirms this
11:11sima: not sure about context (some doc patch I can't find?) but it's just a crtc vblank event that the driver has to push through properly
11:11sima: or the helpers get really unhappy
11:11pq: sima, triggering a VRR new scanout cycle
11:11pq: in hardware
11:11sima: yeah just somehow getting the crtc state into the commit should be enough for that
11:12pq: cool
11:12sima: since that's all that's needed to require the crtc event machinery to go through a frame
11:12emersion: yup
11:13sima: hm ... might be that current drivers kinda slack and end up delaying the flip in the VRR case
11:13sima: but I think for most hw that would need active consideration in the code
11:13MrCooper: yeah, not seeing the connection between the event machinery and triggering scanout with VRR
11:14MrCooper: anyway, it sounds like a flip is programmed for the plane if any state was set for it, which would cover it
11:14sima: so kinda instead of immediate vrr flip you get the one that you'd get if you just do a crtc sequence ioctl wait
11:14sima: which is maybe not what we want, because that atomic commit does hold up everything else :-/
11:15sima: unlike the crtc sequence ioctl wait
11:16sima: pq, emersion I guess you want this for frame pacing without hitting one of the implied full damage cases if you'd add a plane too?
11:16emersion: the context is async atomic flip
11:17emersion: but tbh the VRR case is same for both legacy and atomic, so i'm not very worried
11:17emersion: IOW: enabling async on atomic can't introduce a new VRR-related bug that legacy didn't have
11:19sima: yeah I was just checking how we handle target timestamp and noticed that we dont
11:19sima: I thought we had patches so you could try and precisely schedule VRR frames
11:19MrCooper: must have been a nice dream :)
11:20sima: we don't even have the target sequence like legacy flip ...
11:22MrCooper: right, that might actually be nice to avoid flips getting needlessly delayed
11:22MrCooper: e.g. with very long vblank
11:23sima: I think documenting that any crtc flip (even if empty) should vrr flip asap would be good, and then perhaps fixing drivers if any are not getting this corner right
11:23sima: scheduled timestamp would be one on top I think (for no-jitter media playback or something like that)
11:24emersion: sima, what do you mean by "empty fli"?
11:24emersion: empty flip*
11:25sima: just the crtc, no plane
11:25sima: or too obscure corner case to care?
11:26emersion: nah, i think it's good to document that
11:26emersion: it can be surprising for user-space, if we don't know about it
11:26emersion: the exact rules for a CRTC to page-flip, that is
11:26MrCooper: seems somewhat surprising that there would be a flip with no plane state in the commit
11:27emersion: you could flip just to switch VRR on/off
11:27emersion: without any other change
11:27emersion: i'm sure there are more cases like this
11:27MrCooper: then you can re-set the primary plane's FB_ID to the same value
11:28emersion: we're talkiong about documenting current behavior here
11:28emersion: you can not do it, but also, why not?
11:29MrCooper: because a flip is conceptually a plane operation, not a CRTC one
11:29emersion: for user-space that tries to only include what's actually changed in atomic commits, it would be a pain to handle
11:29emersion: inconsistent at least
11:30emersion: hm, i don't think a flip is a plane op
11:30emersion: the final on-screen image is affected by more than just planes
11:30emersion: GAMMA_LUT, for instance
11:30MrCooper: so changing the FB_ID of an overlay plane isn't a flip?
11:31emersion: i think we should definitely allow changing the CRTC GAMMA_LUT, without any planes, and that should trigger a page-flip
11:32pq: Flip is to do with FB_ID, while a (screen, CRTC?) update is more, IMO. Everyone just talks about "flip" then they mean "update".
11:32emersion: right, "flip" might not be the right world, but it's used in the PAGEFLIP_EVENT flag
11:33pq: good point
11:33emersion: you "flip" the CRTC from the previous state to the next state...
11:33pq: traditioanally an update without FB_ID didn't make much sense
11:35pq: Didn't "page flip" originally mean flipping individual pages to other pages from what the hardware was reading as an FB?
11:36pq: since then we've re-purposed "page flip" to mean "buffer flip", and seems it's now generalizing into a "state flip"
11:36MrCooper: maybe, would have to be last millennium though
11:37MrCooper: one reason why I'm being pedantic is that at least with AMD GPUs, I suspect triggering scanout with VRR requires actually programming a plane address register
11:38MrCooper: i.e. the driver would need to program plane HW state for a CRTC-only commit
11:55MrCooper: pq: I started working on GPU drivers at the end of last millennium, and since then I haven't seen the term "page flipping" refer to anything other than programming a different buffer address
11:56MrCooper: my mental image has been that of flipping a page in a book, as opposed to drawing something new onto the same page
11:56pq: I've probably read some historical remarks.
11:57cwabbott: hakzsam: is there any reason radv doesn't support descriptorBufferCaptureReplay?
11:58hakzsam: cwabbott: yes, because no tool supports it AFAIK, see https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19952
12:50tzimmermann: jfalempe, i've recently tested the bmc connector for ast. but it doesn't work very well
12:58jfalempe: tzimmermann, what's the problem ? I was only able to test it remotely.
12:59tzimmermann: quite a bit on my systems. i think i should have known beforehand. sorry.
12:59tzimmermann: first, the console is limited to 1024x768
13:00tzimmermann: because it hardcodes the resolution for mirrored outputs
13:00tzimmermann: (the bmc connector mirrors the output of vga on my systems)
13:01jfalempe: do you have a monitor connected on vga ?
13:01tzimmermann: yes
13:01tzimmermann: the code is here: https://elixir.bootlin.com/linux/v6.6/source/drivers/gpu/drm/drm_client_modeset.c#L310
13:02jfalempe: was it different without the bmc connector ? I think it will also use the same resolution as vga.
13:02tzimmermann: it's a shortcut in the client code.
13:02tzimmermann: the drm client calls it 'cloned mode'. it sets 1024x768 by default
13:03tzimmermann: but that's just a minor annoyance
13:03jfalempe: you mean your vga screen can do better than 1024x768, and due to the "cloning" mode, it is now restricted ?
13:03tzimmermann: yes
13:03tzimmermann: but the real problem in in gnome, which fails to handle this setup entirely
13:04tzimmermann: it starts, but cannot set any resolution afterwards
13:04tzimmermann: IDK much about gnome, but it looks like gnome tries to use both outputs independently
13:05tzimmermann: and that fails
13:05jfalempe: yes, it should see 2 screens. is there a way to advertise cloned monitor ?
13:06tzimmermann: for example, the gnome settings offer a 'extend' mode, where the output of both connectors form a larger desktop
13:06tzimmermann: but all we can do is 'mirrored', of course. both outputs alsways show the same
13:07tzimmermann: AFAIU, gnome should detect this automatically.
13:07airlied: isn't there only one crtc on those?
13:07airlied: I don't think the hw can do anything except mirrored
13:07jfalempe: I though as both output uses the same crtc, it should behave this way.
13:07tzimmermann: right.
13:08tzimmermann: there are two encoders. both have the same bitmask for possible crtcs (i.e., 0x01)
13:08tzimmermann: so gnome should understand that both encoders hang on the same crtc
13:08airlied: and it should pick 1024x768 since there isn't a hotplug signal for the bmc I don't think
13:08airlied: if the bmc had a hpd line it might be possible to do more
13:09emersion: my understanding was that cloned CRTCs were a thing of the past, not worth supporting
13:09tzimmermann: airlied, i'm not aware of such a feature
13:09airlied: no they are very much a thing of the present on servers
13:09tzimmermann: emersion, it's back! :)
13:09airlied: tzimmermann: yes I've never heard of it either
13:10airlied: so limiting things to 1024x768 and mirrored is as good as we can do I think
13:10tzimmermann: i'm ok with the 1024x768. the gnome issue is much worse
13:10tzimmermann: i did not expect it to fail at that
13:11emersion: what driver is used on these servers?
13:11tzimmermann: because much old HW only has mirrored
13:11tzimmermann: emersion, ast
13:11emersion: please upload a drmdb dump if there isn't one already
13:11tzimmermann: we've added a conenctor that reprsents the BMC
13:11emersion: it's important for us compositor devs
13:11tzimmermann: emersion, ok
13:12tzimmermann: emersion, you mean: https://drmdb.emersion.fr/ right?
13:12emersion: yea
13:12tzimmermann: what happens when i upload something there?
13:13emersion: there is one snapshot, but no cloned CRTCs there: https://drmdb.emersion.fr/snapshots/eebf2e42d1d9
13:14emersion: it's made visible on the website, and then we can understand better what features are available where
13:14tzimmermann: emersion, we've added that second connector with linux 6.6
13:14emersion: right, that one was captured on 5.4
13:15tzimmermann: it represents the bmc's output.
13:15tzimmermann: it fixes the bmc output on systems without connected vga cable
13:16tzimmermann: we could do without that connector, but looks like the natural solution
13:17emersion: hm
13:17emersion: yea, it makes sense to me
13:17emersion: the VGA monitor may have a completely different EDID and stuff
13:19tzimmermann: emersion, IIRC the problem was that as soon as one unplugs the vga cable, the driver thinks that nothing is being displayed and has no idea what's going on. so it reports minimal capabilities. with the bmc connector we can always have an active output with high resolutions, etc
13:20tzimmermann: jfalempe, i'm going to upload that data to simon's webpage. maybe that can be fixed in userspace
13:21emersion: please do note that implementing cloned CRTCs in userspace is far from being trivial
13:21emersion: can require quite a bit of restructuring and refactoring
13:22tzimmermann: emersion, may i ask why?
13:22emersion: because it breaks the "one connector == one swapchain" concept
13:23emersion: a lot of userspace has a single object/struct for each connector+CRTC pair
13:23tzimmermann: i see
13:24zamundaaa[m]: There's also not a lot of use cases with consumer hardware where you need display cloning with the hw, so supporting them isn't a priority
13:25emersion: on top of this, it's not a given that user-space will do "the right thing" by default
13:25tzimmermann: we could handle this case in the driver. it just doesn't seem right either.
13:25emersion: weston is the only compositor i know that supports cloned CRTCs, and i think it will only try that if told in the config file
13:26tzimmermann: hearing that, i guess i better start working on a kernel solution quickly :)
13:27emersion: i do agree that the user-space solution would be cleaner in theory
13:27emersion: sadly that's not going to work with today's user-space
13:28emersion: if we want to support both, we could have a DRM client cap "i support cloned displays"
13:28pq: tzimmermann, FWIW, Weston implements this "shared CRTC" clone mode, if you happen to write your weston.ini that way.
13:28tzimmermann: my other appraoch is to model this as drm bridge. it would not affect the user space
13:29tzimmermann: so the tx-chip itself is a bridge and the bmc is a bridge, and the conenctor is a bridge conenctor, like this: enc -> tx -> bmc -> connector
13:30tzimmermann: and the bmc bridge would return/detect something if the earlier bridges didn't
13:30pq: is most userspace unable to understand that some connectors might not have any free CRTCs left? So it would light up one connector at least.
13:33emersion: yeah, i'd expect current userspace to light up a single CRTC
13:51jfalempe: tzimmermann, thanks, (I had to go to a meeting, I'm back)
13:52LaserEyess: 08:14 < emersion> right, that one was captured on 5.4
13:52LaserEyess: that one is mine, I'm 90% sure
13:52LaserEyess: I can make another one
13:52emersion: ahah
13:53LaserEyess: done
13:54LaserEyess: https://drmdb.emersion.fr/snapshots/ce14d86544bf
13:54emersion: ty!
13:55LaserEyess: also used as a bmc driver, I was mostly curious what drm_info would show and how it would stack up against other drivers, never thought it would ever be relevant
14:40sima: emersion, the problem with a "I support cloned modes" userspace flag is that cloning is the old way
14:40sima: like xrandr can do it
14:40sima: tzimmermann, pq ^^
14:40emersion: hm, right…
14:40emersion: if we wanted, we could have the cap default to 1
14:41emersion: and let user-space degrade it to 0
14:41sima: and there really isn't much reasonable we can do in the kernel, because "cloned but lower res" or "which one do you want at which res" is very much a policy/config thing
14:41emersion: yeah, agreed it's not super great
14:41sima: like what we could do is maybe a really silly dbus service which does the clone setup for the dumb-as-brick compositor ...
14:42sima: like hand it a leased drmfd with only the lower-res connector
14:42sima: and when it enables that we enable the other output with the same reduced mode on the same crtc
14:43emersion: all compositors are dumb-as-brick atm :)
14:43sima: still horrendous, but avoids the "pass an Xorg.conf on the kernel cmdline"
14:44emersion: would probably work with hacks
14:44sima: yeah just about never turn off the crtc with atomic, because it'll fail due to the phantom connector
14:44emersion: (hacks because it's not easy to figure out when a leased CRTC becomes enabled)
14:44emersion: (and probably more issues)
14:44sima: with legacy crtc it works, because for reasons that clears all connectors implicitly
14:44emersion: ah yeah
14:44emersion: fun
14:45sima: emersion, inglorious polling, lots of it
14:45emersion: :D
14:46sima: tzimmermann, gnome falling over because it can't light up the 2nd connector is a more general bug though
14:46sima: there's lots of cases where you have more connected outputs than crtc possible
14:50MrCooper: yeah, a mutter issue should be filed about that if there isn't one yet
15:42tzimmermann: emersion, https://drmdb.emersion.fr/snapshots/4c406f948ee9
15:42emersion: ty!
15:43tzimmermann: for the bug here, it would also be ok if gnome would simply ignore certain connectors
15:43tzimmermann: the bmc is a virtual connector
15:44emersion: that's what i'd expect current compositors to do -- the fact that GNOME just freaks out is definitely something that should be fixed as said earlier
15:46tzimmermann: or could we export a preferred connector for each crtc?
15:47airlied: tzimmermann: can you try removing any monitors.xml you might have and see if gnome copes
15:50tzimmermann: airlied, there's no monitors.xml (in ~/.config)
16:08tzimmermann: emersion, https://gitlab.gnome.org/GNOME/mutter/-/issues/3157
16:27MrCooper: thanks tzimmermann
16:41jenatali: Oof. lower_io_to_temporaries is completely broken for the case where one var is loaded in multiple different ways
17:05gfxstrand: dschuermann, mareko: Does the v_mul_legacy_f32 instruction always flush denorms? Does it have a flag for flushing denorms?
17:05gfxstrand: On NVIDIA, MUL.FMZ always flushes denorms so I'm wondering if we can just make that the required behavior of the NIR op and optimize accordingly.
17:10gfxstrand: jenatali: Uh, what?!?
17:11jenatali: gfxstrand: There's a GLSL 4.40 test that does input - interpolateAtCentroid(input). That generates NIR which does load_deref, load_interp_deref, and then compares the results. When lowering I/O to temps, both of those loads immediately store to the same temp var, so the interp result overwrites the base load
17:12gfxstrand: Oh, yeah, you should really only use it for outputs, not inputs
17:12gfxstrand: IDK why you'd use it for inputs
17:12jenatali: Yeah we were just setting the nir flag and the GL frontend used that to apply it to inputs and outputs
17:13jenatali: I don't really know why you'd use it for inputs either
17:13emersion: sima: do you have thoughts on the dma heaps thing? or no time for this? or should i ping you some otehr time?
17:14jenatali: Surprisingly, just removing the nir flag caused almost 0 regressions and fixed a test, so... guess I'm just gonna do that
17:14gfxstrand: :)
17:17jenatali: And that should clean up my last failures from GL4.4, hooray
17:17gfxstrand: \o/
17:33jenatali: Oh and 4.5 is really easy, I think...
18:08sima: emersion, is there a link with more details on what mripard wants to do?
18:08sima: but at first glance that's going to fail, or at least not solve any problem and probably just cause confusion
18:10emersion: sima, https://lore.kernel.org/dri-devel/tmsf75w3iskpvx2dxgzpk4vn7g6jpfdgdq2qv3nl5i4ocawzz4@ihcwmnq5gval/T/#u
18:10emersion: that's an interesting looking Message-Id :o
18:13sima: emersion, I'm not seeing where mripard suggests some default stuff in that thread?
18:13emersion: part of it is "we should make it generic"
18:13sima: ah got confused by lore's thread display, I didn't see all the replies
18:14emersion: and "Yeah, I agree here, it just seems easier to provide a global hook"
18:14emersion: yeah, i "expanded flat", maybe wasn't the right thing to do
18:15sima: so the problem is, for a lot of cases we flat out don't yet have existing dma-buf heap implementations
18:15sima: like page allocator (compatible for gem shmem helpers) and cma heap exist iirc
18:15sima: but virtio needs virtio buffers, vmwgfx needs vmwgfx buffers
18:16sima: all the discrete need vram buffers
18:16sima: and there's no dma-buf heap for these even
18:16sima: so if the goal is to roll this out for everyone right away it means you get to look at around 100 drivers and write a handful of dma-buf heaps
18:16sima: that's ... not going to happen
18:17sima: so driver-by-driver like we do with every other feature, but broad enough consensus that it'll actually work for all of them, including the funny ones
18:17emersion: sima, i believe mripard wants to expose a heap for all display drivers involved in a split/render SoC
18:18emersion: to stop (ab)using dumb buffers in Mesa
18:19sima: yeah, but don't try to get there with a default
18:19sima: cause it's just wrong in a bunch of cases and so will just teach generic userspace that it cannot rely on that info
18:20emersion: okay, sounds like we agree here
18:20sima: I guess what you could try is include it as part of gem helpers
18:20emersion: yeah
18:20emersion: sima, do you have opinions for the naming of the heaps?
18:21sima: so that when you use shmem helpers you automatically get a sysfs link to the system heap
18:21sima: and cma helpers gives you the sysfs link to cma heap
18:21sima: and everyone else is too special and gets nothing
18:21sima: or maybe more motivation to switch to helpers :-)
18:21emersion: i think something like "vc4_cma" would make more sense than "kms_cma" or "kms_scanout"
18:21sima: why do you want to name the heaps?
18:21emersion: heaps need to have a name
18:22emersion: they show up in /dev/dma_heap/
18:22sima: yeah but we have 2 default ones
18:22sima: those should be enough ...
18:22emersion: hm, i'm confused
18:22emersion: the point of this RFC patch is to add a VC4 CMA heap
18:22sima: why
18:23emersion: to… allocate scanout capable memory
18:23sima: yeah but you just need contig memory
18:23sima: the cma heap in drivers/dma-buf/heaps/cma_heap.c should be enough?
18:23emersion: hm
18:24sima: if drivers create random heaps then funny stuff is going to happen
18:24sima: plus angry missives from dma-api maintainers I'm sure
18:24emersion: i agree that centralized heaps created outside of DRM drivers are better
18:24sima: like for vmwgfx I think a vmwgfx heap in the vmwgfx drivers is probably ok
18:25emersion: but i didn't think that regular system CMA memory would do just fine?
18:25sima: but already for virtio there's common infra so you can share virt buffers across display and camera and other things
18:25sima: so probably want a virt_heap in the shared repo
18:25sima: ttm I frankly haven't thought about too much yet
18:25emersion: i need to read up more vc4 code i think
18:25sima: emersion, so I think depending upon dt you can have more than one cma area
18:25sima: and some of them are per-device
18:25sima: and those aren't exposed yet in the cma-heap
18:26sima: so if that's the case, we'd need to expose these driver heaps somehow
18:26emersion: the CMA areas have different properties right?
18:26sima: I think you can still have multiple devices using the same cma heap
18:27sima: tbh I'm not sure how much this is the case in practice, but we definitely discussed the need for stuff where you have 2 cma heaps
18:27sima: for some "my memory controller is fucked" reasons
18:27emersion: like write-combined and stuff?
18:28emersion: cacheable etc
18:28emersion: USWC
18:28emersion: note how unfamiliar with this stuff i am :P
18:29mareko: jenatali: doing lower_io_to_temporaries for inputs may be required for variable indexing of inputs
18:29sima: hm from a few quick git grep it looks like only special reserved memory regions for per-device have these private heaps
18:29jenatali: mareko: yeah fair enough. DXIL doesn't need that though
18:30mareko: jenatali: I do know that there are other problems with lower_io_to_temporaries, for example, if variables aren't sorted by location, the pass breaks
18:30jenatali: Ooh that's fun...
18:31jenatali: All the more reason to rely on it less
18:31sima: emersion, ok I got confused, I have no idea when/how this is actually a device-private cma and when not in practice
18:33sima: but for a given struct device we can check whether it's using the default cma or not, so could still implement this generically
18:33sima: I think at least
18:40mareko: gfxstrand: denorm flushing is controlled by a global enable bit for all FP32 instructions except min/max, radeonsi always flushes FP32 denorms
19:34emersion: sima, what is the difference between dma_alloc_wc() (used by vc4 for dumb buffers) and cma_alloc() (used by the CMA DMA-heap)?
19:35sima: about 200 layers of lasagna
19:35emersion: yum
19:35sima: so the dma_alloc are for devices, and are meant to fully abstract away details like where & how memory is allocated
19:35sima: with the very neat idea that for the driver it doesn't matter whether it's contig memory
19:36sima: or a pile of pages that gets remapped by an iommu to just look contig
19:36emersion: hm, i see
19:36sima: the reality tends to be seriously more disappointing in many cases, but that's aside
19:36sima: the cma_alloc is the actual contig alloc function
19:36sima: which the dma-api uses to implement the abstract magic
19:36emersion: does cma_alloc give out WC memory?
19:37sima: and which the dma-buf heaps uses to implement the same magic (because it'll call into dma_map apis to get the rest of the way to the same situation as directly calling dma_alloc)
19:37sima: uh
19:37sima: that's I think where you need the right heap and all that
19:37sima: and arch specific nonsense
19:39emersion: you mean we may need to add a cma_wc heap?
19:39sima: emersion, given that both heaps seem to do explicit cache management I think you get wb
19:39sima: and lots of flushing ...
19:40sima: emersion, given how much of userspace consistently uses begin/end_cpu_access dma-buf ioctl, probably yes
19:40sima: or you just forget cpu access for these
19:40emersion: user-space is missing these?
19:40sima: I think most doesn't bother and gets seriously surprised when they get a wb dma-buf and the resulting cache dirt :-)
19:40emersion: i mean this is new uAPI kind-of so we don't need to be held back by old user-space
19:42sima: quick grep says agx and iris gallium drivers and vulkan wsi helpers even bother to use DMA_BUF_SYNC
19:42sima: emersion, ah I guess then you could try to fix the mesa drivers as they go boom
19:42emersion: yeah
19:42sima: shouldn't be needed for more than glReadPixels and similar things
19:43emersion: yup
19:43sima: might even be faster for that since you get full caching
19:43emersion: so tl;dr is that the existing CMA DMA-heap should give us WC and we don't need another one?
19:44emersion: hm i guess i'll just try and see with that existing heap
21:47javierm: emersion: don't know about the CMA dma-buf heap, but the system dma-buf heap is cached. There were some patches to add a WC variant: https://patchwork.kernel.org/project/linux-media/patch/20201029001624.17513-8-john.stultz@linaro.org/
21:48javierm: so a cma_wc makes sense if the current one wb
21:48javierm: *is wb
22:51sivileri_: Hello folks, my name is Sil. I've been working on video features for frontend/va and the gallium d3d12 driver
22:51sivileri_: We're considering adding a new video frontend, a video encoder component that implements the Windows MFT (media foundation transform) interface on top of the gallium video pipe interface.
22:52sivileri_: As part of that, we'd be looking to extend the pipe interface and other video frontends like VA with some new features, e.g. VUI coding, byte-based slice encoding, etc
22:52sivileri_: Anybody have any opinions or thoughts on that? No timeframe for when we'd be looking to upstream, but is that something that upstream folks would like to see?
22:53emersion: hi, are you working at Microsoft with the d3d12 folks? if not, you should probably get in touch with them
22:54jenatali: Yes, Sil's on our team
22:56emersion: is MFT somewhat similar to vaapi?
22:56emersion: have you identified that kind of gallium changes that would be necesary?
22:57emersion: the kind*
22:58sivileri_: Yeah, MFT is like VA (maybe a bit higher level like not exposing reference frames, B frame reordering, etc to the app above)
22:59sivileri_: So far, I identified only extensions necessary to the existing p_video_* files
22:59emersion: missing features?
22:59sivileri_: exposing new H264/HEVC syntax like some missing VUI params, some missing SPS params
23:00sivileri_: exposing slice coding specifying max bytes per slice instead of macroblocks, etc
23:00jkqxz: What is the motivation for making an internal MFT interface rather than making a VAAPI (or Vulkan) MFT? MF is so high level that there isn't really much there.
23:00sivileri_: (which btw would also be supportable by VA frontend)
23:01jkqxz: The header stuff is all just packed headers in VA where you can do whatever you want.
23:02sivileri_: The headers today are coded by the drivers (e.g radeonsi, d3d12) from the pipe structures, and the MF frontend would just consume those instead of re-coding/repacking the headers
23:03sivileri_: The MFT/CodecAPI can sit on top of the pipe_video* interface directly without having to also map MFT/CodecAPI -> Vulkan/VAAPI -> pipe. And for VAAPI for example we'd need to pull in the libva runtime too
23:05sivileri_: MF would control things like the GOP pattern, B frame submission reordering (display order vs encode order)
23:05sivileri_: (MF frontend)
23:06sivileri_: Can also manage things like intra-refresh tracking from -len, period args onto the intra-refresh pipe interface
23:06jkqxz: Urgh. I hadn't realised that Mesa VA packed headers were implemented by parsing the supplied header and then trying to regenerate the things it knows about it on the other side. That's horrifying.
23:09jkqxz: Yeah, standalone MF frontend makes more sense to avoid that brokenness then.
23:11Lynne: also, vulkan doesn't support slice decoding yet
23:12Lynne: while vaapi doesn't really support frame-level decoding (if that matters, the API supports it, but programs like ffmpeg do not, no idea about drivers, but guessing not)