01:16jenatali: karolherbst: should I take your comments as an ack? Guess I need to bug Faith unless there's someone else to review that pass
01:18karolherbst: jenatali: maybe? I'm not that familiar with the pass, all I fixed was some nir validation fail :')
01:20jenatali: Fair enough. I'll ping her tomorrow I guess
08:18tzimmermann: javierm, hi javier. what's your opinion on that lengthy discussion at https://lore.kernel.org/dri-devel/CAAhV-H4hkx0BJ-Y6rNCNgDw_yr4S7uDcTV_249EK-4AbqU5BDg@mail.gmail.com/T/#t ?
08:20tzimmermann: it seems like the aperture code has a bug, so that simpledrm binds after i915
08:26javierm: tzimmermann: I just noticed your email and answered. I need to read that lengthy discussion after having some coffee
08:26tzimmermann: ok :)
08:27tzimmermann: please ping me once you're done. i have a question
08:29javierm: sure
09:41javierm: tzimmermann: Ok, went through that thread now
09:41javierm: tzimmermann: The problem seems to be that when having both simpledrm and a native DRM driver built-in, simpledrm is probed after the native DRM driver removed the conflicting devices?
09:41tzimmermann: yes
09:41tzimmermann: that is my understanding
09:42tzimmermann: but it should not happen, as i outlined in my reply to the patch
09:42javierm: tzimmermann: yes, the aperture infra should call sysfb_disable() from aperture_remove_conflicting_devices()
09:44tzimmermann: and AFAIU the i915 device is the primary vga device
09:44tzimmermann: otherwise it wouldn't be handled by simpledrm at all
09:45tzimmermann: i don't understand what's happening
09:45tzimmermann: javierm, is it possible to probe devices in parallel?
09:46tzimmermann: such that i915 would probe the native device, while simpledrm probes the platform device?
09:48javierm: tzimmermann: I think there's a way to make the probe be async
09:48tzimmermann: javierm, ok i keep that in mind
09:50javierm: tzimmermann: https://elixir.bootlin.com/linux/latest/source/include/linux/device/driver.h#L23
09:51tzimmermann: hyperv_drm seems to use that, but nothing else within drm
09:51tzimmermann: javierm, there's a question/idea i have: the platform device has currectly no conenction to the pci device. maybe we should make the platform device a subdevice of the native one?
09:52tzimmermann: the native device is available from vga_default_device()
09:53tzimmermann: that would allow to model such dependencies
09:53javierm: tzimmermann: yeah, the only connection is the aperture. What I don't understand here is why that is not enough
09:53javierm: do you think is a race then?
09:53tzimmermann: neither do i
09:53tzimmermann: no idea
09:53tzimmermann: i hope they can further debug it
09:54javierm: I think your question is the crux, is sysfb_disable() being called at all?
09:55javierm: that would explain why simplerm is probed at least
09:55javierm: *simpledrm
09:55tzimmermann: and it should unless the i915 device is not the vga default
09:55tzimmermann: javierm, do you know if there's a clear order for probing a parent and its subdevices?
09:55javierm: tzimmermann: I think there is not, unless device links are used
09:55tzimmermann: device links?
09:56javierm: https://www.kernel.org/doc/html/latest/driver-api/device_link.html
09:58tzimmermann: i see. that looks like something different
10:01javierm: so AFAIU that's the only way to give some ordering to the device <-> driver match / driver probe
10:01javierm: otherwise is brute force and that's why we have -EPROBE_DEFERRAL
10:03tzimmermann: javierm, i think it solves a different problem
10:12javierm: tzimmermann: so what Huacai's patch did is to call sysfb_init() in the subsys init call to register the "simple-framebuffer" device as early as possible
10:13javierm: tzimmermann: then the i915 pci device is registered by ACPI, that matches the i915 driver and is probed
10:13tzimmermann: ok?
10:14javierm: then i915 calls drm_aperture_remove_conflicting_pci_framebuffers() and since is the primary device, it should call calls sysfb_disable()
10:14tzimmermann: exactly
10:15javierm: so either that does not happen (because is not the primary?) or the simpledrm driver is matched in the meantime
10:16javierm: in the window between i915 is probed and the simple-framebuffer device is removed
10:18javierm: tzimmermann: could that be possible? Only if using that .probe_type = PROBE_PREFER_ASYNCHRONOUS but neither simpledrm nor i915 set it?
10:18tzimmermann: idk
10:18javierm: yeah, me neither
10:19tzimmermann: javierm, there's a comment about sysfb running after PCI at https://elixir.bootlin.com/linux/v6.5/source/drivers/firmware/sysfb.c#L130
10:19tzimmermann: can we move the sysfb code to a later point?
10:21tzimmermann: something like: (1) probe pci bus and devices, (2) sysfb, (3) probe pci drivers in modules ?
10:21javierm: tzimmermann: it used to be in device_initcall() before Huacai's patch changed to subsys_initcall() and that seemed reasonable to me
10:21javierm: tzimmermann: but yes, I think you are correct and we could make it sure that happens after all the real devices have been registered by ACPI, OF or whatever
10:22tzimmermann: that would put i915 before sysfb; unless it's a module
10:22javierm: probably device_initcall_sync() or even late_initcall()
10:23javierm: tzimmermann: yes, and sysfb_init() would then be a no-op because will check if is disabled already
10:24tzimmermann: exactly. the sysfb code would return immediately
10:24javierm: tzimmermann: can you mention in the thread and propose a patch? And also update the comment that should not only happen after PCI but after all the devices have been registered
10:25tzimmermann: well, it's just an idea for now. i'd like them to first debug this further
10:25javierm: that makes sense, it will delay a little bit when the simple framebuffer is present but seems the correct thing to do
10:25javierm: tzimmermann: Ok
10:26javierm: tzimmermann: regardless, I think that you are correct that should happen after device_initcall()
10:26tzimmermann: but more generally speaking, we should have a clear idea about the dependencies here
10:26javierm: tzimmermann: ideally all drivers should request their memory regions and this handled at the device core
10:28karolherbst: gfxstrand, jenatali: any opinions on this MR? It adds a couple of workarounds for Intel's DPCPP which arne't too horrible.. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25701
11:20MrCooper: Mesa advertises EGL configs which claim RGB10 is renderable with GLES, but is_format_color_renderable says it's not; which is correct?
11:51javierm: pq, emersion: any thoughts on https://bugzilla.kernel.org/show_bug.cgi?id=218115? The commit mentioned in the report is 01f05940a9a7 ("drm/virtio: Enable fb damage clips property for the primary plane")
11:52javierm: which just makes the FB_DAMAGE_CLIPS property to be exposed by the virtio-gpu driver
11:52emersion: javierm: my first guess would be a bug in the driver's implementation of FB_DAMAGE_CLIPS?
11:53emersion: since i haven't received bug reports for any other driver
11:55javierm: emersion: the implementation is very basic, it just does a drm_atomic_helper_damage_merged() to get a merged damaged rectangle
11:55javierm: but also I didn't notice any flickering with mutter
11:56emersion: does mutter support this prop?
11:56javierm: emersion: it does, yes
11:56emersion: is your mutter version high enough?
11:56sima: javierm, aside, but virtio_gpu_crtc_atomic_flush looks very funny
11:57javierm: emersion: I tested to whatever was latest HEAD at the time the patch was posted
11:57emersion: the impl in wlroots is very basic as well, basically just forwarding damage
11:57sima: it sets output->needs_modeset = true, which virtio_gpu_primary_plane_update looks at
11:57sima: but the helpers call atomic_flush _after_ plane_update
11:57sima: should probably be in atomic_begin, or I'm just extremely dense rn :-)
11:59sima: also not sure whether the early return if there's no damage is entirely correct ...
12:00emersion: i haven't found a way to submit an empty damage region via the uAPI
12:00sima: flip without any damage
12:00emersion: so i just do full damage in that edge case
12:00sima: implies full damage
12:00emersion: empty damage = nothing changed
12:00emersion: full damage = everything changed
12:01sima: not how the internals work, see drm_atomic_helper_damage_iter_next() and iter->full_damage in that code
12:01sima: but yeah aside from the atomic_flush confusion I don't see anything obviously wrong in the virtio damage handling
12:03sima: ah I think I see how the atomic_flush works now, the check does catch the initial modeset for different reasons
12:03sima: and from then on the disable sets the output->needs_modeset for the subsequent enable, where the issue was per 3954ff10e06e4
12:03sima: so it does work
12:04javierm: sima: flip without any damage implies full damage? That's not what all the drivers that use this drm_atomic_helper_damage_merged() helper behave
12:04sima: just rather confusion way to implement this :-)
12:04sima: javierm, you get a full damage rect if there's none from that helper
12:05sima: emersion, I guess we have a bit a perf issue of needless uploads if you just move a plane around :-/
12:05emersion: :sadface:
12:05sima: so we might need a special "I actually mean no damage, trust me" knob somewhere
12:06sima: but not sure how to squeeze that into the current uapi without breaking things anywhere
12:06javierm: sima: yes, but returns valid == false
12:06emersion: is the no damage case this: FB_DAMAGE_CLIPS=0 and same FB_ID as before?
12:06sima: javierm, did you check or just guessing?
12:06javierm: sima: looked at https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_damage_helper.c#L309
12:06emersion: we could add a case for a zero rect
12:06javierm: valid = true is set in the loop
12:07emersion: no super great uAPI, but oh well
12:07emersion: we can't do empty blobs can we?
12:07sima: emersion, I don't think the code treats that different than not setting any damage blob at all :-/
12:07sima: javierm, yeah, but damage_iter gives you one rect for the full damage case
12:07javierm: sima: oh
12:07sima: see the special case handling around iter->full_damage
12:07sima: so it /should/ work
12:07sima: obviously reality begs to differ :-/
12:08javierm: sima: that helper makes the wrong assumption then
12:08sima: javierm, why?
12:09javierm: sima: because then the valid true or false makes no sense, it will always be true
12:09sima: javierm, it's false when the fb is off
12:09javierm: so we could just remove it and make it return void
12:09javierm: sima: right
12:09javierm: sima: nvm then, sorry for the confusion
12:09sima: which you already check for earlier in the virtio code
12:10sima: it's just trying to bullet-proof drivers because too many people test as far as "it boots" and never try to even turn the crtc off ...
12:10javierm: sima: not my implementation btw :) I just enabled the damage clipping since the update plane callback already had the logic
12:10sima: yeah tbh I have no idea why it doesn't work ...
12:11emersion: there is a distinction between "frame damage" and "buffer damage", fwiw
12:11emersion: the uAPI is the "frame damage", ie. since last frame
12:11emersion: it's different from the updated regions of the specific buffer since it was last submitted
12:12sima: uh the uapi is in plane src coordinates and buffer damage
12:13emersion: really?
12:13sima: yeah
12:13sima: well
12:13sima: you need both, so if you use the same buffer twice on 2 planes you need to give it the same clip rects
12:13emersion: https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html#damage-tracking-properties
12:13sima: well maybe viewport clipped
12:13emersion: "list of damage rectangles on a plane in framebuffer coordinates"
12:13sima: uh
12:13emersion: "area of plane framebuffer that has changed since last plane update"
12:14emersion: sounds like it's since last frame and in buffer coords
12:14emersion: which is what i would expect
12:15sima: maybe we have a confusion about the differences in semantics
12:16sima: emersion, frame damage in framebuffer cordinates sounds right after a bit of pondering
12:17sima: hm
12:17sima: ok I think this might actually be the bug
12:18sima: like if you page-flip, then userspace sends the damage against the previous frame
12:18sima: which works for manual upload cases
12:18sima: but virtio needs the damage against the previous use of that buffer for uploading that buffer correctly
12:19sima: which then fails
12:19sima: but it does work if you a) do only FB_DAMAGE_CLIPS while keeping the same FB_ID or b) only change FB_ID without any damage rects
12:19javierm: sima: which is the reason why damage clipping with DRM_MODE_PAGE_FLIP_ASYNC doesn't work when using the legacy KMS API right?
12:20sima: javierm, legacy flip is always implied full damage, so that one should work
12:20javierm: sima: I remember we discussed a similar issue before for https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2979
12:20sima: it's atomic flip with damage which fails
12:21sima: javierm, yeah I think we need a new drm_atomic_helper_buffer_damage_merged which handles this case
12:21sima: probably with a defense "upload everything" approach sadly
12:21sima: and then roll that out to all the drivers that do damage handling with buffer uploads
12:21sima: which is a lot of them :-/
12:22sima: oh we have some drivers which do buffer damage with the iter too, so we'd also need a buffer_damage version of that
12:24javierm: sima: yeah, drm_atomic_helper_damage_merged() isn't that smart. jfalempe found that for some drivers with very slow vram copies it could cause perf issues if the damaged rect was almost a full damage
12:24sima: actually no, I think all the damage_iter users are correct as-is
12:24sima: javierm, yeah if you upload to vram then do the damage_iter and it will be correct, since that needs frame damage
12:25sima: but virtio and similar virt drivers need buffer damage
12:25sima: javierm, from a few quick greps I think only virtio and hyperv need to be fixed
12:26pq: sounds good to me - is anyone still confused about what frame damage and buffer damage are?
12:26sima: hm ast cursor maybe also needs buffer_damage
12:26sima: pq, I think I clued up on it
12:26pq: I could point to the EGL ext spec that explains it with ascii art
12:29javierm: pq: which one is that? I would be interested in the ascii art :)
12:29pq: https://registry.khronos.org/EGL/extensions/KHR/EGL_KHR_swap_buffers_with_damage.txt
12:29sima: yeah if we have a good one, putting it into docs as a link would be good
12:30javierm: pq: that's amazing indeed
12:30javierm: we need more ascii art in docs
12:30pq: the same pics seems to be in https://registry.khronos.org/EGL/extensions/KHR/EGL_KHR_partial_update.txt as well
12:31emersion: https://emersion.fr/blog/2019/intro-to-damage-tracking/
12:31javierm: sima: are you going to write that drm_atomic_helper_buffer_damage_merged() helper and change virtgpu and hyperv to use it?
12:31emersion: ^ another resource with pretty pictures
12:31javierm: emersion: that's super. Thanks!
12:31javierm: *superb
12:32emersion: np :)
12:32sima: javierm, probably not ...?
12:33sima: I should get some lunch, got sidetracked ...
12:33pq: much nicer pics indeed!
12:33javierm: sima :) fair, I can give it a shot but unsure if I fully understand what needs to do that helper
12:33sima: javierm, I'm also not sure anymore whether hyperv needs it
12:33sima: but ast cursor probably does need it
12:34jani: agd5f_: hwentlan__: we've got Lyude's review and mripard's ack on merging https://patchwork.freedesktop.org/patch/msgid/20231030155843.2251023-3-imre.deak@intel.com via drm-intel. we've asked for your acks a week ago, as there's also a small amdgpu diff. any objections? we'd like to move on with this
12:34javierm: sima: if FB_ID changed then do a full update ?
12:34sima: ok gm12u320 just freaks me out
12:34sima: has an async worker
12:35sima: refcounts the fb
12:35sima: but the fb_map is in the plane state
12:35sima: probably a use-after-free race :-(
12:35javierm: sima: FB_ID changed and no damage rects I mean
12:36sima: repaper looks like it should use shadow plane helpers, because it just gets it exactly wrong like everyone did before shadow plane helpers got fixed
12:36sima: javierm, FB_ID changed means full damage irrespective of whether you have clips or not
12:37emersion: sima, does it?
12:37emersion: because all compositors will change FB_ID each page-flip with partial damage
12:37emersion: double buffering etc
12:37sima: https://paste.debian.net/hidden/5fbbd523/ <- javierm extremely untested
12:38sima: emersion, yeah drivers that need buffer damage will suffer with this, but it's not worse than before damage clips and we need to fix them first
12:38sima: the overwhelming number of drivers with damage actually want frame damage, so still a decent win
12:39sima: plus the drivers with extremely slow upload _all_ want frame damage
12:39sima: so no regression in the cases where it really matters big time
12:39sima: javierm, so I think virtio and ast cursor needs buffer_damage and repaper is just busted
12:39emersion: ok, just saying that this is effectively the same as not having FB_DAMAGE_CLIPS at all, in 99.99% cases
12:39sima: everyone else looks fine to me
12:40sima: emersion, yeah but I think it only impacts virtio and ast-cursor for the drivers we currently have in upstream
12:40sima: so meh
12:40emersion: i am not aware of any user-space performing front-buffer rendering *and* FB_DAMAGE_CLIPS
12:40sima: plus in theory we could accumulate frame damage and still optimize this
12:41emersion: i'd suggest just not exposing FB_DAMAGE_CLIPS at all instead
12:41sima: emersion, everyone doing DIRTYFB ioctl
12:41emersion: hm
12:41sima: which is a pile of legacy stuff
12:41emersion: i see
12:41emersion: i was considering atomic only
12:41sima: like boot splash, -modesetting, ...
12:41sima: yeah for atomic I guess everyone does flips
12:42javierm: sima: maybe https://paste.centos.org/view/raw/de5b0f7d then ?
12:42sima: emersion, also I think if it hurts we can fix this with minimal history tracking of damage and fb
12:42sima: javierm, yeah, but only for the buffer_damage version of iter_init
12:42sima: if you do this for everyone emersion will cry
12:42emersion: :P
12:42emersion: sima, makes sense to me
12:43pq: wait, using FB_DAMAGE_CLIPS with atomic flips is useless, the drivers see full frame damage anyway?
12:44sima: pq, no
12:44sima: it does work, but it's buggy for drivers who need buffer damage right now
12:44pq: sima, then I totally misunderstood your: "FB_ID changed means full damage irrespective of whether you have clips or not"
12:44sima: which is a very small part of all the drivers that support damage (really just one if you ignore cursor ast)
12:44sima: and the quickest fixes for that one driver is to just go full damage
12:45sima: pq, yeah, but _only_ for drivers that need buffer damage (which is virtio only I think)
12:45emersion: pq, that's just an easy unoptimized thing to do in case the driver wants buffer damage
12:45pq: aah
12:45emersion: yeah i got scared as well lol
12:46sima: javierm, oh correction, vmwgfx uses the iter directly and also needs buffer damage I think
12:46sima: so 2 drivers impacted
12:47javierm: sima: Ok
12:47sima: javierm, and I misread ast-cursor, it should be fine (since the buffer in vram is per-plane, not per-fb)
12:48javierm: I'm going to have lunch now but I will try to reproduce the virtgpu issue on a VM with weston and then give it a try to your suggestion
12:48sima: yeah same
12:48sima: and a walk, it's nice&sunny outside
12:48javierm: sima: enjoy!
12:48emersion: :D
12:51sima: javierm, I guess summarized: if the upload target is per plane or per crtc, then you need frame damage and the current code is ok
12:51sima: if the upload target is per-buffer (like with virtio), then you need buffer damage and we have a bug with the current code
12:51sima: but it's sometimes a bit tricky to tell from reading driver code
13:03pq: that summary sounds good to me
13:04pq: if it's per-buffer, you also need "buffer age" or similar damage accumulation algorithm
13:05pq: soon one might be reinventing Pixman regions
13:06pq: a single bounding rectangle can become very pessimistic fast
13:07pq: but also it's not good to carry myriads of rects in a region, sometimes reducing the number of rects at the cost of enlarging damage is beneficial - that's a problem I haven't seen a solution for yet
13:08pq: *region struct
13:15javierm: sima: awesome. Thanks again
13:26emersion: pq, Pixman region is actually not so great
13:26emersion: mstoeckl had ideas for a better data structure
13:33pq: cool
13:34pq: with union and intersection operations both?
13:35emersion: we were mainly interested in union
13:37pq: I think Weston intersects damage with opaque region etc. to find out the areas to paint with/without blending.
13:38pq: so there might be use for intersection as well, beyond intersecting with a simple rect
13:39pq: emersion, is still axis-aligned pieces, or do you go with arbitrary geometry?
13:41emersion: https://en.wikipedia.org/wiki/Sweep_line_algorithm
13:41emersion: https://gitlab.freedesktop.org/wlroots/wlroots/-/merge_requests/4351#note_2111789
13:58Company: pq: in GTK I have something like 16 rects hardcoded and when the rect gets larger I just union the rects
13:59pq: how do you choose which rects to combine?
14:00emersion: *all* the rects always combined
14:00Company: I think I just combine all of them
14:00pq: ok
14:00emersion: which may not be great if two opposite edges are updated
14:00Company: yeah, but how often does that happen?
14:01Company: that you end up with >16 damage regions on opposite sides
14:01Company: to be fair, GTK's use case is somewhat different, because we diff the render node tree
14:02pq: I was once looking at the damage inside Xwayland with old-school X11 apps...
14:02Company: and when I abort, it's usually half-way through diffing a tree, and then I can bail out and use the bounds of the container node
14:03pq: or maybe it was a proprietary app... anyway, tiny damage rects all over the screen in the hundreds
14:04daniels: pq: I think you're talking about the app which ran fullscreen -> Xwayland -> Weston, shipping using pixman-renderer
14:05Company: pq: https://streamable.com/hw2fob to get an idea how it works in GTK
14:05pq: daniels, yes, I am
14:06daniels: that was pretty pathological in a bunch of ways ...
14:08pq: emersion, I totally agree that trying upstream new stuff into pixman is not a... how to say it nicely... plan I'd choose. :-)
14:08emersion: yeah :o
14:08Company: do you have requirements for float regions?
14:09emersion: pq, if you ever need, feel free to ping me for reviews there
14:09Company: because all our stuff is floats, but we round to ints so we can use pixman's regions
14:10emersion: i don't need floats no
14:10emersion: for damage it's not very useful
14:10emersion: easy enough to floor/ceil
14:11pq: damage can always approximated outwards, opaque regions can always be approximated inwards, and neither in the opposite direction.
14:12pq: float regions are probably not too useful until everyone has a display with pixels smaller than they can see :-)
14:12pq: for compositor purposes
14:14Company: yeah, it's only useful for transforms
14:14Company: and if you have too many of them
14:15Company: but I guess even with fractional scaling it's only the final one - and if the boxes are 1px larger than required, so be it
14:15Company: for opaque regions it's somewhat relevant though
14:16Company: because you want maximized windows to cover their full area
14:16Company: and not round wrong and end up with the bottommost line not covered
14:17pq: Opaque regions can be shrunk freely without causing wrong behaviour. You're right about window regions though.
14:18Company: I meant opaque regions covering the whole window
14:18Company: so you can do direct scanout
14:18pq: right
14:19Company: and with 125% or 175% you have odd prime numbers screwing up everyone's math
14:51xq: heya o/ I'm working on some pretty low level linux program right now, using DRM via ioctls, and i'm searching for a place to drop some questions on that topic. Is this the right place? If so: There's the DRM_PLANE_TYPE_PRIMARY define, and i can query properties, but how do i know which property is the correct one to use?
15:06pq: xq, this is the place. Have you seen https://dri.freedesktop.org/docs/drm/gpu/drm-kms.html#kms-properties etc. yet?
15:08pq: xq, properties are identified by their string names. You look for a specific string name, and you'll find the uint64_t that is used to refer to that property on that specific KMS object.
15:09xq: pq: no, i haven't seen that. googling for the docs is kinda impossible i guess :D
15:09xq: thanks, this is enough info for me to continue <3
15:10pq: xq, maybe consider using libdrm. Things are still very low level even with it.
15:11xq: yeah, i know. but libdrm enforces me to link C :D
15:12pq: Ok. I don't think you'll find many (any?) examples that didn't use it, though.
15:12xq: yeah, i'm just reading libdrm sources for reference right now
15:15xq: so i do a DRM_IOCTL_MODE_GETPROPERTY with drm_mode_get_property, check for the name and if it matches, i found the correct property id, which i then can use always(?) to match to the property without having to query again?
15:17pq: xq, maybe check https://gitlab.freedesktop.org/emersion/drm_info for an example how to read everything for starters?
15:20pq: xq, the property id is (can be) specific to the object id you queried it from, so don't assume the same property id works for all objects that have the same named property.
15:20xq: ok
15:20xq: because i figured that the property often has several objects assigned
15:21pq: an object has several properties, and many objects can have the same property (but different value since it's a different object)
15:23xq: yeah, that's what i figured
15:46MrCooper: pq: hundreds of rects, cute ;) the game in https://gitlab.gnome.org/GNOME/mutter/-/issues/3034 has a SHAPE region with thousands
15:48xq: pq: drivers/gpu/drm/drm_mode_config.c:232 i think i found what i was searching for. so i can basically only rely on the string "type" in userspace for planes
15:52pq: MrCooper, eh heh. Well, I'm happy to not need to deal with SHAPE as one cannot simplify that region much, if the launcher looks like what I can guess it might look like.
15:53pq: xq, correct, that's what I was trying to say. :-)
15:53xq: thanks for the help. now i can finally remove the hardcoded plane ids :D
15:55MrCooper: pq: yeah, I'm afraid the only hope there is that Wine is unnecessarily using SHAPE for translucency
15:55pq: xq, the same name string usage also applies to enum property values, although many older enums may have numbers defined in the headers too. You can always rely on the strings.
15:56emersion: sway doesn't do the SHAPE stuff and it's definitely causing issues
15:56emersion: black regions on some game launchers
15:56pq: I think black regions are an improvement there.
15:57MrCooper: emersion: in this case the main issue seems to be Xwayland itself submitting GPU commands for thousands of rects, some of which are just 1x1
15:59MrCooper: I guess in principle it would be possible to make this work more efficiently in xserver, for someone who enjoys diving into a huge pile of baroque C code
16:05Company: GTK3 had some code to turn bitmaps into regions
16:06Company: because some X API needed a region but we only had a bitmap or so
16:06Company: you could do the same and turn the region into bitmap
16:07MrCooper: I was thinking an alpha mask, but yeah
16:08Company: yeah, just a bunch of memsetting
16:09Company: I'm reminded of roaring bitmaps
16:09Company: https://roaringbitmap.org/
16:11Company: they tile the full range and then every tile can be represented either by an actual bitmap, an EVERYTHING/NOTHING flag, run-length encoding and I think those are all the possibilities
16:11Company: I might be forgetting one
16:11Company: so you could have a region impl that dynamically switches between the options depending on number of rects
16:14emersion: sima: so i got my very first dma heap working for vc4. i'm wondering a bit about permissions though
16:15emersion: if we are to use it from Mesa, we need to be able to open it?
16:15emersion: right now it's only root-readable
16:15emersion: is it ok to make it world-accessible by default?
16:16emersion: or is it not, and we need to come up with something else to pass the heap FD around?
16:18emersion: (funny note, seems like allocating a 4k scanout buffer fails with ENOMEM on rpi4 -- 1080p works fine)
16:19daniels: cma too small?
16:20Company: (noticed yesterday that rpi4 is having issues with ENOMEM if I want to use GL and Vulkan at the same time, but didn't investigate that yet)
16:22emersion: yea probably cma too small
16:23emersion: daniels, btw if you have thoughts wrt the above, lemme know
16:24daniels: hmm yeah, about permissions?
16:25daniels: I'd be tempted to give them all the same perms as rendernodes, on the grounds that you can already allocate infinite memory through those ...
16:25emersion: yea, but not scanout memory
16:25emersion: but yeah
16:26emersion: abusing scanout mem is not so different from abusing regular mem
16:27xq: pq: thanks for the help, i got it running \o/
17:52enunes: emersion: I've been looking into rpi4 stuff as well, in particular what to do with that wl_drm usage in the v3dv wayland wsi. are you looking into something similar at the moment?
17:53enunes: in particular it doesn't seem like we can delete it without breaking direct scanout of clients as of now. I tried to delete it and, well, it broke
17:54enunes: I seem to recall that dma heap was a proposed solution for that
17:56enunes: from what I understand so far and a chat with leandrohrb5 , the dmabuf feedback implementation in wayland wsi doesn't magically solve it by itself either
17:59daniels: enunes: the dmabuf-feedback stuff won't _magically_ fix it, but it does tell you where the compositor's scanout device is, so you can use that to open it and allocate from vc4
18:02enunes: daniels: so that would mean to just replace the wl_drm driver-specific use with another dmabuf feedback implementation right? and other drivers might still need to do the same?
18:04enunes: I'm trying to figure out if there's something we can do in the common layer to avoid that, or special cases like v3dv vc4 might still need their own wayland dmabuf feedback code
18:07emersion: enunes: yes, this is what i'm trying to fix
18:08emersion: i think we could pass a scanout dma-buf heap to the common code and let it do the right thing
18:09emersion: but yeah, the current behavior of always picking scanout memory isn't very nice
18:09daniels: well, all the code to collect all the dmabuf feedback stuff is there in the common code - it looks like all you have to do is take the target_device dev_t from dmabuf-feedback (which we already have in wsi_common_wayland) and pull that into v3dv which currently expects an fd
18:10daniels: bcm isn't special here, it's not the only platform where hitting scanout requires you to allocate from the display device
18:10emersion: daniels, can't allocate scanout if we're not DRM master
18:11emersion: dumb buffers are only for DRM masters
18:12daniels: ... master? that would imply that only the compositor can allocate, so it doesn't really matter what you do in Wayland WSI
18:12emersion: hm, sorry
18:12daniels: auth'ed primary node?
18:12emersion: not master, authenticated primary node
18:12emersion: yeah
18:12daniels: right, I thought that got relaxed a few years back so auth was no longer required
18:13emersion: and that's not the future
18:13emersion: the motivation for not relaxing that is that random clients really shouldn't use dumb buffers
18:13emersion: hence dma heaps
18:13daniels: DRM_IOCTL_DEF(DRM_IOCTL_MODE_DESTROY_DUMB, drm_mode_destroy_dumb_ioctl, 0),
18:13daniels: ^ no AUTH flag there
18:13daniels: (imagine that reads create rather than destroy)
18:14daniels: but yeah, I agree wrt heaps being the actual future
18:14emersion: hm.....
18:14emersion: why the heck does v3d tries to auth then...
18:14enunes: it does allocate a dumb buffer currently
18:15emersion: yeah but it doesn't need the auth stuff
18:15emersion: well, primary nodes may not be accessible at all
18:16emersion: headless, or systemd-less (systemd will facl the primary node when physically logged in)
18:20emersion: hm, it doesn't seem like dumb buffers ever required DRM master…
18:20emersion: (or maybe my git-log fu isn't good enough?)
18:21enunes: so one thing I'm trying to wrap my head around is, if the dmabuf feedback offers formats known to be good for scanout and a request for allocation, couldn't the common code do the allocation through that and let the compositor handle it?
18:22enunes: or does this hit a loop if this would go back to the mesa wsi code?
18:23daniels: enunes: right, it can and does
18:23emersion: but this calls back into the driver for the actual allocation
18:23daniels: yep
18:23daniels: the one thing I can see that's missing is allowing the driver to get the device node for the device the compositor is telling it should allocate for
18:24emersion: that's not a big deal for v3d, because the display driver will always be vc4
18:25emersion: might be for others
18:29enunes: but then it should be able to allocate even in the other device anyway as long as it's linear etc?
18:30emersion: except when running (1) on a compositor without wl_drm (2) headless (3) systemd-less
18:30enunes: I mean, allocate in the gpu device without the trouble to get to the display device
18:30emersion: (because v3d uses dumb buffers and wl_drm auth)
18:31emersion: so, no scanout?
18:31enunes: if it's a format that can be scanned out and with a modifier understood by the display device / no modifiers, should be scanout-able?
18:35daniels: that usually stumbles on either a) the display controller having a stronger pitch alignment constraint which we currently have no way to communicate, or b) the display controller requiring contiguous alloc because it doesn't have S/G IOMMU
18:35daniels: I think vc4 fails both
18:40enunes: ok fair, so if we had ways to communicate all these constraints, I suppose it would not make a difference in this case which device we used to allocate then
18:42enunes: so I guess it is worth it to try to plug the target device coming from dmabuf feedback into the v3dv wsi code, as well as deleting the authentication, and see if we can remove the wl_drm code this way?
18:44daniels: I think that's a great start, yeah
18:44daniels: and given that it's so hardware-specific, you could just hardcode a particular alignment if you're allocating for vc4
19:13i509vcb: Are EGLSync and EGLSyncKHR the same object type? I heard apparently in EGL 1.5 EGLSyncKHR became EGLSync but I see no evidence of such a thing in the EGL specification?
19:13i509vcb: (EGL_ANDROID_native_fence_sync appears to work on plain EGLSync from what I've seen)
19:25emersion: there are aliases yes
19:26emersion: you should see in the Khronos XML
20:42sima: emersion, probably need an allocations service so that a single app doesn't exhaust the heap?
20:42sima: allowing everyone is a bit much yolo perhaps
20:42emersion: hm
20:43emersion: the scope of this work would increase a whole lot
20:43emersion: plus it's already free-for-all atm
20:43emersion: and also an issue for regular memory as said above
20:43emersion: regular GPU memory that is
20:47emersion: if we do want an alloc service, logind/seatd seem like a natural place… or maybe the compositor directly
20:54dj-death: karolherbst: not sure if you've played much with reading structures from global memory in rusticl
20:54dj-death: karolherbst: but looks like like this in OpenCL : VkDrawIndirectCommand param = *(global VkDrawIndirectCommand *)indirect;
20:55dj-death: karolherbst: that will result in a load to temporary variable in NIR
20:55dj-death: karolherbst: and lower_io is lowering all of that to scratch load/store
20:55dj-death: karolherbst: which is ridiculous if you use all those values immediately
20:56dj-death: karolherbst: maybe you have some special passes to avoid that kind of stuff?
21:06dj-death: ah just don't lower_io temp variables
21:06dj-death: and use nir_lower_vars_to_ssa instead
21:17karolherbst: yeah, lower to ssa first
21:22dj-death: somehow not doing anything :(
21:22karolherbst: mhhh... weird...
21:22karolherbst: maybe throw some load propagation at it as well?
21:22karolherbst: but indirects are also kinda... uh... annoying
21:23karolherbst: kinda depends on how the shader looks like before io lowering
21:54jenatali: dj-death: Sounds like you're missing opt_memcpy?
21:58dj-death: jenatali: no, it's there
21:59dj-death: looks like I'm getting into those nir_deref_instr_has_complex_use() cases
21:59jenatali: :(
22:03dj-death: yeah it's casting
22:04dj-death: deref_ptr_as_array too
22:29jenatali: dj-death: Last time I looked, that ptr_as_array and casting was inserted by memcpy lowering
22:29jenatali: Which, you should be able to get a field-by-field deref copy through memcpy optimization first
22:37dj-death: jenatali: thanks, will have another look
22:48DavidHeidelberg: eric_engestrom: as u wrote the priority gitlab-runner wrapper, do you think it would be doable something like "if any other runner won't pick job in 30 seconds, use this runner" wrapper?
22:48DavidHeidelberg:moves to #freedesktop w/ this discussion