04:00 airlied: daniels, MrCooper, anholt : https://gitlab.freedesktop.org/airlied/mesa/-/jobs/2819036
04:00 airlied: trying to rebuild the gl test container, is that just a debian and try against tomorrow problem?
04:01 daniels: yeah ... that's testing for you
04:03 airlied: cool I'll just try again later
06:51 danvet: airlied, kinda neat how drm_fb_helper is the fbdev interface with less suck
06:51 danvet: could also do a defio in there with less suck, there's a fb_mmap hook
07:00 airlied: danvet: yeah the avoid kconfig mess is handy also
07:15 MrCooper: airlied daniels: that doesn't look like a transient issue to me (the testing process should not allow those anyway), looks like we need to get GCC from testing as well
07:18 MrCooper: dinosomething: the scrambled picture with DRI_PRIME=1 is probably due to different pitch alignment requirements between the GPUs
07:21 MrCooper: glisse robclark krh imirkin: FWIW, I recently set up LSP/ccls/company/flycheck in emacs, together they provide a pretty nice IDE experience
07:32 pq: What is the life time of DRM blob IDs and the data they point to? If I read back a blob ID I did not create myself from KMS, when can it become invalid? The moment I drop DRM master?
07:32 danvet: pq, anytime
07:33 danvet: it's kinda awkward
07:33 pq: danvet, so, how is the blind save & restore we talked about supposed to work with blob properties?
07:33 danvet: hm
07:33 danvet: I guess you'd need to save the blob property
07:33 danvet: the trouble is just connector probing
07:33 pq: do I need to read back the data and save that, create a new blob when I want to restore it?
07:33 danvet: e.g. between the time you read the edid blob property the kernel could cycle it
07:34 danvet: and delete and create a new one
07:34 danvet: with the same id
07:34 danvet: so reading blob properties is quite fundamentally racy :-/
07:34 danvet: pq, I think so
07:34 pq: ouch, well, at least EDID is read-only, read/write blobs are more fun
07:34 danvet: pq, edid is writeable by kernel
07:35 pq: yes, but userspace certainly won't try to restore it
07:35 danvet: so not even holding master blocks that
07:35 danvet: hm ... thinking about this, if you're master, you can race-free read the prop
07:35 danvet: since no one can change it
07:35 danvet: which means it'll stick around since in use
07:35 pq: the rule I'm after is: It is not enough to store blob IDs for restoring, you must make a copy of the actual data and create a new blob when restoring.
07:35 danvet: but as soon as you drop master it could be changed
07:36 danvet: +1
07:36 pq: right'o
07:36 danvet: also, you must hold master rights when you do that saving
07:36 danvet: or it'll race
07:36 emersion: but only for blobs you've created right?
07:36 pq: a good point
07:37 emersion: oh, ignore me
07:37 emersion: it is *not* enough
07:37 danvet: pq, btw the race fix for the connector state is that sequence number we've talked about a lot of times, but never yet happened
07:37 shashank_s: danvet: when kernel sees a change in EDID blob, it will also send corresponding hotplug events to the drm-master, wouldn't that be a hint to compositor that something has changed?
07:38 danvet: shashank_s, yeah but how does compositor know it got a consistent snapshot of everything?
07:38 danvet: and not a mix of an old/new monitor
07:38 danvet: (not a real problem in practice, generally)
07:38 shashank_s: if it can track all the events, i guess it should not be a problem
07:39 shashank_s: until its a HDMI analyzer with fake hotplug stuff
07:39 pq: danvet, since reading the blobl data is a separate ioctl, how would connector serial help? What stops the data from changing between reading the serial and blob ID vs. reading the blob data?
07:39 emersion: read serial, read blob, check serial still the same?
07:39 vsyrjala: i forget when we even update the edid prop/etc. at hotplug time, or getconnector?
07:39 pq: emersion, ah
07:40 danvet: pq, retry loop if the serial changed
07:40 pq: right
07:40 danvet: before the compositor takes any actions
07:40 vsyrjala: could just retry loop until no hotplug uevents coming?
07:40 emersion: vsyrjala: sounds racy
07:41 vsyrjala: why?
07:41 danvet: vsyrjala, netlink isn't coherent enough
07:41 shashank_s: pq: we had done this long back for I915, where we were caching the old EDID, re-probing the new edid at resume, compare just the first few bytes, and if found a difference, use the latest one.
07:41 pq: danvet, a different topic: let's say I create a blob. How long id the ID valid?
07:41 emersion: shashank_s: this isn't yet implemented in i915, last time i checked
07:41 danvet: well also udev and stuff
07:42 danvet: pq, until you destroy it or close the drmfd you created it with
07:42 danvet: the fd holds a ref that only that fd can release
07:42 pq: danvet, roger. I wonder if Weston gets that wrong...
07:42 danvet: I'm just realizing how much stuff around uapi we haven't documented :-(
07:42 shashank_s: emersion: Ah, Now when I think again, this was an Android private stuff for VLV I guess, which we could not make mainstream
07:43 shashank_s: let me check if I have a link for the patch somewhere
07:43 emersion: shashank_s: there's a pending patch from… someone, i think
07:43 emersion: (for i915)
07:44 shashank_s: The state machine in kernel was: cache EDID when suspending, suspend, read EDID again on resume, compare few bytes until model number and other some stuff, if changed, send event to compositor, else ignore and live happily
07:45 shashank_s: this was not the HOTPLUG event, but a new MONITOR_CHANGE event
07:45 emersion: https://patchwork.freedesktop.org/series/62816/
07:45 shashank_s: which used to force HWC to drop caching and probe connector again
07:50 shashank_s: emersion: This is a pretty new series, mine was from 2014 I guess, could not find it though
07:51 emersion: ahah, "1 year old", "pretty new" :)
07:51 shashank_s: compare to something from 2014 and Valleyview :)
07:59 pq: ahhah, I think Weston/DRM will leak FB_DAMAGE_CLIPS blobs every update
08:00 emersion: oops
08:02 pq: danvet, when you say "fd holds reference", you mean the file description, not the fd?
08:02 danvet: pq, btw they're refcounted, so you can destroy immediately after setting them
08:02 danvet: fd = file descriptor
08:03 pq: file descriptor != file description
08:03 danvet: well no I have no clue
08:03 danvet: what's the difference?
08:03 emersion: one fd can point to multiple fds right?
08:03 pq: think about dup(), it gives you a new fd pointing to the same file description
08:04 danvet: file description then
08:04 pq: ok, I thought do, but wanted to check :-)
08:04 pq: is there another term for file description? filp maybe?
08:05 pq: danvet, when you say "can destroy after setting", the "setting" means drmAtomicCommit(), not simply drmModeAtomicAddProperty?
08:06 danvet: pq, yup
08:06 pq: thanks
08:12 pq: danvet, sooo... if userspace goes wildly creating and leaking DRM blobs, what happens?
08:12 pq: I mean, what does the OOM look like in that case? Does something get killed?
08:13 pq: or is there some separate limit on much blobs one DRM client can create?
08:20 pq: ok, I think Weston gets the mode blob lifetime wrong. It doesn't throw away the blob IDs when VT-switching away which closes the DRM fd.
08:21 pq: or does it close it...
08:23 pq: aah, it does *not* close it, so no problem
08:29 pq: TIL several things, and it's only lunch time :-)
08:42 daniels: pq: ouch, that's a really good point about FB_DAMAGE_CLIPS :\
08:42 pq: dinosomething, "direct rendering" is not "direct display". Direct rendering means the app sends drawinf commands to a GPU without the commands going through a dispaly server. Direct rendering says nothing about display though, displaying stuff still usually goes through the display server, whether the display server copies or not.
08:46 pq: daniels, a Weston issue filed, as you maybe noticed.
09:00 danvet: pq, I think you can limit it with the kernel memory cgroup controller
09:00 danvet: aside from that, no limits
09:00 danvet: it's just allocating pinned kernel memory until OOM
09:01 emersion: eh, just found we were leaking MODE_ID blobs in gamescope
09:02 pq: emersion, doesn't sound too bad, unless you create a new blob every frame? :-)
09:02 emersion: yes, of course, one on every commit!
09:02 pq: like weston does for FB_DAMAGE_CLIPS
09:02 pq: ouch
09:03 pq: danvet, pinned? So it cannot go swap? Sounds... quite terrible.
09:03 pq: danvet, is it at least accounted to the process, so OMM killer has a chance to kill the right thing?
09:03 pq: *OOM
09:04 pq: or is just a mysterious black hole where memory goes to die?
09:06 daniels: luckily this only affects vmware I guess
09:10 danvet: pq, ofc not
09:10 danvet: :-/
09:11 danvet: it's the same for gpu memory in general
09:11 danvet: although on the big gpu drivers that's at least swapable
09:11 danvet: daniels, lots of dumb scanout displays have manual upload and dirty
09:11 danvet: noralf's refactoring made it almost trivial to set this all up
09:12 danvet: hm, maybe the atomic property isn't exposed automatically, that would be a bit a bug
09:16 emersion: which atomic prop?
09:16 pq: danvet, sooo... are we looking at a kernel regression if some driver exposes FB_DAMAGE_CLIPS and Weston DoS's the system because of its own bug?
09:22 daniels: I would say that's very definitely our bug
09:24 pq: daniels, we can try saying that but "I upgraded my kernel and now my system crashed" makes it a kernel regression.
09:25 daniels: well, in a sense yes, but then every buggy user of any buggy interface means that we have to burn the interface and never use it again
09:25 pq: and I'm afraid that's exactly how it's going to work out if anyone complains to the kernel, and it will force the driver to not expose DB_DAMAGE_CLIPS and add a DRM client cap "I am not broken" before it can be re-introduced.
09:26 pq: daniels, well, that's how the kernel works.
09:26 daniels: not quite. we can't say it's an interface we've relied on, because it's not an interface we've ever used correctly
09:27 daniels: it's not like semantics have changed from under us
09:27 pq: it doesn't matter
09:27 pq: "I upgraded my kernel and now my system crashes" always means it's a kernel bug and needs to be fixed in the kernel. This is what I have been taught, and I very much disagree, but I don't make the rules.
09:28 pq: all we can do is fix it quickly in Weston and hope that no-one files a kernel bug like that
09:36 emersion: idea: add "please do not report kernel bugs to the kernel" to weston's contributing.md
09:37 emersion: :P
09:44 Zeising: Is it possible to get a new release of libdrm?
09:51 bnieuwenhuizen: pq: one of the reasons I'm more and more in favor of *both* sides sending what version of the API they are made for to the other side. So that we don't have to reinvent a new bit all the time
10:01 emersion: but that means a client can't "skip" some features
10:01 emersion: maybe that's fine
10:17 pq: bnieuwenhuizen, if we take this FB_DAMAGE_CLIPS issues, how would the mutual version numbers help there?
10:18 pq: emersion, you have the exact same problem with gamescope it sounds
10:21 emersion: yes, except i've already pushed the fix
10:21 emersion: and gamescope isn't widely used yet
10:22 danvet: bnieuwenhuizen, this wont solve bugs in one side of the uapi
10:22 danvet: e.g. the totally broken atomic use modesetting in 1.20 had (or still has)
10:22 danvet: that's an explicit "pls enable atomic for me, I can do it" flag
10:22 danvet: except userspace had serious illusions about what it can do
10:32 bnieuwenhuizen: danvet: sure, but at least you can avoid regressing it
10:34 daniels: narmstrong: hey, how is your LAVA lab looking?
10:34 pq: bnieuwenhuizen, do you mean that every time any driver adds support for an existing KMS property, some version needs to be bumped everuwhere to get it used?
10:34 bnieuwenhuizen: oh hmm, nvm, the more apt thing here is the app telling the app name + version, not the API name and servsion
10:34 bnieuwenhuizen: and version*
10:35 bnieuwenhuizen: pq: ^ should anwer that probably
10:35 pq: bnieuwenhuizen, so people would need a kernel upgrade every time a new program should start using a new property it didn't use before, and do this per-driver?
10:36 pq: tbh, that sounds a worse nightmare than we live in now
10:37 pq: maintaining application name and version lists separately for every property in every driver
10:37 bnieuwenhuizen: pq: why would they, it is so you can detect broken programs in the kernel, not a whitelist?
10:38 pq: oh you meant a blacklist
10:38 bnieuwenhuizen: yeah, whitelist is indeed pretty insane :P
10:39 pq: client caps are the whitelist we use now
10:54 danvet: bnieuwenhuizen, we have the exe name already
10:54 danvet: we can use that to blacklist
10:55 pq: good luck blaklisting a python app
10:55 danvet: and then when the app/compositor/whatever is fixed, we can bump the client cap revision
10:55 danvet: pq, might need to walk the open file list or whatever :-)
10:55 pq: ugk
10:55 danvet: I mean I hope it doesn't happen too often
10:56 danvet: essentially kernel only needs to care when a kernel change makes the userspace breakage worse in a specific way
10:56 danvet: e.g. -modesetting couldn't drive dual-output at all
10:56 danvet: but when we changed fast-boot to leave the crtc that the bios wanted even when it was not crtc 0
10:57 danvet: then it couldn't even light up at boot-up with a single panel (since 2 crtcs were involved and -modesetting got that wrong with atomic)
10:57 danvet: so I'm hopefully this wont happen too often
10:57 pq: danvet, how's this Weston vs. FB_DAMAGE_CLIPS issue, where does it fall?
10:57 danvet: kernel doesn't have to care if userspace leaks like mad
10:57 danvet: or corruption happens, or crashes, or black screen
10:58 danvet: only if a kernel change made this worse somehow in a way that users notice and file a regression bug report
10:58 pq: well, in this case Weston does not leak at all, until the driver exposes FB_DAMAGE_CLIPS, then it starts leaking like mad.
11:03 danvet: pq, hm yeah I guess if that happens and a user reports it
11:03 danvet: we'd need to take FB_DAMAGE_CLIPS away from weston
11:03 danvet: until weston upgrades the atomic client cap rev
11:03 pq: alright
11:03 danvet: but if you apply the fix to stable releases
11:04 danvet: then generally that's quick enough for these reports to not show up
11:04 pq: mmhm
11:04 danvet: the problem with -modesetting was that it was fixed in the master branch for a while
11:04 danvet: but xserver doesn't see releases, so all the distros were stuck on broken versions
11:04 pq: I guess this is the first big reason to start doing stable releases
11:05 pq: stable patch fix++ releases that is
12:10 danvet: karolherbst, does mmiotracer also catch kernel mmio?
12:10 danvet: if yes should probably disable it at runtime if the lockdown stuff is enabled
12:15 karolherbst: danvet: mainly
12:15 karolherbst: we use it to reverse engineer nvidias kernel driver :p
12:15 karolherbst: danvet: I think it already gets disabled
12:15 danvet: karolherbst, yeah I think then dmitry has a point if that's not yet taken care of
12:15 karolherbst: I remember checking it and decided there is nothing to do
12:15 danvet: karolherbst, lockdown patches are super new
12:15 karolherbst: yeah. I know
12:15 danvet: well, decade old actually, but super new in upstream
12:16 karolherbst: I could have checked older versions
12:16 karolherbst: but I think I was annoyed that it got disabled, but understood why
12:16 karolherbst: but yeah.. might be worth checking it again
12:20 narmstrong: daniels: better, we finished putting ups on all workers, it should be ready soonish
12:21 karolherbst: danvet: seems like the entire tracing infra is disabled on lockdown
12:22 danvet: karolherbst, yeah that gets the job done
12:23 karolherbst: maybe I verify it if I get to it at some point, but it seems like it's already taken care of
12:26 daniels: narmstrong: cool! :)
12:27 danvet: bnieuwenhuizen, btw amd modifiers moving now?
12:34 Zeising: eric_engestrom: ping
12:35 MrCooper: daniels: FWIW, AMD dGPUs can actually only scan out from physically contiguous memory, i.e. in practice only from VRAM, not from system memory
12:36 eric_engestrom: Zeising: hey! sorry, I just realized I saw your text when I was busy a few weeks ago and forgot about it :facepalm:
12:37 eric_engestrom: Zeising: I'm not in my computer right now, but I'll make a libdrm release when I get home tonight
12:42 bnieuwenhuizen: MrCooper: amdgpu tends to hide it pretty well by automatically migrating your buffers to VRAM even if you asked for GTT (only if allocated through amdgpu though, will reject the buffer if allocated by another driver)
12:43 MrCooper: right, and this is in the context of sharing with another driver :)
12:45 bnieuwenhuizen: danvet: we'll see, hope to have a new version with Marek's latest comments out tonight, but compression version is still pretty dicey
12:45 MrCooper: in theory scanning out from another device's VRAM might work with PCIe P2P, but it might be a bad idea anyway
12:47 Zeising: eric_engestrom: Sounds good to me, thanks!
12:47 Zeising: eric_engestrom: Hope everything is OK with you!
12:50 mareko: bnieuwenhuizen: what is dicey about compression?
12:51 mareko: bnieuwenhuizen: btw I'd like to remove family_id and chip_rev and use a 0-based chip enum, e.g. 0=vega10, 1=vega20, etc.
12:55 mareko: bnieuwenhuizen: for modifiers I mean
12:57 pq: looks like kernel patchwork didn't understand my v2 patch is a v2 and not a new patch
12:59 bnieuwenhuizen: mareko: is chip_class precise enough for DCC version?
13:01 bnieuwenhuizen: mareko: for non-DCC, going for family (gfx9/gfx10) + rb_count + pipe_count + interleave_bytes (which is 256 anyway ...) might be more general
13:04 bnieuwenhuizen: .. or we could make that chip_class as well
13:04 mareko: chip_class isn't enough
13:05 mareko: there is DCC const encoding
13:05 bnieuwenhuizen: mareko: oh I'm confused between chip_class and family
13:05 bnieuwenhuizen: is family enough?"
13:05 mareko: yes
13:05 mareko: but radeon_family is not public ABI
13:07 bnieuwenhuizen: mareko: probably worth making the new "family" equivalent part of one of the amdgpu query structs?
13:07 mareko: bnieuwenhuizen: I'm impartial on that, I think it's enough to be part of the modifier header
13:11 mareko: bnieuwenhuizen: I might have to ping closed source people about this
13:14 bnieuwenhuizen: mareko: wrt DCC is there anything besides gfx9 vs gfx9 and constant encode?
13:14 bnieuwenhuizen: gfx9 vs. gfx10?
13:18 mareko: bnieuwenhuizen: no; the gfx10 differences are covered by is_dcc_supported_by_CB and is_dcc_supported_by_L2
13:18 bnieuwenhuizen: mareko: I'm worried that if we using "family", that for new chips we need to know rb_count and pipe_count to determine equivalence (e.g. to see that vega10 and vega20 are compatible), and amdgpu only returns those for the current GPU
13:19 bnieuwenhuizen: since even for amdgpu these are now in firmware instead of source, I wonder if giving indications of rb_count/pipe_count for equivalence might result in publishing those in the source
13:20 bnieuwenhuizen: while if we encode pipe_count/rb_count in the modifier we don't need to know what they are for other GPUs
13:20 mareko: bnieuwenhuizen: it's up to you, I'm impartial on this too
13:21 mareko: bnieuwenhuizen: though new chips might add more complexity, so using the family to identify tiling and DCC would be simpler
13:23 bbrezillon: any drm_mm/mm expert here? I'm trying to understand why page_mapcount() returns 1 even after calling drm_vma_node_unmap() (I checked the mapcount value before the drm_vma_node_unmap() call and it was already 1). Note that the GEM is still mapped in userspace, but I thought calling drm_vma_node_unmap() would undo that.
13:23 bnieuwenhuizen: mareko: yeah, in the end I believe we need some version field to change things if HW changes, the tradeoff is mostly about how often we increment it vs. changing explicit params
13:24 mareko: bnieuwenhuizen: we wouldn't like to put the GPU config into the code though
13:28 Yuti: I have a question regarding DRM framework. How do we retrieve drm_atomic_state form drm_bridge structure?
13:28 Yuti: We want to get the bridge state from drm_bridge or drm_connector as other structure are not available
13:30 mareko: bnieuwenhuizen: I think the gb_addr_config params don't matter for non-X non-T swizzle modes if DCC is disabled, so those can be used for multi-GPU sharing and therefore the family doesn't matter
13:31 bnieuwenhuizen: correct
13:31 mareko: bnieuwenhuizen: another gfx9 and gfx10 difference is that swizzle modes changed, e.g. R rotated->render target and some other small changes to S and D
13:32 bnieuwenhuizen: oh, S and D change too? I'll need to check that, otherwise compat might be a bit more of a mess
13:33 mareko: bnieuwenhuizen: for compat, it would be easier if drivers and DAL selected 1 or 2 swizzle modes from other chips that they can support (instead of listing all non-X/T ones)
13:34 bnieuwenhuizen: mareko: wrt microtile, Vega10 IIRC needs D, while the APUs need S a lot of the time, so you'd need both
13:34 bnieuwenhuizen: for display*
13:35 mareko: bnieuwenhuizen: e.g. DAL would list all modes for the chip it's part of, and then 1 mode for other chips it can support (to decrease the number of exposed swizzle modes across all chips)
13:35 bnieuwenhuizen: in my previous prototype I limit macrotile block to 64K though not sure if limiting those really adds anything for non-X/T
13:36 bnieuwenhuizen: oh that makes sense since DAL tends to be more flexible wrt chip params
13:37 bnieuwenhuizen: then again, most chip-params don't matter for non-X/T, I think we only need the chip_class
13:37 mripard: Yuti: as I said on #armlinux, you usually will get it as parameters from the atomic callbacks
13:38 mripard: what issue are you facing so that it's not enough?
13:38 mareko: bnieuwenhuizen: I'm worried that trying to unify swizzle modes across all GPUs will cause more pain in the future
13:39 bnieuwenhuizen: mareko: I'm keeping the modifier "id" (for now is DRM_FORMAT_MOD_AMD_GFX9 / DRM_FORMAT_MOD_AMD_GFX9_DCC / DRM_FORMAT_MOD_AMD_GFX9_DCN)
13:40 bnieuwenhuizen: if a new generation diverges too much we can just add new types
13:41 mareko: bnieuwenhuizen: I don't know what that means
13:42 bnieuwenhuizen: mareko: say GFX11 is completely different from GFX10. We can then just add DRM_FORMAT_MOD_AMD_GFX11 / DRM_FORMAT_MOD_AMD_GFX11_DCC / DRM_FORMAT_MOD_AMD_GFX11_DCN with completely different parameters
13:42 Yuti: I have subclassed bridge state and want to access this bridge state in interrupt handler. As per my understanding, I can get this subclassed bridge state using drm_bridge_state. But in interrupt handler I have only drm_bridge and drm_connector structure. And drm_bridge doesnt have pointer to drm_bridge_state.
13:43 mareko: bnieuwenhuizen: ok
13:43 bnieuwenhuizen: we might have to tie some compatible cases together, but it would be less effort than the family thing where we have to do that for every chip
13:44 bnieuwenhuizen: and I'd rather just have the code abstraction to make those new modifiers easy
13:46 mareko: bnieuwenhuizen: how do you wanna handle DCC image stores on gfx9?
13:47 mareko: gfx10 can do them, but not gfx9
13:49 bnieuwenhuizen: mareko: each time you get ownsership of the buffer it may be compressed, it is up to the driver to decompress before doing image stores (and yes with implicit sync this probably means we have to do the decompress on the first storage use in every command buffer)
13:50 bnieuwenhuizen: the only thing I can see for changing things around is having a "compressed" bool in memory and then making the decompress a conditional draw, but you still have to emit the commandstream for the decompress
13:51 mareko: on gfx10, you have 2 options: auto-decompress on image_store or recompress on image_store
13:51 mareko: for Z/S, there is only auto-decompress on image_store
13:52 bnieuwenhuizen: I thought Z/S didn't really support DCC on gfx10?
13:52 mareko: bnieuwenhuizen: HTILE
13:52 bnieuwenhuizen: ah, not relevant for modifiers
13:53 bnieuwenhuizen: 2d,single-layer, single level, singe sample, renderable, 2d-images only
13:53 mareko: , rgba
13:53 bnieuwenhuizen: well, there is multiplane yuv too
13:54 mareko: which is rgba in the driver
13:55 bnieuwenhuizen: if we want to really permanently disable DCC with image stores on gfx9 we'll have to use something like a conditional register modification on CB register emitting using the bool but that seems hairy
16:07 himanshujha19964: pinchartl, regarding your're reply on https://lkml.org/lkml/2020/5/23/381, what do you mean by "top-level driver" ? Do you mean that the bridge driver(for eg `drivers/gpu/drm/bridge/ti-sn65dsi86.c`) should create encoder/connector ?
16:11 jenatali: jekstrand: Which addr_formats do you think I should be asserting on? For kernels there's 4 address modes that all need to be 1-component and proper bit size, but for shaders it's not clear to me what needs to be plain pointers and what can be special
16:12 pepp: daniels: thanks for the detailed anwers on 5096, that's really helpful
16:13 mareko: daniels: FYI, I'm not working on modifiers besides consulting at the moment, bnieuwenhuizen is doing the work
16:13 jekstrand: jenatali: For shaders, only temp_addr_format
16:13 jekstrand: For kernels, everything
16:13 jekstrand: Well, everything that kernels ever use
16:14 jekstrand: Which I think is just global, shared, and temp
16:14 jenatali: Right, global/temp/shared, and ubo if you don't set the ubo_as_global flag
16:14 jekstrand: Right. UBO too
16:20 daniels: pepp: np, thankyou for your patience! I'm really happy seeing the protected-surface EGL work, it's a great start
16:20 daniels: mareko: sure :) whatever gets it forward is good, but apologies for misattribution
16:29 pinchartl: himanshujha19964: by top-level driver, I mean the driver that implements the drm_driver structure, the one that registers a drm_device
16:29 pinchartl: the driver posted in that link is for the AXI4-to-DSI bridge
16:29 pinchartl: so it should be a drm_bridge driver
16:30 pinchartl: at the hardware level it could be connected to a DMA engine, or to a more complex source (I think there's a mixer IP core for instance if I'm not mistaken)
16:30 pinchartl: so we need a top-level driver that will look at the device tree to locate the IP cores for the bridges, and bind them together
16:40 mareko: bnieuwenhuizen: if we wanted to add modifiers for 1D/1DArray/2DArray/3D, MSAA, or Z/S in the future, would it be possible?
16:40 jekstrand: Possible? Sure. Anything is possible!
16:45 anholt_: depth/layer count would be an out of band parameter just like w/h/pitch, right?
16:45 anholt_: assuming that, seems perfectly reasonable.
16:50 bnieuwenhuizen: mareko: yeah
16:52 danvet: mareko, what's the use-case for sharing 3d images?
16:53 danvet: not against adding modifiers for them, it's just that thus far none of the specs anywhere (kms, egl, vk, ...) have that as an option
16:53 bnieuwenhuizen: danvet: vk can do it
16:53 danvet: 1d I guess we could cover with pitch set to 0
16:53 danvet: bnieuwenhuizen, oh
16:53 danvet: why?
16:53 himanshujha19964: pinchartl: ummm...would it be similar to `drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c` ? I misunderstood your comment as "don't create separate encoder/connector and expose them via `drm_bridge_funcs`
16:54 bnieuwenhuizen: danvet: vk can do do it on any image assuming the driver says "yes" in the query
16:54 danvet: bnieuwenhuizen, so you have a layer_pitch?
16:54 danvet: or how does that work ...
16:54 bnieuwenhuizen: danvet: why do you need layer_pitch?
16:54 danvet: if we add 3d modifiers should make sure we use at least the same lingo
16:55 danvet: well how do you go from one layer to the next
16:55 bnieuwenhuizen: you only need that if an app needs to override it
16:55 danvet: or was it slice
16:55 pinchartl: himanshujha19964: the Xilinx dsi encoder driver would be similar to that, yes
16:55 danvet: well with modifiers and explicit import/export we can supply all that
16:55 danvet: it's w/h/pitch for 2d buffers
16:57 bnieuwenhuizen: danvet: I think when you get into arrays/mipmaps, pitch/slice_pitch becomes a very messy concept
16:57 anholt_: danvet: a given modifier already implies a lot about internal layout details for a given chip, I don't see why you would need customizable layer strides. we need pitch for 2d because linear 2d is shared between chips, but even when we get to tiling, you sometimes end up moving the thing-like-pitch out of drm's pitch field and into stuff implied by the modifier.
16:58 danvet: anholt_, hm yeah I guess if the modifier has fully implied layer pitch it all works out
16:58 danvet: probably should then put that into the modifier spec in drm_fourcc.h
16:59 danvet: it's the stuff that might be overlooked when it changes between generations
16:59 danvet: mareko, ^^
16:59 bnieuwenhuizen: danvet: that why as part of my modifier work I've written tests for the modifiers I add
16:59 danvet: bnieuwenhuizen, excellent
16:59 danvet: we have a few very simply igt tests for our compressed formats too
17:00 bnieuwenhuizen: though it is kinda hard to test that the kernel agrees on stuff like implied pitch
17:00 danvet: but only solid color stuff, since anything beyond that makes the hw folks freak out
17:00 danvet: bnieuwenhuizen, crc based tests?
17:00 bnieuwenhuizen: since if I just read back the kms settings I get the original data and not how the kernel interprets it
17:00 danvet: usually what we do for testing kms stuff
17:00 danvet: render untiled, capture reference crc
17:00 danvet: then do all the fancy stuff
17:00 bnieuwenhuizen: hmm, I'm not familiar with crc based tests
17:00 danvet: repeat across all resolutions
17:01 danvet: pray there's not too many gaps
17:01 daniels: bnieuwenhuizen: you've got a lot of spare time, right? :)
17:01 danvet: amdgpu.ko unfortunately doesn't support it
17:01 danvet: but Lyude is enabling it for nouveau at least
17:01 danvet: bnieuwenhuizen, tldr is we have some debugfs files that let you capture a stream of crc from the display as it scans out stuff
17:02 bnieuwenhuizen: hmm, that needs HW support right?
17:02 danvet: usual test approach is render reference frame with sw into untiled buffer
17:02 danvet: then do all the fancy stuff with other fourcc, color matrix, gamma ramp and check it still matches
17:02 danvet: bnieuwenhuizen, yup
17:02 danvet: but if your hw doesn't have a crc, the hw engineers can't validate anything either
17:02 danvet: so generally it's there
17:03 danvet: it's also generally dirt cheap to implement in terms of gate count, so only the tiniest soc skip it sometimes
17:03 bnieuwenhuizen: hwentlan: do you know how ^ works for AMD?
17:04 bnieuwenhuizen: danvet: sounds like lower priority than just getting modifiers out, but definitely something to put on the todo list ...
17:04 hwentlan: our IGT tests don't yet have the capability to created tiled or compressed buffers
17:04 hwentlan: but kms_color tests work and test against CRC
17:05 hwentlan: the problem with tiling and compression is that this is done in HW and we currently need Mesa to create such buffers for us
17:05 hwentlan: i'd love to IGT test it though
17:05 bnieuwenhuizen: can IGT not rely on mesa or is it a lack of being able to force the tiling mode in mesa?
17:05 emersion: bnieuwenhuizen: there's also the fancy way of doing it: instead of having a checksum, use a writeback connector
17:06 hwentlan: our newer HW technically has writeback but it's not used in production... might be an option
17:07 danvet: bnieuwenhuizen, oh amdgpu actually has crc support
17:07 hwentlan: the other option is to use Mesa to create reference tiled/compressed frames and then use those in IGT, rather than have IGT create them
17:07 emersion: writeback allows to dump bad frames and to make color conversion tests/alpha tests less strict
17:07 himanshujha19964: pinchartl: so, would be it be that there would be a bridge driver for xilinx and another top-level driver creating encoding/connector part or it would expose encoder/connector funcs via `drm_bridge_funcs` while implementing the drm_bridge in the same driver(as done in synopsys) ?
17:08 danvet: bnieuwenhuizen, merged in 4.17 first version
17:08 danvet:totally behind the times
17:08 danvet: hwentlan, apologies :-)
17:08 bnieuwenhuizen: danvet: also the part that your HW folks probably freak out about is exactly what I want tested: making sure we don't accidently miss changes in the compression format between gens and mark them as compatible
17:08 danvet: bnieuwenhuizen, yeah
17:08 hwentlan: danvet: no worries, things are evolving constantly, but this has been there for a while :)
17:09 danvet: I was pondering whether we need some kms/gbm piglits for this
17:09 danvet: go through all the surface formats mesa has, render them, display them, grab crc
17:09 danvet: yell if mismatch
17:10 danvet: bnieuwenhuizen, the debugfs interface is cross-driver, so this might be pretty useful for a bunch of them
17:10 pinchartl: himanshujha19964: there should be two drivers, one for dsi-tx that implements a drm_bridge, and one top-level driver that isn't specific to dsi-tx and uses the bridge
17:10 danvet: bnieuwenhuizen, could also be used to test vk native display stuff
17:11 danvet: or direct display or whatever the vr display extension is called
17:11 pinchartl: the top-level driver could be the driver for the mixer IP core for instance, or a generic Xilinx driver that parses the pipeline description from DT and connects the bridges and other components together
17:14 hwentlan: danvet: i like the idea of CRC checking all surface formats
17:14 hwentlan: i thought igt guys didn't like the idea of pulling in mesa, though
17:14 danvet: hm I guess that doesn't check what bnieuwenhuizen wants to check, since most likely everything works since kms and vk are the same gpu
17:14 danvet: so no catching when the compression format changes in a subtle way
17:14 danvet: hwentlan, I think piglit would be better place for that
17:15 danvet: probably good enough if you just check primary plane and ignore everything else kms has to offer
17:15 danvet: so not much need for the big library
17:15 danvet: "big"
17:16 bnieuwenhuizen: well, formats + modifiers + planes still would sound interesting to make sure planes are set up correctly
17:16 hwentlan: good point, we've been really focussed on IGT in the display team since we deal with KMS but framebuffer formats have a lot of dependency with Mesa. need to look at piglit myself one of these days
17:16 himanshujha19964: pinchartl: so that dsi-tx drm bridge driver would be generic which could be used for in future for similar SoCs having such bridge and separating the SoC specific stuff in a separate top-level driver ?
17:16 danvet: bnieuwenhuizen, yeah ...
17:17 bnieuwenhuizen: like what modifiers does the cursor really support
17:17 mareko: danvet: I think bnieuwenhuizen is adding a mesa test for modifiers
17:17 danvet: hwentlan, I guess maybe we could also use the piglit to capture reference frames at certain sizes
17:17 danvet: and then have a pile of of them gzipped in igt
17:17 danvet: that would make sure they still work the same
17:17 mareko: danvet: but that's a compile-time test for addrlib vs modifiers consistency, so nevermind
17:17 danvet: bnieuwenhuizen, if it's just a cursor and not full blown plane, probably not much
17:18 hwentlan: danvet: that sounds like the simplest idea and probably wouldn't be hard to prototype right now
17:18 danvet: but yeah some asymetric hw has maybe some special YUV compression mode only on overlay planes
17:18 danvet: mareko, yeah the one we discuss is more end2end testing across kms/mesa
17:18 mareko: danvet: about pitch and layer_size, those are derived from width/height, not independent
17:19 danvet: hwentlan, I still think the piglit kms+gbm one would be useful
17:19 danvet: mareko, ok, then I guess we just need to put that into modifier spec, and everyone needs to check the pitch is as expected
17:19 hwentlan: just need to remember that displayable formats sometimes differ from non-displayable formats, i.e. the sets of displayable and non-displayable formats are intersecting sets but neither is fully contained in the other set
17:19 danvet: even if it's totally fake (iirc afbc does that with the pitch)
17:20 danvet: hwentlan, oh sure, it'll need to be a tiny compositor with all the usual code
17:20 hwentlan: makes sense
17:20 danvet: once vk direct display becomes modifier-aware, that code will exist in mesa I think
17:20 mareko: danvet: for 2D images, the pitch is independent on some chips and derived from width by the hw on others
17:21 bnieuwenhuizen: mareko: my plan: for non-compressed on GFX9: anything goes as long as it is aligned to tile size (or 256 bytes for linear), for GFX10 or compressed, do an exact check
17:22 bnieuwenhuizen: (and probably should figure out a max pitch ...)
17:23 mareko: bnieuwenhuizen:. Navi1x derives the pitch from width, so the user pitch is always ignored
17:24 bnieuwenhuizen: mareko: hence the check for exact match of the pitch
17:24 bnieuwenhuizen: under the assumption that it is better to reject than silently ignore/corrupt
17:25 pinchartl: himanshujha19964: correct. the same DSI-TX IP core would work with that driver, regardless of whether it's connected to a mixer IP core, directly to a DMA engine, or part of a more complex pipeline
17:25 pinchartl: that's the whole idea behind the drm_bridge abstraction, making it reusable in different designs
17:26 mareko: bnieuwenhuizen: sounds great
17:27 danvet: bnieuwenhuizen, definitely check, just in case these additional constraints change between chips supporting the same modifiers
17:27 danvet: kinda like max w/h and all that
17:28 danvet: most likely case for this is optimus laptops, where the apu might be different generation than the discrete one
17:28 mareko: danvet: is that still a thing?
17:29 danvet: mareko, optimus laptops?
17:29 mareko: danvet: yes
17:29 danvet: very
17:29 danvet: niche market ofc
17:29 danvet: no idea how much the apu+amd discrete is one, was just an example
17:43 mareko: optimus was probably an attempt of GPU vendors to get into the laptop market that was dominated by Intel CPUs. The market is very different now.
17:52 bnieuwenhuizen: mareko: I think it is still pretty common with gaming laptops?
17:53 jekstrand: I don't know that it's as common with AMD GPUs anymore but there are still a lot of Intel+Nvidia setups floating around.
17:54 Plagman: some of the brand new renoir notebooks have amd apu + discrete
17:54 jekstrand: A lot of it AFAICT is about power. Intel GPUs are pretty conservative on power and if you've got an Nvidia that's big enough for playing actual games, it's not.
17:54 Plagman: i'm not sure if what you call 'optimus' is somehow different than the general multi-gpu render offload case but i wouldn't call it niche, it's a bunch of notebooks out there
17:54 Plagman: and it doesn't seem like it's fading away
17:55 jekstrand: Not at all. If anything, I'd say it's been pretty well cemented.
17:57 jekstrand: And maybe one day we'll see Intel+Intel. :)
17:57 jekstrand: One day.....
18:25 himanshujha19964: pinchartl: Assuming `drm_of_find_panel_or_bridge` finds a panel. Is there any way to implement '.get_modes' feature in drm_bridge driver(eg. synopsys bridge driver) ? I don't see any such feature in `drm_bridge_funcs`.
18:27 danvet: himanshujha19964, there's a panel bridge helper which should give you a fully functional bridge for any panel
18:27 danvet: I think includes get_modes
18:28 danvet: well so this is all changing, with connectors being pulled out of individual bridges into the overall pipeline
18:34 jenatali: jekstrand: I'm almost done with my saga of emulating pointers for DXIL, I just need to deal with function_temp (CL "private") variables. kusma pointed me to nir_lower_vars_to_scratch, which is close to what I want, but not quite, since it doesn't calculate addresses for plain derefs, only for loads/stores, and only when indirectly accessed. I'm debating between: Modifying it/adding options to do what I want; writing my own version of it; or using
18:34 jenatali: lower_explicit_io on function_temp or scratch
18:35 jenatali: The last one feels hacky, but I also feel like doing anything else will just be reinventing the wheel of what it already does
18:35 jenatali: Thoughts?
18:37 jenatali: karolherbst probably has opinions here too
18:39 karolherbst: jenatali: you have indirectly accessed arrays in HLSL too, no?
18:39 jenatali: Yeah
18:39 jenatali: The problem is just turning derefs into offsets
18:40 karolherbst: well using explicit_io helps with that, no?
18:40 jenatali: Yeah, but explicit_io doesn't currently support function_temp or scratch var modes
18:40 karolherbst: ohh.. because you can also just use lower_io for that I think
18:40 jenatali: Oh, haven't looked at plain lower_io
18:41 karolherbst: I have a "NIR_PASS_V(nir, nir_lower_io, nir_var_all, type_size, (nir_lower_io_options)0)" inside nouveau even with clover
18:41 karolherbst: as clover already calls explicit_io
18:41 karolherbst: and up until now it didn't cause issues
18:41 karolherbst: function_temp is really just internal stuff
18:41 karolherbst: doesn't need ptr sizes or so
18:41 karolherbst: or.. well.. not really
18:41 himanshujha19964: danvet: I see `drm_panel_bridge_add` https://www.kernel.org/doc/html/latest/gpu/drm-kms-helpers.html#c.drm_panel_bridge_add, but where would I place that `.get_mode` in that bridge driver. Would it need to create a separate connector and place it in `drm_connector_helper_funcs` ?
18:41 karolherbst: I guess it does... but uff
18:41 jenatali: You can store a __private void* in a global struct though
18:41 karolherbst: yeah...
18:42 karolherbst: but then you are in undefined behaviour world anyway, no?
18:42 karolherbst: although.. maybe it makes sense for shared mem?
18:42 karolherbst: but yeah..
18:42 jenatali: The value doesn't have to be defined, but you could store and then load it
18:42 karolherbst: why would you
18:42 karolherbst: ahh, sure
18:42 jenatali: Reading it on the host obviously doesn't have any meaning
18:42 karolherbst: yeah.. maybe we need to teach explicit_io about it as well
18:42 karolherbst: wouldn't surprise me
18:43 karolherbst: function_temp is.. weird anyway
18:43 jenatali: Yep
18:43 karolherbst: best to never use it
18:43 karolherbst: nv hw doesn't support it anyway
18:43 karolherbst: :D
18:43 danvet: himanshujha19964, I'm confused ... have you tested this and it doesn't work?
18:43 danvet: it's all there already
18:44 danvet: you don't have to do anything
18:44 jenatali: Hah, good to know
18:44 karolherbst: jenatali: yeah.. in nouveau we spill to vram on indirectly accessed arrays...
18:44 karolherbst: nvidia does something similiar
18:44 danvet: himanshujha19964, if you read the implementation, it comes with get_modes and everything else already
18:44 karolherbst: no indirect register accesses
18:44 karolherbst: so...
18:44 jenatali: So it looks like the scratch intrinsics would do pretty much exactly what I want, so I think I'd teach lower_explicit_io about scratch mem mode
18:44 karolherbst: okay, cool :)
18:45 jenatali: The question is, should I manually convert mem modes to scratch beforehand, or should I have lower_explicit_io do it for me?
18:45 jenatali: function_temp -> scratch that is
18:45 karolherbst: mhhh, good question..
18:45 jenatali: Probably cleaner to do it manually, though having variables with the scratch mode (even temporarily) would probably also be weird
18:46 karolherbst: right now in nouveau I rely on variables, but maybe it makes sense to have somethign else as well
18:46 karolherbst: dunno
18:46 karolherbst: function variables might make the most sense for now?
18:46 karolherbst: dunno really
18:46 jenatali: Yeah, not sure
18:46 karolherbst: with spirv you also get those lifetime ops...
18:47 karolherbst: and right now we ignore it
18:47 jenatali: Heh, was just looking up what was done with those
18:47 jenatali: I don't think that's super interesting for us
18:48 karolherbst: well.. it kind of is for us
18:48 karolherbst: could limit the size of the space needed for spilling
18:48 himanshujha19964: danvet: I haven't tested this, nor do I have the hardware right now. I'm just trying to understand the bridge framework :)
18:48 jenatali: Hm, true. We could limit the size of the alloca we emit
18:49 jenatali: Let variables occupy the same space in that memory
18:49 karolherbst: jep
18:49 jenatali: That's a worthy optimization but a bit out of my wheelhouse :)
18:49 karolherbst: :)
18:49 himanshujha19964: danvet: could you please point me that implementation that covers get_modes ?
18:50 himanshujha19964: I see "Creates a drm_bridge and drm_connector that just calls the appropriate functions from drm_panel"
18:50 danvet: panel_bridge_get_modes
18:50 danvet: well that kerneldoc is somewhat outdated
18:50 danvet: there's now some higher level thing on top of bridges that creates the connector
18:50 danvet: pinchartl, sravn maybe something to update?
18:52 jenatali: Alright, I'm going to teach lower_explicit_io about scratch mem modes, and manually update the variables to scratch before running it
18:52 jenatali: karolherbst: Thanks for the discussion
18:53 himanshujha19964: danvet: yeah, couldn't find it latets docs. But yeah, it's out there there, thanks!
18:53 danvet: himanshujha19964, well it's open source, so can always read the code :-)
18:53 danvet: himanshujha19964, also patches to improve the docs very much appreciated
18:54 danvet: we don't have proper doc writers, it's all done by developers who created it, and they're often simply too close and stuff gets overlooked
18:54 pinchartl: himanshujha19964: if drm_of_find_panel_or_bridge() finds a panel, I recommend wrapping it in a brige with drm_panel_bridge_add(), so you always have a drm_bridge
18:54 jenatali: Oh, nvm, there is no variable mode for scratch. function_temp it is
18:54 danvet: pinchartl, I kinda wonder whether we should change that function to just always give you a bridge
18:55 pinchartl: in the future I plan to make bridge creation for the drm_panel automatic
18:55 danvet: and never directly a panel
18:55 pinchartl: danvet: you've read my mind :-)
18:55 danvet: pinchartl, awesome
18:55 pinchartl: I was thinking of first converting direct drm_panel users to drm_panel_bridge_add(), but maybe handling it in drm_of_find_panel_or_bridge() is better
18:56 pinchartl: I'll have a look, but now I'm busy with the conversion to DRM_BRIDGE_ATTACH_NO_CONNECTOR
19:08 himanshujha19964: danvet: Ok, so it was introduced recently in v5.7(2be68b5), and I was at v5.5 :-/
19:13 danvet: himanshujha19964, well this is graphics, always linux-next or you're on outdated code :-)
19:28 jekstrand: jenatali: None of those options sound entirely insane.
19:31 jekstrand: jenatali: lower_scratch is pretty simple. Duplicating it wouldn't be the end of the world either.
19:31 jekstrand: jenatali: That said.... Given that you want variable pointers, lower_explicit_io is likely what you'll need.
19:31 jenatali: Yeah, but lower_explicit_io is pretty complex, and I'd have to duplicate quite a bit of that into it :)
19:32 jekstrand: jenatali: Yeah, I kind-of think teaching nir_lower_io about scratch is the way to go
19:32 jenatali: Great
19:32 airlied: Zeising: I might do one today
19:32 jenatali: I've got that implemented already and it's not complex at all
19:33 jekstrand: cool
19:33 jekstrand: jenatali: As long as you run dead_vars and some other opts ahead of it, it should only end up lowering variables that actually need to be on the stack for the sake of pointers.
19:34 jenatali: Yeah, exactly
19:37 jekstrand: kusma: Have you thought about plugging clover into Zink?
19:37 jekstrand: kusma: With all this work going into Clover over D3D12, it might not be too bad.
19:37 agd5f_: mareko, pretty much the only way to get a dGPU in a notebook is hybrid graphics these days
19:37 agd5f_: pretty common
19:37 airlied: unless eric_engestrom gets to it first
19:37 agd5f_: still a fair amount of gfx8 as well
19:38 agd5f_: how do modifiers handle alignment requirements? E.g., some GPUs may require 64K for linear others maybe 256K, etc.
19:39 kusma: jekstrand: I have thought about it, but not done anything serious with it. I think I even mentioned it in the initial presentation.
19:39 kusma: jekstrand: But yeah, it would be an interesting thing to try out.
19:39 emersion: agd5f_: they don't handle alignment
19:40 kusma: It would be a nice exercise to generalize some of the DXIL passes, perhaps :)
19:40 emersion: alignment needs to be handled via other means, currently non-existing
19:41 agd5f_: so they are generally useless :p
19:41 emersion: modifiers?
19:41 emersion: they are about image layout only
19:41 emersion: but yeah they are not enough
19:42 emersion: i've started accumulating some notes to have a Better Solution, but i feel like i don't know enough to start suggesting something
19:42 anholt_: emersion: you're aware of the allocator work, right?
19:42 emersion: ofc
19:42 anholt_: ok.
19:43 emersion: would be nice to talk (again) about this next XDC
19:44 emersion: i should ask james to see if he has some plans
19:46 bnieuwenhuizen: agd5f_: I think one of the advantages of modifiers even with a single GPU are the negotiation (on AMD mainly about whether you can use DCC etc.)
19:48 emersion: yeah, it's not possible to use radv to import/export dmabufs right now
19:48 emersion: only modifiers can make this possible
19:58 agd5f_: emersion, why not?
19:59 emersion: agd5f_: is the question "why aren't modifiers encoding alignment requirements too?"?
20:00 agd5f_: why can't you use dma-buf without modifiers?
20:00 bnieuwenhuizen: agd5f_: mostly some stupid Vulkan API stuff ...
20:00 Lyude: is there any way to run depmod on a kernel you've just built, _without_ installing it somewhere first? it seems like plain old depmod doesn't even work for this
20:01 emersion: agd5f_: ah, because when importing a dma-buf, radv (or any other consumer) will need to guess the image layout
20:01 bnieuwenhuizen: where nobody wants to make a standard out of the old and underspecified thing because modifiers are around the corner
20:01 agd5f_: ok
20:01 emersion: agd5f_: that _can_ work in some cases, but if the producer has allocated with USE_SCANOUT, it won't
20:02 emersion: agd5f_: because USE_SCANOUT changes the image layout
20:02 emersion: i believe there were also other things, like DCC
20:25 mareko: emersion: alignment is part of the layout already, you can't have tiling without alignment
20:25 emersion: well, everybody assumes <sensible alignment> i guess?
20:27 emersion: what do you mean by "can't have tiling without alignment"?
20:27 emersion: i thought "image layout" meant "how pixel data is layed out in the buffer"
20:27 mareko: a buffer offset has to be aligned to the largest tile
20:27 emersion: buffer offsets aren't part of the modifiers, they are extra metadata
20:27 emersion: err
20:28 emersion: plane offsets aren't part of the modifiers, they are extra metadata
20:28 mareko: true, but there is also buffer alignment that was specified at allocation time; if a buffer with a wrong alignment is imported, the driver might have to reject the import
20:29 mareko: i.e. a modifier has to describe everything that is not implied by width,height,format
20:30 emersion: hmm, but modifiers typically don't describe this
20:30 emersion: they also don't describe requirements like contiguous memory etc
20:30 mareko: on AMD they will
20:31 emersion: that sounds like a bad idea, or so i've been told
20:31 emersion: daniels: ^
20:31 emersion: danvet not here unfortunately
20:32 anholt_:is in agreement with mareko that modifier should imply alignment for amd's tiling modifiers.
20:33 mareko: that said, we can impose any requirements on modifiers that we want, such as proper buffer alignment, and there is nothing you can say against it, because the alternative would mean that everything is broken
20:34 mareko: there is already a requirement on an exact pitch, i.e. you can't set your own pitch
20:34 airlied: eric_engestrom: releasing libdrm now
20:35 daniels: yes, tiling has to imply some minimum alignment, because otherwise the maths doesn't work, and that's fine ... but it's specifying alignment beyond that which doesn't work, because then your buffer is described by multiple unique modifiers
20:36 mareko: agd5f_: do we have 64K alignment for linear anywhere?
20:37 emersion: what i've understood is that having two modifiers that are the same except for alignment is a bad idea
20:39 bnieuwenhuizen: creation a couple of modifier with the same layout modulo alignment isn't the end of the world no?
20:40 mareko: it doesn't matter, apps will use the intersection of modifiers supported by all components they talk to, apps don't care about tiling or alignment
20:41 eric_engestrom: airlied: thanks! I got home a few minutes ago and I was going to do it now, but since I've never done a libdrm release before it's easier/quicker if you do it :)
20:42 airlied: eric_engestrom: it's been a long time since I've done one, had to drag back some old memories and then work out the updated process
20:42 airlied: well it's done now, so hopefully I got it right
20:44 agd5f_: mareko, IIRC, a lot of r6xx-NI class hw
20:44 eric_engestrom: airlied: awesome, thanks!
20:45 agd5f_: emersion, moreoever, the tiling alignments are dependent on the GPU memory config in a number of cases (e.g., number of pipes, memory channels, etc.)
20:49 eric_engestrom: Zeising: ^
20:51 mareko: agd5f_: yeah the current plan is to support modifiers on gfx9 and later
20:52 airlied: uggh building gl cts with asan is going to be a bad life choice I suspect
20:53 airlied: karolherbst: did you ever do that? ^
20:53 mareko: airlied: there are worse life choices :)
20:53 karolherbst: airlied: uhmmm... why would you wanna do that?
20:54 karolherbst: planning on debugging a gl cts bug or mesa?
21:00 mannerov: Anyone knows why hugetlb is not used more ? I figured out on my system HugePages_Total was 0, so I guess it must be default for a lot of people too, and thus apps are not using that much
21:01 mannerov: Experimenting with that and storing gallium nine textures in memfd, hugetbl is 10 times faster than normal mmap or allocation
21:01 airlied: karolherbst: cts crashes after a few days randomly, trying to work out if ith as bugs or mesa has :-P
21:01 karolherbst: airlied: ehhh..
21:01 karolherbst: or master
21:02 airlied: karolherbst: master
21:02 karolherbst: ahh
21:02 karolherbst: yeah.. well
21:02 karolherbst: have fun I guess?
21:02 karolherbst: actually.. I might try to do the same as I still have this unexplainable issue in nouveau
21:09 bnieuwenhuizen: airlied: after a few days? what are you doing to that poor thing?
21:13 karolherbst: my guess? llvmpipe :p
21:15 mareko: you can run glcts via piglit where it runs everything in parallel or even 1 test per process
21:15 airlied: bnieuwenhuizen: yeah full GL CTS runs for submission do about 90 passes
21:16 airlied: with llvmpipe that takes a long time
21:16 airlied:doesn't get crashes on a straight single pass run
21:16 karolherbst: airlied: can you actually use master for submissions?
21:16 karolherbst: imirkin was convinced it has to be
21:16 karolherbst: or well..
21:16 karolherbst: any released version
21:17 airlied: karolherbst: yeah I switched to master until I get it fixed
21:17 anholt_: airlied: parallel-deqp-runner might help, in that it would leave you a short caselist of the set of tests where a fail happened. (if you can reproduce there)
21:17 airlied: then I'll go back once I have a working baseline
21:17 karolherbst: ahhh
21:17 airlied: anholt_: nope can't reproduce on anything smaller than the full cts-runner submission
21:17 karolherbst:having the same issue with a different issue
21:17 bnieuwenhuizen: airlied: and here I'm happy that VK CTS allows you to shard 8x now :P
21:19 airlied:just ran deqp gles 31 on virgl on llvmpipe with a debug llvm build, > 1 day as well :-P
21:20 bnieuwenhuizen: airlied: debug llvm though ...
21:22 airlied: yeah I deserve that one :-P
21:23 HdkR: But would a dual 64core epyc system make it complete faster? :P
21:24 bnieuwenhuizen: HdkR: I'm guessing not as he probably runs it single threaded
21:25 bnieuwenhuizen: oh nvm, llvmpipe has internal multithreading, though I'm curious how expensive that part is in CTS
21:25 airlied: yeah single threaded :-p
21:25 airlied: yeah its mostly compilerand cta
21:25 HdkR: wow
21:25 airlied: cts
21:26 bnieuwenhuizen: In Vulkan it is mostly the same. Compiler and CTS reading back results from the GPU
21:36 imirkin: karolherbst: my understanding was that you had to run stuff on that "cts" branch
21:36 imirkin: or whatever it was called
21:36 karolherbst: I want to use a branch where we don't hit our stupid bug :D
21:37 imirkin: just make one where all the tests return success :)
21:37 hopetech: airlied: As I don't see you on the virglrenderer channel: do you mind if we drop the release script from virgl?
21:37 hopetech: See https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/398
21:45 airlied: hopetech: ugh irc keep losing it on me
21:45 airlied: hopetech: does gitlab produce stable tarballs? i.e. exact same tarball everytime you generate it
21:51 hopetech: airlied: good question. But Gert seems happy with it
21:56 airlied: hopetech: sone distros care about it
21:56 airlied: im not sure i do that much
22:02 hopetech: I will investigate tomorrow then.
22:02 hopetech: Keep you posted
22:10 antognolli: so... it seems that dri2_create_image() has an "use" parameter where __DRI_IMAGE_USE_SCANOUT can be passed, but dri2_create_image_with_modifiers() doesn't. Which means that PIPE_BIND_SCANOUT might not be set for scanout buffers when we have modifiers... is that right?
22:15 bnieuwenhuizen: airlied: yep
22:16 bnieuwenhuizen: er, antognolli ^
22:21 antognolli: bnieuwenhuizen: oh, so that's expected? Then how is one supposed to check if a buffer could be potentially used for scanout in a gallium driver?
22:26 emersion: antognolli, bnieuwenhuizen: see discussion in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197
22:26 antognolli: emersion: lol, that's what I was looking for, thanks :)
22:30 bnieuwenhuizen: emersion: yes, fully aware :P
22:31 Plagman: emersion: bnieuwenhuizen: discussion above makes sense, yeah my general understanding is we'd be stuck with hacks until 'modifiers land'
22:31 Plagman: question that remains unclear is how can we help get closer to that mystical situation where 'modifiers have landed'
22:47 mareko: airlied: I use release llvm with assertions enabled but no debug symbols, and I've been pretty happy with that
22:54 jekstrand: @jekstrand no my patch wants to be allowed to remove things with nir_var_mem_ubo so long as the layout is set to packed, this patch conflicts logically with my patch.
23:11 mattst88: airlied: FWIW, I found and fixed a ~14 year old bug in git that puts the wrong checksum in the tarball created by git archive
23:11 mattst88: but the patch was rejected because it would have changed the generated tarball (duh)
23:12 mattst88: their assumption was that software like cgit/github/gitlab used git archive and that distros wanted byte-for-byte identical tarballs
23:13 airlied: mattst88: yeah they try to keep it stable, I know github didn't for ages though
23:13 mattst88: ouch
23:16 airlied: https://github.com/keybase/client/issues/10800
23:16 gitbot: keybase issue 10800 in client "Github generated tarballs change randomly" [Linux, Closed]
23:16 airlied: timestamps end up in them
23:17 airlied: maybe gitlab does it right
23:23 mattst88: strange, wonder why I wasn't aware of that in Gentoo