00:02airlied: jekstrand: hehe yeah compute workloads are the future :-P
00:03jekstrand: airlied: Just implement rasterization in a compute shader. Done.
00:03jenatali: I mean, UE5's already there
00:05airlied: jekstrand: isn't that just llvmpipe in SPIR-V? :-P
00:05airlied: seems like a long weekend could knock it over :-P
00:06alyssa: jekstrand: Larabee? is that you?
00:06imirkin: radeonsi already has compute-based T&L
00:06imirkin: rast is next!
00:06mareko: it has only compute-based primitive culling
00:07imirkin: mareko: how does that work btw? do you essentially TF it all out into a buffer, then compute, then feed it through a no-op vertex pipeline?
00:10mareko: imirkin: it launches async compute for the draw and generates a new index buffer without culled primitives, then the draw is executed as normal
00:10imirkin: mareko: right, but the input into the culling presumably has to be the output of the regular vertex pipeline?
00:11mareko: the input of the regular vertex pipeline is the input to the compute shader
00:11imirkin: so then you must execute the regular vertex pipeline in the compute shader, no? (to get the final positions)
00:11mareko: the compute shader is generated from the vertex shader to compute the position exactly like the VS would, so it has all VS UBOs, VBOs, samplers, etc.
00:12imirkin: right ok
00:12imirkin: and presumably none of this works if you have tess or gs?
00:12imirkin: cool, makes sense
00:14anholt_: like a binning shader for a single bin!
00:15airlied: oh man enabling gl4.0 on llvmpipe really upsets the CI piglit results
00:19krh: SPIR-V coming to eBPF soon!
00:20airlied: krh: WHLSL :-P
00:20airlied: or whatever the I can't believe it's not SPIR-V for the web is
00:23krh: ah, we can always use another shading language
00:26mareko: HLSL is the only true god!
01:32DrNick: itym DXIL
02:19thaytan: airlied, I sent that Rift S non-desktop quirk patch to dri-devel, but I can't tell if anyone merged it or not.
02:31airlied: thaytan: I might just merge it to my fixes now
02:34thaytan: airlied, ta!
03:30airlied: bnieuwenhuizen, agd5f_ : amd next just added a new mem sync uapi for amdvlk, is it useful for radv as well?
03:30airlied: mareko: or for radeon?
05:22mareko: airlied: does radv invalidate caches at the beginning of IBs?
05:22mareko: airlied: especially L2
05:26airlied: mareko: don't think we do
05:26airlied: but I'm a bit rusty in the details
05:28mareko: airlied: what is preamble CS?
05:28mareko: airlied: if you don't invalidate caches at the beginning of IBs, you should set MEM_SYNC; otherwise, ignore
05:29airlied: preamble_cs just inits the clear stte and rings/scratch stuff
05:29airlied: oh then it does do an invalidate
06:14imirkin: would one expect gitlab-ci to match "src/gallium/drivers/freedreno/**/*" to "src/gallium/drivers/freedreno/freedreno_blitter.c" ?
06:14imirkin: (it doesn't seem to)
06:19imirkin: oh hm. looks like i need to trigger x86_cross_arm_test now too. maybe that was it.
07:12hanetzer: hey peeps. So, I recently acquired a zisworks monitor kit which has multiple modes. its 'best' mode is 4k120hz, which is fed over two displayport cables to the gpu and uses the EDID TILE property to glue them together. I can't quite seem to make this work properly.
07:13HdkR: hanetzer: With what GPU family?
07:13hanetzer: amdgpu, rk580
07:13hanetzer: xrandr --props shows the tile info.
07:14HdkR: Hm, I had my 8k60 tile display at least /semi/ working with my RX590
07:15hanetzer: display env?
07:15HdkR: i3wm and setting everything manually with xrandr
07:15hanetzer: I mean, 1080p240hz is pretty sweet in its own right.
07:16hanetzer: gdm/etc or raw startx?
07:16HdkR: eh, whatever Ubuntu ships with these days. Not sure if that is gdm or lightdm
07:16hanetzer: but something other than tty+startx, gotcha.
07:17hanetzer: care to share the sauce?
07:18HdkR: I just did xrandr --output <output> --mode <mode> for each of the tile modes and display connections
07:18hanetzer: ever touch setmonitor in this?
07:21airlied: setmonitor helos for xinerama followers
07:21hanetzer: ah. so that's xinerama stuff then/
07:21airlied: i3 might use xinerama hints
07:22hanetzer: HdkR: the two tiles, are they 'joined'?
07:22HdkR: Yea, one is tile 0, the other is tile 1
07:22hanetzer: but a full window spans both?
07:23hanetzer: care to share the output of xrandr -q, xrandr --listmonitors, and xrandr --prop ?
07:23HdkR: I don't have it hooked up anymore to my Linux machine :P
08:08MrCooper: imirkin: if the rules didn't match, the job wouldn't exist at all in the pipeline
08:08MrCooper: hanetzer: which Xorg driver are you using?
08:09MrCooper: which version?
08:10MrCooper: k, that should do its part of the TILE property handling
08:10hanetzer: there's a possiblity that my cables aren't good enough, I believe. any way to check?
08:12HdkR: Make sure they are rated for the full 32.4gbit/s and then hope they didn't lie :P
08:12hanetzer: fair enough.
08:13MrCooper: airlied, danvet: FWIW, fglrx used to have the same KMD<->UMD version interlock as other proprietary blobs, but that's no longer the case with amdgpu; maybe this can serve as precedent for not accepting the former upstream
08:14MrCooper: (otherwise it might prompt some people at AMD to wonder why they bothered)
08:15MrCooper: (people outside of the open source graphics driver teams I mean)
08:16hanetzer: HdkR: don't suppose you have any recommended ones?
08:16haasn: I remember reading something to the extent of "opengl drivers need to maintain an extra copy of the frontbuffer to allow users to read back from the default framebuffer", specifically in the context of vulkan no longer including this ability to access images after they've been submitted to the swapchain
08:16HdkR: https://www.amazon.com/gp/product/B07J2PGGFX These ones are the ones that I have and they work with my 8k panel
08:17haasn: the specifics I remember reading were something along the lines of "you can just maintain your own frontbuffer VkImage and blit from it when presenting, that's what the opengl driver has to do internally anyway to permit this functionality"
08:17haasn: Is this true?
08:19MrCooper: OpenGL drivers only have to jump through hoops if the app actually uses GL_FRONT
08:19hanetzer: HdkR: sold.
08:20danvet: MrCooper, oh yeah we can't have that I think
08:20danvet: I mean it's already pain when you try to build containers
08:20danvet: I think the wsl2 solution is essentially that they inject the right userspace somehow
08:20danvet: and upgrade in lockstep
08:21danvet: in upstream we just cant do that, no way
08:24MrCooper: haasn: but it works slightly differently, grep for 'fake_front' in mesa/src/loader/loader_dri3_helper.c
08:25haasn: MrCooper: that's exactly the code I was looking for, thanks!
08:27wm4: so all that recent media buzz about dx/wsl, it's still unclear whether you can use the existing d3d implementations that are either in mesa or floating around somewhere in normal userspace programs
08:27wm4: is there a definitive answer?
08:30daniels: yes, there is
08:31daniels: /dev/dxg is a direct translation of the existing Windows DX kernel APIs, and accordingly libdxcore.so + libdirectx.so are implementations of the same existing Windows DirectX DLLs
08:31daniels: WSL will bind-mount in the proprietary .sos as well as the proprietary vendor drivers (command-stream assembler, shader compiler, etc)
08:32daniels: clients write to the same API so essentially 'just' need to be recompiled
08:32daniels: in that sense, the existing implementations (Mesa and others) will work, at least for compute and other headless things
08:32daniels: what's currently missing is any implementation of DXGI, since Windows's window system and Linux's are ... not quite the same
08:32headless:wakes up :-)
08:34daniels: https://lists.freedesktop.org/archives/dri-devel/2020-May/266691.html is the most authoritative answer you'll find on their plans, which is that they're currently running Weston with the RDP backend in userspace, but doing that for hardware-accelerated clients will involve figuring out how to bridge the underlying DX model for buffer sharing with how Linux window systems work
08:35wm4: I was asking more about native reimplementations of d3d that have done before, e.g. apparently https://github.com/mesa3d/mesa/tree/master/src/gallium/frontends/nine
08:35wm4: this is a great mystery to me
08:36daniels: oh, I see. no.
08:37daniels: well, I suppose you could use the Nine frontend (providing DX9 API to clients) with the D3D12 backend (providing acceleration through a DirectX12 implementation), but that would be pretty perverse tbh
08:37daniels: you could use the /dev/dxg kernel API to implement an open driver on, e.g. port Intel's ANV Vulkan driver to use /dev/dxg rather than DRM
08:37daniels: but I don't think the driver teams are quite that bored at the moment
08:38wm4: yeah, /dev/dxg on its own doesn't seem very attractive, since you'd have to implement both sides anyway (though on wsl you could possibly test with the closed half in each case)
08:49MrCooper: daniels: the Windows drivers would have to stop assuming KMD<->UMD version interlock for that to be feasible?
08:50MrCooper: or do you mean shipping ANV as part of the Intel Windows driver package?
08:50daniels: MrCooper: well yeah, you'd have to figure out an ANV -> D3D12KMT ABI
09:03hanetzer: only thing that really sucks about this monitor setup is I have to dismantle it to swap cables (the ports are internal)
10:06HdkR: https://imgur.com/vUnHnAP Huh. Ice Lake capping out at 400Mhz now rather than the typical 100Mhz. I wonder why it chooses different maxes
10:06HdkR: Does it hit a thermal cap /sometime/ when the laptop is running and then that becomes the new max speed when throttling happens?
10:07ickle: you need to also look at the pkg power
10:08daniels: and you need to look at what your vendor has enabled for adaptive PM
10:09HdkR: er, I should say this is an ongoing issue. The laptop isn't currently hitting package power limits or thermal limits. Just after some amoutn of uptime, it starts refusing to go above a clock speed
10:09HdkR: Ice Lake woes
10:10HdkR: I just started glxgears to see what the clock speed the GPU was hitting since chrome was feeling sluggish
10:16HdkR: Although I haven't hit the GPU hang in a few days. Mostly because I've been restarting the laptop daily to get around this clock speed bug :/
12:33danvet: emersion, pq does either of you have drm-misc commit rights?
12:33danvet: or someone handy to do it for you
12:35pq: err, umm...
12:36pq: I'll have to ask
12:39pq: I saw daniels raise a hand :-)
12:40danvet: daniels, tag, you're it
12:40danvet: there's also a bunch of other collaborians with commit rights I whink
12:40daniels: you know emersion doesn't work for us right :P
12:40danvet: you don't hire everyone?
12:41daniels: but yeah, I would've loved to have had those docs, so of course I'll push it if emersion points me to an amended version :)
12:41daniels: danvet: unfortunately we have this stupid business model where our cost and revenue are in direct proportion to each other
12:42daniels: as opposed to the ones who are so successful at selling gcc cross-compiler binaries that they give away consultancy basically for free, or anyone with VC, or ...
12:43daniels: so my dream of being the world's puppet master is on hold whilst I answer questions like 'can we pay for it?'
12:43emersion: is this about the CRTC prop docs? :)
12:44emersion: ohh, missed danvet's reply
12:44emersion: ah, 5min ago, okay :)
12:46danvet: emersion, :-)
12:50pq: danvet, re: device unplug; I was trying to ask if UDL is already faking success all around? If it is, we can't decide to return EIO, because that would regress userspace.
12:50pq: at least userspace that worked with UDL and its unplug
12:55emersion: i'd've hoped "unplug" would be seen as a new feature, not something subject to user-space regresion rules
12:56pq: emersion, except for devices where unplug already works (UDL?)
12:56danvet: pq, I think it's faking success
12:56danvet: as in eats the usb subsystem errors
12:56danvet: the other usb driver we have tiny/gm12u320 definitely eats the errors
12:56danvet: atomic modeset forces you to do that :-)
12:57pq: ok, so returning errors to userspace would be a regression with those, so there is a real precedent to faking success
13:10MrCooper: danvet: your "dpms guarantee" as written could be interpreted as "changing only ACTIVE in a commit can never fail", which isn't true when toggling it from 0 to 1
13:10MrCooper: if other state has changed since it was set to 0
13:12MrCooper: maybe that's what you meant, but I think it would need to be expressed more clearly
13:15danvet: MrCooper, yeah I'm not clear enough there ... can you reply with a suggestion perhaps?
13:17vsyrjala: i wonder how many drivers actually implement that part correctly
13:18vsyrjala: i guess maybe the really simple ones might
13:30danvet: vsyrjala, as long as drivers never consult crtc_state->active it should work
13:32vsyrjala: all i know is i915 doesn't do this right
13:33danvet: uh we should
13:33danvet: if we don't check wm for !active that's a bug
13:33danvet: I thought I checked that way back on first atomic conversion
13:33danvet: mlankhorst, ^^ did we break this meanwhile?
13:33vsyrjala: it's been broken since ~forever
13:34vsyrjala: it's one of those todo list items i never seem to get to
13:42mlankhorst: danvet: we suck at watermarks :)
13:42mlankhorst: dpms may fail on active change
13:43danvet:happy to outlaw that
13:43danvet: that's just suck :-)
13:43mlankhorst: yeah but our code will fall over otherwise
13:44danvet: MrCooper, so I think it's actually really just ACTIVE 0->1 that must succeed
13:44danvet: we always atomic_check, and we always pin buffers and all that
13:59CrazyByDefault: hey guys, we're unable to get X to start with mesa on 20.04, while the same install works for us on 16.04
14:00joey: probably not the right place to ask, but, when i boot linux without X, i can't get my console to display on my main monitor (Nvida). It only displays on intel integrated graphics-Monitor (multi monitor setup). I can't seem to find where to change this. Is there any config file involved?
14:00CrazyByDefault: startx fails with Fatal server error: AddScreen/ScreenInit failed for driver 0
14:09MrCooper: danvet: my point is that ACTIVE 0->1 can legitimately fail, if other state has changed since it was set to 0
14:09MrCooper: CrazyByDefault: can you pastebin the Xorg log file?
14:09pq: it sounded like setting other state would have failed, if the 0->1 change wouldn't work
14:10Hura_Italian: One sec I'll paste bin it
14:10CrazyByDefault: MrCooper Hura is with me
14:11danvet: MrCooper, well if your driver works, it should have caught that in atomic_check
14:11danvet: i915 isn't quite correct and gets this a bit wrong
14:11danvet: with the legacy drivers it's an entire different mess
14:11danvet: and there DPMS off->on could indeed go boom if you changed the config
14:12danvet: but also, it was impossible to change the config without a forced dpms on
14:12MrCooper: danvet: how is that supposed to work when the primary plane gets disabled due to the FB assigned to it getting destroyed, while ACTIVE is 0?
14:12danvet: so kinda hard to pull off
14:12danvet: MrCooper, if your driver doesn't cope with disabled primary plane
14:12danvet: we shut down the entire crtc pipe
14:12danvet: i.e. MODE_ID = 0
14:13MrCooper: then the cursor plane breaks with amdgpu DC
14:13danvet: so yeah for that case ACTIVE will not work
14:13danvet: I mean destroying the drm_fb is kinda a config change that might wreak stuff
14:13Hura_Italian: Here is the Failed X Server on Ubuntu 20.04: https://pastebin.com/nseZcF71
14:13MrCooper: HW can't show the cursor without the primary plane
14:13danvet: don't do that :-)
14:13danvet: MrCooper, that's not breaking this
14:14MrCooper: it breaks the HW cursor with mutter
14:14danvet: we always had the semantics that if you destroy the currently use drm_fb for a primary plane, then the crtc gets shut off
14:14danvet: it's just that with atomic it's possible that this isn't required, depending upon hw
14:14danvet: but there's lots of drivers that still require it
14:14danvet: so the fb cleanup code actually has 2 paths, one just shuts off the plane, the other shuts of the entire ctrc if the plane shutoff didn't work
14:15danvet: MrCooper, how?
14:15danvet: the "breaks hw cursor with mutter" part
14:16Hura_Italian: Here is the Working Xserver log on Ubuntu 16.04: https://pastebin.com/gtEdgLh4
14:16Hura_Italian: We did the same setup in both the cases
14:18MrCooper: danvet: it sets DPMS off, then it destroys the primary plane's FB, then it sets DPMS on again and does drmModeSetCursor with a non-0 FB => EINVAL, and from then on it doesn't use the HW cursor anymore
14:18Hura_Italian: We installed meson and ninja, and built both drm and mesa from repos using this code on both the systems: https://pastebin.com/LMsP5cdk
14:19danvet: MrCooper, well yeah that doesn't work well
14:19danvet: DPMS off doesn't release the resources, so the fb is still in use
14:19danvet: but this should be equally broken on legacy modeset
14:20danvet: if not I guess we need to tune the compat logic a bit more in the legacy cursor ioctls
14:21MrCooper: mutter doesn't use atomic API yet
14:22danvet: MrCooper, hm I read around a bit in core/helper code
14:22danvet: I'm not seeing where that EINVAL comes from
14:22MrCooper: Hura_Italian: sure those are the right log files? They both show an unsuccessful attempt at using the generic fbdev driver
14:23danvet: I thought we allow you to update planes and change how they're linked to crtcs while the crtc is off
14:23MrCooper: danvet: it's from amdgpu DC, because the cursor plane is on but the primary is off
14:23Hura_Italian: The one 16.04 was terminated by ctrl-c but it was working I could use glxinfo to check OpenGL version
14:24danvet: MrCooper, change amdgpu to never allow the primary plane to be disconnected then
14:24danvet: drivers really shouldn't allow that if all they can scan out is black and no other planes on top
14:25MrCooper: which brings us back to: how is that supposed to work when the primary plane gets disabled due to the FB assigned to it getting destroyed?
14:25danvet: MrCooper, this is not what happens when you disallow disabling the primary plane
14:26danvet: we kill the entire crtc, primary plane stays connected
14:26danvet: more or less
14:26danvet: it might get messy in details
14:26MrCooper: I can try that
14:27danvet: atomic_remove_fb() in drm_framebuffer.c is what drives this
14:27danvet: legacy compat gore, but it's supposed to make this kind of stuff work
14:27danvet: assuming the driver doesn't try to one-up with more clever tricks of its own
14:27MrCooper: you're saying the atomic should fail whenever the primary plane is disabled but the CRTC is enabled?
14:27MrCooper: atomic check
14:28danvet: yeah if your hw can't handle that really, you need to reject that
14:29danvet: drm_simple_kms_plane_atomic_check() respectively drm_atomic_helper_check_plane_state()
14:29danvet: since amdgpu isn't using drm_atomic_helper_check_plane_state() it's a safe bet it's broken and wrong :-)
14:31danvet: writing a correct atomic driver is really hard, this is why we need igt
14:32danvet: driving an atomic driver correctly is also pretty tricky, hence vkms
14:35MrCooper: danvet: BTW, you have noticed I'm no longer working for AMD, right? :)
14:35CrazyByDefault: MrCooper not to be a bother, but do let us know if those logs we shared up there make any sense to you! :)
14:37danvet: MrCooper, yeah you suddenly care about userspace other than the -amdgpu ddx :-P
14:38MrCooper: CrazyByDefault: looks like something changed in the kernel, and the Xorg fbdev driver can't cope
14:39CrazyByDefault: hot damn. one of the lines that did differ is - FBDEV(0): Depth 24, (==) framebuffer bpp 24 in 16.04
14:39CrazyByDefault: and FBDEV(0): Depth 24, (==) framebuffer bpp 32 in 20.04
14:40CrazyByDefault: https://www.x.org/wiki/FAQErrorMessages/#index3h2 says the error we're getting is caused by incompatible screen depth/res configs for the display adapter
14:41CrazyByDefault: so i'm guessing something is tempting fbdev to default to 32 bpp and kill the server if that fails, while 24bpp worked fine. we're unable to get it to try 24bpp on 20.04 as of yet, any ideas there?
14:41mareko: danvet: since you're the atomic expert... we have to run a compute shader in the kernel before every flip, is it something that would work with the atomic API and would it break userspace because compute would add latency to every flip?
14:42danvet: mareko, late retropojection for VR or something else?
14:43mareko: danvet: DCC compression postprocessing for compatibility with the display hw
14:45danvet: generally people try to make this stuff match between render engine and display
14:45danvet: at least for some of the layouts
14:45danvet: that's kinda what all the modifier stuff is for
14:45danvet: mareko, your gpu can't render in the right format at all?
14:46mareko: danvet: correct
14:47danvet: ok, so essentially you have a CCS format you can render/sample
14:47danvet: and a slight different one you can scan out, and you need to do a compute job resolve pass to get there?
14:47lynxeye: mareko: you can set up the shader in prepare fb and launch it as soon as the buffer fence signals, just need to make sure you replace the fence atomic is waiting on with your compute shader write fence. It breaks all the benefits of explicit fencing though, as you do stuff behind the scenes and userspace doesn't know about it...
14:48danvet: so the way this is supposed to work roughly is you create 2 drm_modifiers (you might need more, but this is for the basic idea)
14:48danvet: gl/vk advertises both, and makes the render/sample one the preferred one
14:48danvet: kms only advertises the one it can scan out
14:48danvet: then you enable modifiers on the entire stack
14:48danvet: and when you do direct flip you pass the modifier hint up to the app to tell it to use the kms-capable modifier
14:49danvet: which you can then use to schedule the compute job as part of the winsys flip magic/resolve/whatever else that needs to be done there
14:49danvet: so for full atomic glory for this step 1 is: get amdgpu modifier'ed
14:49danvet: doing a compute job from the kernel to paper over this late ... pls no :-)
14:50danvet: I thought I've seen some amd tiling modifiers fly by, so this shouldn't be too much horrors and tons of work
14:51mareko: danvet: the problem is all window systems would have to be changed to execute the compute job (X, Wayland, Android, ChromeOS, etc.)
14:52danvet: mareko, no, not with modifiers
14:52danvet: that's the problem modifiers are supposed to fix
14:52danvet: the one thing you can't easily do is the in-flight resolve
14:52danvet: maybe vk is good enough to express that
14:52danvet: jekstrand, ^^ can vk express that you can transition from one modifier to another with a resolve pass only?
14:53danvet: so generic userspace would need to do a copy when things change
14:53danvet: but then you reallocate everything in the right layout as specified by the modifiers
14:53danvet: and your winsys in mesa does the compute job
14:53danvet: compositor wont ever have to run it
14:54dj-death: danvet: you can copy into a different image with a different modifier
14:54danvet: but once the preferred modifiers has trickled down to all clients, it should be stable setup and no surplus copying or resolve or anything like that going on
14:54dj-death: danvet: but in general vk images have one modifier and they stick to it
14:54dj-death: at least for anv
14:54danvet: mareko, you can also do this for fun like passing entirely render/sample-only CCS layouts from client to compositor for gl compositing
14:55danvet: dj-death, well more a spec question really
14:55danvet: would still need to get compositors to use it
14:55danvet: and since it's a transition state only, probably not worth the trouble
14:55danvet: if the first frame when going fullscreen does a surplus copy and then it's all good, no problem really
14:56dj-death: danvet: it's a single modifier : https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkGetImageDrmFormatModifierPropertiesEXT.html
14:56danvet: ah well
14:57mareko: danvet: what's the API function that calls Mesa to do the CCS transition?
14:57danvet: would need a pile more information to get apps to understand that with certain modifiers a resolve does defacto change the modifier
14:57danvet: mareko, I have no idea how this works in mesa, it's been like 10 years I've looked at mesa code in anger
14:57danvet: probably more like 15
14:57danvet: but intel has modifers and I think we do some resolves like this already
14:58dj-death: mareko: when the image is going to be used for something
14:58jekstrand: I'm completely missing what the questions are here
14:58dj-death: mareko: typical case is read the texture
14:58danvet: very vague memories that gallium had some resolve hook
14:58danvet: vk has it definitely as a fairly explicit thing
14:58jekstrand: Vulkan doesn't have a resolve hook as such
14:58danvet:made a fool of himself again
14:58danvet: jekstrand, isn't a "make this thing presentable" thing?
14:58jekstrand: What it has is a pipeline barrier where you transition to the PRESENT_SRC layout (for WSI extensiosn) or to VK_QUEUE_FAMILY_FOREIGN for modifiers.
14:59mareko: the gallium resolve hook isn't exposed to apps, it's only invoked by SwapBuffers internally
14:59jekstrand: The pipeline barrier takes the place of the resolve hook
14:59danvet: Kayden, probably knows how to make this work on gallium
14:59jekstrand: Again, I'm still missing what question we're trying to answer
14:59mareko: BTW I added the gallium resolve hook originally
15:00danvet: mareko, you might need it in more places for egl images with modifiers perhaps
15:00mareko: I wanted to add an EGL extension that would expose the resolve function, but then all window systems would have to be changed to use it
15:00danvet: jekstrand, it sounds like amdgpu needs modifiers to express the difference between "the CCS format we can render/sample" and "the CCS format we can scan out"
15:01danvet: mareko, so the scanout CCS format, can you still sample that?
15:01jekstrand: That seems like a perfectly reasonable use of modifiers
15:01danvet: or would you need an unresolve compute job for that again?
15:01jekstrand: mareko: To get from one to the other, is it an in-place transition or a blit?
15:02mareko: danvet: we can always sampler and render, we just need the resolve pass to make it displayable
15:02jekstrand: mareko: Or do you want to always just use the scan-out version when scan-out is possible?
15:02danvet: mareko, ok should work with modifiers I think
15:02danvet: mareko, I guess it's something like "resolve clear color"?
15:02mareko: jekstrand: in place
15:02danvet: just as an idea
15:02MrCooper: mareko: the resolve hook should also be called from glFlush with front-buffer rendering
15:02jekstrand: mareko: Ok, then it's basically exactly the same as what we have to do for CCS
15:03danvet: MrCooper, frontbuffer rendering into CCS is somewhat ugly
15:03MrCooper: mareko: generally, that hooks is precisely for cases like this, if it's not called in a case where it's needed, that should be fixed
15:03jekstrand: You would expose two modifiers: AMDGPU_FORMAT_MODIFIER_CCS_RENDER and AMDGPU_FORMAT_MODIFIER_CCS_DISPLAY or something like that
15:03danvet: -modesetting avoids rendering into multi-plane modifier formats
15:04jekstrand: If the image's modifier is _DISPLAY, you do the resolve in the resolve hook in gallium or in the pipeline barrier in Vulkan.
15:04daniels: mareko: I assume this is what your question on GitLab was about then :)
15:04mareko: danvet: there are 4 buffers per BO: color data, CCS meta buffer (only used by 3D), display CCS meta buffer (only used by display), non-display->display CCS translation table
15:05danvet: hm ok, but yeah should still map to modifiers without hiccups
15:06danvet: only thing is that for frontbuffer scanout to avoid upsetting -modesetting, make sure your display modifier has at least 2 planes
15:06danvet: 4 can work too, but I guess that's all internal layout which is fully fixed
15:07mareko: jekstrand: the resolve hook is only called for the window framebuffer in SwapBuffers, it's not called for any EGL images, which is what many window systems use (Android, ChromeOS, etc.)
15:08jekstrand: Yeah... EGL is kind-of busted in that regard
15:08jekstrand: There was a Google extension at one point
15:08jekstrand: But IIRC it was busted for some reason
15:08danvet: probably need to wire it up to glflush or so to make this work
15:08danvet: if we ever rendered into an eglimage
15:09mareko: I wish I could do it in the kernel before flips to avoid the window system madness
15:11mareko: since the Chrome browser uses EGL on ChromeOS, we would have to modify the browser code to execute the resolve operation.... how many more apps do we have to fix...
15:11danvet: why can't gallium keep track of eglimage's used as render target and call the resolve hook?
15:11jekstrand: Somehow this works for us in ChromeOS with modifiers
15:12jekstrand: krh likely knows how
15:12danvet: jekstrand, i965g doing the resolve for eglimage?
15:12danvet: maybe doesn't work with iris yet ...
15:12mareko: danvet: gallium doesn't know when a frame ends and which buffer is for scanout
15:12MrCooper: mareko: surely there must be some kind of flush between drawing to the EGL image and scanning it out
15:12MrCooper: that flush needs to call the resolve hook
15:12danvet: mareko, that's what modifiers are for
15:13danvet: all you have to do is render into the right layout
15:13danvet: which means for the displayable modifier, you shovel a compute resolve on top every time the eglimage might be read by something else
15:13danvet: which afaik (and I know little) is glflush time
15:14danvet: you _dont_ care whether it will be scanned out or not in radeonsi
15:14lynxeye: mareko: isn't the situation with EGL very easy to handle, because it's all hidden from the application? The compositor just tells you the preferred modifier behind the scenes, if it's compositing it'll get you the sampler optimal modifier, if it's going to a plane you get the scanout modifier
15:14lynxeye: Then when you see the scanout modifier, you just slap your compute shader on the winsys buffer on swapBuffers
15:15danvet: I think amdgpu is the last driver which doesn't have modifier support
15:15mareko: glFlush, not SwapBuffers (we already handle SwapBuffers)
15:15danvet: and we created these for this kind of stuff
15:16mareko: how does the compositor know which modifier is for sampler and scanout? and which modified is more optimal for 1440p and 4K?
15:16MrCooper: mareko: the hook should already be called from glFlush in this case, isn't it?
15:17lynxeye: mareko: it asks both the EGL implementation (via the EGL dmabuf modifier extension) and the scanout side (via a DRM property blob)
15:17MrCooper: if not, glamor is broken as well
15:18lynxeye: MrCooper: it's not. One of the bugs that keeps glamor from working on Vivante GC2000, where we need the resolve step. Never did get around to fixing this, as not a use-case I care about
15:19MrCooper: well that should be fixed then :)
15:19mareko: lynxeye: does the DRM property blob return the optimal modifier for a specific combination of width, height, and bpp?
15:20daniels: mareko: format/modifiers are always strictly paired, so that answers bpp
15:20lynxeye: mareko: No, it's just a list (with currently undefined ordering) of supported modifiers per format
15:21daniels: mareko: as for dimensions, when you want to create a buffer, you pass in a list (always a list) of modifiers in no particular preference order - KMS, EGL, Wayland, etc, all transit lists
15:21danvet: MrCooper, we might get out of glamor breakage with the multi-plane check for frontbuffer rendering
15:21danvet: if the glflush is indeed missing
15:21daniels: mareko: so the implementation (e.g. DRIImage::createImageWithModifiers) receives a format + list of modifiers + dimensions, and then the driver filters the list of acceptable modifiers based on its own optimality sort
15:21danvet: or maybe because i965g has it, and it's just missing from gallium
15:22daniels: mareko: the app/user doesn't try to pick a 'best' modifier, it just intersects the sets to come up with a list of _possible_ modifiers, and leaves it to the driver to pick the best one
15:22daniels: mareko: obviously once the BO is created, it only ever has one immutable modifier, just like format
15:23lynxeye: mareko: Right, as daniels says it's always a list. So as long as your EGL implementation can work out the best modifier for a dimension it get's to choose and you are done. The display though can't really specify a preference.
15:23danvet: it's supposed to be sorted
15:23danvet: the ugly part is if render and display disagree on what's best
15:23danvet: thus far no one came up with such kind of hw
15:24danvet: the only thing we have "oh sry that doesn't work for random hw limit, just pick the next one ok"
15:24daniels: danvet: it can't be sorted as soon as you intersect
15:24danvet: daniels, yeah that's what I mean
15:25danvet: so we just intersect and trust that "good for render/sample" is also good enough for everyone else
15:26mareko: our CCS is different for 1440p and 4K, so even if DAL exposed both modifiers, only one can be used depending on the image size, and Mesa will have to make that decision I guess
15:27lynxeye: daniels: you can still intersect by picking what is the first supported modifier in both preference sets, if they are ordered to have highest preference first
15:28daniels: lynxeye: but how do you define the relative priority between the two lists? and if you decimate the lists in the act of intersection, how do you decide how to interleave the lists?
15:28lynxeye: obviously you'll get different results depending on which way around you do the intersection, but I guess for most use-cases this would be good enough (TM)
15:28mareko: the last question is... should we expose all tiling modes as modifiers, or just 1 modifier for "best sample/render" and then multiple modifiers for the displayable options?
15:28daniels: mareko: yep - expose both modifiers, Mesa will allocate the only one which can work
15:28daniels: mareko: all tiling modes where you feasibly share across processes
15:29daniels: mareko: if you have some kind of elaborate cubemap-optimised tiling mode, you don't need to worry about encoding that as a modifier, because a compositor or scanout engine is never going to see that
15:29daniels: mareko: but anything where you would exchange between client <-> compositor, or GPU <-> display, or media <-> GPU/display, should be encoded as a modifier
15:29daniels: you can look at jajones's series in particular to see how NV has parameterised their modifiers, which might be useful
15:30danvet: I thought there's also a series from bnieuwenhuizen or someone on dri-devel already with a basic proposal
15:30danvet: but yeah the nv one is probably most interesting
15:30danvet: that's the result of 2+ years of bikeshedding with nvidia people at khronos
15:31danvet: they really wanted to make sure this stuff works before agreeing to add it to the linux vk standard stuff
15:32bnieuwenhuizen: mareko: danvet: https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/-/commits/modifiers-planes/ (https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/-/commit/baf63d530de5e15d63337d480700d82029d6e415)
15:32mareko: thanks, that's was a productive discussion
15:33bnieuwenhuizen: though I'm open to just throwing the plain tiling mode as a parameter in there instead of having this largish list
15:33bnieuwenhuizen: my main TODO thing is deailing with the fact nobody can/will set EXPLICIT_FLUSH on images with modifiers :(
15:34MrCooper: mareko: are two different modifiers really needed for those two kinds of CCS though? It sounds like the kind is implicit from the buffer dimensions
15:34danvet: mareko, yeah if it's hard limit and not just an optimization, you can encode size/bpp dependent stuff into the same modifier
15:35danvet: but if both formats work, but one is better for smaller buffers, and the other better for large buffers, then maybe 2 modifiers is better
15:35danvet: I guess main takeaway from the modifier discussion with nv is to not encode everything that's possible
15:35mareko: MrCooper: 4K changes the compression parameters slightly, so modifiers need to be aware of the difference (if we don't send that info through private BO metadata)
15:36danvet: but just the stuff you need for sharing
15:36danvet: mareko, and you can't derive those optimal parameters from the information you already have?
15:36danvet: which should be drm_fourcc, width, height, pitch
15:37danvet: it's a tradeoff ofc, since if the cut-off is different on different gpus you want to make this maybe explicit
15:37danvet: so that sharing between unlike gpus works
15:37danvet: all would advertise all the formats they can consume
15:38danvet: but would pick the one layout that's best for them to render into
15:38danvet: depending upon size or whatever
15:39mareko: bnieuwenhuizen: I'll make the explicit flush implicit by tracking whether an image with a scanout modifier has been rendered to, and if so, execute the resolve pass in flush
15:40bnieuwenhuizen: yeah, that is pretty much what I was coding up
15:42bnieuwenhuizen: okay, I'll send out my modifier stuff as WIP later today (gotta work away that wip patch) and then you're probably faster coding up the flush stuff
15:44daniels: mareko: excited to see it happening! will be cool to have protected-surface support as well
15:47mareko: bnieuwenhuizen: then in theory we don't need the flush_resource callback at all
15:49bnieuwenhuizen: correct, though I'm not sure if it is better to rely on flush_resource for EXPLICIT_FLUSH to avoid accidental flushes
15:49Kayden: sounds like an interesting change
16:02imirkin: wrong window.
16:34mareko: instead of guessing when the do the resolve pass, it would be much better to have acquire/release semantics for shared EGL images just like OpenCL
16:36mareko: EGL_EXT_image_flush_external was essentially proper acquire/release API for EGL
16:43bbrezillon: cmarcelo: you probably noticed, but I pushed a new version of the barrier stuff
16:47bnieuwenhuizen: mareko: for the acquire/release semantics I'm wondering if going forward it makes sense to build that on GL memory objects? Those have fence integration.
16:55MrCooper: danvet: doing what you suggested in atomic check looks promising so far with mutter, but triggers the WARN(ret, "atomic remove_fb failed with %i\n", ret) in drm_framebuffer_remove while running IGT (and seems to break some of the tests as well)
17:05alyssa: Before I reinvent the wheel, is there a shader_info for a fragment shader having side effects?
17:06alyssa: writes_memory || uses_discard, I guess
17:06alyssa: also writes_depth || stencil
17:07alyssa: Our hw can skip shader execution while still doing depth/stencil updates.
17:08alyssa: Which... ought to help performance when colourmask = 0x0. I'd hope.
17:11alyssa: (Actually, I think it's just those 4 + demote, easier than I thought :) )
17:16danvet: MrCooper, hm that shouldnt happen
17:16danvet: do you reject it with -EINVAL?
17:16danvet: it ignores all other errors
17:16danvet: also maybe check with some printk that you're getting into the disable_crtcs path in atomic_remove_fb
17:17danvet: wrt igt, it could be that we have some intel-ism in there, it happens :-(
17:17danvet: but the WARN_ON is a bug, so that needs to go away first
17:19vsyrjala: danvet: wrt. the earlier active 0->1 thing, encoder .compute_config() can also fail due to dpcd changes/etc.
17:20vsyrjala: i guess there is active_changed vs. mode_changed so we could perhaps skip all that stuff for pure active changes
17:20danvet: well but then people might get annoyed if dpms off/on doesn't fix their screen anymore
17:20danvet: maybe should be more just a strong recommendation, dunno
17:21vsyrjala: one might hope the link gets then marked as bad and forces a new modeset with full recompute
17:23vsyrjala: but it's perhaps a bit theoretical since it'd need the display to get swapped to another w/o userspace disabling it in middle while it';s disconnected
17:24vsyrjala: even worse if it happens during resume since then we light up nothing since the whole commit fails
17:37mareko: bnieuwenhuizen: the major targets are X (EGL), Wayland (EGL), ChromeOS (EGL), Android (EGL), so we have the bad API everywhere
17:41mareko: bnieuwenhuizen: indeed GL external objects and GL back buffers can set EXPLICIT_FLUSH to get explicit flushing
17:43mareko: I think the main benefit from modifiers is that we can finally get the correct scanout/non-scanout flag
18:45mareko: bnieuwenhuizen: it would be better to just put swizzle_mode into the modifier instead of multiple tiling flags
18:47bnieuwenhuizen: mareko: so something like swizzle_mode as parameter and the set of enabled metadata planes (none, DCC, displayable DCC) as separate modifiers?
18:54mareko: bnieuwenhuizen: I think metadata should be part of the modifier, no separate buffers
18:55pmoreau: EdB: the computeinfo MR was merged, as well as the fix for C-linkage :-)
18:55EdB: nice :)
18:57bnieuwenhuizen: mareko: yeah, completely agree on the bo metadata not being used for images with modifiers
18:57bnieuwenhuizen: mareko: however, how many of them should be "fixed" and how may should we add as parameters? e.g. I was thinking about assuming scanout = 1 unless DCC is enabled
18:57bnieuwenhuizen: https://hastebin.com/haxijiwata.http ?
18:58imirkin: gitlab.fd.o appears to have gone *poof*
19:02mareko: bnieuwenhuizen: same as internal amdgpu tiling_flags, but no offset/pitch, no scanout flag (I think), and for DCN, we need: rb_aligned, pipe_aligned, max_uncompressed_block_size, max_compressed_block_size, independent_64B(and 128B)_blocks
19:02EdB: imirkin: seems to work o here
19:03imirkin: yep, it's back
19:03imirkin: was returning 503's for a while
19:04bnieuwenhuizen: mareko: also "no separate buffers", do you mean you want everything in a single DRM plane, or only talking about the bo metatda buffer? (I can do that but having >1 planes in the format happens to be the heuristic X uses to not do frontbuffer rendering so that is going to get messy there)
19:07mareko: bnieuwenhuizen: everything in a single DRM plane
19:09bnieuwenhuizen: That means we have to have code in the kernel for calculating DCC offset + pitch though. I was planning to anyway for verification but FYI
19:09mareko: bnieuwenhuizen: we can pass the offset and pitch via the private BO metadata
19:10bnieuwenhuizen: I don't believe we should be using the private BO metadata for planes with modifiers
19:11bnieuwenhuizen: I think the better solution with intent is to have multiple DRM planes and at import time combine that into one si_texture
19:11airlied: daniels, anholt : getting some issues rebuilding the x86_test-gl container, looks like networking errors
19:11airlied: can't git clone virglrenderer, failed to get some of the VK-GL-CTS bits
19:11airlied: just in case there is something ongoing
19:12bnieuwenhuizen: mareko: and the Vulkan modifier API is kinda incompatible with doing things via private BO metadata
19:12mareko: bnieuwenhuizen: they can be derived from the pitch, height, bpp, pipe_aligned, rb_aligned, but it's unnecessary to recompute them in every drier
19:15danvet: mareko, private bo metadata is really not how modifiers are supposed to work ...
19:15mareko: bnieuwenhuizen: for vulkan interop, shared MSAA and Z/S are required too, which modifiers don't cover
19:15danvet: it's meant to replace all that, not augment it
19:15danvet: so generally that means multi-plane most likely
19:15danvet: you can still make sure that all the planes are in the same dma-buf/bo
19:16danvet: multi-plane doesn't mean multi-buffer
19:16danvet: so essentially on the kms side if you still look at bo private data on the kms side with modifiers, you're doing it wrong
19:17danvet: modifiers are supposed to entirely overwrite all that (together with the other data you get from addfb)
19:17bnieuwenhuizen: mareko: I meant the Vulkan modifier extenion. I agree it ain't going to work for the full Vulkan interop
19:18danvet: so that means as bnieuwenhuizen said you need to recompute all the derived stuff in all the projects
19:19bnieuwenhuizen: yeah, we need either multiple planes (in which case I argue for forcing them to be in the same dma-buf), or fix the layout and rederive everywhere (and define a new modifier if we ever change the layout given parameters)
19:20daniels: imirkin, airlied: yes, I pushed a totally harmless change which wouldn't result in any downtime. had totally forgotten that I'd previously pushed but not deployed downtime-requiring changes, leaving them there to wait for the next downtime window :\
19:20bnieuwenhuizen: honestly I think some amount of rederivation is going to happen anyway if we want a secure API to avoid apps receiving poisoned offset/pitches that cause reads outsize of the buffer (into possible other buffers which might be an information leak)
19:21danvet: bnieuwenhuizen, mareko another reason for why you really want compressed buffers to be multi-plane is the -modesetting frontbuffer logic
19:21danvet: it skips any multi-plane formats for frontbuffer rendering
19:21danvet: since generally the result is terrible
19:21danvet: it's even worse with compression
19:21bnieuwenhuizen: yeah, I called that out above
19:21danvet: so pretty much for that alone ccs needs to be multi-plane
19:22Karyon: h7c0 . .cgzfbcvnzb n sec v ccfcfdsssssssssssssssstyrdxyc88888888888888888888888888888888888888888888i+98poiuytrewqqaf15T/
19:22danvet: we might have botched that with the afbc codes, not sure
19:22Karyon: .myhy[y 411dyr4\[r\5dylizr6 s
19:22tango_: Karyon: what
19:22bnieuwenhuizen: which is sad becaause I like the simplicity of just fixing the layout
19:22mareko: tango_: probably a dog or a cat
19:22danvet: bnieuwenhuizen, well you can still reject anything that has the wrong offsets and pitches for the aux plane
19:22tango_: mareko: or a credit cart swipe or a yubikey ;-)
19:23bnieuwenhuizen: danvet: the extra planes end up just gnarly in mesa though. I've had to implement dummy textures because it imports each plane separately ...
19:24mareko: bnieuwenhuizen: I estimate we'll need 5 planes in the future, and 6 planes if we need to store the clear color there too
19:25bnieuwenhuizen: alternative solution: put everything in a single plane, fix the layout and add a dummy plane :P
19:25mareko: bnieuwenhuizen: for more modifier bits?
19:26bnieuwenhuizen: no just to have >1 plane to make X not do frontbuffer rendering
19:26mareko: sounds good
19:27bnieuwenhuizen: what would we need 6 for? I only get to color+dcc+displayable dcc + clear color + reswizzle map (which we can technically keep in the driver and not share)
19:28mareko: we need the reswizzle map if the allocator is a different process or library (Android works like that)
19:29mareko: it allocates via gralloc and then imports everything in GL
19:29bnieuwenhuizen: well, we can derive it too :P
19:31danvet: bnieuwenhuizen, uh, dunno how iris/anv solved that ...
19:31mareko: bnieuwenhuizen: we already derive it in Mesa, so there is no work there
19:31danvet: iirc we have exactly all the same constraints, but maybe I'm wrong
19:31bnieuwenhuizen: danvet: you mean the more than 4 planes?
19:31danvet: bnieuwenhuizen, uh no that would be first
19:32danvet: I think addfb2 is atm limited to 4
19:32bnieuwenhuizen: oh you mean the import gnarlyness in mesa, nvm me
19:32danvet: wrt the legacy metadata stuff at least for addfb best imo is to capture the metadata once and convert to modifiers at addfb time
19:32danvet: then the entire kms driver only ever deals in modifiers for everything
19:32mareko: I can imagine 6 planes for future hw, but that's all I can say
19:33danvet: same might be useful in other places so there's no confusion
19:33danvet: since modifiers are supposed to overwrite any legacy bo data you have to make sure cross device sharing works correctly
19:34danvet: I think intel is atm at 3 for color data + compression bits + clear color
19:34danvet: for the stuff we want to share across devices/process/with kms
19:34bnieuwenhuizen: yeah, we're +2 because of compression bits that need a reswizzle between render and display
19:37mareko: bnieuwenhuizen: I'm slightly concerned about putting chip_external_rev into the modifier, because DCN actually doesn't care about it and supports any GPU from that generation and sometimes even previous generation, which means DAL will expose 64K modifiers or so
19:39bnieuwenhuizen: mareko: are there no DCC differences (except for the alignment thing due to rb_count == 1 for raven2)?
19:40bnieuwenhuizen: also, I was thinking of exposing only the own chip_rev as nobody is going to have >1 Raven GPU in any useful way
19:41bnieuwenhuizen: I guess that might be more messy with navi14
19:41mareko: bnieuwenhuizen: DCC has very few differences between gfx9 and gfx10
19:42bnieuwenhuizen: well there is the constant color thing right?
19:42mareko: DCN in Raven1 doesn't have constant encoding, gfx10 added independent 128B blocks, that's about it
19:43bnieuwenhuizen: okay, so unaligned, everything in GFX9 is compatible with GFX9 and everything in GFX10 is compatible with GFX10?
19:43bnieuwenhuizen: or anything RV is compatible with RV and NV compatible with NV
19:44mareko: bnieuwenhuizen: Raven2 and Renoir might be able to read DCC from Navi14
19:45bnieuwenhuizen: hmm, that gets complicated anyway, because Navi14 only supports rendering to _R_X with DCC right, and I don't think Raven2/Renoir can read that?
19:45danvet: yeah if you end up with tons of duped modifiers because the chip id is in there then maybe that shouldn't be in there
19:45bnieuwenhuizen: danvet: turns out that finding the right versioning scheme for the HW compression format is hard ...
19:46danvet: oh sure
19:46MrBIOS: imirkin you around?
19:46danvet: and if the oddball aliasing results, I dont think that's horrible
19:46danvet: just if you alias them all just because, maybe a bit much
19:46imirkin: MrBIOS: yes
19:46MrBIOS: imirkin I was reading some old IRC log history and noticed you were active in #haiku a while back :)
19:47imirkin: never ended up doing anything
19:47imirkin: i wanted to get them over to the DRI api
19:47bnieuwenhuizen: danvet: yeah, for my initial idea we're just dealing with APU's so no sense exposing any chip id except your own
19:47MrBIOS: that was my next question
19:48wm4: can I ask a naive question about the nature of DRM buffer modifiers? why are they needed?
19:49danvet: bnieuwenhuizen, well you might still get yourself into a corner ...
19:50airlied: wm4: so that different clients of the kernel can negotiate common memory layouts for optimal sharing
19:50danvet: and amd at least should know when they rev'ed the compressor
19:51cmarcelo: bbrezillon: will get to it between today and tomorrow. thanks!
19:51wm4: airlied: but isn't that already the pixel format (fourcc)? and if it's some weird vendor specific thing, what's the use of exposing a pixel format at all?
19:51airlied: wm4: no pixel formats don't cover tiling and/or compression
19:51danvet: wm4, yeah just normal linear buffers is extremely slow for gpus to render
19:52danvet: and wastes bandwidth for most stuff that's on the screen without compression
19:52danvet: so if you have multiple gpus you need to figure out something better that all of them understand
19:52danvet: other example is in fullscreen mode you want to render something that can be display directly
19:52danvet: but in windowed mode you want something that's most efficient for rendering only, screw the display
19:53danvet: on most modern gpus these two formats arent the same, the most efficient render format cannot be scanned out directly
19:53wm4: well I understand that GPUs represent bitmaps in a weird way, but I don't get why normal userspace software should know about it; can't it just be passed along in an opaque way within the kernel?
19:54mareko: bnieuwenhuizen: you may be right, maybe Renoir can't read Navi14 tiling even though they are both DCN2 I believe, alright let's not worry about cross-gen sharing
19:55mareko: bnieuwenhuizen: also Navi1x is the only family that can't do a user-specified pitch
19:55mareko: in the 3D engine
19:56mattst88: wm4: the point danvet made is important: the most efficient render format often cannot be scanned out directly, so it's important that there's some negotiation to give the efficient formats when we know that buffer won't be scanned out, for example
19:56wm4: hm ok
19:56wm4: but if it's all vendor specific (not sure if it is, but maybe), how can software make any proper decisions at all?
19:56bnieuwenhuizen: mareko: so for DCC versioning can we use the versioning of the underlying tile mode + constant_encode?
19:57wm4: unless the point of exposing these formats to userspace is for userspace driver parts or such
19:57mareko: bnieuwenhuizen: for completeness, gfx10 S,D,R with 4bpp is the same micro tiling as gfx9 S
19:57mareko: there is different macro tiling though
19:57bnieuwenhuizen: b = bits?
19:58mareko: 4 bytes per pixel
19:59mattst88: wm4: I'm not an expert on this, but I think it's not entirely vendor-specific. e.g., it should be possible for a dGPU to render to some format that the integrated GPU can scan out in a prime setup (but maybe that's wishful thinking, not sure)
20:01mareko: bnieuwenhuizen: we might need enum radeon_family in there just for inter-process sharing of tile modes
20:02wm4: if a userspace program is software rendering, should it attempt to avoid linear framebuffers, and use any of the funny tiling modes?
20:05danvet: wm4, normal apps don't care
20:05danvet: this is for low-level userspace like compositors
20:06danvet: or maybe media frameworks like gstreamer so they can find a common format between the gpu and the camera and the video codec on a SoC
20:06wm4: yeah, normal applications don't use DRM directly
20:06wm4: but in mpv we have support for video output via software rendering and DRM, for example
20:06danvet: it's not just drm, there's extensions for modifier support in vulkan, EGL, ...
20:06wm4: of course this uses a slow linear framebuffer
20:06danvet: libva too
20:06danvet: for software rendering it doesn't matter much because a) dead slow anyway b) giantic cpu caches paper over the inefficiency
20:07wm4: apart from such "experiments", libva and vulkan explicitly expose these DRM modifiers, or they can't interoperate
20:07wm4: so far it's apparently just a matter of passing them through
20:07bnieuwenhuizen: wm4: my advise for CPU randering/decode to be consumed by the GPU is used the GPU driver for the upload instead of using dma-bufs yourself. You never know what uncacheable memory you're going to get otherwise
20:07danvet: yeah so mpv I guess is one of the media frameworks that need to understand this stuff
20:08danvet: wm4, libva is missing some interfaces, you can't yet list supported modifiers
20:08danvet: once that's there you need to grab the list of supported modifiers from both vk and libva and then find the common one to use
20:09danvet: but vk, egl, gbm, kms already have lists of supported modifiers (I think at least)
20:10bnieuwenhuizen: danvet: gbm doesn't have the list either I think?
20:12wm4: also I sort of wish I understood how dma-buf works at all (is it essentially just mapped physical memory from a PCIe memory mapped region? does such a thing even exist? etc.)
20:12danvet: bnieuwenhuizen, hm I guess you get it from egl or something?
20:13danvet: wm4, atm it's just system memory, mostly
20:13danvet: amdgpu just landed support for access directly across the pcie bus between devices
20:13danvet: but it also can be in vram on the chip if it's not shared
20:13danvet: and it can be special contiguous memory on some SoC
20:14bnieuwenhuizen: danvet: so on allocation the app can specify a list. But on import I'm not sure what people use besides the luck of often being very similar to EGL
20:14danvet: it's all a bit wobbly
20:14danvet: bnieuwenhuizen, yeah I guess the assumption is that anything egl produces gbm can eat
20:14bnieuwenhuizen: danvet: stops being funny in VK compositors though :)
20:14danvet: bnieuwenhuizen, well vk has it's own list too I thought?
20:15bnieuwenhuizen: danvet: which might differ from EGL, while in practice EGL == GBM
20:15daniels: bnieuwenhuizen: for GBM you get the list from KMS
20:15danvet: and not sure what you need gbm for in a vk compositor
20:15danvet: daniels, I thought you need gbm for headless
20:15danvet: rendering I mean
20:15daniels: which tells you exactly what you can scan out from, then GBM intersects that list with what you can render to
20:16danvet: which might or might not have anything to do with your display kms driver
20:16daniels: danvet: then just query EGL
20:16danvet: oh so gbm intersects internally?
20:16danvet:no idea, as usual
20:16bnieuwenhuizen: danvet: on allocation everybody does
20:16bnieuwenhuizen: only one party gets to allocate though
20:16wm4: danvet: if a dmabuf is simply located on the GPU's vram and not shared, how exactly can you access the dmabuf's contents anyway?
20:17bnieuwenhuizen: wm4: VRAM can be visible from the CPU
20:18danvet: or we move it to system memory
20:18bnieuwenhuizen: depends a bit on PCI setup how much of it
20:18danvet: or there's peer to peer access from one device to the other
20:18daniels: yes, alloc intersects - conceptually on alloc you're telling it, 'here's what I can accept, choose between them'
20:18daniels: the intersect is mandatory because the driver obviously can't choose to render something it can't reason about
20:27emersion: > and not sure what you need gbm for in a vk compositor
20:28emersion: vk implementations (radv, anv) doesn't have the required extension to deal with modifiers, yet
20:28emersion: … this results in compositors using private mesa APIs to say "i want to scan-out this buffer"
20:37bnieuwenhuizen: I think the other part is even with modifiers Vulkan will not manage a drm bo handle for you so if you use plain KMS you'll use something to manage the handle refcounting, presumably gbm?
20:38emersion: one can export to a DMA-BUF and then drmPrimeFDToHandle
20:38bnieuwenhuizen: (which is agruably overkill, but I've seen people recommend not doing it yourself)
20:38bnieuwenhuizen: emersion: right, but when do you close that handle?
20:39emersion: ah, no
20:39emersion: weird that i'm not hitting some kind of issue related to this
20:40bnieuwenhuizen: there is a close ioctl, but since multiple imports can return the same handle ...
20:40emersion: i guess importing twice the same DMA-BUF results in the same handle?
20:42emersion: why doesn't libdrm have drmCloseHandle?
22:41airlied: jekstrand: I think dx also has something about samplemask being valid without multisampling
22:41airlied: since I just fixed llvmpipe for GL behaviour, but I suspect it'll be different
22:41jekstrand: airlied: Could be
22:42jekstrand: airlied: I don't care much one way or the other. I just care about making sure things are properly noted in the spec and that decisions like this are made by the spec committe and not the CTS committee.
22:43airlied: jekstrand: indeed
22:43jenatali: FWIW: https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#17.17%20SampleMask
22:45jekstrand: jenatali: Thanks!
22:46jekstrand: airlied: In all liklihood, given that our hardware is designed for DX, we're going out-of-our-way to ignore it on GL.
22:46jekstrand: I've not looked at it in detail though
22:49Kayden: st/mesa passes 0xFFFFFFFF in pipe->set_sample_mask() calls
22:50jekstrand: Kayden: This is about gl_SampleMask out in non-MSAA fragment shaders.
22:50jekstrand: In GL, it's ignored. In D3D and, aparently, Vulkan, it's not.
22:51Kayden: Yeah, GLSL says "If a shader does not statically assign a value to gl_SampleMask, the sample mask has no effect on the processing of a fragment."
22:51jekstrand: Vulkan says similar things. The real question is about single-sampled
22:51jekstrand: We're aparently failing some CTS tests because of this
22:52jekstrand: Kayden: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3016 if you feel like taking a look
22:55Kayden: hmm, I guess I'm not convinced that writing gl_SampleMask in a non-MSAA fragment shader has no effect in GL.
22:56Kayden: bits 31:1 are irrelevant in per-fragment mode, and it seems like bit 0 would still apply (possibly disabling the fragment if it's set to 0)
22:57Kayden: at least, it looks that way from my cursory reading of the GL 4.5 spec
22:57jekstrand: Sounds like we need a piglit test. :)
22:59Kayden: well, and we can run fragment shaders in per-sample mode or per-fragment, even with MSAA framebuffers..
23:03Kayden: left a comment with a bit of spec text
23:33bnieuwenhuizen: jekstrand: FWIW I believe radv and radeonsi are doing the same thing here (export samplemask even without MSAA)
23:34bnieuwenhuizen: (unless the HW silently ignores it ...)
23:38jekstrand: bnieuwenhuizen: May just be an ANV bug. I'm just trying to make sure.
23:39airlied: there are gles test against it
23:39airlied: or maybe deqp
23:42imirkin: i definitely remember tests being picky about gl_SampleMaskIn in per-sample mode
23:42imirkin: on nvidia, we just have the coverage, so have to & it with the sample id
23:49bnieuwenhuizen: jekstrand: if I wanted to add big internal shaders to a mesa driver, what would be the best way to do that? Still use the nir builder? Try to get some GLSL->SPIRV pipeline going? (Would it be reasonable to shell out to some glslang somehow?)
23:51airlied: bnieuwenhuizen: anything that isn't the last one :-P
23:52airlied: writing them in NIR might not be the worst (messy but not horrible), I suppose for libclc we ended up just writing spirv to a library file
23:52bnieuwenhuizen: airlied: SPIR-v to a library file seems like it'd be horribly messy wrt git though?
23:53airlied: yeah, we aren't storing the clc one in git, and it's an external provided binary
23:53bnieuwenhuizen: I wonder if it makes sense to have something like source+SPIRV checked in and have CI check the correspondence? that that seems hackish
23:53airlied: maybe glslang callout isn't the worst answer, depends on what the value is I suppose
23:54airlied: I wonder how bad a GLSL->SPIRV in mesa would be
23:54bnieuwenhuizen: airlied: just mulling over what it'd take to do raytracing acceleration structure creation
23:54airlied: I think idr looked at it once
23:54bnieuwenhuizen: which seems like it'd be large enough to want a proper source language
23:56airlied: just steal the nir -> spirv from zink :-P
23:56airlied: and do glsl->nir->spirv
23:57airlied: I suppose you could do glsl->nir internally
23:58airlied: without touching spirv
23:59karolherbst: bnieuwenhuizen: we already have the problem with the soft fp64 lib
23:59karolherbst: I think drivers are just responsible for compiling the glsl code to nir and use that
23:59karolherbst: or so?
23:59karolherbst: never looked on how that's actually used
23:59karolherbst: but we have this already
23:59bnieuwenhuizen: airlied: glsl->nir->serialize nir might actually be a reasonable idea