00:00 Lyude: airlied: ok! it should finally be sent :), the last step took a bit because I realized dim relies on mutt, and there's no way to control evolution over cli like you can with mutt so I had to workaround that lol
00:03 airlied: Lyude: it should write a pullreq.txt to your homedir
00:03 airlied: I just paste that into my email
00:04 Lyude: airlied: ahhh, well I actually just figured out how to get evolution to start up from cli w/ mutt compatible arguments :)
00:04 airlied: oh Ive hacked mine
00:04 Lyude: TIL xdg-email is a thing
00:04 airlied: echo $DIM_MUA
00:04 airlied: /home/airlied/.local/bin/dim_pull
00:04 airlied: cat $4 > ~/pullreq.txt
00:04 airlied: is what is in that script
00:04 Lyude: hehe
00:45 Lyude: airlied: did the pull request work btw?
00:48 airlied: Lyude: yup thanks!
01:15 airlied: hwentlan, agd5f : can you make sure someone looks at Mario's MBP regression fix before 5.6
01:23 keithp: jekstrand: that took more time that I would have liked
01:26 jekstrand: keithp: :/
01:26 keithp: but, appears to be working. Now to let the code work with legacy modesetting too
01:29 keithp: jekstrand: bonus feature, no blocking for the first frame now
01:49 jekstrand: keithp: That sounds like a feature
01:50 keithp: jekstrand: yup
01:50 keithp: argh. now getting EINVAL when attempting to set DPMS state
01:50 keithp: sigh
01:52 jekstrand: i915 has a feature in exactly 1 file that, if you build your kernel with the right config option, it tells you where the EINVAL comes from.
01:52 jekstrand: I wish the kernel did that for all error codes in all files.
01:53 keithp: heh
01:53 jekstrand: At least for all EINVAL
01:58 keithp: looks like I'm not supposed to set the DPMS property in atomic
01:58 keithp: ok
02:08 jekstrand: daniels: Things that would make zwp_linux_explicit_synchronization_v1 massively less painful:
02:09 jekstrand: 1. Add a set_acquire_immediate request so we don't always need an in-fence
02:10 jekstrand: 2. Drop the absurd restriction that you can only have one zwp_linux_surface_synchronization_v1 per surface at a time.
02:11 jekstrand: The second one is pointless because the surface_synchronization only has requests so the only cost to the compositor to have more than one is that it now has to keep a list for tear-down rather than a pointer. Big deal.
02:12 jekstrand: It's also really painful because in Vulkan, you have to re-create the swapchain every time you resize and so now we have to reference count them (I've not implemented that yet) to ensure that we only have one. That's a pile of pain on the client side for the cost of a single extra pointer and a list walk on the server side. Seems like the wrong trade-off to me.
02:13 jekstrand: If we were just talking about new messages on the wl_surface, that would be another thing. But having a second object I have to manage AND guarantee we only ever have one of. That's just mean.
02:13 jekstrand: I might be rev the spec just to solve the second issue.
02:51 keithp: jekstrand: yay! merge request submitted...
02:51 keithp: jekstrand: what was the plan after that?
05:11 jekstrand: keithp: sync_file
05:24 keithp: jekstrand: right
05:24 keithp: should be easier than with a window system :-)
05:26 jekstrand: If not for the stupid bits, the Wayland extension was pretty easy.
05:28 keithp: jekstrand: your long email post talks about some kernel additions to auto-generate sync file?
05:30 jekstrand: keithp: I want to completely rip implicit sync out of the i915 uapi
05:30 jekstrand: Or at least rip Vulkan's usage of it out
05:30 jekstrand: keithp: In order to do that, I need to somehow deal with the few implicit bits remaining on the system
05:31 jekstrand: keithp: That's what this is for: https://lists.freedesktop.org/archives/dri-devel/2020-March/258833.html
05:32 jekstrand: keithp: Even longer term, I'm hoping we can substantially rework the i915 UAPI to completely separate residency management, command submission, and synchronization.
05:32 krh: jekstrand: when you get the ocean boiling, let me know, I've got some noodles to cook
05:32 jekstrand: keithp: Short version: No more lists of buffers passed into execbuffer2; it becomes O(1)
05:33 anholt_: jekstrand: that intro message sure sounds great!
05:33 jekstrand: anholt_: ?
05:34 anholt_: your dma-buf api extension
05:34 keithp: jekstrand: I thought that was well underway though; with separate address spaces for each client, you don't need relocations anymore
05:34 anholt_: patch 3/3, not the intro I guess
05:35 jekstrand: keithp: We don't
05:35 anholt_: keithp: you don't need the relocations, but you need the buffer lists.
05:35 jekstrand: keithp: But we're still passing lists of buffers and those lists can get long.
05:36 keithp: for implicit sync? Or other?
05:36 jekstrand: keithp: For implicit sync and residency management
05:36 jekstrand: keithp: IMO, neither of which belongs in execbuf anymore
05:36 krh: jekstrand: taking out cache management too?
05:36 jekstrand: krh: What cache management?
05:37 keithp: oh, because of eviction under memory pressure
05:37 krh: render cache flushing, sampler cache invalidation
05:37 keithp: how can you manage residency in physical memory from user space?
05:38 jekstrand: krh: Oh, we nuked domains a long time ago.
05:38 keithp: Or do you have working virtual memory for the GPU now?
05:38 krh: gpu caches that i915 have managed (by flushing and invalidating everything between batches)
05:38 jekstrand: keithp: We've had per-process page tables for a long time now. It's been working since Broadwell.
05:39 keithp: yes, but page faulting from the GPU?
05:39 jekstrand: keithp: Ha! No.
05:39 krh: you can swap in the null page
05:39 jekstrand: keithp: The kernel has to ensure everything for a process is accessible.
05:40 keithp: jekstrand: that either means 'everything', or 'we have to know which pages the ring might touch'
05:40 jekstrand: On integrated graphics, that means "not in swap"
05:40 jekstrand: keithp: Yup, it means everything
05:40 keithp: that's harsh
05:41 jekstrand: For integrated graphics, it means, make sure nothing is in swap. For discrete, you can have some stuff in VRAM and some stuff in system RAM and it's ok. Again, no swap.
05:41 airlied: do execbuffer fail with -EFAULT or something if everything isn't in place?
05:41 airlied: then userspace has to call another ioctl to make it's stuff resident
05:41 jekstrand: keithp: But given that most apps are pretty good about keeping their memory usage down, you're probably touching every single GPU resource you own several times a second. If you're swapping, you're dead anyway.
05:41 keithp: that sounds like penalizing applications which put a lot of things in GPU space that aren't active
05:42 keithp: like, say, browsers
05:42 krh: like... compositors with a lot of windows
05:42 keithp: or, perhaps, X servers
05:42 jekstrand: Besides, soon we're going to have teribytes of non-volatile RAM and no one will care. :P
05:42 keithp: yeah
05:42 jekstrand: keithp: Possibly.
05:43 keithp: I'd say you should wait until you've got page faulting on the GPU before doing that
05:43 keithp: it's been promised for a long time; surely it will happen soon
05:43 airlied: pagefaulting still won't save you
05:43 jekstrand: Pagefaulting is going to save us from everything!
05:43 jekstrand: Or so I'm told
05:43 krh: I think most regular apps will work fine under this scheme
05:43 airlied: it'll just make it have more overheads, because it'll be page based :-P
05:43 keithp: airlied: sure, it means you can put GPU objects in swap and still run the ring without caring
05:44 krh: and the ones that don't (compositors, browsers) can be expected to do more hands-on management
05:44 jekstrand: Yup
05:44 airlied:wonders if we could do better than just following in WDDM trail
05:44 krh: chrome already handles oom by killing tabs
05:44 airlied: like for instance the Linux nvidia driver based on their windows driver, never evicts things from VRAM
05:44 keithp: in the 'usual' case, you'll be all in-memory, and things will be fine. When you run that giant asciidoctor-pdf run to generate vkspec.pdf, you will be able to get memory for that while the screen is idle
05:44 krh: which frees up all resources for that tab and reloads when you switch back (swap to the cloud!)
05:45 airlied:gets the impression that one resource hog could pretty block forward progress for everyone with WDDM
05:45 keithp: sounds like depending on reasonable user space behaviour for system performance
05:46 airlied: repeat execbuffer/page my stuff in a tight loop
05:46 jekstrand: keithp: A GL driver can always reduce it's working set to only things that have been accessed recently.
05:46 keithp: (has anyone else here tried to build vkspec.pdf recently? got to 47G before I gave up)
05:46 krh: keithp: ouch
05:46 jekstrand: We wouldn't be getting rid of the ability to run with a subset. It's just more manually managed.
05:46 jekstrand: keithp: I never build the PDFs
05:46 keithp: jekstrand: you're thrashing the working set *just in case* someone else wants memory; that doesn't seem great
05:47 jekstrand: keithp: Right now, we're thrashing it every execbuf
05:47 keithp: I'd rather give something with a global view control over what's in physical memory
05:47 keithp: no, not when your working set fits in memory
05:48 airlied: working set as opposed to all your VMA usage
05:48 keithp: you're walking a long list of buffers checking to make sure they're still in memory
05:48 keithp: exactly
05:48 jekstrand: keithp: I think you may be misunderstanding the API. We've got patches out for a VM_BIND ioctl which binds ranges of BOs to the GPU address space. You can always unbind the stuff you don't need at the moment.
05:48 jekstrand: So it's not required to always be all
05:48 keithp: so unbound stuff can still be in GPU space?
05:48 jekstrand: I may not have made that clear
05:49 jekstrand: Sure, the kernel can leave it in VRAM if it wants
05:49 keithp: yes, that was not clear at all :-)
05:49 keithp: you just want apps to tell the kernel what objects are in their working set incrementally, instead of at each batch buffer
05:49 jekstrand: Sorry, for most apps, the working set will always be "all". Your favorite app isn't "most apps" :-P
05:50 keithp: firefox holds a *lot* of data in buffers
05:50 jekstrand: keithp: broad strokes, yes.
05:50 keithp: ok, that seems sensible
05:50 airlied:wonders for a well behaved app does that just mean we move the longs list to bind/unbind time
05:50 jekstrand: Yeah, and that's why GL is going to have to have fairly fine-grained residency management under this model.
05:50 keithp: could do exactly what the kernel does now, with less data transferred between user space and kernel :-)
05:50 jekstrand: Drop everything from the residency set that hasn't been used in 1s or so.
05:51 airlied:has vague memories of amdgpu having the bind/unbind step being just as horrible in the end
05:51 keithp: could just unmark everything not referenced from the current ring; add things before submit, pull things on retire
05:52 jekstrand: Part of the problem is that, thanks to things like VK_KHR_descriptor_indexing, we can't possibly know what's referenced from the current ring.
05:52 keithp: jekstrand: if the add/remove things were in-ring with commands, you'd have the same effect as the current code
05:53 jekstrand: Right
05:53 jekstrand: Vulkan today is passing the entire list of buffers on every execbuf because we don't actualy know what all is used.
05:53 keithp: heck, make the ring shared between user and kernel and avoid syscalls for submit :-)
05:53 jekstrand: So fine-grained buffer lists at execbuffer2 time are 100% cost and 0% benefit
05:53 krh: yay
05:53 krh: keithp: been looking at io_uring?
05:54 jekstrand: keithp: That's the plan. :)
05:54 keithp: krh: of course
05:54 keithp: almost like the kernel is catching up to X here
05:54 keithp: we've had serialized 'syscalls' for over 30 years
05:54 krh: discovering async protocols... even down to being able to specify the fd for openat
05:55 krh: instead of openat returning the fd
05:55 keithp: when syscalls are expensive, we find ways to not use them, I guess
05:55 krh: just like how XIDs are managed on the client
05:56 keithp: yeah, syscall to get a range of available IDs
05:56 airlied: bring back the drm lock :-P
05:56 keithp: we had to add that to X when the server got robust enough to last more than a few hours :-)
05:57 krh: airlied: nah, I stuffed it and hang it on the wall
06:27 daniels: jekstrand: mm, not thought about swapchain taking over. the 'only one per surface' thing is really common when having two makes desired behaviour impossible to figure out
08:45 MrCooper: airlied: your memory is right, amdgpu had separate BO lists from the start, but it ended up being an overall loss at least for radeonsi, so it's now passing an explicit list to the CS ioctl every time
09:55 hakzsam: https://gitlab.freedesktop.org/hakzsam/mesa/-/jobs/1925515 --> no space left on device
09:57 daniels: hakzsam: yeah, one of the big packet runners was full. noticed that just before you said it and fixed it now, jobs are being retried
09:58 hakzsam: thanks
09:58 daniels: it seems like we finally have a real solution to the disk issues, so fingers crossed this won't happen at all beyond the weekend
09:58 hakzsam: great
09:59 daniels: i know it's frustrating, sorry - also a horrendous yak-shave full of blind alleys :\
10:00 hakzsam: no worries, I'm just reporting, I'm not frustrated at all :)
10:23 MrCooper: bnieuwenhuizen: FWIW, amdgpu can crank out towards 10K flips/s, so submitting a flip shouldn't take more than ~100us
10:29 MrCooper: actually, that covers at least one kernel->Xorg->client->Xorg->kernel round-trip
10:30 MrCooper: the actual flip submission to the kernel is likely just a fraction
11:14 pq: jekstrand, what stops you from having the wl_surface and the zwp_linux_surface_synchronization_v1 objects in the same struct?
12:50 bl4ckb0ne: could opengles3 run on lima hardware?
12:54 kisak: once it's figured out, yes? (Midgard+)
12:55 bl4ckb0ne: so there's no hardware restrictions, it's just not started?
12:56 kisak: https://en.wikipedia.org/wiki/Mali_(GPU) has where the proprietary driver ended up.
12:59 linkmauve: bl4ckb0ne, lima is only for Utgard, so no.
12:59 linkmauve: You’re looking at panfrost hardware.
13:00 kisak: linkmauve: oh sorry, my bad. I mentally folded them together as the same driver family
13:09 agd5f: airlied, already picked it up. I'll include in next week in my fixes
13:46 bl4ckb0ne: linkmauve: so there's no gles3 because of hardware limitations?
13:47 linkmauve: Yes.
13:47 linkmauve: Mali-4xx is basically designed around GLES2.
13:57 bl4ckb0ne: whats in gles3 that prevents an implementation? compute shaders?
13:58 linkmauve: No, those were made core in 3.1.
13:58 linkmauve: Have a look at the list of features from GLES3, most if not all of those will be missing in Mali-4xx GPUs.
13:59 linkmauve: Which features do you want especially?
16:19 bl4ckb0ne: linkmauve: do you have the link to the gles3 features?
16:19 bl4ckb0ne: i was just curious
16:19 bl4ckb0ne: maybe i would need instanced drawing but im already working on the extension for gles2
16:21 imirkin_: bl4ckb0ne: gles3 has all the GL3 texturing requirements, as well as integers. that's usually death for gles2 hw.
16:23 bl4ckb0ne: whats the texturing requirement?
16:24 imirkin_: like 100 diff things, but ... stuff like
16:24 imirkin_: - integer textures
16:24 imirkin_: - bias, explicit lod
16:24 imirkin_: - explicit derivatives
16:24 imirkin_: - texel fetch
16:25 imirkin_: that's all i got off the top of my head
16:25 imirkin_: oh, texture arrays.
16:25 imirkin_: and seamless cubemap sampling
16:25 imirkin_: like i said, "lots of stuff" :)
16:27 bl4ckb0ne: could it be ported to gles2 with extensions? or its hardware specific?
16:28 imirkin_: sure, you could make an extension that's like GL_EXT_gles3_texturing
16:28 imirkin_: but only gles3 hw can do it
16:54 vsyrjala: mripard: mlankhorst_: is drm-misc-next done for 5.7?
17:05 imirkin_: bl4ckb0ne: actually i dunno - i might have to take that back. i expect there is hardware out there which can do most of that, but not some stupid bit of gles3. like intel gen4/5, which doesn't do MSAA. but they're DX10 parts and have all the rest of it.
17:05 imirkin_: bl4ckb0ne: however when adding new extensions ... think about who's going to use them
17:05 imirkin_: what software developer will care about this, etc
18:14 bl4ckb0ne: it was just a question (regarding the extensions), what i have a MR for what I need already
18:15 imirkin_: the larger point is that you can go around adding extensions which enable various functionality, but if no software uses thos extensions, that's all for naught
18:20 bl4ckb0ne: agreed
18:35 mlankhorst_: vsyrjala: not sure, not doing 5.7 :)
20:19 sravn: pinchartl: I bite the bullet and did the drm_encoder_init() to drm_encoder_init_funcs() rename. The big downside is that get_maintainers may pick me up far too often if this goes in :-)
20:22 pinchartl: sravn: :-)
20:42 danvet: Lyude, btw fixes you can just stuff into drm-misc-fixes ...
20:42 Lyude: danvet: yeah, but airlied asked me for that topic branch
20:42 danvet: ah ok, just wondered
20:49 krh: jekstrand: what's your take on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885/diffs?commit_id=36c53d2249f70ff1b289c73f3ce58ffb406c624e
20:50 krh: jekstrand: I'm wondering whether it should be rolled into one of the other bool lowering passes
20:50 krh: or if we're fine adding anoter
20:54 jekstrand: krh: Initial reaction is to agree with you
20:54 jekstrand: krh: What is it doing that's different from the int32 pass?
20:55 airlied: danvet: yeah linus was a bit rc size worried, so i wanted an option to keep mst fix separate
20:56 danvet: airlied, makes sense
21:00 krh: jekstrand: it's picking different bool sizes based on the srcs to the alu op
21:00 krh: but I can't explain why we wouldn't just always pick 16 bit bools in freedreno
21:02 jekstrand: krh: In isl, I chose to always do 16-bit bools
21:02 jekstrand: Well, sort-of
21:02 jekstrand: 16-bit is the default
21:03 jekstrand: But comparisons generate the same size bool as the thing being compared
21:03 jekstrand: Thanks to our regioning hardware, though, integer downcasts are free
21:03 krh: is isl?
21:04 krh: in isl?
21:05 jekstrand: IBC
21:05 jekstrand: sorry
21:05 krh: I suppose this pass does something similar - we run this after lowering precision and so most bools will be 16 bit
21:05 krh: but if we have full float comparisons this pass will lower to 32 bit bools
21:06 krh: jekstrand: do you use an ibc specific bool lowering pass?
21:07 jekstrand: Yeah, I did it all inside IBC
22:14 pinchartl: what would be the right name for a 10-bit greyscale 4CC ? DRM_FORMAT_R10 ?
22:14 pinchartl: there's DRM_FORMAT_R8 which, even if documented as "8 bpp Red", I have been told can be used for greyscale
22:19 danvet: pinchartl, R is kinda the new way of calling stuff in gl/vk, instead of I for intensity/greyscale
22:20 danvet: pinchartl, so yeah R10
22:27 vsyrjala: R? for grayscale doesn't seem really right to me unless you're talking about a shader and thus can swizzles it into whatever you want
22:28 vsyrjala: unfortunately we don't have display shaders
22:28 imirkin_: unfortunately??
22:29 vsyrjala: would be cool no?
22:29 imirkin_: zero downsides, i'm sure.
22:29 vsyrjala: no more "why didn't the hw designers make this just a little bit more generic?"
22:29 pinchartl: danvet: thanks
22:30 pinchartl: vsyrjala: so what would you recommend ?
22:31 vsyrjala: maybe a separate format? although i guess we could always decide R? is grayscale when it comes to display and document it as such
22:32 vsyrjala: and then cry when someone wants a retro friendly G8 format
22:32 vsyrjala: duh. clearly not retro enough. R2 seems better
22:33 vsyrjala: by which i meant G2 of course
22:43 danvet: vsyrjala, CTM is kinda a display shader
22:43 danvet: very simple display shader
22:43 danvet: like veeeerrrryyyyyyyyy simple
22:43 imirkin_: like NV_texture_shader
22:43 danvet: anyway we started with the R* stuff, better to stick to it
22:43 imirkin_: it's the beginning of the end
22:44 danvet: imirkin_, luckily displays seem to have stuck to the beginning for quite some time now
22:44 danvet: there is still hope
22:44 danvet: (let's ignore vc4)
22:44 imirkin_: lol
22:45 anarsoul: danvet: why?
22:48 danvet: anarsoul, vc4 is a fully programmable display pipeline, more or less
22:48 danvet: it's just a piece of firmware running on the display controller that does all the blending
22:48 anarsoul: I see
23:50 mattst88: how do I build deqp with GLES 1 support?
23:50 mattst88: -DDEQP_SUPPORT_GLES1=ON -DDEQP_GLES1_LIBRARIES=/usr/lib64/libGLESv1_CM.so is still leaving me with this in my cmake output:
23:50 mattst88: -- DEQP_SUPPORT_GLES1 = OFF
23:51 mattst88: craftyguy: ^
23:52 mattst88: I don't see anything special looking in https://gitlab.freedesktop.org/Mesa_CI/mesa_jenkins/-/blob/master/deqp/build.py
23:54 mattst88: looks like you have to build with -DDEQP_TARGET=x11_egl
23:54 mattst88: wonder what it defaults to, if not that