00:05 jenatali: Ugh why are all Windows GL apps terrible?
00:06 dwfreed: the operative word there is "Windows"
00:07 jenatali: I have an app that requires a GL pixel format present with PFD_SWAP_COPY (i.e. double-buffered but swapping is guaranteed to be a copy from back to front)
00:08 jenatali: Of course I don't think it actually cares about that, but if there isn't one, then it just issues drawing without a context bound
00:10 jenatali: Now the question... do I implement this "for real" and hope that nobody ever accidentally triggers it? Or do I use driconf to lie and say that's the swap method?
00:17 zmike: I'd driconf for now and implement for real if you hit it again
00:17 zmike: unless it's easy
00:17 zmike: but wsi is never easy
00:25 stefan11111: Hi. I noticed that EGL_DRIVER_NAME_EXT is not part of the epoxy egl headers. Creating the define manually works. Is this intended?
00:25 stefan11111: https://github.com/anholt/libepoxy/issues/318
00:26 anholt: sounds like epoxy hasn't been updated for that extension yet?
00:26 anholt: have you tried updating the xml?
00:40 karolherbst: jenatali: probably everybody copying from the same tutorial that was written in 1995
00:57 stefan11111: anholt: Done, and it seems to work. Hopefully I did it right: https://github.com/anholt/libepoxy/pull/320
07:14 eric_engestrom: PSA: mesa 26.0 branchpoint is likely in ~8h
07:15 eric_engestrom: don't wait until the last second to assign to marge, I'd like to avoid a 12h merge queue 😅
08:19 Company: MrCooper: have you ever considered the usefulness of GL_EXT_fragment_shading_rate for glamor (or Mutter even)?
08:20 Company: it looks useful for solid rounded rectangles which GTK has a lot of but not sure about X or compositors
08:21 MrCooper: I hadn't, interesting idea though
08:21 MrCooper: then again, it might not make any difference for simple shaders
08:22 Company: Mesa doesn't support it on GL at all yet, and not on lavapipe
08:22 MrCooper: since the bottleneck is elsewhere anyway
08:22 Company: fill rate is the bottleneck in GTK once we go to 4k screens
08:22 MrCooper: right, and this won't affect that, will it?
08:23 Company: I don't have written enough code to know
08:23 MrCooper: I mean, the bottleneck is probably memory bandwidth
08:23 Company: it depends how GPUs implement it I suppose
08:24 glehmann: If the shader doesn't need interpolation, radeonsi will use 2x2 vrs behind your back already
08:24 Company: if it reduces memory transfers, you win - if it just avoids the shaders, not so much
08:25 Company: if I turn it on unconditionally in GTK, I get ~40% faster rendering (on a 4k screen)
08:25 dj-death: on intel it just reduces fragment shader computation
08:26 Company: but it looks like https://i.imgur.com/24qGBp7.png
08:26 Company: so that's definitely an upper bound
09:06 Company: MrCooper: on my Radeon 6500 the upper bound for the speedup is 2% (so not the 40% from my TGL laptop)
09:07 Company: but this thing can only do 2x2 blocks per fragment shader, the Intel can do 4x4
09:08 MrCooper: looks consistent with what the others wrote above
09:30 austriancoder: gfxstrand: if you have some spare minutes, would be great if you could look at this single commit from an already reviewed MR. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38929/diffs?commit_id=635f22e8f252c49578d9e3281149be4f66781a7d
10:54 MrCooper: fun fact: a Mesa branch CI pipeline currently has exactly 500 jobs
11:00 eric_engestrom: "fun" but maybe not for everyone 😅
12:04 mareko: glehmann: well the goal is to remove tex instr handling from ACO and other places in favor of intrinsics
12:05 glehmann: what's the benefit?
12:05 mareko: code duplication removal
12:06 glehmann: are you talking about just txf or all texops?
12:06 mareko: txf and the other redundant ones
12:07 mareko: formatted buffer loads are implemented 6 times in AMD drivers (load_buffer_amd, tex, images) x (ACO, nir_to_llvm)
12:09 glehmann: I guess that would be fine, I just want to keep the sampling ops as they are right now
12:10 glehmann: although I guess it's a bit of a regression for vega, because images need to handle 3d descriptors unlike tex
12:10 mareko: what are 3d descriptors?
12:14 glehmann: it's about the descriptor not matching what the shader expects because there is no way to get a 2D slice view of a 3D image for all tilings on vega
12:15 glehmann: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/amd/compiler/instruction_selection/aco_select_nir_intrinsics.cpp#L1788 this code
12:17 mareko: that could be skipped for ACCESS_IMAGE_IS_TEX
13:14 karolherbst: Venemo: any reason that nir_opt_offsets only operates on 32 bit addresses? I'd like to use it for global memory operations where we support 24 bit constant offsets
13:15 karolherbst: So I'd like to support 64 bit constants that are still within the 32 bit limit drivers can advertise
13:16 karolherbst: could skip processing it in case uub has to be used, but the plain constant extraction should also work with 64 bits, no?
13:18 Venemo: karolherbst: sounds like a nice plan. you can extend it to support more stuff. it's just that when I wrote the pass the 32 bits were enough to cover all uses that I wanted to cover (and other people extended it a lot since then). that being said, AFAIR the main limitation is that nir_unsigned_upper_bound itself only supports 32 bits
13:19 karolherbst: right... I can see that become an issue if you want to extract it from a iadd(base, u2u64(iadd(offset, const_offset)) thing
13:20 karolherbst: but then again.. we can do the uub in 32 bits
13:20 karolherbst: and the 64 bit calc should be safe, because the load/store op should have infinite precision
13:20 karolherbst: or well.. it's UB anyway if the address overflows
13:20 Venemo: I think if you know that your HW supports 64 bit addresses and uses a 64 bit adder for offsets, then you may ignore unsigned wraps
13:21 Venemo: the issue is when that is not so clear cut
13:21 karolherbst: the 32 bit offset calc might still overflow, but I can use uub for it anyway. But yeah in hw it's all 64 bits and doesn't matter
13:21 Venemo: can add a flag to tell the pass that it just shouldn't care about wraps, and then it shouldn't be an issue
13:21 karolherbst: hw can even add a 64 bit uniform address with a 32 bit non uniform offset with a 24 bit constant
13:21 karolherbst: and it's all done with 64 bit precision
13:22 Venemo: for radv it was an issue because even though the hw supports 64 bit addresses, it adds the offsets using 32-bit arithmetics
13:22 karolherbst: so I need to check for overflows in _if_ the offset gets calculated in 32 bit with a variable + a constant
13:22 karolherbst: ahh...
13:22 karolherbst: but yeah.. if it's all 64 bit I can ignore uub
13:22 Venemo: (I mean before it adds the 32 bit offset to the 64 bit address, it may calculate the offset itself in just 32 bits, AFAIR)
13:23 karolherbst: I think
13:23 karolherbst: yeah...
13:24 karolherbst: I should just do it in the callback I guess...
13:24 karolherbst: need to still come up with a great plan to do it all, but my initial plan was to add a case for nir_intrinsic_store_global_nv (and co) and just call into try_fold_load_store twice for both sources
13:25 karolherbst: the non uniform address can be either 32 or 64 bit, so that complicates things a little
13:48 alyssa: glehmann: we could fix some of this..
14:22 mlankhorst: jani, agd5f: Ack to push the first 4 patches in this series through drm-misc-fixes? https://patchwork.freedesktop.org/series/159261/
14:24 agd5f: mlankhorst, fine with me unless hwentlan__ has any concerns.
14:26 mlankhorst: sima: same for you since vkms is affected too?
14:36 jani: mlankhorst: ack, already replied: https://lore.kernel.org/r/513db214e2adcad6a70cea2461b7bfc26c2884db@intel.com
16:47 hakzsam: dcbaker: so which ones cause a problem?
17:08 zmike: is there a nir pass which splits gl_PerVertex struct output into separate variables?
17:20 glehmann: nir_split_struct_vars?
17:23 zmike: oh neat
17:23 zmike: thanks
17:27 dcbaker: hakzsam: "radv: fix capturing performance counters with SPM" is using functions that don't exist in 25.3 currently. the `radv_spm_trace_enabled()` is pretty straightforward to add (and the patch it's in might be a candidate for stable anyway?) the other two are ac_cmdbuf_ functions and I'm not sure on those
17:27 dcbaker: " ac/sdma: fix stencil only copies on GFX9" creates a very large diff, and I'm not sure what I'm getting makes sense
17:28 dcbaker: and "radv/sqtt: delay VMID reservation at capture time" has some weird interactions with 25.3.x because I had to make some changes to the patch it fixes to get it to apply, and I'm not a bit lost on what to do
17:29 hakzsam: dcbaker: okay, I will try to provide a MR
17:30 dcbaker: hakzsam: thank you, I appreciate all of your help with stable!
17:47 cwabbott: can someone with a device running panfrost reproduce one of the crashes in https://gitlab.freedesktop.org/cwabbott0/mesa/-/jobs/91513208 and give me the output with `NIR_DEBUG=print`?
17:52 zmike: glehmann: hm this doesn't actually seem to work for gl_PerVertex though because it doesn't preserve location info
17:53 hakzsam: dcbaker: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39437
17:54 zmike: I guess I have to fix it
17:54 dcbaker: hakzsam: thank you!
17:54 hakzsam: np
18:04 alyssa: cwabbott: https://docs.mesa3d.org/drivers/panfrost/drm-shim.html
18:06 alyssa: possibly an internal afbc-p shader though
18:11 cwabbott: nah, given the test name it's probably a normal shader
18:14 zmike: actually no, it's nir_split_per_member_structs that I wanted
18:22 glehmann: nir has too many passes 🙃
18:22 zmike: yes
19:22 mareko: not enough, need moar
19:35 mareko: NIR is partially taking on the responsiblity of what a machine IR would do
23:05 zmike: cmarcelo: maybe you're who to ask about this: why would vtn generate NIR like load_deref -> store_deref for vs outputs if the spirv only does stores?