00:05jenatali: Ugh why are all Windows GL apps terrible?
00:06dwfreed: the operative word there is "Windows"
00:07jenatali: I have an app that requires a GL pixel format present with PFD_SWAP_COPY (i.e. double-buffered but swapping is guaranteed to be a copy from back to front)
00:08jenatali: Of course I don't think it actually cares about that, but if there isn't one, then it just issues drawing without a context bound
00:10jenatali: Now the question... do I implement this "for real" and hope that nobody ever accidentally triggers it? Or do I use driconf to lie and say that's the swap method?
00:17zmike: I'd driconf for now and implement for real if you hit it again
00:17zmike: unless it's easy
00:17zmike: but wsi is never easy
00:25stefan11111: Hi. I noticed that EGL_DRIVER_NAME_EXT is not part of the epoxy egl headers. Creating the define manually works. Is this intended?
00:25stefan11111: https://github.com/anholt/libepoxy/issues/318 00:26anholt: sounds like epoxy hasn't been updated for that extension yet?
00:26anholt: have you tried updating the xml?
00:40karolherbst: jenatali: probably everybody copying from the same tutorial that was written in 1995
00:57stefan11111: anholt: Done, and it seems to work. Hopefully I did it right: https://github.com/anholt/libepoxy/pull/320 07:14eric_engestrom: PSA: mesa 26.0 branchpoint is likely in ~8h
07:15eric_engestrom: don't wait until the last second to assign to marge, I'd like to avoid a 12h merge queue 😅
08:19Company: MrCooper: have you ever considered the usefulness of GL_EXT_fragment_shading_rate for glamor (or Mutter even)?
08:20Company: it looks useful for solid rounded rectangles which GTK has a lot of but not sure about X or compositors
08:21MrCooper: I hadn't, interesting idea though
08:21MrCooper: then again, it might not make any difference for simple shaders
08:22Company: Mesa doesn't support it on GL at all yet, and not on lavapipe
08:22MrCooper: since the bottleneck is elsewhere anyway
08:22Company: fill rate is the bottleneck in GTK once we go to 4k screens
08:22MrCooper: right, and this won't affect that, will it?
08:23Company: I don't have written enough code to know
08:23MrCooper: I mean, the bottleneck is probably memory bandwidth
08:23Company: it depends how GPUs implement it I suppose
08:24glehmann: If the shader doesn't need interpolation, radeonsi will use 2x2 vrs behind your back already
08:24Company: if it reduces memory transfers, you win - if it just avoids the shaders, not so much
08:25Company: if I turn it on unconditionally in GTK, I get ~40% faster rendering (on a 4k screen)
08:25dj-death: on intel it just reduces fragment shader computation
08:26Company: but it looks like https://i.imgur.com/24qGBp7.png 08:26Company: so that's definitely an upper bound
09:06Company: MrCooper: on my Radeon 6500 the upper bound for the speedup is 2% (so not the 40% from my TGL laptop)
09:07Company: but this thing can only do 2x2 blocks per fragment shader, the Intel can do 4x4
09:08MrCooper: looks consistent with what the others wrote above
09:30austriancoder: gfxstrand: if you have some spare minutes, would be great if you could look at this single commit from an already reviewed MR. https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38929/diffs?commit_id=635f22e8f252c49578d9e3281149be4f66781a7d 10:54MrCooper: fun fact: a Mesa branch CI pipeline currently has exactly 500 jobs
11:00eric_engestrom: "fun" but maybe not for everyone 😅
12:04mareko: glehmann: well the goal is to remove tex instr handling from ACO and other places in favor of intrinsics
12:05glehmann: what's the benefit?
12:05mareko: code duplication removal
12:06glehmann: are you talking about just txf or all texops?
12:06mareko: txf and the other redundant ones
12:07mareko: formatted buffer loads are implemented 6 times in AMD drivers (load_buffer_amd, tex, images) x (ACO, nir_to_llvm)
12:09glehmann: I guess that would be fine, I just want to keep the sampling ops as they are right now
12:10glehmann: although I guess it's a bit of a regression for vega, because images need to handle 3d descriptors unlike tex
12:10mareko: what are 3d descriptors?
12:14glehmann: it's about the descriptor not matching what the shader expects because there is no way to get a 2D slice view of a 3D image for all tilings on vega
12:15glehmann: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/amd/compiler/instruction_selection/aco_select_nir_intrinsics.cpp#L1788 this code
12:17mareko: that could be skipped for ACCESS_IMAGE_IS_TEX
13:14karolherbst: Venemo: any reason that nir_opt_offsets only operates on 32 bit addresses? I'd like to use it for global memory operations where we support 24 bit constant offsets
13:15karolherbst: So I'd like to support 64 bit constants that are still within the 32 bit limit drivers can advertise
13:16karolherbst: could skip processing it in case uub has to be used, but the plain constant extraction should also work with 64 bits, no?
13:18Venemo: karolherbst: sounds like a nice plan. you can extend it to support more stuff. it's just that when I wrote the pass the 32 bits were enough to cover all uses that I wanted to cover (and other people extended it a lot since then). that being said, AFAIR the main limitation is that nir_unsigned_upper_bound itself only supports 32 bits
13:19karolherbst: right... I can see that become an issue if you want to extract it from a iadd(base, u2u64(iadd(offset, const_offset)) thing
13:20karolherbst: but then again.. we can do the uub in 32 bits
13:20karolherbst: and the 64 bit calc should be safe, because the load/store op should have infinite precision
13:20karolherbst: or well.. it's UB anyway if the address overflows
13:20Venemo: I think if you know that your HW supports 64 bit addresses and uses a 64 bit adder for offsets, then you may ignore unsigned wraps
13:21Venemo: the issue is when that is not so clear cut
13:21karolherbst: the 32 bit offset calc might still overflow, but I can use uub for it anyway. But yeah in hw it's all 64 bits and doesn't matter
13:21Venemo: can add a flag to tell the pass that it just shouldn't care about wraps, and then it shouldn't be an issue
13:21karolherbst: hw can even add a 64 bit uniform address with a 32 bit non uniform offset with a 24 bit constant
13:21karolherbst: and it's all done with 64 bit precision
13:22Venemo: for radv it was an issue because even though the hw supports 64 bit addresses, it adds the offsets using 32-bit arithmetics
13:22karolherbst: so I need to check for overflows in _if_ the offset gets calculated in 32 bit with a variable + a constant
13:22karolherbst: ahh...
13:22karolherbst: but yeah.. if it's all 64 bit I can ignore uub
13:22Venemo: (I mean before it adds the 32 bit offset to the 64 bit address, it may calculate the offset itself in just 32 bits, AFAIR)
13:23karolherbst: I think
13:23karolherbst: yeah...
13:24karolherbst: I should just do it in the callback I guess...
13:24karolherbst: need to still come up with a great plan to do it all, but my initial plan was to add a case for nir_intrinsic_store_global_nv (and co) and just call into try_fold_load_store twice for both sources
13:25karolherbst: the non uniform address can be either 32 or 64 bit, so that complicates things a little
13:48alyssa: glehmann: we could fix some of this..
14:22mlankhorst: jani, agd5f: Ack to push the first 4 patches in this series through drm-misc-fixes? https://patchwork.freedesktop.org/series/159261/ 14:24agd5f: mlankhorst, fine with me unless hwentlan__ has any concerns.
14:26mlankhorst: sima: same for you since vkms is affected too?
14:36jani: mlankhorst: ack, already replied: https://lore.kernel.org/r/513db214e2adcad6a70cea2461b7bfc26c2884db@intel.com 16:47hakzsam: dcbaker: so which ones cause a problem?
17:08zmike: is there a nir pass which splits gl_PerVertex struct output into separate variables?
17:20glehmann: nir_split_struct_vars?
17:23zmike: oh neat
17:23zmike: thanks
17:27dcbaker: hakzsam: "radv: fix capturing performance counters with SPM" is using functions that don't exist in 25.3 currently. the `radv_spm_trace_enabled()` is pretty straightforward to add (and the patch it's in might be a candidate for stable anyway?) the other two are ac_cmdbuf_ functions and I'm not sure on those
17:27dcbaker: " ac/sdma: fix stencil only copies on GFX9" creates a very large diff, and I'm not sure what I'm getting makes sense
17:28dcbaker: and "radv/sqtt: delay VMID reservation at capture time" has some weird interactions with 25.3.x because I had to make some changes to the patch it fixes to get it to apply, and I'm not a bit lost on what to do
17:29hakzsam: dcbaker: okay, I will try to provide a MR
17:30dcbaker: hakzsam: thank you, I appreciate all of your help with stable!
17:47cwabbott: can someone with a device running panfrost reproduce one of the crashes in https://gitlab.freedesktop.org/cwabbott0/mesa/-/jobs/91513208 and give me the output with `NIR_DEBUG=print`?
17:52zmike: glehmann: hm this doesn't actually seem to work for gl_PerVertex though because it doesn't preserve location info
17:53hakzsam: dcbaker: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39437 17:54zmike: I guess I have to fix it
17:54dcbaker: hakzsam: thank you!
17:54hakzsam: np
18:04alyssa: cwabbott: https://docs.mesa3d.org/drivers/panfrost/drm-shim.html 18:06alyssa: possibly an internal afbc-p shader though
18:11cwabbott: nah, given the test name it's probably a normal shader
18:14zmike: actually no, it's nir_split_per_member_structs that I wanted
18:22glehmann: nir has too many passes 🙃
18:22zmike: yes
19:22mareko: not enough, need moar
19:35mareko: NIR is partially taking on the responsiblity of what a machine IR would do
23:05zmike: cmarcelo: maybe you're who to ask about this: why would vtn generate NIR like load_deref -> store_deref for vs outputs if the spirv only does stores?