IRC Logs of #dri-devel on irc.freenode.net for 2024-10-22

02:28 Lynne: "Function parameter Decoration not handled: SpvDecorationCoherent"
02:29 Lynne: does mesa not implement certain parts of the SPIRV spec?
02:31 Lynne: either glslang or mesa miscompiles a bda buffer, and I'm gathering evidence so I can exonorate myself in the future, just in case
02:37 Sachiel: from the spirv docs for that: Coherent is not allowed if the declared memory model is Vulkan.
02:42 Lynne: I'm using volatile
02:43 Lynne: oh no, does glslang miscompile, AND confuse volatile with coherent in some situations?
02:45 Sachiel: don't know, I'd expect it to respect the rules of the target-env you ask for it
03:33 DavidHeidelberg: Company: force GTK apps dev to use PinePhone 1 :D Problem solved :trollface:
03:35 Company: I think you guys first need to get the driver to actually rerender the screen fast enough
03:35 Company: because last someone checked, cairo was very competitive on that phone
03:36 Company: something like 50% faster for GTK4, with 6fps vs 4fps
07:15 mahkoh: What is the purpose of https://gitlab.freedesktop.org/mesa/mesa/-/blob/28655b26f5cd62c7e7c581e3eabe3244ce7d9769/src/egl/drivers/dri2/platform_wayland.c?page=2#L1717-1725
07:16 mahkoh: In certain places mesa will just loop on dispatching wayland messages if this field is set. Nowadays I'd expect this sync to be mostly unrelated to buffer release message.
07:17 mahkoh: I saw mesa emitting a bunch of these syncs in a row (without any other wayland messages) in an application I was testing.
07:22 mahkoh: (I'm suspecting that it was mesa. I didn't see any other involved code using wl_display_sync.)
07:23 mahkoh: https://gist.github.com/mahkoh/299fdc305d0a2b1d5d4e96ede92bf9e4
07:36 daniels: mahkoh: well it's right after a wl_surface_commit and other protocol requests in the snippet you posted, so you'd at least see those rather than just back-to-back syncs
07:36 daniels: it's there for two reasons: partly because older versions of some compositors used to not flush after sending wl_buffer.release events, so you'd need something else like frame or sync to get the release event out of its queue
07:36 mahkoh: True, I guess it's not mesa after all.
07:37 daniels: and secondly, because you want _some_ kind of limiting on how fast you issue commits, rather than just sending so many messages you fill up the sendq
07:37 daniels: (you'll only hit that path if you have eglSwapInterval(0) i.e. completely free-running client not using fifo/ct or frame)
07:37 mahkoh: But isn't this already throttled by the swapchain length? If you're committing faster than the compositor can handle, you'll eventually run out of buffers.
07:49 daniels: yeah, true
08:37 hakzsam: not sure what's happening in CI here https://gitlab.freedesktop.org/mesa/mesa/-/pipelines/1294903
08:41 daniels: hakzsam: hmm, I wonder if one of the shared runners is sick - python-test just died halfway through without reporting failure, then looked like it sort of spontaneously restarted
12:09 daniels: DavidHeidelberg: so what's the plan with the fedora-release job? is there something blocking us from disabling LTO in pre-merge?
15:13 ernstp: agd5f: is that a good list of patches for 6.11.5 stable...? only: always apply the powersave optimization & Only force workload setup on init
16:19 DavidHeidelberg: daniels: generally nothing, https://gitlab.freedesktop.org/dh/mesa/-/jobs/65303723 I think this could be solvable by fedora images bump from 37 -> 39, but if we just add workaround it's likely fine
16:20 daniels: sure, if that works then cool, or we could just disable r600 on Fedora, or add -Wno-error=array-bounds - I don't really mind which
16:22 DavidHeidelberg: I would be inclined to latter, as this is GCC bug (with LTO it's not printed, and I checked the code and it seems to be correct)
16:22 DavidHeidelberg: let me drop MR
16:23 daniels: thankyou!
16:35 DavidHeidelberg: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31790 like, share, follow!
16:41 DavidHeidelberg: buy it, use it, break it, fix it, merge it.
17:32 agd5f: ernstp, the patches that have stable cc'ed on them in the -fixes PRs
19:24 legiauxmanau: So the moment you add a next power to low end bits, it is going to be uniquely encoded , you pass indexes as inputs to the second operand. So the encoder gives 443-72-256=115 to the 6th power in pseudo code. so the bottom encoder get's also unique even though we treated it differently dma stride for indexes is some X number, it would not need the fifo mode cause they are contiguous. so
19:24 legiauxmanau: the operand two has passthrough indexes it get's no modifications such as last element+1+powerbase i.e 32 or 64. The dma itself should look like this: write address of source whenever 115 has 1bit in, then 120 has 1 bit in. or 0 when 125 has zero bit in. now howto replace the 1's with 115 and 1 with 120 and 0 with 0, it is possible only when something overflows or corrupts , but howto
19:24 legiauxmanau: make it overflow at 120, so it needs to access some register that increments the counter that says it tranferred nothing, but resumes from 120. in other words source access register needs to be at 120 and bytes transferred none, so source is not incremented, so that can be realized if you have source incr counter whatever the dest counter is, such as 120 got 0, source stride is now 0 and
19:24 legiauxmanau: it repeats the source location with say address 120, so if another one never comes it graduates with whatever it had, so , but if dest counter strides to base+125, next time, the source is also 125 now that is some way of doing it. But there are others.
21:36 DavidHeidelberg: MrCooper: Heya, I'm little bit concerned. It seems lot of people are interested in Mesa performance, thou probably no people who knows how LTO/IPO works are between them.
21:37 DavidHeidelberg: I was hoping doing the experiment enabling LTO on the nightly pipeline, but seems like no-one is interested to help out. I getting convinced that maybe the CPU perf doesn't matter that much to most people.
22:52 Company: DavidHeidelberg: my guess would be that Mesa doesn't benefit too much from LTO because it's (a) only a passthrough layer from app to kernel, so there's not much to gain from inlining/reordering things and (b) due to its heavy vfuncification in the relevant places not likely to gain a lot from LTO
22:55 Company: I played a bit with LTO in GTK because I had hoped that it inlines all the getters we have, but it didn't help much - the big difference for GTK was around -O2 and maybe a bit from -O3 which some distros tend to forget because they do buildtype=plain and then screw up the flags
22:55 DemiMarie: Company: how much would there be to gain from dropping the vfuncs and going to switch statements instead?
22:55 Company: that was an issue in the early days of meson in aprticular, no idea if that's still a problem
22:56 DemiMarie: For those who want long-running Vulkan compute and don’t want to wait for preempt fences integration with WSI, could there be a VK extension that, when enabled, enables long-running compute and breaks WSI, unless one has new drivers or the compositor offers drm_syncobj?
22:57 Company: DemiMarie: no idea, but I'd expect the gain to be when you can avoid vfuncs - or prove to the compiler that there's only one possible function ever so it ccan optimize out the vfunc
22:57 Company: but no idea if any compiler can do that
22:57 DemiMarie: Company: the idea is to change the source code so that it uses enums and switches instead of vfuncs
22:57 heat: compilers can devirtualize
22:57 DemiMarie: because CPUs hate indirect branches
22:58 DemiMarie: heat: can they turn vfuncs into switch/case when there are multiple possibilities?
22:58 heat: probably
22:58 heat: assuming it's a performance win of any kind
22:58 Company: you can play with godbolt if you're curious
22:58 heat: i dont know however if you can use the C struct stuff_ops pattern and still expect the compiler/LTO to do it's thing... honestly never tried that
22:58 Company: but compilers do unexpected things all the time
22:58 DemiMarie: Regarding the VK ext idea: the other purpose would be for hardware that can't preempt (like Asahi) and therefore needs everything pinned longterm
22:59 DemiMarie: this ensures that everything is properly allocated and accounted for
23:00 DemiMarie: If your compositor supports syncobj or if the driver has preempt hooked up then things work, otherwise WSI fails
23:00 Company: it's also a thing that inlining often doesn't help much, because the slowness is notsolved by doing less work but by having more relevant memory in L1
23:00 DemiMarie: How much of that overhead is NIR?
23:01 Company: at least that's the thing for most GTK code
23:02 Company: avoiding a pointer chase is worth more than reordering/inlining the instructions
23:02 heat: quite honestly depends on how the code is written
23:03 heat: heavy C++ code tends to benefit quite a lot, C code (particularly carefully written code like the linux kernel) usually doesn't get too much from LTO
23:03 heat: LTO inlining doesn't matter too much if you already static inline __always_inline'd everything into headers