00:11idr: ngcortes, mareko: Yes... I am getting assertion failures in shader-db like:
00:11idr: error: (nir_src_is_const(*offset_src) && nir_src_as_uint(*offset_src) == 0) || offset_src->ssa->parent_instr->type == nir_instr_type_phi (../../SOURCE/master/src/compiler/nir/nir_validate.c:892)
00:11idr: That's in shaders/skia/1069.shader_test, so it should be relatively easy to reproduce on any system.
00:20daniels: idr: hrm, is that part of the regular shader-db tree?
00:21idr: daniels: Yes.
00:22idr: https://gitlab.freedesktop.org/mesa/shader-db/-/blob/master/shaders/skia/1069.shader_test
00:23idr: I can provide details on how to run that with intel_stub_gpu on any computers if that would help.
00:23Kayden: seems like it's something in the common blend equation advanced lowering going awry
00:26daniels: idr: even better if you could provide a MR which would make .gitlab-ci/run-shader-db.sh hit that, because we do run shader-db/shaders against intel_stub_gpu on pre-merge, so this should not have happened ...
00:26idr: daniels: Oh... Okay. I'll add that to my todo list for this week.
00:26idr: Thanks for the suggestion.
00:27daniels: idr: np, please feel free to ping me if you're stuck trying to reconcile 'it works on my machine' vs. 'CI??' issues
00:28idr: Roger that.
00:32Kayden: have a fix
00:33daniels: Kayden: please don't forget the pre-merge aspect!
00:54Kayden: idr, daniels, mareko, ngcortes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38231 probably fixes it
00:55Kayden: daniels: I upgraded the CI shader-db to run on HSW, BDW, SKL, and MTL instead of just SKL. That should cover crocus, the broadwell regression, and more-recent HW. Hopefully that's not too much though? 🤔 nouveau seems to run on 7 chipsets.
00:56Kayden: (crocus also regressed from this)
00:57idr: I'll check it in the morning. I'm leaving shortly.
00:57Kayden: 👍
01:03Kayden: if there were any VK failures that would not fix them though
01:05daniels: Kayden: wow, rapid - thanks! the crocus gens are one thing, but given that you're running for mtl, is there any benefit to running for cml/adl/etc, or is it all kind of same same?
01:05daniels: (also, we should probably at this point figure out how to parallelise shader-db and/or put it into deqp-runner)
01:28Mangix: anyone know about VCN 2.0 not exposing encoders?
01:30Mangix: vainfo is not exposing VAEntrypointEncSlice
02:10Kayden: daniels: I think skl/kbl/cml/cfl/adl/whl/aml are all basically the same. if I were to add anything else, I might add one of lnl/bmg
02:10Kayden:forgot cml was a -thing-
02:10Kayden: (on second though let's not include camelake, 'tis a silly place. oh right, -comet- lake)
02:16ccr:dances merrily, singing something about Camelot
02:16ccr: "it's just a model."
08:30MrCooper: silurian_invader: if some flip completion events incorrectly have the same sequence number as the previous one, that might throw off user space
08:48jani: tzimmermann: mlankhorst: mripard: is alex's ack enough to merge https://lore.kernel.org/r/cover.1761681968.git.jani.nikula@intel.com or do we need proper r-b?
08:50tzimmermann: jani, your cover letter is 'brief' :)
08:51tzimmermann: go ahead with merging AFAIC
08:52tzimmermann: jani, in case you missed it: there was another report about drm_print fallout in hypervdrm
09:32jani: tzimmermann: just making it quicker to read for you ;)
09:32jani: and thanks for the heads up about hypervdrm. I guess another kconfig option to hunt down and enable
09:33jani: I thought my x86/arm/arm64 configs were pretty comprehensive wrt DRM
09:57dj-death: does a NIR pattern like this begin_interlock; barrier(memory_modes=0); makes the barrier essentially useless?
09:59pendingchaos: if the barrier has execution_scope=invocation and memory_modes=0, then I think it's a no-op
10:04dj-death: @begin_invocation_interlock
10:04dj-death: @barrier (execution_scope=NONE, memory_scope=DEVICE, mem_semantics=ACQ, mem_modes=0)
10:07dj-death: it's a bit awkward because there is invocation interlock and the barrier but also the ssbo accessed in the critical section is marked coherent
10:07dj-death: so it all sounds like some stuff could be dropped
15:11silurian_invader: MrCooper: yeah, but the race described in the comment on drm_crtc_arm_vblank_event would make it so the event/sequence happens one frame later than it should
15:11silurian_invader: but there's no possibility of colliding sequences because we can't start a second atomic update until the previous update has been committed
15:12silurian_invader: and shouldn't drivers all be doing something like what meson does?
15:14silurian_invader: e.g. atomic_update only recalculates registers and doesn't touch them and the actual programming happens in the vblank interrupt
15:15silurian_invader: otherwise wouldn't the updates not actually be atomic (without a "go" bit)?
15:18pq: I believe there is also hardware that latches registers on vblank. Latching is "fast", writing them maybe not quite?
15:18sima: silurian_invader, you can also do vblank evasion like intel does
15:19silurian_invader: pq: yeah, that's what I mean by a "go" bit
15:19silurian_invader: (to use the arm_vblank_event terminology)
15:20pq: maybe there is a "go" bit, maybe not, maybe the "go" bit itself is immediate rather than waiting for vblank. I dunno.
15:20silurian_invader: sima: but doesn't that mean you effectively can't use the drm helpers?
15:21sima: well not the plane update parts
15:21sima: that's why they're modular
15:22silurian_invader: ok, so you'd sub out drm_atomic_helper_commit_tail with a version that calls your own version of drm_atomic_helper_commit_planes that does everything during vblank?
15:24silurian_invader: or maybe just wait for the vblank interrupt in atomic_begin?
15:54austriancoder: I just did my first `dim push-branch drm-misc-next` - I hope did not broke anything
16:02phasta: dim catches / prevents most possible breakages. That's kind of why it's there
16:08alyssa: not sure if pixelcluster is on IRC around now but I'm planning to finish off & land sparse live sets
16:10alyssa: needs some work on Intel
16:36alyssa: compile-time win on Intel is looking good tho
16:43alyssa: ..crash :(
16:54alyssa: It might help if I built anv (:
16:57pac85: This function here https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/compiler/nir/nir_lower_tex.c?ref_type=heads#L160 is lowering texture offset without accounting for texture mip maps? I wonder if there is a reason it doesn't matter. I feel like lowering texture offset is impossible in general (due to aniso)
16:58alyssa: pac85: yes that lowering is broken and so are any drivers using it
16:58alyssa: for anything other than txf
16:58alyssa: or maybe tg4
16:58alyssa: I think intel's ok because it only tg4
16:59pac85: alyssa: ah, so in my uber shaders stuff (yes I'm bringing it back to life) I need to run it unconditionally to be able to then handle clamping coords dynamically and that breaks a ton of tests. I'm not sure what to do
17:00pac85: I'm sure I could write a pile of nir to pass CTS since it probably doesn't care about aniso specifics but it would technically still be incorrect due to invariance
17:01alyssa: I don't know what you're doing but it sounds like it can't work, no
17:06pac85: very simply, I have a version of `saturate_src` that is controlled dynamically through push costants. `saturate_src` needs offset to be lowered and normally both of those are only called if needed. This means that offsets are only broken when GL_CLAMP or wtahever is used and ig. CTS doesn't notice that. However in my dynamic version all the lowering runs unconditionally and that makes the brokenness surface.
17:08pac85: uh I have a cursed idea, I can have my own version of lower_offset that can also be controlled dynamically so I can disable it when not needed too and replicate the exact same kind of broken that normally passes CTS...
17:19alyssa: I highly suggest, not doing that
17:20alyssa: VK_EXT_gl_clamp for everybody /o\
17:20pac85: lol
17:20alyssa: i've considered it
17:20pac85: I didn't miss dealing with impossible to correctly emulate gl stuff
17:21pac85: (or maybe I did since sometimes it is kinda fun but not this time lol)
17:52HdkR: While I'm looking at host_image_copy on Turnip, are there any other platforms that I should look at which has a smaller tile format that needs swizzling? PVR, Mali, Asahi, Broadcom...Vivante?
17:56HdkR: I'm guessing the traditional x86 focused drivers are likely to already have CPU optimized paths, so they aren't too spicy.
17:58mareko: optimized? not at all
17:58HdkR: Well, Radeon at least hasn't caused Silksong heartburn like Turnip did a few days ago :D
17:59cwabbott: radeon probably doesn't return optimal for cpp=1 like we do
18:00cwabbott: also radeon has the async compute and copy queues which apps generally use instead of HIC
18:00cwabbott: and it's not even exposed by default due to being too slow
18:01alyssa: the asahi routines are exceedingly poorly optimized
18:01HdkR: Ah right, I didn't try to enable HIC on radeon
18:02alyssa: because the GL driver blits with the GPU most of the time and only really uses the s/w routines for compressed texture upload (and is fast enough to not be a bottleneck there)
18:02alyssa: and by the time I got around to HIC in HK I ran out of f's to give (:
18:02alyssa: would be straightforwardish to NEON accelerate things etc but, effort.
18:03HdkR: Alright, I'll throw Asahi on the list to check
18:03alyssa: check what?
18:04HdkR: atomic loadstores to uncached memory causing things that use HIC to fall off a cliff in performance.
18:04alyssa: oh, jeez, uh yeah that might hurt on both asahi & panfrost
18:05jannau: is the stray VK_EXT_host_image_copy comment in hk_get_device_properties() https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/asahi/vulkan/hk_physical_device.c#L906 there on purpose?
18:05HdkR: It caused Turnip to hit a code path that was 500x slower when TSO-emulation was enabled, making SilkSong's intro video run at something like 0.5 FPS
18:07HdkR: I'll slap Panvk on the list as well
18:08alyssa: jannau: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/asahi/vulkan/hk_physical_device.c#L1048 should be in there as designated init, I copypasted from turnip too hard I guess
18:08alyssa: not that it matters
18:08jannau: access to uncached memory itself is already very slow as https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37655 showed
18:09HdkR: Yea, multiply that last number by 500x :P
18:10HdkR: Although Asahi platforms have hardware TSO, so as long as the user has the kernel patch to enable that, it would be fine.
18:11HdkR: Regular slow versus unusably slow.
18:14jannau: I haven't noticed anything while starting Silksong on asahi in TSO mode but I think I didn't pay attention to the intro video
18:14HdkR: Yea, if it is as bad as turnip then it becomes slow once the FMV starts
18:19HdkR: But I guess it would be fine, because hardware TSO converts atomic stores in to str :D
21:54alyssa: is there any prior art for a fossildb stat for "how long this pipeline took to compile?" and then fossil-report showing which apps are sped up/slowed?
21:55alyssa: the .csv has PSO wall duration already, so just need python for it?
21:58jannau: Mary: have you recently run vulkan cts on asahi? I get "NIR validation failed after nir_lower_vars_to_ssa in ../src/asahi/compiler/agx_compile.c:3702" for transform_feedback and clipping.user_defined tests with 67a6fc01607f
22:22Lynne: are storage image loads expected to be faster than buffer loads nowadays on modern hardware?
22:23HdkR: Have you looked at sebbbi's `perftest` to see the difference?
22:38Mary: jannau: haven't test since at least the start of last week...
22:45Lynne: HdkR: d3d11? I haven't touched windows in over 15 years.
22:49HdkR: Lynne: It has different names sure, but it's the same thing under the hood.
22:51Company: (apart from descriptor handling)
23:36jannau: Mary: your commit fb4010e64123 ("asahi: Update CI expectations") looks better (tests are still running). so possibly a regression. I'll bisect