00:11 idr: ngcortes, mareko: Yes... I am getting assertion failures in shader-db like:
00:11 idr: error: (nir_src_is_const(*offset_src) && nir_src_as_uint(*offset_src) == 0) || offset_src->ssa->parent_instr->type == nir_instr_type_phi (../../SOURCE/master/src/compiler/nir/nir_validate.c:892)
00:11 idr: That's in shaders/skia/1069.shader_test, so it should be relatively easy to reproduce on any system.
00:20 daniels: idr: hrm, is that part of the regular shader-db tree?
00:21 idr: daniels: Yes.
00:22 idr: https://gitlab.freedesktop.org/mesa/shader-db/-/blob/master/shaders/skia/1069.shader_test
00:23 idr: I can provide details on how to run that with intel_stub_gpu on any computers if that would help.
00:23 Kayden: seems like it's something in the common blend equation advanced lowering going awry
00:26 daniels: idr: even better if you could provide a MR which would make .gitlab-ci/run-shader-db.sh hit that, because we do run shader-db/shaders against intel_stub_gpu on pre-merge, so this should not have happened ...
00:26 idr: daniels: Oh... Okay. I'll add that to my todo list for this week.
00:26 idr: Thanks for the suggestion.
00:27 daniels: idr: np, please feel free to ping me if you're stuck trying to reconcile 'it works on my machine' vs. 'CI??' issues
00:28 idr: Roger that.
00:32 Kayden: have a fix
00:33 daniels: Kayden: please don't forget the pre-merge aspect!
00:54 Kayden: idr, daniels, mareko, ngcortes: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38231 probably fixes it
00:55 Kayden: daniels: I upgraded the CI shader-db to run on HSW, BDW, SKL, and MTL instead of just SKL. That should cover crocus, the broadwell regression, and more-recent HW. Hopefully that's not too much though? 🤔 nouveau seems to run on 7 chipsets.
00:56 Kayden: (crocus also regressed from this)
00:57 idr: I'll check it in the morning. I'm leaving shortly.
00:57 Kayden: 👍
01:03 Kayden: if there were any VK failures that would not fix them though
01:05 daniels: Kayden: wow, rapid - thanks! the crocus gens are one thing, but given that you're running for mtl, is there any benefit to running for cml/adl/etc, or is it all kind of same same?
01:05 daniels: (also, we should probably at this point figure out how to parallelise shader-db and/or put it into deqp-runner)
01:28 Mangix: anyone know about VCN 2.0 not exposing encoders?
01:30 Mangix: vainfo is not exposing VAEntrypointEncSlice
02:10 Kayden: daniels: I think skl/kbl/cml/cfl/adl/whl/aml are all basically the same. if I were to add anything else, I might add one of lnl/bmg
02:10 Kayden:forgot cml was a -thing-
02:10 Kayden: (on second though let's not include camelake, 'tis a silly place. oh right, -comet- lake)
02:16 ccr:dances merrily, singing something about Camelot
02:16 ccr: "it's just a model."
08:30 MrCooper: silurian_invader: if some flip completion events incorrectly have the same sequence number as the previous one, that might throw off user space
08:48 jani: tzimmermann: mlankhorst: mripard: is alex's ack enough to merge https://lore.kernel.org/r/cover.1761681968.git.jani.nikula@intel.com or do we need proper r-b?
08:50 tzimmermann: jani, your cover letter is 'brief' :)
08:51 tzimmermann: go ahead with merging AFAIC
08:52 tzimmermann: jani, in case you missed it: there was another report about drm_print fallout in hypervdrm
09:32 jani: tzimmermann: just making it quicker to read for you ;)
09:32 jani: and thanks for the heads up about hypervdrm. I guess another kconfig option to hunt down and enable
09:33 jani: I thought my x86/arm/arm64 configs were pretty comprehensive wrt DRM
09:57 dj-death: does a NIR pattern like this begin_interlock; barrier(memory_modes=0); makes the barrier essentially useless?
09:59 pendingchaos: if the barrier has execution_scope=invocation and memory_modes=0, then I think it's a no-op
10:04 dj-death: @begin_invocation_interlock
10:04 dj-death: @barrier (execution_scope=NONE, memory_scope=DEVICE, mem_semantics=ACQ, mem_modes=0)
10:07 dj-death: it's a bit awkward because there is invocation interlock and the barrier but also the ssbo accessed in the critical section is marked coherent
10:07 dj-death: so it all sounds like some stuff could be dropped
15:11 silurian_invader: MrCooper: yeah, but the race described in the comment on drm_crtc_arm_vblank_event would make it so the event/sequence happens one frame later than it should
15:11 silurian_invader: but there's no possibility of colliding sequences because we can't start a second atomic update until the previous update has been committed
15:12 silurian_invader: and shouldn't drivers all be doing something like what meson does?
15:14 silurian_invader: e.g. atomic_update only recalculates registers and doesn't touch them and the actual programming happens in the vblank interrupt
15:15 silurian_invader: otherwise wouldn't the updates not actually be atomic (without a "go" bit)?
15:18 pq: I believe there is also hardware that latches registers on vblank. Latching is "fast", writing them maybe not quite?
15:18 sima: silurian_invader, you can also do vblank evasion like intel does
15:19 silurian_invader: pq: yeah, that's what I mean by a "go" bit
15:19 silurian_invader: (to use the arm_vblank_event terminology)
15:20 pq: maybe there is a "go" bit, maybe not, maybe the "go" bit itself is immediate rather than waiting for vblank. I dunno.
15:20 silurian_invader: sima: but doesn't that mean you effectively can't use the drm helpers?
15:21 sima: well not the plane update parts
15:21 sima: that's why they're modular
15:22 silurian_invader: ok, so you'd sub out drm_atomic_helper_commit_tail with a version that calls your own version of drm_atomic_helper_commit_planes that does everything during vblank?
15:24 silurian_invader: or maybe just wait for the vblank interrupt in atomic_begin?
15:54 austriancoder: I just did my first `dim push-branch drm-misc-next` - I hope did not broke anything
16:02 phasta: dim catches / prevents most possible breakages. That's kind of why it's there
16:08 alyssa: not sure if pixelcluster is on IRC around now but I'm planning to finish off & land sparse live sets
16:10 alyssa: needs some work on Intel
16:36 alyssa: compile-time win on Intel is looking good tho
16:43 alyssa: ..crash :(
16:54 alyssa: It might help if I built anv (:
16:57 pac85: This function here https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/compiler/nir/nir_lower_tex.c?ref_type=heads#L160 is lowering texture offset without accounting for texture mip maps? I wonder if there is a reason it doesn't matter. I feel like lowering texture offset is impossible in general (due to aniso)
16:58 alyssa: pac85: yes that lowering is broken and so are any drivers using it
16:58 alyssa: for anything other than txf
16:58 alyssa: or maybe tg4
16:58 alyssa: I think intel's ok because it only tg4
16:59 pac85: alyssa: ah, so in my uber shaders stuff (yes I'm bringing it back to life) I need to run it unconditionally to be able to then handle clamping coords dynamically and that breaks a ton of tests. I'm not sure what to do
17:00 pac85: I'm sure I could write a pile of nir to pass CTS since it probably doesn't care about aniso specifics but it would technically still be incorrect due to invariance
17:01 alyssa: I don't know what you're doing but it sounds like it can't work, no
17:06 pac85: very simply, I have a version of `saturate_src` that is controlled dynamically through push costants. `saturate_src` needs offset to be lowered and normally both of those are only called if needed. This means that offsets are only broken when GL_CLAMP or wtahever is used and ig. CTS doesn't notice that. However in my dynamic version all the lowering runs unconditionally and that makes the brokenness surface.
17:08 pac85: uh I have a cursed idea, I can have my own version of lower_offset that can also be controlled dynamically so I can disable it when not needed too and replicate the exact same kind of broken that normally passes CTS...
17:19 alyssa: I highly suggest, not doing that
17:20 alyssa: VK_EXT_gl_clamp for everybody /o\
17:20 pac85: lol
17:20 alyssa: i've considered it
17:20 pac85: I didn't miss dealing with impossible to correctly emulate gl stuff
17:21 pac85: (or maybe I did since sometimes it is kinda fun but not this time lol)
17:52 HdkR: While I'm looking at host_image_copy on Turnip, are there any other platforms that I should look at which has a smaller tile format that needs swizzling? PVR, Mali, Asahi, Broadcom...Vivante?
17:56 HdkR: I'm guessing the traditional x86 focused drivers are likely to already have CPU optimized paths, so they aren't too spicy.
17:58 mareko: optimized? not at all
17:58 HdkR: Well, Radeon at least hasn't caused Silksong heartburn like Turnip did a few days ago :D
17:59 cwabbott: radeon probably doesn't return optimal for cpp=1 like we do
18:00 cwabbott: also radeon has the async compute and copy queues which apps generally use instead of HIC
18:00 cwabbott: and it's not even exposed by default due to being too slow
18:01 alyssa: the asahi routines are exceedingly poorly optimized
18:01 HdkR: Ah right, I didn't try to enable HIC on radeon
18:02 alyssa: because the GL driver blits with the GPU most of the time and only really uses the s/w routines for compressed texture upload (and is fast enough to not be a bottleneck there)
18:02 alyssa: and by the time I got around to HIC in HK I ran out of f's to give (:
18:02 alyssa: would be straightforwardish to NEON accelerate things etc but, effort.
18:03 HdkR: Alright, I'll throw Asahi on the list to check
18:03 alyssa: check what?
18:04 HdkR: atomic loadstores to uncached memory causing things that use HIC to fall off a cliff in performance.
18:04 alyssa: oh, jeez, uh yeah that might hurt on both asahi & panfrost
18:05 jannau: is the stray VK_EXT_host_image_copy comment in hk_get_device_properties() https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/asahi/vulkan/hk_physical_device.c#L906 there on purpose?
18:05 HdkR: It caused Turnip to hit a code path that was 500x slower when TSO-emulation was enabled, making SilkSong's intro video run at something like 0.5 FPS
18:07 HdkR: I'll slap Panvk on the list as well
18:08 alyssa: jannau: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/asahi/vulkan/hk_physical_device.c#L1048 should be in there as designated init, I copypasted from turnip too hard I guess
18:08 alyssa: not that it matters
18:08 jannau: access to uncached memory itself is already very slow as https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37655 showed
18:09 HdkR: Yea, multiply that last number by 500x :P
18:10 HdkR: Although Asahi platforms have hardware TSO, so as long as the user has the kernel patch to enable that, it would be fine.
18:11 HdkR: Regular slow versus unusably slow.
18:14 jannau: I haven't noticed anything while starting Silksong on asahi in TSO mode but I think I didn't pay attention to the intro video
18:14 HdkR: Yea, if it is as bad as turnip then it becomes slow once the FMV starts
18:19 HdkR: But I guess it would be fine, because hardware TSO converts atomic stores in to str :D
21:54 alyssa: is there any prior art for a fossildb stat for "how long this pipeline took to compile?" and then fossil-report showing which apps are sped up/slowed?
21:55 alyssa: the .csv has PSO wall duration already, so just need python for it?
21:58 jannau: Mary: have you recently run vulkan cts on asahi? I get "NIR validation failed after nir_lower_vars_to_ssa in ../src/asahi/compiler/agx_compile.c:3702" for transform_feedback and clipping.user_defined tests with 67a6fc01607f
22:22 Lynne: are storage image loads expected to be faster than buffer loads nowadays on modern hardware?
22:23 HdkR: Have you looked at sebbbi's `perftest` to see the difference?
22:38 Mary: jannau: haven't test since at least the start of last week...
22:45 Lynne: HdkR: d3d11? I haven't touched windows in over 15 years.
22:49 HdkR: Lynne: It has different names sure, but it's the same thing under the hood.
22:51 Company: (apart from descriptor handling)
23:36 jannau: Mary: your commit fb4010e64123 ("asahi: Update CI expectations") looks better (tests are still running). so possibly a regression. I'll bisect