IRC Logs of #dri-devel on irc.freenode.net for 2025-04-15

08:01 pq: melissawen, zamundaaa[m], re: missing kernel logs; what about the thing that the DRM flight-recorder idea became? The problem with that is integration, because people explicitly did not want a compositor to have access to that log.
09:11 dwt: Which thing is that? That sounds handy
09:12 pq: I'm not sure what it's called nowadays, would need to dig. I think it manifests through the ftrace framework? If it landed yet.
09:24 emersion: yeah, if it's privileged then it's a lot less useful...
09:24 emersion: a hack with dmesg is enough in that case
10:28 Lynne: zzoon: in case you're back, could you take a look at the hevc samples? also, https://files.lynne.ee/av1_intel_broken2.ivf and https://files.lynne.ee/av1_intel_broken.ivf
10:29 Lynne: broken2 causes a device lost on intel
10:29 Lynne: dj-death: also ping about the descriptor buffer issue
10:33 glehmann: has anyone ever written a reassociation NIR pass? something that can eliminate duplicate additions like this: a + (b + c), a + (b + d) -> (a + b) + c, (a + b) + d
10:34 glehmann: alyssa: ^ iirc you were working on something in that area for preambles
10:38 dj-death: Lynne: still no time to look at it atm
10:38 Lynne: sure, whenever you find time
10:39 dj-death: I hope this week
11:33 zzoon: Lynne: ok.. will look into them. is that av1 or hevc?
11:34 Lynne: both samples I linked are av1
11:34 zzoon: ok
11:34 Lynne: the hevc sample is https://files.lynne.ee/testsamples/hevc_scaling_list4.mkv if you need a link again
11:35 zzoon: ah right
12:37 pq: in GLSL 1.00 ES, if I have a uniform struct variable; if one field of the struct is active, does it imply that all fields of the struct are active even if not used?
12:44 bbrezillon: tzimmermann: looks like 266ab86ac1f5 ("drm/panthor: Test for imported buffers with drm_gem_is_imported()") is regressing panthor
12:44 bbrezillon: not sure why yet
12:45 tzimmermann: bbrezillon, see the discussion on dri-devel
12:45 tzimmermann: the test in drm_gem_is_imported os slightly incorrect
12:45 bbrezillon: ah, so that's a known issue
12:46 tzimmermann: see "drm/gem: Internally test import_attach for imported objects". additional feedback is welcome
12:52 bbrezillon: I'll have a look. Thanks for the pointer
12:59 alyssa: glehmann: I started trying, it's challenging though
12:59 alyssa: glehmann: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21578#note_2859237
13:06 sima: tzimmermann, I guess I'll reply there too since I'm partially responsible for that mess :-/
13:09 glehmann: alyssa: okay, that looks fundamentally different from what I want I think. basically, I want to reassociate the adds in shaders like this to get vectorized stores in the end: https://gitlab.freedesktop.org/freedesktop/snippets/-/snippets/7836
13:09 zmike: eric_engestrom: I am eagerly awaiting the branchpoint tomorrow
13:10 alyssa: glehmann: yeah, reassoc is just a big rats nest of heuristics that aren't well documented anywhere
13:10 alyssa: gcc's pass basically just says "we copied llvm"
13:10 alyssa: ..
13:10 alyssa: `f2e4m3fn` wtf?
13:11 glehmann: float 2 8bit float format, I'm not married to the name
13:12 glehmann: the issue is that there are at least 4 competing 8bit float formats, so I wanted to be precise
13:13 alyssa: i still have no idea what that means
13:14 alyssa: oh. exponent-4 mantissa-3. ok.
13:14 glehmann: yes and fn for finite
13:15 alyssa: agx supports immediate float srcs with exp-3 mant-4
13:15 pendingchaos: I don't think the load/store vectorizer requires addition reassociation
13:15 alyssa: competint indeed
13:16 pendingchaos: it looks through all the additions and obtains a constant offset and a sorted list of ssa defs and their multipliers
13:17 glehmann: pendingchaos: do you have an idea why it doesn't work here then?
13:17 pendingchaos: no
13:17 pendingchaos: I don't expect all 3 to be vectorized (hw has no 3-byte stores), but I think the first two are supposed to be
13:20 glehmann: these were 8 8bit stores, and for some reason we and up with one 32bit, 2 8bit and one 16bit store
13:20 glehmann: instead of a single store
13:27 alyssa: glehmann: possibly we're getting stuck in a local optimum?
13:27 alyssa: wait no that shouldn't apply
13:27 alyssa: the middle 2 stores should indeed be vectorizing :/
13:28 alyssa: oh. i do see the reassocation issue now. ooooof.
13:28 alyssa: spicy.
13:33 alyssa: but indeed it looks like we should handle this ..
14:11 dj-death: Lynne: what do I need to compile rc_vk_test?
14:12 dj-death: Lynne: looks like there is a ffmpeg dependency
14:12 dj-death: Lynne: but meson is not requiring it
14:20 dj-death: Lynne: got it to build with master of ffmpeg
14:20 dj-death: Lynne: but now it's crashing in vulkan.c
14:20 dj-death: Lynne: 2914 s->extensions = ff_vk_extensions_to_mask(s->hwctx->enabled_dev_extensions,
14:20 dj-death: s->hwctx = 0xb
14:26 eric_engestrom: zmike: haha, any particular reason?
14:26 zmike: so much code delete
14:27 eric_engestrom: right, indeed
14:27 eric_engestrom: 🪓
14:40 eric_engestrom: zmike: you can already post the MRs now if you want, so that they get reviewed and are ready to merge the second 25.1 is branched :P
14:40 zmike: clover deletion has been up for literal years already
14:40 daniels: usually people are asking for more rather than less time before the branchpoint
14:41 zmike: we live in troubled times
14:44 eric_engestrom: iirc the clover mr has a couple of missing changes to be good to go (deleting pipe-loader is the one I remember off the top of my head)
14:45 zmike: I thought that was a followup
14:50 daniels: yeah
15:00 eric_engestrom: I don't like leaving dead code around, but sure
15:00 zmike: karolherbst is planning to delete a ton of other gallium stuff too
15:00 zmike: so it won't be left for very long
15:01 eric_engestrom: ack
15:11 llyyr: delete more code
15:21 alyssa: eric_engestrom: clover is the dead code ;)
15:31 jenatali: glehmann: If it helps, D3D is adopting that format as F8_E4M3 (per https://github.com/microsoft/hlsl-specs/blob/main/proposals/0029-cooperative-vector.md)
15:41 glehmann: jenatali: what will d3d require for larger f32 -> e4m3 conversions? max finite value or NaN?
15:41 jenatali: > float to float conversion is implementation dependent and preserves the value as accurately as possible
15:42 jenatali: > /// XXX TODO: Error handling for illegal conversions.
15:42 jenatali: :)
15:45 glehmann: also E4M3 and E5M2 are really not precise enough, there are two formats in the wild for each of them
15:46 jenatali: Oh? News to me but I'm not really in this space (besides helping with the WARP impl of that spec)
15:46 jenatali: Got a link or something I can look at?
15:46 glehmann: one E4M3 format has 0x80 as NaN, the other has 0xff/0x7f. The first format also uses a different bias for the exponent
15:47 glehmann: https://asawicki.info/articles/fp8_tables.php has all of the ones I know
15:47 jenatali: Thanks!
15:48 glehmann: rdna4 supports all of them, but the kernel driver decides which ones and in practice it's FLOAT8E4M3FN and FLOAT8E5M2. CDNA3 only has FLOAT8E4M3FNUZ and FLOAT8E5M2FNUZ
15:50 dj-death: people know that nir_lower_io() is generating incorrect NIR ?
15:50 dj-death: like it's adding load_output in vertex shaders, but then the divergence analysis thinks it's not allowed
15:53 pendingchaos: dj-death: divergence analysis can fail for valid nir if it's not supported by the pass
15:53 pendingchaos: VS output loads can be removed by using nir_lower_io_to_temporaries(, , true, false) sometime before nir_lower_io
16:00 dj-death: pendingchaos: I guess my problem appears with I run with NIR_DEBUG
16:00 dj-death: pendingchaos: but thanks, I'll try to call that pass after lower_io to make sure it cleans things up
16:01 eric_engestrom: alyssa: I guess it's run-time dead code vs compile-time dead code (:
16:01 pendingchaos: NIR_DEBUG=extended_validation? I think that option broke a bit after divergence was made metadata
16:02 pendingchaos: you can remove nir_metadata_divergence from the nir_metadata_require() in nir_metadata_require_all()
16:05 glehmann: jenatali: will d3d not support coop matrix, and instead only coop vector? I know at some point coop matrix was planned for sm6.8 but cancelled/delayed
16:06 jenatali: Yeah matrix was planned for 6.8. Folks are trying to get it back on the roadmap, unclear if that'll be 6.9, 6.10, or 7.0 at this point though
16:07 glehmann: speaking of 7.0, do you know yet if it's structured or unstructured spirv?
16:07 jenatali: I sincerely hope it's structured
16:08 jenatali: I've asked that question myself and I don't think I've gotten a definitive answer but I think people are trending towards structured
16:09 glehmann: I hope so too :D
16:09 jenatali: glehmann: FYI: https://github.com/microsoft/hlsl-specs/issues/490
16:09 jenatali: Thanks for pointing that out :)
16:47 dj-death: pendingchaos: not helping unfortunately
16:47 jenatali: dj-death: He said before lower_io, not after
16:48 dj-death: jenatali: yeah I tried that after rereading ;)
16:48 dj-death: jenatali: now I'm left with copy_deref
16:48 jenatali: nir_lower_copies?
16:48 dj-death: which lower_locals_to_regs_block complains about
16:48 jenatali: Er, nir_lower_var_copies?
16:50 dj-death: nope
16:50 dj-death: because that adds back the load_output
16:50 dj-death: which again the divergence pass will complain about
16:54 jenatali: It shouldn't be copying from the output, only to it
16:55 dj-death: guessing I need to call it wayyyy before
16:57 dj-death: yeah that works
16:57 dj-death: really early
17:30 FireBurn: Hey is there something up with the ssh side of gitlab?
17:30 pendingchaos: use ssh.gitlab.freedesktop.org
17:31 pendingchaos: or update ~/.ssh/config: https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/2076
17:32 FireBurn: Ta
18:38 Lynne: dj-death: does that happen with git master of the test program?
18:38 Lynne: also, which ffmpeg version do you have
18:38 Lynne: there's definitely a dependency on ffmpeg in the meson code, so I'm confused
18:42 dj-death: Lynne: I took master of ffmpeg
18:42 dj-death: Lynne: and I think master of the test program too
18:43 Lynne: I did do a few updates to it since I linked it last week
18:43 dj-death: Lynne: apparently the ffmpeg from the system wasn't enough
18:43 dj-death: I'll try again tomorrow
21:53 jenatali: Is there a nir pass that can prune a loop that has no side effects?
21:55 pendingchaos: nir_opt_dead_cf
21:56 jenatali: That's not pruning it for me
21:56 jenatali: It's a complicated loop that culminates in conditionally breaking, but at no point does the loop ever have any observable side effects besides spending time
21:56 jenatali: No storage buffer/image writes or any other kind of data leaving the loop, so it's effectively dead
21:58 anholt_: what's keeping dce from cleaning up the ops in the loop?
22:00 jenatali: At the "end" of a long chain of instructions is a comparison which branches to a conditional break
22:00 pendingchaos: might be used as part of the break condition, I think nir_opt_dead_cf should handle that though
22:02 pendingchaos: it looks like non-reorderable ssbo/shared/global/output loads prevent the loop from being dce'd?
22:02 anholt_: oh, right.
22:04 pendingchaos: from: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9938
22:19 jenatali: Ah... I see
22:21 jenatali: This is my horrible hacky TCS lowering where I have to split a TCS into two functions, one that handles all patch outputs and a different one that handles control points. I've got a TCS with no control point outputs so I would expect to end up with an empty function, but instead it's got a massive loop in it
22:35 jenatali: Yeah ok forcing load_output to undef apparently fixes all my problems. Still a horrible hacky solution but I don't have many other options