08:01pq: melissawen, zamundaaa[m], re: missing kernel logs; what about the thing that the DRM flight-recorder idea became? The problem with that is integration, because people explicitly did not want a compositor to have access to that log.
09:11dwt: Which thing is that? That sounds handy
09:12pq: I'm not sure what it's called nowadays, would need to dig. I think it manifests through the ftrace framework? If it landed yet.
09:24emersion: yeah, if it's privileged then it's a lot less useful...
09:24emersion: a hack with dmesg is enough in that case
10:28Lynne: zzoon: in case you're back, could you take a look at the hevc samples? also, https://files.lynne.ee/av1_intel_broken2.ivf and https://files.lynne.ee/av1_intel_broken.ivf
10:29Lynne: broken2 causes a device lost on intel
10:29Lynne: dj-death: also ping about the descriptor buffer issue
10:33glehmann: has anyone ever written a reassociation NIR pass? something that can eliminate duplicate additions like this: a + (b + c), a + (b + d) -> (a + b) + c, (a + b) + d
10:34glehmann: alyssa: ^ iirc you were working on something in that area for preambles
10:38dj-death: Lynne: still no time to look at it atm
10:38Lynne: sure, whenever you find time
10:39dj-death: I hope this week
11:33zzoon: Lynne: ok.. will look into them. is that av1 or hevc?
11:34Lynne: both samples I linked are av1
11:34zzoon: ok
11:34Lynne: the hevc sample is https://files.lynne.ee/testsamples/hevc_scaling_list4.mkv if you need a link again
11:35zzoon: ah right
12:37pq: in GLSL 1.00 ES, if I have a uniform struct variable; if one field of the struct is active, does it imply that all fields of the struct are active even if not used?
12:44bbrezillon: tzimmermann: looks like 266ab86ac1f5 ("drm/panthor: Test for imported buffers with drm_gem_is_imported()") is regressing panthor
12:44bbrezillon: not sure why yet
12:45tzimmermann: bbrezillon, see the discussion on dri-devel
12:45tzimmermann: the test in drm_gem_is_imported os slightly incorrect
12:45bbrezillon: ah, so that's a known issue
12:46tzimmermann: see "drm/gem: Internally test import_attach for imported objects". additional feedback is welcome
12:52bbrezillon: I'll have a look. Thanks for the pointer
12:59alyssa: glehmann: I started trying, it's challenging though
12:59alyssa: glehmann: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21578#note_2859237
13:06sima: tzimmermann, I guess I'll reply there too since I'm partially responsible for that mess :-/
13:09glehmann: alyssa: okay, that looks fundamentally different from what I want I think. basically, I want to reassociate the adds in shaders like this to get vectorized stores in the end: https://gitlab.freedesktop.org/freedesktop/snippets/-/snippets/7836
13:09zmike: eric_engestrom: I am eagerly awaiting the branchpoint tomorrow
13:10alyssa: glehmann: yeah, reassoc is just a big rats nest of heuristics that aren't well documented anywhere
13:10alyssa: gcc's pass basically just says "we copied llvm"
13:10alyssa: ..
13:10alyssa: `f2e4m3fn` wtf?
13:11glehmann: float 2 8bit float format, I'm not married to the name
13:12glehmann: the issue is that there are at least 4 competing 8bit float formats, so I wanted to be precise
13:13alyssa: i still have no idea what that means
13:14alyssa: oh. exponent-4 mantissa-3. ok.
13:14glehmann: yes and fn for finite
13:15alyssa: agx supports immediate float srcs with exp-3 mant-4
13:15pendingchaos: I don't think the load/store vectorizer requires addition reassociation
13:15alyssa: competint indeed
13:16pendingchaos: it looks through all the additions and obtains a constant offset and a sorted list of ssa defs and their multipliers
13:17glehmann: pendingchaos: do you have an idea why it doesn't work here then?
13:17pendingchaos: no
13:17pendingchaos: I don't expect all 3 to be vectorized (hw has no 3-byte stores), but I think the first two are supposed to be
13:20glehmann: these were 8 8bit stores, and for some reason we and up with one 32bit, 2 8bit and one 16bit store
13:20glehmann: instead of a single store
13:27alyssa: glehmann: possibly we're getting stuck in a local optimum?
13:27alyssa: wait no that shouldn't apply
13:27alyssa: the middle 2 stores should indeed be vectorizing :/
13:28alyssa: oh. i do see the reassocation issue now. ooooof.
13:28alyssa: spicy.
13:33alyssa: but indeed it looks like we should handle this ..
14:11dj-death: Lynne: what do I need to compile rc_vk_test?
14:12dj-death: Lynne: looks like there is a ffmpeg dependency
14:12dj-death: Lynne: but meson is not requiring it
14:20dj-death: Lynne: got it to build with master of ffmpeg
14:20dj-death: Lynne: but now it's crashing in vulkan.c
14:20dj-death: Lynne: 2914 s->extensions = ff_vk_extensions_to_mask(s->hwctx->enabled_dev_extensions,
14:20dj-death: s->hwctx = 0xb
14:26eric_engestrom: zmike: haha, any particular reason?
14:26zmike: so much code delete
14:27eric_engestrom: right, indeed
14:27eric_engestrom: 🪓
14:40eric_engestrom: zmike: you can already post the MRs now if you want, so that they get reviewed and are ready to merge the second 25.1 is branched :P
14:40zmike: clover deletion has been up for literal years already
14:40daniels: usually people are asking for more rather than less time before the branchpoint
14:41zmike: we live in troubled times
14:44eric_engestrom: iirc the clover mr has a couple of missing changes to be good to go (deleting pipe-loader is the one I remember off the top of my head)
14:45zmike: I thought that was a followup
14:50daniels: yeah
15:00eric_engestrom: I don't like leaving dead code around, but sure
15:00zmike: karolherbst is planning to delete a ton of other gallium stuff too
15:00zmike: so it won't be left for very long
15:01eric_engestrom: ack
15:11llyyr: delete more code
15:21alyssa: eric_engestrom: clover is the dead code ;)
15:31jenatali: glehmann: If it helps, D3D is adopting that format as F8_E4M3 (per https://github.com/microsoft/hlsl-specs/blob/main/proposals/0029-cooperative-vector.md)
15:41glehmann: jenatali: what will d3d require for larger f32 -> e4m3 conversions? max finite value or NaN?
15:41jenatali: > float to float conversion is implementation dependent and preserves the value as accurately as possible
15:42jenatali: > /// XXX TODO: Error handling for illegal conversions.
15:42jenatali: :)
15:45glehmann: also E4M3 and E5M2 are really not precise enough, there are two formats in the wild for each of them
15:46jenatali: Oh? News to me but I'm not really in this space (besides helping with the WARP impl of that spec)
15:46jenatali: Got a link or something I can look at?
15:46glehmann: one E4M3 format has 0x80 as NaN, the other has 0xff/0x7f. The first format also uses a different bias for the exponent
15:47glehmann: https://asawicki.info/articles/fp8_tables.php has all of the ones I know
15:47jenatali: Thanks!
15:48glehmann: rdna4 supports all of them, but the kernel driver decides which ones and in practice it's FLOAT8E4M3FN and FLOAT8E5M2. CDNA3 only has FLOAT8E4M3FNUZ and FLOAT8E5M2FNUZ
15:50dj-death: people know that nir_lower_io() is generating incorrect NIR ?
15:50dj-death: like it's adding load_output in vertex shaders, but then the divergence analysis thinks it's not allowed
15:53pendingchaos: dj-death: divergence analysis can fail for valid nir if it's not supported by the pass
15:53pendingchaos: VS output loads can be removed by using nir_lower_io_to_temporaries(, , true, false) sometime before nir_lower_io
16:00dj-death: pendingchaos: I guess my problem appears with I run with NIR_DEBUG
16:00dj-death: pendingchaos: but thanks, I'll try to call that pass after lower_io to make sure it cleans things up
16:01eric_engestrom: alyssa: I guess it's run-time dead code vs compile-time dead code (:
16:01pendingchaos: NIR_DEBUG=extended_validation? I think that option broke a bit after divergence was made metadata
16:02pendingchaos: you can remove nir_metadata_divergence from the nir_metadata_require() in nir_metadata_require_all()
16:05glehmann: jenatali: will d3d not support coop matrix, and instead only coop vector? I know at some point coop matrix was planned for sm6.8 but cancelled/delayed
16:06jenatali: Yeah matrix was planned for 6.8. Folks are trying to get it back on the roadmap, unclear if that'll be 6.9, 6.10, or 7.0 at this point though
16:07glehmann: speaking of 7.0, do you know yet if it's structured or unstructured spirv?
16:07jenatali: I sincerely hope it's structured
16:08jenatali: I've asked that question myself and I don't think I've gotten a definitive answer but I think people are trending towards structured
16:09glehmann: I hope so too :D
16:09jenatali: glehmann: FYI: https://github.com/microsoft/hlsl-specs/issues/490
16:09jenatali: Thanks for pointing that out :)
16:47dj-death: pendingchaos: not helping unfortunately
16:47jenatali: dj-death: He said before lower_io, not after
16:48dj-death: jenatali: yeah I tried that after rereading ;)
16:48dj-death: jenatali: now I'm left with copy_deref
16:48jenatali: nir_lower_copies?
16:48dj-death: which lower_locals_to_regs_block complains about
16:48jenatali: Er, nir_lower_var_copies?
16:50dj-death: nope
16:50dj-death: because that adds back the load_output
16:50dj-death: which again the divergence pass will complain about
16:54jenatali: It shouldn't be copying from the output, only to it
16:55dj-death: guessing I need to call it wayyyy before
16:57dj-death: yeah that works
16:57dj-death: really early
17:30FireBurn: Hey is there something up with the ssh side of gitlab?
17:30pendingchaos: use ssh.gitlab.freedesktop.org
17:31pendingchaos: or update ~/.ssh/config: https://gitlab.freedesktop.org/freedesktop/freedesktop/-/issues/2076
17:32FireBurn: Ta
18:38Lynne: dj-death: does that happen with git master of the test program?
18:38Lynne: also, which ffmpeg version do you have
18:38Lynne: there's definitely a dependency on ffmpeg in the meson code, so I'm confused
18:42dj-death: Lynne: I took master of ffmpeg
18:42dj-death: Lynne: and I think master of the test program too
18:43Lynne: I did do a few updates to it since I linked it last week
18:43dj-death: Lynne: apparently the ffmpeg from the system wasn't enough
18:43dj-death: I'll try again tomorrow
21:53jenatali: Is there a nir pass that can prune a loop that has no side effects?
21:55pendingchaos: nir_opt_dead_cf
21:56jenatali: That's not pruning it for me
21:56jenatali: It's a complicated loop that culminates in conditionally breaking, but at no point does the loop ever have any observable side effects besides spending time
21:56jenatali: No storage buffer/image writes or any other kind of data leaving the loop, so it's effectively dead
21:58anholt_: what's keeping dce from cleaning up the ops in the loop?
22:00jenatali: At the "end" of a long chain of instructions is a comparison which branches to a conditional break
22:00pendingchaos: might be used as part of the break condition, I think nir_opt_dead_cf should handle that though
22:02pendingchaos: it looks like non-reorderable ssbo/shared/global/output loads prevent the loop from being dce'd?
22:02anholt_: oh, right.
22:04pendingchaos: from: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9938
22:19jenatali: Ah... I see
22:21jenatali: This is my horrible hacky TCS lowering where I have to split a TCS into two functions, one that handles all patch outputs and a different one that handles control points. I've got a TCS with no control point outputs so I would expect to end up with an empty function, but instead it's got a massive loop in it
22:35jenatali: Yeah ok forcing load_output to undef apparently fixes all my problems. Still a horrible hacky solution but I don't have many other options