IRC Logs of #dri-devel on irc.freenode.net for 2023-08-11

03:35 gfxstrand: anholt, daniels: What's up with these Intel trace jobs? It's consistently taking 15-20 minutes between lava printing "waiting for job XXXXX to start" and "job XXXXX started". For the GLK jobs, I'm seeing total job times of upwards of 40-45 minutes for a single job. It's destroying CI throughput.
03:37 gfxstrand: (Not always. Sometimes they complete faster than that. But I saw one take 43m)
03:38 anholt: gfxstrand: unfortunately, lava starts the job on gitlab before a machine is ready. so your job is just stuck behind other jobs in the lava farm. you'd be queued up either way, but lava queuing inside the gitlab job means that it can time out and fail.
03:40 anholt: so, maybe they're oversubscribed and someone needs to crank down how much testing we do on them
03:40 gfxstrand: Okay, that makes sense. Feels a bit weird but okay.
03:40 anholt: or maybe something else has gone wrong. file an issue for it for someone to look into it (not me, I'm out on medical)
03:41 gfxstrand: kk
05:38 daniels: the reason they optimistically start in advance is to reduce latency
05:39 daniels: there are two reasons that happens anyway. one is that a bunch of machines took a dive and we haven’t fixed them yet because it’s still super early in Cambridge. another is that someone is hammering on CI to test their personal branches and eating all the capacity
06:39 daniels: gfxstrand: also, it makes life vastly easier if you attach links to jobs
07:07 daniels: emersion: btw, can I help with the modifier doc at all?
07:09 emersion: I'll have more time in the next few days
07:09 emersion: I completely forgot what the comments about it were
07:17 daniels: emersion: no prob!
07:37 linkmauve: Hi, when I rmmod amdgpu I get this stack trace in dmesg, it isn’t an issue for me bug maybe you’d be interested in it? https://linkmauve.fr/files/journald.log
07:37 linkmauve: This is only the beginning of the file, at the end my whole CPU crashes and I have yet to figure out why (even though I can reproduce using either amdgpu or i915).
09:19 cwabbott: gfxstrand: for NV12, we do the swizzle by composing it with the user swizzle in the texture descriptor (because we can't represent it with the swap, and border colors aren't a thing for YUV formats)
09:35 cwabbott: gfxstrand: afaict we don't use the info in that table at all, because qcom has native formats for all of the ycbcr formats and we don't use any of the lowering code apart from nir_convert_ycbcr_to_rgb()
09:36 cwabbott: the HW format does return things in a different order, but we handle that ourselves with the texture swizzle
09:36 cwabbott: so, qcom is already correct and there's nothing you have to do there
09:36 cwabbott: *turnip is already correct
09:51 MTCoster: From `docs/vulkan/base-objs.rst`:
09:51 MTCoster: > We also provide an implementation of
09:51 MTCoster: vkEnumerateInstanceExtensionProperties() which can be used similarly
09:51 MTCoster: Is there a reason this isn't exposed as vk_common_*?
09:53 MTCoster: Ignore me, I'm being dumb
10:27 kj: Is there a mechanism to mask/disable specific Vulkan device extensions at runtime? So even if the driver supports the extensions, it doesn't get advertised
10:27 kj: Maybe an env variable or a Vulkan layer
10:29 kj: I don't suppose MESA_EXTENSION_OVERRIDE works with Vulkan. Or does it?
10:30 pixelcluster: kj: I guess you could use https://vulkan.lunarg.com/doc/view/1.3.204.1/linux/profiles_layer.html with an appropriate json config
10:35 pixelcluster: it's a bit tedious/only for temporary debugging, the process would be using vulkaninfo --json to export a profile .json, editing that json manually to remove the extension(s) and then passing it to the profiles layer
10:38 kj: Thanks. Will give it a go. Looks like it might be what I'm looking for, the "Exclude Device Extensions" setting
10:49 dolphin: airlied, sima: Final drm-intel-gt-next PR sent, just 4 patches and a backmerge of drm-next appeared since previous week.
15:54 gfxstrand: cwabbott: Okay, that explains why turnip is unaffected.
15:54 gfxstrand: cwabbott: I thought I remembered you having more magic hardware than most
15:54 cwabbott: yup, it's p magic
15:55 cwabbott: the only thing not done in hw is the colorspace transform
16:48 gfxstrand: cwabbott: What do you mean by that? What exactly does the hardware magically do?
16:48 gfxstrand: cwabbott: Is it just that the hardware magically handles multi-plane in a single descriptor and single texture fetch?
16:48 cwabbott: yup
16:48 gfxstrand: Okay, that's what Mali has, too.
16:48 gfxstrand: So we may yet want to generalize some
16:49 cwabbott: I think it's mostly already generalized?
16:49 gfxstrand: cwabbott: How does it handle chroma offsets?
16:49 cwabbott: that's in the descriptor too iirc
16:49 gfxstrand: Okay
16:49 gfxstrand: Makes sense
16:49 gfxstrand: It would have to be, I guess. That or in the sampler.
16:50 alyssa: AGX has that and also magic formats that do the CSC too
16:50 alyssa: I don't know if we want to use the latter
16:50 alyssa: (in any API)
16:50 cwabbott: yeah, there's CHROMA_MIDPOINT_X and CHROMA_MIDPOINT_Y fields
16:50 cwabbott: I assume that's what you mean
16:51 daniels: from an app point of view, having hardware-defined magic inscrutable CSC is no worse than having magic nir_lower_ycbcr-defined magic inscrutable CSC
16:51 daniels: if they care about the details, they open-code it
16:55 alyssa: daniels: yeah just a question of which is less work for the drivers :~)
16:56 gfxstrand: alyssa: Eventually, it should just be a matter of a flag or two we hand off to the NIR pass
16:57 alyssa: sure
16:57 alyssa: the spicy part is that Metal includes non-CSC multiplane formats, but does /not/ include any CSC formats
16:57 alyssa: public documented Metal, I mean
16:57 alyssa: the CSC formats are private Apple APIs on macOS for... some reason
16:58 alyssa: which makes r/e considerably more annoying
17:00 gfxstrand: I really doubt the CSC really costs much.
17:01 alyssa: same
17:01 gfxstrand: So I'd be inclined to treat it like qcom/mali and just use the multi-plane and lower the CSC in NIR.
17:01 gfxstrand: And probably IMG, too
17:01 alyssa: yeah, that's where i'm at
17:01 alyssa: if you're trying to save power, cut the gpu out of the loop, stop worrying about a few FMAs.
17:20 simon-perretta-img: Is there a way to use nir(_opt)_algebraic to build up a vec in the replace expression, with the order of the elements being specified by (const) arguments in the search expression?
17:20 simon-perretta-img: So for example if we have: testop(value.xy) and testop_split(base, value, elem), and want to fold ('testop_split', ('testop_split', 0, 'val0', 0), 'val1', 1) into ('testop', (vec2', 'val0', 'val1'))
17:20 simon-perretta-img: If that sort of op only takes a vec2 then sure, it'd be simple/cheap enough to emit an additional match expression for when the inner testop_split has "val1, 1" and the outer one "val0, 0", but for larger vecs that seems like it might be excessive
17:21 simon-perretta-img: Hence, is there currently a way to match ('testop_split', ('testop_split', 0, 'val0', 'idx0'), 'val1', 'idx1') and replace it with something like ('testop', (vec2', 'idx0': 'val0', 'idx1': 'val1')) ?
17:25 alyssa: simon-perretta-img: what problem are you trying to solve?
17:27 alyssa: https://xyproblem.info/
17:31 simon-perretta-img: I've got hardware instructions for e.g. pack_unorm_4x8(value.rgba), but also pack_unorm_1x8_split(base, value, elem) - so with base = B and unorm(val) = V, the output with elem = 1 will be 0xBBBBVVBB
17:32 simon-perretta-img: I can translate both variants into backend instructions, but wanted to see if I could add an algebraic case to fold the 4x 1x8 cases into a single 4x8 case
17:33 robclark: daniels: app can't really open code it once modifiers enter the picture.. nv12+ubwc doesn't work with the "lets pretend it is R8+R8G8" thing
17:35 alyssa: simon-perretta-img: What's generating pack_unorm_1x8_split?
17:36 simon-perretta-img: alyssa: Scalarised fragment shader store_outputs
17:38 simon-perretta-img: I've just been keeping them as vector store_outputs for now in order to use pack_unorm_4x8, but a backend/hardware requirement is that its input needs to be a contiguous block of 4 regs, so I'm exploring/experimenting using the 1x8 variant
17:41 simon-perretta-img: So currently for folding them I've written a C pass that walks back the base of each 1x8 chain, etc. but was also hoping there might be a way to do the same with nir_algebraic
17:41 alyssa: Don't scalarize store_output then
17:41 alyssa: Problem solved
17:44 alyssa: "its input needs to be a contiguous block of 4 regs" this really isn't a big deal
17:44 simon-perretta-img: True, but if I've got a gl_FragColor = vec4(0.0, 0.0, 0.0, 1.0) or similar, that means reserving 4 contiguous temps to use for that vec rather than being able to use 0 temps with the split variants
17:44 alyssa: uh?
17:45 alyssa: pack_unorm_4x8(vec4(0.0, 0.0, 0.0, 1.0)) = 0xff000000
17:45 simon-perretta-img: ...it would be constant folded
17:45 alyssa: it's an ALU op, it constant folds
17:45 alyssa: yes
17:45 alyssa: :)
17:45 simon-perretta-img: Missing the forest for the trees :D
17:45 alyssa: Yes
17:45 alyssa: simon-perretta-img: also, word of advice: you have infinite ALU
17:45 alyssa: if you're spending time to "improve performance by reducing ALU", chances are you're wasting your time
17:46 alyssa: some of us like to do that to relax and/or procrastinate on real work
17:46 alyssa: but in general ... I don't think I can remember a single ALU saving optimization I've ever done that's actually moved the FPS needle
17:46 alyssa: sure, it saves power, but .. meh
17:47 alyssa: These are problems to worry about when you have a conformant VK1.3 driver that's running DX12 games at full perf and your biggest problem is some extra moves
17:48 alyssa: I once added some "important" opt_algebraic rules that reduced the cycle count of a massive shader in glmark2 by 22% on mali-g57
17:48 alyssa: do you know what happened to fps for that glmark2 scene?
17:48 alyssa: nothing. nothing happened to it. zero change on mali-g57.
17:49 simon-perretta-img: For sure, this was more of a "pack_unorm_4x8 and pack_unorm_1x8 both already work, but it's Friday afternoon and I'm curious if I can write something short to fold the latter" kinda thought haha
17:49 alyssa: because, despite that shader having 1000 instructions of ALU ... even a dinky little Mali can do lots of ALU, that scene was totally bound by memory bandwidth and driver overhead due to constantly glGenerateMipmap'ing
17:50 alyssa: simon-perretta-img: Also, it's a lot easier to scalarize than to vectorize, as you're now discovering
17:50 alyssa: so would make more sense to keep store_output vectorized and always do pack_unorm_4x8 and just have targeted heuristics to break it down to 1x8 when that's actually better
17:50 alyssa: although I suspect that's "almost never"
17:52 simon-perretta-img: Yeah, makes sense
17:55 simon-perretta-img: Although, part of the reason for investigating the scalar route for fs outputs (and when unpacking for vs inputs) is for funky formats that don't have HW instructions to pack/unpack them in their entirety
17:56 simon-perretta-img: Which sure, might be more of a "deal with it when it comes to it" issue when nir_format_convert.h isn't enough
17:56 simon-perretta-img: But anyway, that's a bit of a different topic haha
17:57 simon-perretta-img: Thanks for the pointers!
17:57 alyssa:hugs her formatted store
17:59 daniels: gfxstrand: the 3 FMAs certainly vanish into line noise compared to dual tex load
18:01 daniels: robclark: mmm, yeah … I wonder if we want an EGLImage ext for ‘raw values only pls’, so it can still do the nice sampling (the co-issue helps on Mali too), but let the user be in control of CSC
18:01 alyssa: yuv considered a mistake
18:02 alyssa: *harmfuk
18:02 alyssa: **harmful
18:03 robclark: yeah, an ext that gave you the unconverted yuv could work
18:04 robclark: alyssa: 1.5 bytes per pixel is a lot less than 4 bytes per pixel
18:04 robclark: yuv is probably one of the more sane things about video :-P
18:04 robclark: (or less insane?)
18:06 alyssa: robclark: video considered harmful
18:06 alyssa: read a book?
18:06 alyssa: :p
18:06 robclark: heh, _that_ I would agree with :-P
18:11 alyssa: got it in 2!
19:13 alyssa: when did my baby driver get so big? O_O
19:15 HdkR: alyssa: It grew up :D
19:19 zmike: when you played the right music ?
19:24 alyssa: HdkR: o_O
19:26 HdkR: The grow up and glow up?