IRC Logs of #dri-devel on irc.freenode.net for 2025-07-16

03:40 olivial: just unassigning marge from an MR is sufficient to cancel, right?
03:44 olivial: ah, cancelling the CI pipeline worked
06:57 phasta: tursulin: drm_sched unit tests are now also being executed by RH's quality assurance on CKI. Good work. https://datawarehouse.cki-project.org/kcidb/tests/redhat:koji-134819423-ppc64le-kernel_upt_6
08:57 mwalle: hi, how is devm_drm_bridge_alloc() supposed to work if the bridge is part of an encoder struct which is in turn allocated (and initialzed) by drmm_simple_encoder_alloc()?
08:59 mwalle: lucaceresoli: see drivers/gpu/drm/tidss/tidss_encoder.c
09:38 sima: rodrigovivi, imre said that your rerere commit 5dd2d660323d78890f92809be3413a77f8e41f07 has apparently a wrong interim conflict resolution for "drm/dp: Change AUX DPCD probe address from LANE0_1_STATUS to TRAINING_PATTERN_SET" in -fixes vs -next, and imre's in 7f2bb7f564c4c is the right one
09:38 sima: can you pls try to sort this out with imre?
09:39 sima: airlied, ^^ also heads-up so we make sure we don't accidentally land this, or send a bogus example conflict resolution to linus in the main merge window pr
09:39 sima: imre, did you see a mail from sfr about the conflict in linux-next fly by on dri-devel?
09:40 sima: you should get cc'ed if you've authored/committed one of the involved commits
09:43 lucaceresoli: mwalle: this topic has been discussed between jani and mripard w.r.t. panels for devm_drm_panel_alloc(), but for bridges it's the same
09:43 lucaceresoli: mwalle: https://lore.kernel.org/all/20250606-pompous-mellow-guan-1d9ea4@houat/
09:45 lucaceresoli: mwalle: TL;DR: the bridge will have to be allocated dynamically (yes, that's a bit of annoyance for drivers which currently embed it, but not quite avoidable)
09:45 imre: sima, rodrigovivi, yes rodrigo asked me if that resolution was ok and I acked it, so my fault. The correct resolution is 'ret = drm_dp_dpcd_probe(aux, DP_TRAINING_PATTERN_SET);' in the result not 'ret = drm_dp_dpcd_probe(aux, DP_LANE0_1_STATUS);'. Sorry for that.
09:45 lucaceresoli: mwalle: and you can either have a wrapper struct that embeds the bridge, and devm_drm_bridge_alloc() that struct, if it makes sense
09:46 sima: imre, rodrigovivi ah ok, then revert of that drm-rerere commit and retrying with dim rebuild-tip should be enough
09:46 lucaceresoli: mwalle: or you can call the low-level function __devm_drm_bridge_alloc() as done in https://lore.kernel.org/all/13d15c1414e65ffb21944d66e2820befdab54e98.1749199013.git.jani.nikula@intel.com/
09:46 imre: sima, rodrigovivi, I suppose reverting 5dd2d660323d from drm-rerere and perhaps also doing a 'dim rebuild-tip' would fix this.
09:46 sima: yeah that should usually do the trick
09:46 imre: sima, ok
09:46 sima: it's even documented as the procedure
09:46 imre: I'll answer now to sft as well
09:47 imre: sfr
09:47 sima: oh, do you have the link for that one for here?
09:47 mwalle: lucaceresoli: thanks for the pointers, i'll have a look later. right now i'm getting a refcnt overflow warning with the latest next (as it is expected i'd guess if the bridge isn't initialzed)
09:48 imre: sima, didn't answer yet, but his email is https://lore.kernel.org/all/20250716141832.5542b414@canb.auug.org.au
09:51 imre: it's the correct resolution, so no need for me to answer
09:55 mwalle: lucaceresoli: I'd probably need a wrapper to get a reference the private struct of the driver (within the bridge_functs), right? Ie. struct tidss_encoder_bridge { struct drm_bridge bridge; struct tidss_encoder *encoder}. Then go from drm_bridge to tidss_encoder_brigde and use the pointer to get the original private struct
10:05 sima: imre, ah yeah that's just standard adjacent line changes stuff, standard fare for linus to sort out
10:05 sima: just need to get drm-tip fixed
10:06 lucaceresoli: mwalle: indeed the refcnt warning is expected with current -next, because kmalloc or any other "classic" allocation process won't initialize the refcnt, thus it will start from 0 hence the warning
10:09 lucaceresoli: mwalle: and yes, your code snippet looks like a good solution
10:10 glehmann: eric_engestrom: ugh, marge pushed the MR after she was unassigned: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115
10:10 eric_engestrom: yeah I just saw that, cancelled it
10:10 eric_engestrom: it's because it picked the MR, then got unassigned while it was rebasing it, and then pushed it
10:10 eric_engestrom: bad timing
12:35 alyssa: do we have a "reversed" version of nir_dominance_lca?
12:35 alyssa: query returning "first block that dominates both input blocks"
12:36 alyssa: to solve the problem of "what's the highest place in the program we can insert an instruction with given sources"
12:36 alyssa: hmm nir_opt_sink must do that..
12:38 alyssa: hmm it uses nir_dominance_lca, maybe I'm confused
12:43 alyssa:does local version first
13:38 MrCooper: zmike: just bisected GALLIUM_HUD not working anymore to "gallium: de-pointerize pipe_surface"
13:38 zmike: uhh
13:38 zmike: you're welcome?
13:38 zmike:panics
13:47 rodrigovivi: airlied sima, on drm_netlink for ras, what do you envision as a standard user space consumer?
14:30 gfxstrand: dcbaker: How do you feel about adding a src/python?
14:30 gfxstrand: And is there a good way we could make that land in the import path of every script in the tree?
14:31 gfxstrand: Like, I would love it if we had a src/python that just showed up as a mesabuild module so you just do `import mesabuild` at the top of your script and you get stuff
14:56 eric_engestrom: gfxstrand: there's the sys.path.insert() thing, it's ugly but it's reliable
14:57 gfxstrand: Yeah
14:57 eric_engestrom: (grep for that in the tree for plenty of examples)
14:59 gfxstrand: Yeah, I found a few
15:07 eric_engestrom: gfxstrand: what kind of thing are you looking to put in there?
15:16 karolherbst: alyssa: 276501755364d72b55de810e728981e78c6ee0e0 is regression some CL stuff on radeonsi
15:19 karolherbst: maybe some weirdo spirv handling missing...
15:21 glehmann: is it the splitting or the fusing that breaks it?
15:22 karolherbst: wished I'd knew
15:25 karolherbst: okay so disabling those 4 opts fixes it...
15:25 sima: rodrigovivi, tried to not think about that, maybe airlied has an idea
15:26 gfxstrand: eric_engestrom: We've got some utils in nouveau for rust generators
15:26 sima: or perhaps agd5f or someone else from amd thought about it
15:26 alyssa: karolherbst: CL should be setting `exact` everywhere
15:26 alyssa: not my bg
15:26 alyssa: bug
15:26 eric_engestrom: gfxstrand: ack; I'd be curious to see the MR when you post it :)
15:26 karolherbst: at least on the fma...
15:27 karolherbst: but yeah.. in CL the fma can't be split.. guess I'll write a patch
15:31 gfxstrand: eric_engestrom: I gave up and I'm doing something dumb now
15:31 agd5f: sima, rodrigovivi I vaguely remember looking at it. I think Hawking and Lijo provided some comments at the time. Our RAS stack doesn't currently make use of it.
15:32 sima: agd5f, it's more what should the minimal open userspace for it look like, as in how much yolo
15:40 karolherbst: alyssa: exact doesn't help?
15:41 karolherbst: like I need the fma to stay a fma for like forever
15:42 gfxstrand: exact should prevent it from being split
15:42 gfxstrand: exact means "don't do any transform on this that isn't bit-for-bit the same output"
15:42 gfxstrand: So splitting fma is definitely out
15:44 karolherbst: yeah... maybe it's something else going on, but it's kinda weird..
15:44 karolherbst: mhhhhh
15:47 karolherbst: nope it's defo those...
15:48 karolherbst: but it's only an issue with radeonsi
15:48 alyssa: ok but the patterns you linked are gated on the exact bit not being set so
15:49 alyssa: can you send me the NIR_DEBUG=print output please?
15:54 karolherbst: something nukes the exact flags...
15:54 karolherbst: or dunno.. mhh
15:55 alyssa: can you send me the NIR_DEBUG=print output please?
15:55 karolherbst: https://gist.githubusercontent.com/karolherbst/473fbe88c5bc5ba8fd57750a029e9095/raw/9e1065c178c2d5dbd3ecb4b4084f6c013776ca7d/gistfile1.txt
15:57 alyssa: karolherbst: vtn is failing to set on the exact bit on fadd/fmul instructions
15:57 karolherbst: it's legal to merge those into ffma
15:58 alyssa: ..right, I wrote that patch didn't I.
15:58 karolherbst: but..
15:58 karolherbst: all the ffma! get cf_dceed
15:58 karolherbst: so...
15:58 karolherbst: no idea what's going on...
15:58 karolherbst: maybe just something very unfortunate
15:58 karolherbst: maybe it's just libclc being wrong
15:58 dcbaker: gfxstrand: I've wanted to do that for a long time but never gotten to it.
15:59 alyssa: karolherbst: can you comment out those four lines, then send me NIR_PRINT for that too?
15:59 dcbaker: The options other than what eric_engestrom mentioned are: 1) use the `PYTHONPATH` environment variable, 2) use a small python wrapper script instead of `prog_python` that does the path insertion automatically, and then does the python equivalent of `exec $?`
16:00 karolherbst: https://gist.githubusercontent.com/karolherbst/16bb421becfb4d1472ed483904044821/raw/aa31194557da2cb088b03ebaf35f3895dc17a636/gistfile1.txt
16:00 dcbaker: I've kinda wanted to do that approach because I have this clever idea of letting that script check your python imports and generate a depfile
16:01 eric_engestrom: ooh, depfile for python would be neat
16:02 alyssa: karolherbst: oh that's all kinds of screwed up
16:02 alyssa: i see the problem, gimme a minute
16:02 karolherbst: but I should send a patch to set exact on all fmas anyway
16:03 karolherbst: though the SPIR-V might already flag them...
16:03 karolherbst: well CL spir-v env spec says "Correctly rounded"
16:04 karolherbst: there is mad if you don't care
16:05 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36175
16:12 alyssa: karolherbst: The problem, I think, is that libclc explicitly uses mad
16:12 alyssa: https://github.com/llvm/llvm-project/blob/78b9128250c9fe5c7f9e460a27cc28c6450fd8fd/libclc/clc/lib/generic/math/clc_sincos_helpers.inc#L9-L75
16:12 alyssa: which does not have the exact bit set
16:12 karolherbst: yeah, but it's fine to do either with that
16:12 alyssa: right but I think it expects it to be consistent which you do. maybe?
16:13 karolherbst: mhhhhhh
16:13 karolherbst: good question
16:13 karolherbst: I do decide inside vtn_opencl
16:13 karolherbst: maybe I just mark the result as exact as well then...
16:13 karolherbst: let me try that
16:14 alyssa: what?
16:15 karolherbst: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/4db82ce27bd7ae65eecc2894c2ed2ed3532c5a4f
16:15 alyssa: sure, but that's not necessarily good enough
16:16 alyssa: because CLC calls its own mad function directly
16:16 alyssa: but I do agree that's probably sane
16:16 karolherbst: mhhh... right...
16:16 alyssa: mad()'s description seems to be "you can pick either one", not "this is some kind of fast-math mode"
16:17 karolherbst: yeah
16:17 alyssa: and the lack of exact is fast-math circus
16:17 alyssa: BUT
16:17 alyssa: that patch won't do anything, because that ffma is already exact (-:
16:17 karolherbst: yeah, it's great
16:17 alyssa: the problem isn't mad(), it's libclc's internal mad
16:18 karolherbst: `#pragma OPENCL FP_CONTRACT ON` impressive
16:18 alyssa: OHHH
16:18 alyssa: frick
16:18 alyssa: wait no i misread the code
16:18 alyssa: nvm
16:18 alyssa: average alyssa interaction
16:19 karolherbst: anyway, my patch on mad seems to help 🙃 or I'm going crazy
16:19 alyssa: then we have a bug elsewhere
16:19 karolherbst: yeah... it does fix it
16:20 alyssa: because exact should be set on that builder
16:20 alyssa: unless this is some nonsense where the libclc shader itself is special
16:20 karolherbst: doubtful
16:20 karolherbst: normally the translator sets the contraction mode stuff properly
16:21 karolherbst: ohhh
16:21 karolherbst: oh no
16:21 karolherbst: no no no
16:21 karolherbst: on the nir side the only difference is "ffma!" and "ffma" now with my patch
16:21 karolherbst: so I guess it's needed for the AMD backend
16:22 alyssa: now that i can believe.
16:24 karolherbst: I'll test the patch and if that solves all the other issues, we'll just set exact on fma and mad
16:24 alyssa: um, no, the story's not over here
16:25 alyssa: why is b.exact not *already* set?
16:25 alyssa: and if it's not - presumably from a FP_CONTRACT ON in libclc - why do we need to override that? libclc bug? vtn bug?
16:25 karolherbst: in the spirv?
16:25 karolherbst: I think the translator might not bother for the clc builtins to set it on the spirv level
16:25 karolherbst: I should check the spirv...
16:26 karolherbst: which uhm.. is alwyas fun
16:27 karolherbst: "%22064 = OpExtInst %float %1 mad %22061 %float_n0_836411297 %float_1_10496962" well..
16:28 alyssa: can you post the spirv?
16:28 karolherbst: the entire thing?
16:28 karolherbst: it's like 2.7MiB
16:29 alyssa: I would like to understand why b.exact is not set
16:29 karolherbst: if it helps, I don't see any ContractionOff
16:29 alyssa: so..
16:29 alyssa: so why is b.exact not set?
16:30 karolherbst: why should it be set for everything?
16:31 alyssa: it's OpenCL, that's the default.
16:32 alyssa: ...apparently it is not
16:32 karolherbst: yeah, but the spir-v should tell us, because how would we know what the frontend expects
16:32 alyssa: / The DEFAULT value is ON.
16:32 alyssa: #pragma OPENCL FP_CONTRACT on-off-switch
16:32 alyssa: you have got to be kidding me
16:33 alyssa: this feels like a libclc bug.
16:35 karolherbst: not unlikely
16:35 karolherbst: cos requires <= 4 ulp, but with that change we go around 5
16:35 karolherbst: most of the code was written to be "good enough" for whatever hardware was targeted
16:36 karolherbst: (AMD)
16:39 alyssa: I strongly suspect the real bug here then is the libclc code explicitly asking for mad's when it should be explicitly asking for ffma's or something
16:39 alyssa: but also I don't care we can merge your patch I want to go back to reassociating fmuls which will break CL again (:
16:40 karolherbst: :D
16:40 karolherbst: sounds good
16:41 karolherbst: but anyway, on fedora the libclc spirv is at /usr/lib64/clc/spirv64-mesa3d-.spv
16:42 karolherbst: I kinda hope we can move to the LLVM SPIR-V target at some point and deal with all sorts of breakage :)
16:47 dcbaker: gfxstrand, eric_engestrom: I threw together a really quick and probably full of corners runner script, but does work and allows loading modules from `src/python`, it's the `wip/2025-07/src-python` branch on my gitlab
19:53 glehmann: alyssa: do you have a branch with the insert change you tried
19:53 alyssa: glehmann: the cursor one?
19:53 glehmann: yes
19:53 alyssa: let me dig thru reflog
19:54 glehmann: there are some shaders where the new pass does really badly, like farcry5/0195cf650255e8c2/vs
19:54 glehmann: badly == double register pressure
19:54 alyssa: do you have a branch with radv wired up?
19:54 alyssa: trade you :P
19:55 alyssa: glehmann: nir/opt-association-failed-attempt pushed
19:56 alyssa: not tested but should be ok
19:56 alyssa: (well it's build tested, and because my Mesa build includes a bunch of chunky AGX binaries, that smoke tests the pass hah)
19:56 glehmann: alyssa: https://gitlab.freedesktop.org/DadSchoorse/mesa/-/tree/radv-reassoc2, you can drop the last three commits there are just further things I tried for some cmat shaders
19:57 alyssa: fwiw I'm not convinced this pass will run to a fixed point
19:58 glehmann: it doesn't, that's why there is a loop limit 🙃
19:58 alyssa: :clown:
19:58 glehmann: running it a few more times only has benefits for radv
19:58 alyssa: interesting
19:58 alyssa: I wonder why it's not converging
19:58 alyssa: in one iter I mean
19:59 alyssa: I guess CSE'ing stuff makes other chains shorter and lets us reassociate more or something
19:59 glehmann: yeah that was the case in my cmat shader
19:59 alyssa: ah
20:01 alyssa: my suspicion is that the benefits on AGX have a lot to do with making good use of preambles
20:01 alyssa: which is good news for ir3
20:01 alyssa: but means we need diffeent heuristics for other ISAs
20:02 alyssa:running radv under drm-shim now
20:03 glehmann: maybe instead of trying to fix this in the reassoc pass, we should write a basic scheduler that attempts to reduce register pressure
20:04 alyssa: 2 things can be true :)
20:04 glehmann: aco is pretty dumb because the input register pressure is best possible result you get, our scheduler only makes it worse
20:04 alyssa: and yeah the AGX backend schedules for pressure
20:04 alyssa: which might also explain why my results are so much better
20:05 alyssa: (the AGX backend reg pressure scheduler is really dumb but it helps so who cares)
20:07 glehmann: something really conservative is still better than nothing
20:07 alyssa: the AGX thing is conservative in the sense that it is guaranteed to only help pressure
20:07 glehmann: maybe we should even do this in NIR
20:07 alyssa: but probably kills ILP in the process
20:08 alyssa: I don't have a cycle model of AGX so :(
20:08 glehmann: I think aco's backend schedulers would likely do a good enough job at recovering ILP
20:09 alyssa: fair
20:09 glehmann: especially for GCN, where ALU latency isn't a thing
20:09 alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/asahi/compiler/agx_pressure_schedule.c?ref_type=heads
20:09 alyssa: ^ the really dumb thing
20:10 alyssa: (i originally wrote that at collabora for bifrost, it's just as dumb/effective there too)
20:11 alyssa: when I did this for bifrost, apparently it made 60%/70% of my spills/fills go away, lol
20:12 glehmann: is there a good reason to do it in the backend?
20:12 karolherbst: zmike: did you start to expose intensity formats in zink recently?
20:12 zmike: yes?
20:13 zmike: or at least they're native-ish now
20:13 alyssa: I mean.. it's more accurate w.r.t accounting correctly for abs/neg/sat, for example
20:13 alyssa: (and anything aco_optimizer.cpp fuses, etc)
20:13 karolherbst: zmike: okay.. don't seem to work with rusticl at least
20:13 alyssa: we could probably do a NIR one but I'd probably use it in addition to the backend one and not as a replacement
20:13 glehmann: fair
20:13 zmike: karolherbst: how are you using it
20:13 zmike: also wtf cl has intensity formats?
20:14 karolherbst: probably the wrong way, why?
20:14 alyssa: zmike: yeah it's crazy
20:14 glehmann: (I just really hate writing/maintaining aco passes)
20:14 karolherbst: luminance is alsoa thing
20:14 karolherbst: luminance works tho
20:14 zmike: wild
20:14 alyssa: glehmann: tbh i think that says more about aco than backends in general....
20:14 karolherbst: maybe I have to set up the swizzle on the image/sampler views correctly or something? anyway, works wiht other drivers
20:15 zmike: are you using an image sampler?
20:15 zmike: or buffer
20:15 karolherbst: image
20:15 zmike: then you probably have to set the swizzles correctly
20:15 zmike: I think it's just RRRR though
20:15 karolherbst: yeah.. that's doable.. I just trust the uhm.. helper to do the right thing
20:15 karolherbst: u_sampler_view_default_template
20:16 karolherbst: which doesn't set RRRR for intensity it seems
20:16 zmike: I don't think that actually does anything special
20:16 zmike: you should probably copy the mesa/st handling
20:17 karolherbst: probably
20:17 karolherbst: anyway, that should be a simple fix
20:17 karolherbst: I just never bothered with swizzles, because I don't do swizzled images yet
20:17 zmike: it won't work for buffer usage though
20:17 karolherbst: that's not supported with write images anyway, right?
20:17 zmike: it should be
20:18 karolherbst: pipe_image_view doesn't have a swizzle
20:18 zmike: oh
20:18 zmike: huh
20:18 karolherbst: yeah...
20:18 karolherbst: anyway.. I can just RRRR for intensity :D
20:18 zmike: pretty sure zink could do it, but idk
20:18 karolherbst: I haven't checked if write intensity images are broken tho
20:19 karolherbst: well aren't supported anyway
20:19 glehmann: alyssa: yeah I know, aco's IR design isn't really something I would recommend
20:20 alyssa: glehmann: ok, so some preliminary notes from poking at radv:
20:20 alyssa: * the "skip global cse if we can preamble more stuff" is a loss if you don't have preambles, surprised pikachu
20:21 alyssa: * the divergence-aware ranking is a toss up if you don't have preambles, I guess because you only have integer SALU
20:22 alyssa: that doesn't account for everything, though.
20:22 glehmann: I'm actually not sure if the divergence data is close to up to date, also gfx11.5+ has float ALU too
20:23 alyssa: hmm ok
20:23 glehmann: float SALU, I mean
20:23 alyssa: running divergence at the start of the pass doesn't magically solve it tho, i tried
20:23 alyssa: my script has gpuid hardcoded to NAVI10
20:23 alyssa: what GPU_ID should I use for gfx11.5+?
20:23 glehmann: gfx1201
20:24 alyssa:reruns
21:57 zzyiwei: robclark: Hello, Sir! one more piece to assist common AHB support: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36151
22:16 sarbes: Hi. I would like to implement "normalize(vec3(xyz))" for Utgard/Lima, since there is direct HW support. Unfortunately, it seems that normalize gets lowered in NIR, so it is not entirely straight forward.
22:17 sarbes: It seems to me that I could go three different ways. 1) Undo the lowering in lima_nir_algebraic.py. 2) Introduce a normalize() op in NIR, with optional lowering. 3) Undo the lowering in a C pass.
22:17 sarbes: 1) Seems to require some tinkering with the search code, as swizzles are not supported. Without some hacking, I'm not able to match the NIR pattern.
22:18 sarbes: 2) I don't think that introducing such a "legacy" op would be accepted.
22:18 sarbes: 3) Seems to be the best solution overall, but I would like to get some confirmation.
22:20 alyssa: sarbes: I'm not seeing why you can't use an algebraic rule for that?
22:20 alyssa: oh because of the broadcast behaviour... bah
22:20 alyssa: vec4 hw was a mistake
22:21 sarbes: Yeah. The pattern is something like "('fmul', 'a', ('frsq', ('fdot3', 'a', 'a')))"
22:22 sarbes: But I would need "('fmul', 'a', ('frsq.xxx', ('fdot3', 'a', 'a')))"
22:22 alyssa: yeah I see what you mean
22:22 alyssa: this is for PP?
22:22 sarbes: Yeah.
22:25 alyssa: I guess #3 is my prefernce if it's all the same to you
22:26 alyssa: I wouldn't mind plumbing the op through as long as it's not invasive to other drivers, tho
22:26 alyssa: modifying nir_search for this is a hard no
22:26 alyssa: this is kind of a repeat of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33331 tbh
22:29 sarbes: I've been eying this MR since it was submitted, but there is no resolution for now.
22:32 sarbes: If #3 is the preferred route to go, so be it.
22:32 pac85: Are we doing lisp now :p
23:44 sarbes: Curiously, the op is processed by the varying unit. Same as perspective division (which I want to wire up too).
23:45 sarbes: Anyway, thanks for the input.
23:49 alyssa: sarbes: midgard can do perspective division when loading varyings, so that'd why be
23:49 alyssa: i assume utgard-pp could normalize too
23:50 alyssa: I mean if you already have a floating point divide, why not?
23:52 sarbes: Just saying. :)
23:52 sarbes: It does make sense to me to have it there.