03:40olivial: just unassigning marge from an MR is sufficient to cancel, right?
03:44olivial: ah, cancelling the CI pipeline worked
06:57phasta: tursulin: drm_sched unit tests are now also being executed by RH's quality assurance on CKI. Good work. https://datawarehouse.cki-project.org/kcidb/tests/redhat:koji-134819423-ppc64le-kernel_upt_6
08:57mwalle: hi, how is devm_drm_bridge_alloc() supposed to work if the bridge is part of an encoder struct which is in turn allocated (and initialzed) by drmm_simple_encoder_alloc()?
08:59mwalle: lucaceresoli: see drivers/gpu/drm/tidss/tidss_encoder.c
09:38sima: rodrigovivi, imre said that your rerere commit 5dd2d660323d78890f92809be3413a77f8e41f07 has apparently a wrong interim conflict resolution for "drm/dp: Change AUX DPCD probe address from LANE0_1_STATUS to TRAINING_PATTERN_SET" in -fixes vs -next, and imre's in 7f2bb7f564c4c is the right one
09:38sima: can you pls try to sort this out with imre?
09:39sima: airlied, ^^ also heads-up so we make sure we don't accidentally land this, or send a bogus example conflict resolution to linus in the main merge window pr
09:39sima: imre, did you see a mail from sfr about the conflict in linux-next fly by on dri-devel?
09:40sima: you should get cc'ed if you've authored/committed one of the involved commits
09:43lucaceresoli: mwalle: this topic has been discussed between jani and mripard w.r.t. panels for devm_drm_panel_alloc(), but for bridges it's the same
09:43lucaceresoli: mwalle: https://lore.kernel.org/all/20250606-pompous-mellow-guan-1d9ea4@houat/
09:45lucaceresoli: mwalle: TL;DR: the bridge will have to be allocated dynamically (yes, that's a bit of annoyance for drivers which currently embed it, but not quite avoidable)
09:45imre: sima, rodrigovivi, yes rodrigo asked me if that resolution was ok and I acked it, so my fault. The correct resolution is 'ret = drm_dp_dpcd_probe(aux, DP_TRAINING_PATTERN_SET);' in the result not 'ret = drm_dp_dpcd_probe(aux, DP_LANE0_1_STATUS);'. Sorry for that.
09:45lucaceresoli: mwalle: and you can either have a wrapper struct that embeds the bridge, and devm_drm_bridge_alloc() that struct, if it makes sense
09:46sima: imre, rodrigovivi ah ok, then revert of that drm-rerere commit and retrying with dim rebuild-tip should be enough
09:46lucaceresoli: mwalle: or you can call the low-level function __devm_drm_bridge_alloc() as done in https://lore.kernel.org/all/13d15c1414e65ffb21944d66e2820befdab54e98.1749199013.git.jani.nikula@intel.com/
09:46imre: sima, rodrigovivi, I suppose reverting 5dd2d660323d from drm-rerere and perhaps also doing a 'dim rebuild-tip' would fix this.
09:46sima: yeah that should usually do the trick
09:46imre: sima, ok
09:46sima: it's even documented as the procedure
09:46imre: I'll answer now to sft as well
09:47imre: sfr
09:47sima: oh, do you have the link for that one for here?
09:47mwalle: lucaceresoli: thanks for the pointers, i'll have a look later. right now i'm getting a refcnt overflow warning with the latest next (as it is expected i'd guess if the bridge isn't initialzed)
09:48imre: sima, didn't answer yet, but his email is https://lore.kernel.org/all/20250716141832.5542b414@canb.auug.org.au
09:51imre: it's the correct resolution, so no need for me to answer
09:55mwalle: lucaceresoli: I'd probably need a wrapper to get a reference the private struct of the driver (within the bridge_functs), right? Ie. struct tidss_encoder_bridge { struct drm_bridge bridge; struct tidss_encoder *encoder}. Then go from drm_bridge to tidss_encoder_brigde and use the pointer to get the original private struct
10:05sima: imre, ah yeah that's just standard adjacent line changes stuff, standard fare for linus to sort out
10:05sima: just need to get drm-tip fixed
10:06lucaceresoli: mwalle: indeed the refcnt warning is expected with current -next, because kmalloc or any other "classic" allocation process won't initialize the refcnt, thus it will start from 0 hence the warning
10:09lucaceresoli: mwalle: and yes, your code snippet looks like a good solution
10:10glehmann: eric_engestrom: ugh, marge pushed the MR after she was unassigned: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115
10:10eric_engestrom: yeah I just saw that, cancelled it
10:10eric_engestrom: it's because it picked the MR, then got unassigned while it was rebasing it, and then pushed it
10:10eric_engestrom: bad timing
12:35alyssa: do we have a "reversed" version of nir_dominance_lca?
12:35alyssa: query returning "first block that dominates both input blocks"
12:36alyssa: to solve the problem of "what's the highest place in the program we can insert an instruction with given sources"
12:36alyssa: hmm nir_opt_sink must do that..
12:38alyssa: hmm it uses nir_dominance_lca, maybe I'm confused
12:43alyssa:does local version first
13:38MrCooper: zmike: just bisected GALLIUM_HUD not working anymore to "gallium: de-pointerize pipe_surface"
13:38zmike: uhh
13:38zmike: you're welcome?
13:38zmike:panics
13:47rodrigovivi: airlied sima, on drm_netlink for ras, what do you envision as a standard user space consumer?
14:30gfxstrand: dcbaker: How do you feel about adding a src/python?
14:30gfxstrand: And is there a good way we could make that land in the import path of every script in the tree?
14:31gfxstrand: Like, I would love it if we had a src/python that just showed up as a mesabuild module so you just do `import mesabuild` at the top of your script and you get stuff
14:56eric_engestrom: gfxstrand: there's the sys.path.insert() thing, it's ugly but it's reliable
14:57gfxstrand: Yeah
14:57eric_engestrom: (grep for that in the tree for plenty of examples)
14:59gfxstrand: Yeah, I found a few
15:07eric_engestrom: gfxstrand: what kind of thing are you looking to put in there?
15:16karolherbst: alyssa: 276501755364d72b55de810e728981e78c6ee0e0 is regression some CL stuff on radeonsi
15:19karolherbst: maybe some weirdo spirv handling missing...
15:21glehmann: is it the splitting or the fusing that breaks it?
15:22karolherbst: wished I'd knew
15:25karolherbst: okay so disabling those 4 opts fixes it...
15:25sima: rodrigovivi, tried to not think about that, maybe airlied has an idea
15:26gfxstrand: eric_engestrom: We've got some utils in nouveau for rust generators
15:26sima: or perhaps agd5f or someone else from amd thought about it
15:26alyssa: karolherbst: CL should be setting `exact` everywhere
15:26alyssa: not my bg
15:26alyssa: bug
15:26eric_engestrom: gfxstrand: ack; I'd be curious to see the MR when you post it :)
15:26karolherbst: at least on the fma...
15:27karolherbst: but yeah.. in CL the fma can't be split.. guess I'll write a patch
15:31gfxstrand: eric_engestrom: I gave up and I'm doing something dumb now
15:31agd5f: sima, rodrigovivi I vaguely remember looking at it. I think Hawking and Lijo provided some comments at the time. Our RAS stack doesn't currently make use of it.
15:32sima: agd5f, it's more what should the minimal open userspace for it look like, as in how much yolo
15:40karolherbst: alyssa: exact doesn't help?
15:41karolherbst: like I need the fma to stay a fma for like forever
15:42gfxstrand: exact should prevent it from being split
15:42gfxstrand: exact means "don't do any transform on this that isn't bit-for-bit the same output"
15:42gfxstrand: So splitting fma is definitely out
15:44karolherbst: yeah... maybe it's something else going on, but it's kinda weird..
15:44karolherbst: mhhhhh
15:47karolherbst: nope it's defo those...
15:48karolherbst: but it's only an issue with radeonsi
15:48alyssa: ok but the patterns you linked are gated on the exact bit not being set so
15:49alyssa: can you send me the NIR_DEBUG=print output please?
15:54karolherbst: something nukes the exact flags...
15:54karolherbst: or dunno.. mhh
15:55alyssa: can you send me the NIR_DEBUG=print output please?
15:55karolherbst: https://gist.githubusercontent.com/karolherbst/473fbe88c5bc5ba8fd57750a029e9095/raw/9e1065c178c2d5dbd3ecb4b4084f6c013776ca7d/gistfile1.txt
15:57alyssa: karolherbst: vtn is failing to set on the exact bit on fadd/fmul instructions
15:57karolherbst: it's legal to merge those into ffma
15:58alyssa: ..right, I wrote that patch didn't I.
15:58karolherbst: but..
15:58karolherbst: all the ffma! get cf_dceed
15:58karolherbst: so...
15:58karolherbst: no idea what's going on...
15:58karolherbst: maybe just something very unfortunate
15:58karolherbst: maybe it's just libclc being wrong
15:58dcbaker: gfxstrand: I've wanted to do that for a long time but never gotten to it.
15:59alyssa: karolherbst: can you comment out those four lines, then send me NIR_PRINT for that too?
15:59dcbaker: The options other than what eric_engestrom mentioned are: 1) use the `PYTHONPATH` environment variable, 2) use a small python wrapper script instead of `prog_python` that does the path insertion automatically, and then does the python equivalent of `exec $?`
16:00karolherbst: https://gist.githubusercontent.com/karolherbst/16bb421becfb4d1472ed483904044821/raw/aa31194557da2cb088b03ebaf35f3895dc17a636/gistfile1.txt
16:00dcbaker: I've kinda wanted to do that approach because I have this clever idea of letting that script check your python imports and generate a depfile
16:01eric_engestrom: ooh, depfile for python would be neat
16:02alyssa: karolherbst: oh that's all kinds of screwed up
16:02alyssa: i see the problem, gimme a minute
16:02karolherbst: but I should send a patch to set exact on all fmas anyway
16:03karolherbst: though the SPIR-V might already flag them...
16:03karolherbst: well CL spir-v env spec says "Correctly rounded"
16:04karolherbst: there is mad if you don't care
16:05karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36175
16:12alyssa: karolherbst: The problem, I think, is that libclc explicitly uses mad
16:12alyssa: https://github.com/llvm/llvm-project/blob/78b9128250c9fe5c7f9e460a27cc28c6450fd8fd/libclc/clc/lib/generic/math/clc_sincos_helpers.inc#L9-L75
16:12alyssa: which does not have the exact bit set
16:12karolherbst: yeah, but it's fine to do either with that
16:12alyssa: right but I think it expects it to be consistent which you do. maybe?
16:13karolherbst: mhhhhhh
16:13karolherbst: good question
16:13karolherbst: I do decide inside vtn_opencl
16:13karolherbst: maybe I just mark the result as exact as well then...
16:13karolherbst: let me try that
16:14alyssa: what?
16:15karolherbst: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/4db82ce27bd7ae65eecc2894c2ed2ed3532c5a4f
16:15alyssa: sure, but that's not necessarily good enough
16:16alyssa: because CLC calls its own mad function directly
16:16alyssa: but I do agree that's probably sane
16:16karolherbst: mhhh... right...
16:16alyssa: mad()'s description seems to be "you can pick either one", not "this is some kind of fast-math mode"
16:17karolherbst: yeah
16:17alyssa: and the lack of exact is fast-math circus
16:17alyssa: BUT
16:17alyssa: that patch won't do anything, because that ffma is already exact (-:
16:17karolherbst: yeah, it's great
16:17alyssa: the problem isn't mad(), it's libclc's internal mad
16:18karolherbst: `#pragma OPENCL FP_CONTRACT ON` impressive
16:18alyssa: OHHH
16:18alyssa: frick
16:18alyssa: wait no i misread the code
16:18alyssa: nvm
16:18alyssa: average alyssa interaction
16:19karolherbst: anyway, my patch on mad seems to help 🙃 or I'm going crazy
16:19alyssa: then we have a bug elsewhere
16:19karolherbst: yeah... it does fix it
16:20alyssa: because exact should be set on that builder
16:20alyssa: unless this is some nonsense where the libclc shader itself is special
16:20karolherbst: doubtful
16:20karolherbst: normally the translator sets the contraction mode stuff properly
16:21karolherbst: ohhh
16:21karolherbst: oh no
16:21karolherbst: no no no
16:21karolherbst: on the nir side the only difference is "ffma!" and "ffma" now with my patch
16:21karolherbst: so I guess it's needed for the AMD backend
16:22alyssa: now that i can believe.
16:24karolherbst: I'll test the patch and if that solves all the other issues, we'll just set exact on fma and mad
16:24alyssa: um, no, the story's not over here
16:25alyssa: why is b.exact not *already* set?
16:25alyssa: and if it's not - presumably from a FP_CONTRACT ON in libclc - why do we need to override that? libclc bug? vtn bug?
16:25karolherbst: in the spirv?
16:25karolherbst: I think the translator might not bother for the clc builtins to set it on the spirv level
16:25karolherbst: I should check the spirv...
16:26karolherbst: which uhm.. is alwyas fun
16:27karolherbst: "%22064 = OpExtInst %float %1 mad %22061 %float_n0_836411297 %float_1_10496962" well..
16:28alyssa: can you post the spirv?
16:28karolherbst: the entire thing?
16:28karolherbst: it's like 2.7MiB
16:29alyssa: I would like to understand why b.exact is not set
16:29karolherbst: if it helps, I don't see any ContractionOff
16:29alyssa: so..
16:29alyssa: so why is b.exact not set?
16:30karolherbst: why should it be set for everything?
16:31alyssa: it's OpenCL, that's the default.
16:32alyssa: ...apparently it is not
16:32karolherbst: yeah, but the spir-v should tell us, because how would we know what the frontend expects
16:32alyssa: / The DEFAULT value is ON.
16:32alyssa: #pragma OPENCL FP_CONTRACT on-off-switch
16:32alyssa: you have got to be kidding me
16:33alyssa: this feels like a libclc bug.
16:35karolherbst: not unlikely
16:35karolherbst: cos requires <= 4 ulp, but with that change we go around 5
16:35karolherbst: most of the code was written to be "good enough" for whatever hardware was targeted
16:36karolherbst: (AMD)
16:39alyssa: I strongly suspect the real bug here then is the libclc code explicitly asking for mad's when it should be explicitly asking for ffma's or something
16:39alyssa: but also I don't care we can merge your patch I want to go back to reassociating fmuls which will break CL again (:
16:40karolherbst: :D
16:40karolherbst: sounds good
16:41karolherbst: but anyway, on fedora the libclc spirv is at /usr/lib64/clc/spirv64-mesa3d-.spv
16:42karolherbst: I kinda hope we can move to the LLVM SPIR-V target at some point and deal with all sorts of breakage :)
16:47dcbaker: gfxstrand, eric_engestrom: I threw together a really quick and probably full of corners runner script, but does work and allows loading modules from `src/python`, it's the `wip/2025-07/src-python` branch on my gitlab
19:53glehmann: alyssa: do you have a branch with the insert change you tried
19:53alyssa: glehmann: the cursor one?
19:53glehmann: yes
19:53alyssa: let me dig thru reflog
19:54glehmann: there are some shaders where the new pass does really badly, like farcry5/0195cf650255e8c2/vs
19:54glehmann: badly == double register pressure
19:54alyssa: do you have a branch with radv wired up?
19:54alyssa: trade you :P
19:55alyssa: glehmann: nir/opt-association-failed-attempt pushed
19:56alyssa: not tested but should be ok
19:56alyssa: (well it's build tested, and because my Mesa build includes a bunch of chunky AGX binaries, that smoke tests the pass hah)
19:56glehmann: alyssa: https://gitlab.freedesktop.org/DadSchoorse/mesa/-/tree/radv-reassoc2, you can drop the last three commits there are just further things I tried for some cmat shaders
19:57alyssa: fwiw I'm not convinced this pass will run to a fixed point
19:58glehmann: it doesn't, that's why there is a loop limit 🙃
19:58alyssa: :clown:
19:58glehmann: running it a few more times only has benefits for radv
19:58alyssa: interesting
19:58alyssa: I wonder why it's not converging
19:58alyssa: in one iter I mean
19:59alyssa: I guess CSE'ing stuff makes other chains shorter and lets us reassociate more or something
19:59glehmann: yeah that was the case in my cmat shader
19:59alyssa: ah
20:01alyssa: my suspicion is that the benefits on AGX have a lot to do with making good use of preambles
20:01alyssa: which is good news for ir3
20:01alyssa: but means we need diffeent heuristics for other ISAs
20:02alyssa:running radv under drm-shim now
20:03glehmann: maybe instead of trying to fix this in the reassoc pass, we should write a basic scheduler that attempts to reduce register pressure
20:04alyssa: 2 things can be true :)
20:04glehmann: aco is pretty dumb because the input register pressure is best possible result you get, our scheduler only makes it worse
20:04alyssa: and yeah the AGX backend schedules for pressure
20:04alyssa: which might also explain why my results are so much better
20:05alyssa: (the AGX backend reg pressure scheduler is really dumb but it helps so who cares)
20:07glehmann: something really conservative is still better than nothing
20:07alyssa: the AGX thing is conservative in the sense that it is guaranteed to only help pressure
20:07glehmann: maybe we should even do this in NIR
20:07alyssa: but probably kills ILP in the process
20:08alyssa: I don't have a cycle model of AGX so :(
20:08glehmann: I think aco's backend schedulers would likely do a good enough job at recovering ILP
20:09alyssa: fair
20:09glehmann: especially for GCN, where ALU latency isn't a thing
20:09alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/asahi/compiler/agx_pressure_schedule.c?ref_type=heads
20:09alyssa: ^ the really dumb thing
20:10alyssa: (i originally wrote that at collabora for bifrost, it's just as dumb/effective there too)
20:11alyssa: when I did this for bifrost, apparently it made 60%/70% of my spills/fills go away, lol
20:12glehmann: is there a good reason to do it in the backend?
20:12karolherbst: zmike: did you start to expose intensity formats in zink recently?
20:12zmike: yes?
20:13zmike: or at least they're native-ish now
20:13alyssa: I mean.. it's more accurate w.r.t accounting correctly for abs/neg/sat, for example
20:13alyssa: (and anything aco_optimizer.cpp fuses, etc)
20:13karolherbst: zmike: okay.. don't seem to work with rusticl at least
20:13alyssa: we could probably do a NIR one but I'd probably use it in addition to the backend one and not as a replacement
20:13glehmann: fair
20:13zmike: karolherbst: how are you using it
20:13zmike: also wtf cl has intensity formats?
20:14karolherbst: probably the wrong way, why?
20:14alyssa: zmike: yeah it's crazy
20:14glehmann: (I just really hate writing/maintaining aco passes)
20:14karolherbst: luminance is alsoa thing
20:14karolherbst: luminance works tho
20:14zmike: wild
20:14alyssa: glehmann: tbh i think that says more about aco than backends in general....
20:14karolherbst: maybe I have to set up the swizzle on the image/sampler views correctly or something? anyway, works wiht other drivers
20:15zmike: are you using an image sampler?
20:15zmike: or buffer
20:15karolherbst: image
20:15zmike: then you probably have to set the swizzles correctly
20:15zmike: I think it's just RRRR though
20:15karolherbst: yeah.. that's doable.. I just trust the uhm.. helper to do the right thing
20:15karolherbst: u_sampler_view_default_template
20:16karolherbst: which doesn't set RRRR for intensity it seems
20:16zmike: I don't think that actually does anything special
20:16zmike: you should probably copy the mesa/st handling
20:17karolherbst: probably
20:17karolherbst: anyway, that should be a simple fix
20:17karolherbst: I just never bothered with swizzles, because I don't do swizzled images yet
20:17zmike: it won't work for buffer usage though
20:17karolherbst: that's not supported with write images anyway, right?
20:17zmike: it should be
20:18karolherbst: pipe_image_view doesn't have a swizzle
20:18zmike: oh
20:18zmike: huh
20:18karolherbst: yeah...
20:18karolherbst: anyway.. I can just RRRR for intensity :D
20:18zmike: pretty sure zink could do it, but idk
20:18karolherbst: I haven't checked if write intensity images are broken tho
20:19karolherbst: well aren't supported anyway
20:19glehmann: alyssa: yeah I know, aco's IR design isn't really something I would recommend
20:20alyssa: glehmann: ok, so some preliminary notes from poking at radv:
20:20alyssa: * the "skip global cse if we can preamble more stuff" is a loss if you don't have preambles, surprised pikachu
20:21alyssa: * the divergence-aware ranking is a toss up if you don't have preambles, I guess because you only have integer SALU
20:22alyssa: that doesn't account for everything, though.
20:22glehmann: I'm actually not sure if the divergence data is close to up to date, also gfx11.5+ has float ALU too
20:23alyssa: hmm ok
20:23glehmann: float SALU, I mean
20:23alyssa: running divergence at the start of the pass doesn't magically solve it tho, i tried
20:23alyssa: my script has gpuid hardcoded to NAVI10
20:23alyssa: what GPU_ID should I use for gfx11.5+?
20:23glehmann: gfx1201
20:24alyssa:reruns
21:57zzyiwei: robclark: Hello, Sir! one more piece to assist common AHB support: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36151
22:16sarbes: Hi. I would like to implement "normalize(vec3(xyz))" for Utgard/Lima, since there is direct HW support. Unfortunately, it seems that normalize gets lowered in NIR, so it is not entirely straight forward.
22:17sarbes: It seems to me that I could go three different ways. 1) Undo the lowering in lima_nir_algebraic.py. 2) Introduce a normalize() op in NIR, with optional lowering. 3) Undo the lowering in a C pass.
22:17sarbes: 1) Seems to require some tinkering with the search code, as swizzles are not supported. Without some hacking, I'm not able to match the NIR pattern.
22:18sarbes: 2) I don't think that introducing such a "legacy" op would be accepted.
22:18sarbes: 3) Seems to be the best solution overall, but I would like to get some confirmation.
22:20alyssa: sarbes: I'm not seeing why you can't use an algebraic rule for that?
22:20alyssa: oh because of the broadcast behaviour... bah
22:20alyssa: vec4 hw was a mistake
22:21sarbes: Yeah. The pattern is something like "('fmul', 'a', ('frsq', ('fdot3', 'a', 'a')))"
22:22sarbes: But I would need "('fmul', 'a', ('frsq.xxx', ('fdot3', 'a', 'a')))"
22:22alyssa: yeah I see what you mean
22:22alyssa: this is for PP?
22:22sarbes: Yeah.
22:25alyssa: I guess #3 is my prefernce if it's all the same to you
22:26alyssa: I wouldn't mind plumbing the op through as long as it's not invasive to other drivers, tho
22:26alyssa: modifying nir_search for this is a hard no
22:26alyssa: this is kind of a repeat of https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33331 tbh
22:29sarbes: I've been eying this MR since it was submitted, but there is no resolution for now.
22:32sarbes: If #3 is the preferred route to go, so be it.
22:32pac85: Are we doing lisp now :p
23:44sarbes: Curiously, the op is processed by the varying unit. Same as perspective division (which I want to wire up too).
23:45sarbes: Anyway, thanks for the input.
23:49alyssa: sarbes: midgard can do perspective division when loading varyings, so that'd why be
23:49alyssa: i assume utgard-pp could normalize too
23:50alyssa: I mean if you already have a floating point divide, why not?
23:52sarbes: Just saying. :)
23:52sarbes: It does make sense to me to have it there.