00:00 linkmauve: Actually with EGL_NO_CONFIG_KHR I can get by.
07:02 lucaceresoli: lumag: thanks for reviewing the drm_connector_attach_encoder() series, I will send an iteration taking your suggestions
07:02 lucaceresoli: lumag: did you have a look at the "Additional rationale" section of the cover letter too?
07:02 lucaceresoli: lumag: TL;DR: I'm implementing (a full rewrite of) bridge hotplug inside the bridge-connector
07:02 lucaceresoli: lumag: code is not yet ready to be sent out but if you have an opinion on the principle I'd love to know it.
12:01 mripard: pinchartl: tomba: tzimmermann: Could you have a look at https://lore.kernel.org/all/20260320-drm-mode-config-init-v2-0-c63f1134e76c@kernel.org/ ?
12:29 pinchartl: mripard: probably not this week :-/
12:44 vsyrjala: mripard: the state create seems quite misplaced in the register() thing. also by that time i915 will have done readout+one internal commit already. not sure why you even have that kind of delayed state allocation tbh. why not just do it when creating the object?
12:46 mripard: "I" don't have anything, it's the pattern virtually every driver use (but i915 from what you're telling me)
12:48 mripard: also, with the state readout thing, we'll have to perform the state readout when the whole device is somewhat functional anyway
12:48 mripard: so around registration
12:49 mripard: delaying the initial allocation to the registration would save us an extra allocation
12:51 vsyrjala: there is no extra allocation in i915
12:51 vsyrjala: well, apart from the internal commit which will duplicate states as usual of course
12:52 mripard: This series doesn't affect i915, so I'm a bit confused
12:52 mripard: why are you bringing it up?
12:53 vsyrjala: you are touching private objects in general are you not?
12:53 mripard: I'm touching everything but private objects
12:54 vsyrjala: then what is this https://lore.kernel.org/all/20260320-drm-mode-config-init-v2-16-c63f1134e76c@kernel.org/ ?
12:56 mripard: yeah, ok, my bad
12:58 vsyrjala: and register() really is too late for this. everything should be fully set up *before* we allow userpsace to poke at the stuff
12:58 tzimmermann: mripard, i'll have a look
13:01 vsyrjala: imo allocating alognside the object is the best approach. that way you can never have an object floating around without a state
13:02 mripard: vsyrjala: let's agree to disagree then
13:03 mripard: vsyrjala: if you feel like registration is too late, when would be a good time once the device is somewhat tied up?
13:06 vsyrjala: i915 needs the states pretty much as soon as the objects are created. there isn't much between the obj create and readout
13:09 mripard: where's the readout code for i915 private objects?
13:11 vsyrjala: not sure we have any atm
13:12 mripard: ... really?
13:13 vsyrjala: i don't recall what the mst and tunnel stuff do. those might be the only private objs we have right now. everything else is intel_global_obj
13:14 vsyrjala: which is pretty much a rw locked variant of private obj
13:21 vsyrjala: anyways, intel_modeset_setup_hw_state() is where all the init/resume readout happens. in addition intel_modeset_verify_crtc() does partial readout to verify the hw state after modesets and whatnot
13:22 mripard: I'm still confused
13:23 mripard: if you don't do readout on private objs, and provide the state for every other object already. How is i915 affected by my patches, and if it is, how would you like it to be changed?
13:24 vsyrjala: i'm not 100% sure it's affected right now. but i don't what yet another "let's paint ourselves in a corner" approach that prevents some drivers from using private objects
13:25 mripard: it's pretty clear by now what you don't want
13:25 mripard: what do you want?
13:26 vsyrjala: i already said it. i think the state allocation should stay with the obj. that way there is never any obj w/o state around, and we don't need yet another untested error path to deal with the state allocation failures
13:28 mripard: and, doing so, you want 99% of the KMS drivers to not deal with state allocation failures at all
13:29 vsyrjala: no. they already have to handle the obj allocation failing. the state allocation failure then uses the same error path
13:29 mripard: no they don't
13:29 mripard: they just call drm_mode_config_reset() and call it a day
13:29 mripard: which returns void
13:29 vsyrjala: yes right now. but move it into the obj alloc and that problem gets solved as well
13:34 mripard: which would require an extra allocation later on for readout
13:34 mripard: which, btw, you absolutely have in i915: https://elixir.bootlin.com/linux/v7.0/source/drivers/gpu/drm/i915/display/intel_modeset_setup.c#L705
13:35 vsyrjala: the readout happens into the current state
18:02 karolherbst: gfxstrand: I think I'll need a judgement call in regards to ~ffma transformation rules. So float_controls2 allow flagged instructions to be reassociated and contracted, but many ~ffma rules do that on the assumption that ffma isn't a single op, but rather fmul+fadd. So I'm wondering how float_controls2 and real ffma play together there. Or if we
18:02 karolherbst: should just say: "well.. it's ffma, so rather keep it always fused and don't reassoc unless the fma itself remains intact"... buuuuut not quite sure how far we want to go there.
18:06 karolherbst: e.g. doing ~fma(a, b, c * d) -> a * (b + c) kinda feels illegal even with reassoc+contract enabled.. but dunno
18:06 karolherbst: ehhh...
18:07 karolherbst: ~fma(a, b, a * c) -> a * (b + c) I meant
18:07 karolherbst: but sadly neither SPV extension really says anything there
18:10 alyssa: why are u2u32/unpack_64_2x32_split_x separate? x_x
18:11 alyssa: can we just.. delete unpack_64_2x32_split_x and use u2u32 always
18:12 karolherbst: alyssa: it feels like somebody asked this very question every week
18:12 karolherbst: *asks
18:12 alyssa: probably
18:12 alyssa: is ther an answer
18:12 alyssa: because istg i'll go deleting
18:12 alyssa: :p
18:12 karolherbst: I can give you two made up ones
18:12 karolherbst: 1. "consistency" and 2. "history"
18:13 Sachiel: consistory
18:13 karolherbst: but yeah.. I'm all for always using u2u32
18:13 karolherbst: r-by for the change 🙃
18:14 alyssa: more insane is that lots of backends do.. different things for those 2 ops.
18:15 karolherbst: have fun! *goes back to the fma rabbit hole*
18:15 alyssa: ...yeah this seems like too much effort
18:15 alyssa:goes back to jay rabbit bird bunny hole
18:15 karolherbst: I think you found your answer 🙃
18:15 mareko: a lot of things exist because they made a lot of sense in the past, but less so now
18:16 karolherbst: alyssa: I was side-eyeing intel_nir_opt_peephole_ffma and wondering if we want to delete that 🙃
18:16 alyssa: the real issue seems to be backends that don't ingest u2u32(64) for purely artifical reasons
18:16 HdkR: alyssa: Add some new ops to do it correctly, and phase out the old ones in stages, eventually fizzling out without quite replacing all of them, then we can question why we have even more conflicting ops :P
18:16 alyssa: karolherbst: at one point it did things for brw, dunno if jay cares
18:16 alyssa: and the common thing is more competnet now too so idk
18:16 alyssa: Intel fma has weird restrictions but for jay i think the common thing is fine?
18:17 karolherbst: alyssa: yeah so one thing that's special is,t hat it prevents doing ffma with two imediates
18:17 alyssa: right.
18:17 mareko: creating an issue asking drivers to stop using something redundant might also work
18:17 alyssa: but jay would probably not mind 2 immediate ffma anyway.
18:17 alyssa: brw cares more.
18:17 karolherbst: okay
18:17 alyssa: (this is a hw thing but jay is better at not doing dumb)
18:17 karolherbst: ahh
18:17 karolherbst: okay
18:18 alyssa: 2 imm ffma turns into 3 instructions on intel
18:18 karolherbst: well sounds like the answer is: I have to keep it around then :')
18:18 alyssa: vs 2 instructions for 1 imm fmul+1 imm add
18:18 alyssa: but I suspect for jay it's less cycles average.
18:18 alyssa: & maybe also brw
18:18 alyssa: idk if it's been tested
18:18 karolherbst: mhhh
18:19 karolherbst: when Nvidia still had 64 bit instructions, ffma was such a disaster
18:19 alyssa: fwiw a lot of intel compiler code exists only to workaround intel compiler code
18:19 karolherbst: only allowed a single 32 bit immediate when the dest and src2 got the same register id
18:20 karolherbst: and otherwise the max was 20 bit immediate? which was like fine most of the time
18:20 karolherbst: so I guess on older gens NVK has the same issue somehow
18:20 karolherbst: we could probably nir shader option it if we reaaaalllllyy care
18:21 alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41084 https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41085
18:21 alyssa: karolherbst: ^ these 2 make rusticl suck less on intel
18:21 alyssa: i got annoyed :p
18:21 karolherbst: not surprised
18:22 alyssa: "hello world" from 72 instructions to 44 instructions on jay
18:22 karolherbst: the optimized or default shader?
18:22 karolherbst: optimized I guess...
18:22 alyssa: optimized
18:22 alyssa: IGC is like 23 instructions for the same shader so
18:22 alyssa: we still suck
18:22 karolherbst: heh
18:22 alyssa: but yknow.
18:22 karolherbst: yeah sooo
18:22 karolherbst: intel cheats
18:23 karolherbst: they just disable 4GiB+ allocs by default
18:23 alyssa: sounds good, i'm down
18:23 karolherbst: but yeah.. I should land my ubo stuff and get ssbo stuff going
18:23 karolherbst: next month or something
18:24 karolherbst: maybe I just delete intel_nir_opt_peephole_ffma, because my goal is to not have drivers trying to be super smart in regards to ffma fusing themselves... and then it all common code
18:25 alyssa: let me fossildb that on jay, i'm curious
18:26 karolherbst: this two constant ffma thing sounds like a limitation every hw has tbh..
18:26 karolherbst: so might as well just... do that as common code
18:26 alyssa: not all hw?
18:26 karolherbst: two 32 bit immediates?
18:26 alyssa: AGX doesn't care
18:26 alyssa: well
18:26 karolherbst: ohh right...
18:26 karolherbst: one immediate is fine
18:26 alyssa: AGX doesn't have 32-bit imms in the first place
18:26 karolherbst: heh
18:27 karolherbst: nvidia with their 128 bit instructions be like: ...
18:27 karolherbst: but yeah.. lemme know the results
18:33 glehmann: on amd it depends on the imm
18:34 alyssa: karolherbst: results are all over the place because it's affecting loop unrolling
18:34 alyssa: anholt: did you have a patch for report-fossil.py for this? am I imagining?
18:34 karolherbst: pain
18:34 karolherbst: I had fun with this in regards to other opts the other day :')
18:35 anholt: alyssa: I don't use the .py any more. Use the new rust tool.
18:35 karolherbst: ahh yeah.. fsin/fcos lowering, it was great
18:35 anholt: but, yes, new rust tool should skip stats when loop unrolling changes. not sure about py
18:36 anholt: (https://gitlab.freedesktop.org/mesa/shader-db/-/merge_requests/117)
18:36 alyssa: ahhh right
18:36 alyssa: so I was half-hallucinating it then
18:36 alyssa: thanks
18:39 alyssa: karolherbst: for non-loopy shaders, it seems still all ove rhte place because the common fuse_ffma has higher reg pressure than the Intel pass
18:39 karolherbst: mhhhh
18:39 alyssa: (and Intel compilers panic under reg pressure)
18:39 karolherbst: oof
18:39 alyssa: at least Jay panics in linear-time!! :p
18:39 karolherbst: maybe I need to take another look, but I _could_ make this an option but...
18:41 glehmann: common fma fusion is pretty bad
18:42 glehmann: it doesn't have any heuristic to choose which mul operand to fuse
18:42 alyssa: the differences do seem to be loops & reg pressure though, the 2-constant heuristic is almost noise for jay
18:42 alyssa: keeping the intel pass but ripping out that heuristic:
18:42 alyssa: Totals from 96 (3.63% of 2647) affected shaders:
18:42 alyssa: Instrs: 184058 -> 184126 (+0.04%); split: -0.12%, +0.16%
18:42 glehmann: and it creates fma when the mul ultimately isn't even eliminated
18:42 alyssa: glehmann: i thought i fixed that last onr
18:43 karolherbst: glehmann: so probably better to have it its own pass with a bunch of configurable heuristics or just tell either driver to do their own thing?
18:43 glehmann: alyssa: iirc you added is only used by fadd, but that is too aggressive because not all fadds have to be fused with that mul
18:44 glehmann: karolherbst: aco will continue to do its own thing for sure
18:44 glehmann: there are just too many details to deal with in NIR
18:44 alyssa: ah
18:44 karolherbst: I see
18:45 karolherbst: more details than fmad vs ffma?
18:45 karolherbst: guess scalar vs vector as well
18:46 glehmann: yes that, and which constants are free vs which needs to use large imm
18:46 karolherbst: I see
18:47 glehmann: also, we want some muls to use the free output modifer instead of being fused to fma
18:47 glehmann: and salu is another story because it has no modifiers at all, and no three operand encoding
18:47 karolherbst: mhhh
18:48 karolherbst: yeah.. on nvidia it's easy: you always have both modifiers on every source and one constant on src1 or src2 🙃 and no uniform float ops
18:49 glehmann: I thought they added uniform float ops in the latest gen?
18:49 karolherbst: could be
18:49 alyssa: intel lets you implement whatever isa you want it's great /s
18:50 karolherbst: I think only the consumer blackwells got it...
18:50 karolherbst: yeah... seems to be that the DC blackwell cards don't have it either :')
18:51 glehmann: alyssa: just use aco then
18:51 alyssa: glehmann: what did you think i was doing since november
18:52 glehmann: I thought your standard trick is to hit things with a hammer until they work like nv
18:52 zmike: I heard someone say hammer
18:55 alyssa: glehmann: yes that's plan A
18:55 alyssa: plan B is to copy from radv
18:59 glehmann: btw, looks like intel's peephole_ffma messes up float controls when modifiers are handled
19:34 mareko: going from a 60Hz monitor to 200Hz feels amazing
19:35 zmike: it sure does
19:36 karolherbst: the worst part is, there is no going back
19:36 karolherbst: using a 60hz display now is just suffering
19:38 zmike: 60hz just can't keep up with how fast gdb is scrolling through my crash traces
20:08 airlied: karolherbst: I got 4k@120 on nouveau to lock on boot yesterday
20:09 karolherbst: HDMI 2.1?
20:09 karolherbst: nice
20:10 airlied: yup
20:10 karolherbst: wait
20:10 karolherbst: "lock" as in, it doesn't work?
20:10 airlied: my first attempt only worked if I loaded the module manually, otherwise the core notifier would crash
20:10 karolherbst: ahh, so hotplugging kinda works, but not on boot?
20:11 karolherbst: mhhhh
20:11 airlied: yes, but got the init sequence to go yesterday
20:11 karolherbst: well whenever you got something to test, I have a 4K@120 and 4K@165/FHD@360 display. Though not sure if FHD#360 needs HDMI2.1...
20:12 karolherbst: ohh looks like it does
21:50 pinchartl: airlied_: ping
21:54 airlied: pinchartl: pong
21:55 pinchartl: airlied: could I get your review on https://lore.kernel.org/dri-devel/20260407104951.1781047-1-laurent.pinchart+renesas@ideasonboard.com/ ?
21:55 pinchartl: Sima asked me to get your ack
21:56 pinchartl: (and you can of course just merge the patch too if you're fine with it :-))
21:58 airlied: pinchartl: ack, though I'm not a position to merge stuff today, have to unscrew the screw up I did last week :-P
21:58 pinchartl: :)
21:59 pinchartl: if you're fine getting that merged through drm-misc, could you reply with a r-b or ack on the list ?
22:01 airlied: done
22:19 pinchartl: airlied: thank you