IRC Logs of #dri-devel on irc.freenode.net for 2025-09-15

07:19 tzimmermann: sima, airlied, hi. could you please formward drm-next to -rc6?
07:26 sima: tzimmermann, I guess backmerge and do you need something specific? or did you mean drm-fixes?
07:27 tzimmermann: sima, yeah backmerge. i want to get drm-misc-next and -next-fixes to -rc6
07:32 sima: tzimmermann, will look into it later today, going to the gym this morning
07:50 airlied: tzimmermann: doing a backmerge at the moment
07:50 airlied: needed it for msm
07:51 airlied: tzimmermann: pushed out now
07:56 tzimmermann: airlied, thanks
09:07 jani: hwentlan__: stumbled on parse_edid_displayid_vrr() and parse_amd_vsdb() etc. in amdgpu_dm.c... why is this being added in the driver, it's all supposed to be in drm_edid.c...
09:08 jani: and the idea kind of is that drivers don't modify connector->display_info
09:23 tzimmermann: PSA: With -rc6 tagged, drm-misc-next-fixes is now open. Features still go into drm-misc-next. Fixes for v6.17 or stable go into drm-misc-fixes. Fixes for v6.18 go into drm-misc-next-fixes. Patches in -fixes branches should be small and have a Fixes tag.
13:58 sima: airlied, thx :-)
15:51 alyssa: how is b2b32 actually defined?
15:51 alyssa: is (('ineg', ('b2i32', a)), ('b2b32', a)) legal?
15:52 pendingchaos: IIRC b2b1(b2b32(a@1)) == a
15:52 karolherbst: my gut feeling says yes, but who knows
15:52 alyssa: pendingchaos: right. that's not strong enough for the rule I'd like
15:53 pendingchaos: (I could be wrong, I just remember the opcode being added so that the 32-bit value can be some faster but backend specific value)
15:54 alyssa: right..
15:54 alyssa: Intel would like an opcode that's actually explicit 0/~0
15:54 alyssa: so then we can optmize ineg/b2i32 in NIR
15:55 alyssa: Or maybe the crazier rule this shader could benefit from -- i2i32(unpack_32_4x8(ineg(b2i32(x)).<whatever>)) -> b2b32(x)
15:56 karolherbst: I wonder if there is hardware that defines those bools differently?
15:56 karolherbst: afaik nvidia defines them the same as intel
15:58 alyssa: ac/llvm seems to do 0/1 but could maybe be fixed
15:58 karolherbst: mhhh
15:58 karolherbst: could add configurable opts
15:59 jenatali: HLSL defines 32-bit bools as 0/1, not 0/~0
15:59 pendingchaos: ACO has always implemented it as 0/1
15:59 pendingchaos: problem with just "b2b1(b2b32(a@1)) == a" is that it can break if the backend doesn't match constant folding
15:59 alyssa: jenatali: that's not really relevant. we can opt_algebraic chew thru whatever.
15:59 alyssa: pendingchaos: yeah, exactly
16:00 karolherbst: at least on nvidia: 0/-1 and 0.0/1.0
16:00 alyssa: I don't care what encoding we pick but we should really have one canonical encoding
16:00 alyssa: in NIR
16:00 alyssa: because yeah not matching constant folding is.. bad
16:00 alyssa: and "NIR ops that change behaviour by backend" are.. bad
16:00 karolherbst: could make it part of the nir options
16:01 alyssa: no
16:01 alyssa: r600/sfn i cant tell at a quick glance what it does
16:01 alyssa: zink does 0/1
16:01 alyssa: as does dxil
16:02 alyssa: as does nak seemingly
16:02 alyssa: well this a mess.
16:02 karolherbst: nvidia might have gotten rid of a canonical format in hw
16:02 alyssa: half the backends do one thing and half do another
16:02 jenatali: What's constant folding do?
16:02 alyssa: jenatali: 0/~0
16:02 alyssa: i think
16:02 jenatali: Ouch
16:03 karolherbst: I'm actually curious why nak does 0/1 🙃
16:03 karolherbst: though I think the hw used to accept anything
16:04 karolherbst: as long as it's 0 and not 0
16:04 alyssa: r600/sfn I think does 0/~0 if I'm reading the code right
16:04 alyssa: it's zink/dxil, aco/ac-llvm, and nak vs everything else
16:05 alyssa: zink/dxil & nak look trivial to change to 0/~0 with no/minimal perf impact
16:05 alyssa: amd idk C++ scares me, which is why I work at compilers at Intel -- frick
16:07 karolherbst: just port it over to C before touching it, ez
16:08 alyssa: dont tempt me with a bad time
16:08 karolherbst: what scares me is, that with you I'm not sure you wouldn't suddently do it
16:08 alyssa: of note, AGX internally uses 0/1 booleans (it's better for the hw)
16:09 alyssa: but I still implemented b2b32 as 0/~0 because I thought I had to :)
16:09 alyssa: seems fine \shrug/
16:09 karolherbst: so in case it matters, CLC actually defines them as 0/1
16:09 alyssa: it really doesn't matter what frontends or backends want, it's easy to massage to whatever
16:09 alyssa: we just need to pick something and be consistent
16:11 alyssa: pendingchaos: do you have concenrs from an AMD perspective changing to 0/~0?
16:11 alyssa: gfxstrand: ^ from a nvidia
16:13 pendingchaos: b2b32 is faster for uniform booleans because it can use SCC directly
16:13 pendingchaos: but because we always use b2b32 right before a shared memory store (unless that changed at some point), we would insert a copy to convert to VGPR anyway
16:13 pendingchaos: for divergent booleans, 0/1 allows a trick with "a + b + b2b32(c)" to use only one instruction, but that's the shared store thing again
16:13 pendingchaos: so 0/~0 for b2b32 is probably fine
16:16 alyssa: the `a + b + b2b32(c)` trick being.. add-with-carry instruction?
16:17 pendingchaos: yes
16:18 alyssa: right..
16:18 pendingchaos: the carry-in is the same representation as divergent booleans
16:18 alyssa: where is the b2b32 coming from in that case? why is it not a b2i32?
16:18 alyssa: it's concerning given b2b32 is, currently underdefined it seems
16:21 pendingchaos: I'm not sure if that code actually appears, because IIRC b2b32 is only used for shared stores
16:22 pendingchaos: the carry-in opt was probably made to optimize "a + b + (c ? 1 : 0)" instead, but both look the same to ACO at this point
16:23 alyssa: right.. so that should be fixed to only look for b2i32 instead
16:28 alyssa: actually the aco opt is already fine
16:29 alyssa: yeah so all of this to me sounds like "define b2b32 as 0/~0, leave b2i32 as 0/1, fix isel in a few backends, move on"
18:32 alyssa: Or.. delete b2b32 altogether and just use bcsel(0, ~0) explicitly
18:33 alyssa: (and make nir_b2b32 a helper that generates a bcsel)
18:33 alyssa: similar to what idr did with i2b32 years ago
18:34 alyssa: also probably delete b2b1 and make it a helper doing ine
18:35 alyssa: this might require aco's bcsel & ine becoming more clever to avoid regressing codegen for scalar
18:37 alyssa: ir3 does a trick with ABSNEG_S which would need replumbed
18:38 alyssa: all of zink_nir_algebraic would get deleted which is nice
18:38 alyssa: ok. Yeah I think this is worth doing
18:40 idr: alyssa: Not to throw a wrench in things...
18:40 alyssa: /o\
18:41 idr: I have a branch that I've been poking at from time to time that tries to emit 16-bit Booleans to decrease register pressure.
18:41 alyssa: Great!
18:41 alyssa: ..So?
18:41 idr: That ends up producing some b2b32 when a b16 and a b32 would be mixed.
18:42 idr: I don't know of that would run afoul of what you're thinking of doing.
18:42 idr: The branch hasn't shown a clear win yet, so... *shrug*.
18:43 alyssa: idr: My current proposal is simply "remove b2b1 & b2b32 opcodes, systematically convert producers to ine/bcsel, make backend's ine/bcsel smarter to match codegen if needed"
18:43 idr: Okay. That sounds reasonable.
18:44 idr: I've been thinking we might want type conversion opcodes in brw, but that's a topic for another day.
18:45 idr: (Short version: MOV is too flexible. It's a hassle to determine, "Is this just type conversion, or is it doing other regioning nonsense too?")
18:45 alyssa: Sure. I don't think that has any bearing on the NIR clean up
19:28 alyssa: ..Ok, NIR trivia question..
19:28 alyssa: Is ieq valid on 1-bit bools? What about ine?
19:29 alyssa: (Can we done xnor in one op?)
19:33 alyssa: 27 files changed, 36 insertions(+), 211 deletions(-)
19:33 alyssa: Yeah...
19:39 alyssa: Am I scared to CTS/shader-db this? Sure am.
19:43 pendingchaos: I don't know if it's valid, but it should work with ACO anyway
19:45 glehmann: alyssa: iirc both 1bit ieq and ine are valid and used
19:46 alyssa: Cool
19:46 alyssa: because ir3 doesn't think so (:
19:46 glehmann: we should probably document/validate which ops can be used with 1bit vals, but that's annoying work
19:49 alyssa: yeah..
19:49 alyssa: in the interet of fairnes we should also allow iadd
19:50 alyssa: with equivalent behaviour to ine
19:50 alyssa: ineg, with equivalent behaviour to mov
19:50 alyssa: ...
19:50 alyssa: :p
20:19 alyssa: Kayden: yeah but those stages suck :p
20:19 alyssa: oops
20:20 jenatali: Wouldn't ineg be not, instead of mov, if we're treating i1 as 0/~0 (i.e. -1 in 2s complement)?
20:22 alyssa: jenatali: -x = (~x) + 1 = ~(~x) = x
20:22 alyssa: yes i am trolling
20:22 jenatali: Ah yeah ok
20:23 alyssa: jenatali: or if you prefer, the only bit is the sign bit
20:23 jenatali: Right, -0 == 0, and -1 would be 1 but that wraps back around to -1
20:23 alyssa: -(INT32_MIN) = INT32_MIN and all that jazz
20:23 alyssa: likewise, -(INT1_MIN) = INT1_MIN
20:24 alyssa: isn't modular arithmetic fun
20:24 alyssa: `-1 = 1 mod 2`
20:24 jenatali: Yep
20:25 alyssa: or if you prefer, `2x = 0 mod 2` hence `x = -x mod 2`
20:25 jenatali: I'm sorry I said anything :P
20:25 alyssa: i am waiting for intel shader-db to compile, i've got lots of math trolling time free :p
20:26 alyssa: (building AGX shaderdb on x86 is.. a lot faster than building for intel gpu,.)
20:26 alyssa: compiling every fragment shader twice is great
20:29 HdkR: More cores for more faster, just get like 48 :D
20:30 alyssa: HdkR: I can tell when shaderdb is done based on when the fans stop (:
20:31 HdkR: Oh hey, that's how I recognize that binaryninja is done
20:31 alyssa: my macbook is fanless \o/
20:34 HdkR: I think if I put enough radiator on this Threadripper I /could/ be fanless.
20:34 alyssa: Lol
20:35 dwfreed: anything can be fanless with a large enough radiator
20:35 dwfreed: the problem is "large enough" is often substantially larger than one usually has space for
20:40 alyssa: Oh gahhhhh
20:41 alyssa: I now see why these opcodes exist.
20:41 alyssa:twiddles her blue badge in frustration
20:44 alyssa: awful. well, I tried
21:18 pac85: Maybe ai could make use of those 1 bit signed integers
21:21 Kayden: maybe we can lower those 1 bit numbers to tomatoes and throw them at things
21:21 alyssa: pac85: vec32!
21:23 pac85: lol
21:28 pac85: Mmm would dot product be bit_count(a&b) & 1
21:32 karolherbst: pac85: ..... well cuda has support for it
21:32 pac85: Ah
21:33 karolherbst: behold a 16x8x256 matmul: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-fragment-mma-168256
21:34 karolherbst: I know for certain it's not a thing on hw 🙃
21:42 pac85: Messed up lol
22:10 Mis012[m]: AI likes 1.5bit, Nvidia should really get on that
23:34 ngcortes: qq; is there a way to manually reset the gpu when a gpu hang occurs? also, if there are concurrent processes running that are using the gpu, is there a way to "pause" those processes when a gpu reset occurs (ie. after a gpu hang)?
23:34 ngcortes: on i915/xe of course