IRC Logs of #dri-devel on irc.freenode.net for 2024-08-22

02:32 ishitatsuyuki: you can lateload GPU drivers, I think it's the default on arch
02:32 ishitatsuyuki: Loading it early gets rid of one annoying flicker
07:11 tzimmermann: sima, airlied, hi. can you please merge last week's PR for drm-misc-next? https://lore.kernel.org/dri-devel/20240816084109.GA229316@localhost.localdomain/
07:19 MrCooper: DemiMarie: if the improper synchronization affects data which ends up used for shader control flow, anything seems possible
07:19 sima: tzimmermann, will do that now, was heading out for a concert yesterday evening already
07:20 tzimmermann: sima, ok np
07:21 tzimmermann: spare time *is* important
08:16 sima: tzimmermann, done
09:29 jfalempe: Is it still possible to get the panic QR code reviewed for v6.12 ?
09:35 kode54: dang
09:35 kode54: I can't build mesa 32 bit from this developer's branch because libnak fails to build
09:36 sima: agd5f, for the fd fixes series from al viro I'm assuming you'll just land that in amdgpu.git?
09:36 kode54: it's also building nouveau even though I disabled nouveau
09:37 kode54: https://gist.github.com/kode54/58f1de5dac224016f0a5108c3875ea9f
09:43 kode54: oops, I somehow disabled nouveau for gallium, but not for vulkan
09:52 kode54: probably a broken tree as far as libnak is concerned, though
09:52 kode54: (not mesa/mesa?main)
10:06 kode54: happening on a branch based on a8a15dc5b585b99f823c2ef2f2edb7906d0b35d1
10:20 MrCooper: kisak: efifb or simplefb only show up on displays connected to GPUs initialized by the system firmware, which may not include e.g an external monitor connected to a laptop with dGPU
12:00 luc: in vulkan, what does it depend on whether to normalize a varying? e.g. outUV = vec3((gl_VertexIndex << 1) & 2, gl_VertexIndex & 2, 0.0); in vs, outUV.s is supposed to range [0, 2], but it seems to range in [0, 1] when outUV is used in fs. Would varyings in vulkan be implicitly normalized?
12:42 tzimmermann: sima, thanks. if you have a bit, could you also forward drm-next to the most recent upstream (v6.11-rc4 IIRC) ?
12:42 sima: tzimmermann, do you have some specific commit you need?
12:43 tzimmermann: sima, no. i just want all the fixes
12:43 sima: (I kinda screwed that up a bit on the last backmerge)
12:44 sima: linus doesn't like backmerges for no reason much, so we try to avoid them a bit ...
12:44 DemiMarie: MrCooper: is it sufficient for a Wayland compositor to not have shader control flow that depends on client-provided values?
12:46 DemiMarie: If so, is this something Mesa would be willing to guarantee?
12:55 DemiMarie: If not, what is sufficient?
13:00 tzimmermann: jfalempe, i've reviewed patches 1 to 3. can't really say much about the rust code.
13:01 tzimmermann: jfalempe, deadline for v6.12 is -rc6, which happens end of next week
13:01 jfalempe: tzimmermann: thanks, yes for the rust code, I already have a review by Alice Ryhl
13:02 jfalempe: tzimmermann: and I'm on PTO next week, so thanks again for your review.
13:03 tzimmermann: my comments are optional. your choice
13:15 agd5f: sima, I think that makes sense.
13:23 zmike: mareko: pls ack https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30772
13:29 daniels: eric_engestrom: do you want to take a pass on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30784
13:32 eric_engestrom: daniels: gladly, and thank you for doing this!
13:32 daniels: np
13:34 eric_engestrom: daniels: nit-pick, but why are broadcom and freedreno changes in the same commit, but everything else in its own commit?
13:35 daniels: eric_engestrom: no set reason
13:36 daniels: would you like to separate v3d from fdno?
13:36 daniels: or separate igalia from google from collabora?
13:36 eric_engestrom: I think the split of the other commits makes sense, it's just src/broadcom/ and src/freedreno/ being in the same commit that looks weird
13:36 eric_engestrom: but again, it's a nit-pick
13:36 eric_engestrom: I reviewed all the CI commits, r-b on all of them
13:37 eric_engestrom: looking at the script commits now
13:46 MrCooper: DemiMarie: it would require solving the halting problem ;)
14:22 DemiMarie: MrCooper: what do you mean?
14:22 MrCooper: what you want is Mesa deciding whether or not the shader will terminate
14:22 DemiMarie: No
14:22 DemiMarie: I don't want that
14:22 MrCooper: that's what it boils down to
14:23 DemiMarie: In the most general case, yes, but I'm not interested in the general case.
14:24 MrCooper: both GL and Vulkan shaders are general enough that it's the halting problem
14:24 DemiMarie: I'm not interested in arbitrary shaders
14:25 DemiMarie: I'm asking for rules that are sufficient to guarantee that unsynchronized concurrent modification of an input buffer will not cause out of bounds accesses.
14:26 DemiMarie: For instance, if my shader has no control flow at all (and is thus guaranteed to terminate), is this sufficient?
14:43 emersion: any client can infinite loop in the GPU, and that might reset the compositor's context
15:00 mareko: karolherbst: it looks like it's more correct for OpenCL not to evaluate constant expressions for some reason, here are examples of both CUDA and ROCm not evaluating sqrt(4) at compile time: https://godbolt.org/z/sTMb1b7WK https://cuda.godbolt.org/z/Pc38jbTro
15:01 karolherbst: mareko: it kinda depends, but yeah.. the question is if you guarantee that constant expression evaluate to the same as done on the GPU
15:02 karolherbst: and I think CL doesn't make such a guarantee
15:03 karolherbst: but maybe we do have to run all of that at the GPU... I think it's implied by the C99 spec
15:04 mareko: that may affect perf comparisons between rusticl and other drivers
15:05 mareko: karolherbst: also, radeonsi on RDNA 3 currently destroys all other AMD drivers in clear/copy_buffer performance
15:06 karolherbst: 🙃
15:06 mareko: optimizations for other generations are coming
15:06 karolherbst: I was considering filing for conformance for rusticl+radeonsi, because AMD's official one isn't 🙃
15:07 karolherbst: but there are 2 or 3 real annoying bugs
15:07 karolherbst: and apparently idiv is also broken
15:07 karolherbst: e.g. https://gitlab.freedesktop.org/mesa/mesa/-/issues/11761
15:07 karolherbst: no idea what's wrong there, but must be radeonsi specific
15:13 mareko: radeonsi lowers idiv in NIR
15:13 karolherbst: yeah I know
15:13 karolherbst: and there was a fix needed for asahi as well
15:13 karolherbst: but even with that one it's still wrong
15:14 karolherbst: adjusted to also apply for radeonsi: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/f325a39f79db1512286c4206189fe28a4820d29c
15:14 karolherbst: but maybe I've messed up...
15:15 mareko: oh I think radeonsi implement frcp for F32 inexactly
15:15 mareko: *FP32
15:15 mareko: and we don't enable FP32 denormals either
15:15 karolherbst: ahh, that might do it
15:18 DemiMarie: emersion: that's excluded from my worries
15:21 DemiMarie: emersion: what I'm concerned about is Mesa (or LLVM) generating code that assumes there are no data races, and which becomes exploitable by a malicious client.
15:22 DemiMarie: In Qubes OS, if one client can cause another client to misrender, that is a security vulnerability.
15:23 karolherbst: mareko: if you can throw me a quick patch to try it out that would be helpful :) reverting the fix for asahi doesn't seem to make a different in this case, so must be something else probably
15:25 mareko: karolherbst: oh we lower fdiv in NIR, and I think "(('fdiv', a, b), ('fmul', a, ('frcp', b)), 'options->lower_fdiv')," is definitely incorrect for OpenCL, forwarding fdiv to LLVM should fix it
15:25 karolherbst: asahi also lowers fdiv in nir and that one is fine
15:26 karolherbst: mhhh
15:26 karolherbst: wait
15:26 karolherbst: this is about idiv
15:26 karolherbst: I think fdiv is fine regardless
15:27 karolherbst: the only fp32 related bug I have with radeonsi is nextafter
15:27 mareko: I'm sure fdiv isn't fine, but I don't know why you're not seeing failures
15:27 karolherbst: because I already have a thing to make it more precise for CL
15:28 karolherbst: nir_scale_fdiv
15:28 karolherbst: it's also needed on e.g. nvidia
15:28 karolherbst: but CL only needs 2.5 ULP precision here, so it's good enough
15:30 karolherbst: mhh.. radeonsi has multithreading issues I haven't noticed before .. oh well..
15:30 karolherbst: anyway
15:30 karolherbst: fdiv with radeonsi: 2: divide fp32 ................passed 1.97 @ {-0x1.dafff2p-53, -0x1.dfc0e6p-84}
15:30 karolherbst: so 1.97 ULP in avarage
15:31 karolherbst: without scale_fdiv: ERROR: divide: -nan ulp error at {-inf, -0x1.fffffep+127}: *inf vs. -nan (0xffc00000) at index: 197 :)
15:35 karolherbst: mareko: but anyway, I'm more concerned about idiv atm, because that's clearly showing a bug :)
15:35 karolherbst: also with aco it seems
15:36 karolherbst: ehh wait
15:36 karolherbst: aco is fine
15:38 mareko: nir_scale_fdiv could be inefficient, the hw has dedicated v_div_scale and v_div_fixup instructions for that
15:38 karolherbst: ahh, good to know
15:39 karolherbst: I guess those do the same thing given it's called "scale"
15:39 mareko: yeah
15:40 karolherbst: could add a nir instruction and lowering for that one
15:41 mareko: just passing unlowered fdiv to LLVM should be enough
15:41 karolherbst: right.. but that won't work for e.g. aco
15:41 mareko: yeah
15:42 karolherbst: and I think other hardware might also have somehting like that
15:42 karolherbst: I just know that nvidia doens't
15:48 karolherbst: the nir is identical between aco and llvm, so that's good I guess...
15:49 karolherbst: assembly: https://gist.github.com/karolherbst/4dddd5dc5686f06fe928bf3ccb3729dc
15:49 mareko: I guess we could just pass unlowered idiv to LLVM as well
15:55 bl4ckb0ne: :q
15:55 karolherbst: mhh oh right.. the idiv lowering is mostly folded, because it's constant divisors
15:55 bl4ckb0ne: wait thats not my vim term
17:43 karolherbst: mareko: it seems to be an issue with nir_opt_idiv_const actually....
17:43 karolherbst: if I disable that pass, llvm sdiv/udiv and nir_lower_idiv both work
17:43 karolherbst: if I don't, both fail as well
17:58 karolherbst: mhh, the main difference seems to be v_mul_hi_i32_i24_e32 (aco) and v_mul_hi_u32 (llvm)
18:00 karolherbst: ehh, the other way around, v_mul_hi_i32_i24_e32 is from LLVM
18:14 zamundaaa[m]: Demi: the wrong rendering would happen only in the window of the bad client
18:14 zamundaaa[m]: Which the client can already do anyways
18:15 DemiMarie: zamundaaa: is that guaranteed?
18:15 pendingchaos: that's probably the bug. v_mul_hi_u32(a, b) where signext_24_32(a)==a && signext_24_32(b)==b isn't the same as v_mul_hi_i32_i24_e32(a, b) if a or b is negative
18:16 zamundaaa[m]: If the compositor doesn't do anything really stupid
18:17 zamundaaa[m]: If a GPU reset happens, that could corrupt the whole screen temporarily, but you don't need to send the compositor any data to cause one
18:17 karolherbst: pendingchaos: mhh... that would be a bug in LLVM then, no? Because nir_to_llvm simply emits LLVMBuildMul with some shifts
18:18 DemiMarie: zamundaaa: does that mean that Mesa’s optimizers are not allowed to introduce time-of-check to time-of-use issues to shaders?
18:18 karolherbst: or mhh..
18:19 zamundaaa[m]: I'm not sure what you mean?
18:20 DemiMarie:thinks if she needs to use a pastebin for a code block
18:22 karolherbst: the reporter also uses llvm-18...
18:42 DemiMarie: zamundaaa: https://gist.github.com/DemiMarie/ecd489920163017b09a5a8528fd03f27
18:43 DemiMarie: (put in a gist to not confuse IRC users)
18:44 zamundaaa[m]: I don't know if there's guarantees for that, but fetching pixels from the buffer multiple times would be bad for performance
18:44 zamundaaa[m]: So I doubt it would ever do such a thing
18:46 pendingchaos: I can imagine situations where that transformation might be beneficial
18:49 pendingchaos: whether it's allowed or not probably depends on what kind of load "c[0]" is and what's between "a = c[0]" and "b[a] = 0"
18:55 DemiMarie: In C it is allowed unless c is a pointer to an _Atomic type, but GLSL doesn’t have those and even if it did I doubt compositors would use them for these operations.
19:00 DemiMarie: pendingchaos: that’s what makes me nervous.
19:09 DemiMarie: Are there any formats that have an alpha channel and for which there is no opaque format with the same layout?
19:09 DemiMarie: If I get an image in ARGB8888, I can tell a compositor to treat it as XRGB8888 without needing to make any copies. Are there any formats for which there is no opaque version?