02:32ishitatsuyuki: you can lateload GPU drivers, I think it's the default on arch
02:32ishitatsuyuki: Loading it early gets rid of one annoying flicker
07:11tzimmermann: sima, airlied, hi. can you please merge last week's PR for drm-misc-next? https://lore.kernel.org/dri-devel/20240816084109.GA229316@localhost.localdomain/
07:19MrCooper: DemiMarie: if the improper synchronization affects data which ends up used for shader control flow, anything seems possible
07:19sima: tzimmermann, will do that now, was heading out for a concert yesterday evening already
07:20tzimmermann: sima, ok np
07:21tzimmermann: spare time *is* important
08:16sima: tzimmermann, done
09:29jfalempe: Is it still possible to get the panic QR code reviewed for v6.12 ?
09:35kode54: dang
09:35kode54: I can't build mesa 32 bit from this developer's branch because libnak fails to build
09:36sima: agd5f, for the fd fixes series from al viro I'm assuming you'll just land that in amdgpu.git?
09:36kode54: it's also building nouveau even though I disabled nouveau
09:37kode54: https://gist.github.com/kode54/58f1de5dac224016f0a5108c3875ea9f
09:43kode54: oops, I somehow disabled nouveau for gallium, but not for vulkan
09:52kode54: probably a broken tree as far as libnak is concerned, though
09:52kode54: (not mesa/mesa?main)
10:06kode54: happening on a branch based on a8a15dc5b585b99f823c2ef2f2edb7906d0b35d1
10:20MrCooper: kisak: efifb or simplefb only show up on displays connected to GPUs initialized by the system firmware, which may not include e.g an external monitor connected to a laptop with dGPU
12:00luc: in vulkan, what does it depend on whether to normalize a varying? e.g. outUV = vec3((gl_VertexIndex << 1) & 2, gl_VertexIndex & 2, 0.0); in vs, outUV.s is supposed to range [0, 2], but it seems to range in [0, 1] when outUV is used in fs. Would varyings in vulkan be implicitly normalized?
12:42tzimmermann: sima, thanks. if you have a bit, could you also forward drm-next to the most recent upstream (v6.11-rc4 IIRC) ?
12:42sima: tzimmermann, do you have some specific commit you need?
12:43tzimmermann: sima, no. i just want all the fixes
12:43sima: (I kinda screwed that up a bit on the last backmerge)
12:44sima: linus doesn't like backmerges for no reason much, so we try to avoid them a bit ...
12:44DemiMarie: MrCooper: is it sufficient for a Wayland compositor to not have shader control flow that depends on client-provided values?
12:46DemiMarie: If so, is this something Mesa would be willing to guarantee?
12:55DemiMarie: If not, what is sufficient?
13:00tzimmermann: jfalempe, i've reviewed patches 1 to 3. can't really say much about the rust code.
13:01tzimmermann: jfalempe, deadline for v6.12 is -rc6, which happens end of next week
13:01jfalempe: tzimmermann: thanks, yes for the rust code, I already have a review by Alice Ryhl
13:02jfalempe: tzimmermann: and I'm on PTO next week, so thanks again for your review.
13:03tzimmermann: my comments are optional. your choice
13:15agd5f: sima, I think that makes sense.
13:23zmike: mareko: pls ack https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30772
13:29daniels: eric_engestrom: do you want to take a pass on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30784
13:32eric_engestrom: daniels: gladly, and thank you for doing this!
13:32daniels: np
13:34eric_engestrom: daniels: nit-pick, but why are broadcom and freedreno changes in the same commit, but everything else in its own commit?
13:35daniels: eric_engestrom: no set reason
13:36daniels: would you like to separate v3d from fdno?
13:36daniels: or separate igalia from google from collabora?
13:36eric_engestrom: I think the split of the other commits makes sense, it's just src/broadcom/ and src/freedreno/ being in the same commit that looks weird
13:36eric_engestrom: but again, it's a nit-pick
13:36eric_engestrom: I reviewed all the CI commits, r-b on all of them
13:37eric_engestrom: looking at the script commits now
13:46MrCooper: DemiMarie: it would require solving the halting problem ;)
14:22DemiMarie: MrCooper: what do you mean?
14:22MrCooper: what you want is Mesa deciding whether or not the shader will terminate
14:22DemiMarie: No
14:22DemiMarie: I don't want that
14:22MrCooper: that's what it boils down to
14:23DemiMarie: In the most general case, yes, but I'm not interested in the general case.
14:24MrCooper: both GL and Vulkan shaders are general enough that it's the halting problem
14:24DemiMarie: I'm not interested in arbitrary shaders
14:25DemiMarie: I'm asking for rules that are sufficient to guarantee that unsynchronized concurrent modification of an input buffer will not cause out of bounds accesses.
14:26DemiMarie: For instance, if my shader has no control flow at all (and is thus guaranteed to terminate), is this sufficient?
14:43emersion: any client can infinite loop in the GPU, and that might reset the compositor's context
15:00mareko: karolherbst: it looks like it's more correct for OpenCL not to evaluate constant expressions for some reason, here are examples of both CUDA and ROCm not evaluating sqrt(4) at compile time: https://godbolt.org/z/sTMb1b7WK https://cuda.godbolt.org/z/Pc38jbTro
15:01karolherbst: mareko: it kinda depends, but yeah.. the question is if you guarantee that constant expression evaluate to the same as done on the GPU
15:02karolherbst: and I think CL doesn't make such a guarantee
15:03karolherbst: but maybe we do have to run all of that at the GPU... I think it's implied by the C99 spec
15:04mareko: that may affect perf comparisons between rusticl and other drivers
15:05mareko: karolherbst: also, radeonsi on RDNA 3 currently destroys all other AMD drivers in clear/copy_buffer performance
15:06karolherbst: 🙃
15:06mareko: optimizations for other generations are coming
15:06karolherbst: I was considering filing for conformance for rusticl+radeonsi, because AMD's official one isn't 🙃
15:07karolherbst: but there are 2 or 3 real annoying bugs
15:07karolherbst: and apparently idiv is also broken
15:07karolherbst: e.g. https://gitlab.freedesktop.org/mesa/mesa/-/issues/11761
15:07karolherbst: no idea what's wrong there, but must be radeonsi specific
15:13mareko: radeonsi lowers idiv in NIR
15:13karolherbst: yeah I know
15:13karolherbst: and there was a fix needed for asahi as well
15:13karolherbst: but even with that one it's still wrong
15:14karolherbst: adjusted to also apply for radeonsi: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/f325a39f79db1512286c4206189fe28a4820d29c
15:14karolherbst: but maybe I've messed up...
15:15mareko: oh I think radeonsi implement frcp for F32 inexactly
15:15mareko: *FP32
15:15mareko: and we don't enable FP32 denormals either
15:15karolherbst: ahh, that might do it
15:18DemiMarie: emersion: that's excluded from my worries
15:21DemiMarie: emersion: what I'm concerned about is Mesa (or LLVM) generating code that assumes there are no data races, and which becomes exploitable by a malicious client.
15:22DemiMarie: In Qubes OS, if one client can cause another client to misrender, that is a security vulnerability.
15:23karolherbst: mareko: if you can throw me a quick patch to try it out that would be helpful :) reverting the fix for asahi doesn't seem to make a different in this case, so must be something else probably
15:25mareko: karolherbst: oh we lower fdiv in NIR, and I think "(('fdiv', a, b), ('fmul', a, ('frcp', b)), 'options->lower_fdiv')," is definitely incorrect for OpenCL, forwarding fdiv to LLVM should fix it
15:25karolherbst: asahi also lowers fdiv in nir and that one is fine
15:26karolherbst: mhhh
15:26karolherbst: wait
15:26karolherbst: this is about idiv
15:26karolherbst: I think fdiv is fine regardless
15:27karolherbst: the only fp32 related bug I have with radeonsi is nextafter
15:27mareko: I'm sure fdiv isn't fine, but I don't know why you're not seeing failures
15:27karolherbst: because I already have a thing to make it more precise for CL
15:28karolherbst: nir_scale_fdiv
15:28karolherbst: it's also needed on e.g. nvidia
15:28karolherbst: but CL only needs 2.5 ULP precision here, so it's good enough
15:30karolherbst: mhh.. radeonsi has multithreading issues I haven't noticed before .. oh well..
15:30karolherbst: anyway
15:30karolherbst: fdiv with radeonsi: 2: divide fp32 ................passed 1.97 @ {-0x1.dafff2p-53, -0x1.dfc0e6p-84}
15:30karolherbst: so 1.97 ULP in avarage
15:31karolherbst: without scale_fdiv: ERROR: divide: -nan ulp error at {-inf, -0x1.fffffep+127}: *inf vs. -nan (0xffc00000) at index: 197 :)
15:35karolherbst: mareko: but anyway, I'm more concerned about idiv atm, because that's clearly showing a bug :)
15:35karolherbst: also with aco it seems
15:36karolherbst: ehh wait
15:36karolherbst: aco is fine
15:38mareko: nir_scale_fdiv could be inefficient, the hw has dedicated v_div_scale and v_div_fixup instructions for that
15:38karolherbst: ahh, good to know
15:39karolherbst: I guess those do the same thing given it's called "scale"
15:39mareko: yeah
15:40karolherbst: could add a nir instruction and lowering for that one
15:41mareko: just passing unlowered fdiv to LLVM should be enough
15:41karolherbst: right.. but that won't work for e.g. aco
15:41mareko: yeah
15:42karolherbst: and I think other hardware might also have somehting like that
15:42karolherbst: I just know that nvidia doens't
15:48karolherbst: the nir is identical between aco and llvm, so that's good I guess...
15:49karolherbst: assembly: https://gist.github.com/karolherbst/4dddd5dc5686f06fe928bf3ccb3729dc
15:49mareko: I guess we could just pass unlowered idiv to LLVM as well
15:55bl4ckb0ne: :q
15:55karolherbst: mhh oh right.. the idiv lowering is mostly folded, because it's constant divisors
15:55bl4ckb0ne: wait thats not my vim term
17:43karolherbst: mareko: it seems to be an issue with nir_opt_idiv_const actually....
17:43karolherbst: if I disable that pass, llvm sdiv/udiv and nir_lower_idiv both work
17:43karolherbst: if I don't, both fail as well
17:58karolherbst: mhh, the main difference seems to be v_mul_hi_i32_i24_e32 (aco) and v_mul_hi_u32 (llvm)
18:00karolherbst: ehh, the other way around, v_mul_hi_i32_i24_e32 is from LLVM
18:14zamundaaa[m]: Demi: the wrong rendering would happen only in the window of the bad client
18:14zamundaaa[m]: Which the client can already do anyways
18:15DemiMarie: zamundaaa: is that guaranteed?
18:15pendingchaos: that's probably the bug. v_mul_hi_u32(a, b) where signext_24_32(a)==a && signext_24_32(b)==b isn't the same as v_mul_hi_i32_i24_e32(a, b) if a or b is negative
18:16zamundaaa[m]: If the compositor doesn't do anything really stupid
18:17zamundaaa[m]: If a GPU reset happens, that could corrupt the whole screen temporarily, but you don't need to send the compositor any data to cause one
18:17karolherbst: pendingchaos: mhh... that would be a bug in LLVM then, no? Because nir_to_llvm simply emits LLVMBuildMul with some shifts
18:18DemiMarie: zamundaaa: does that mean that Mesa’s optimizers are not allowed to introduce time-of-check to time-of-use issues to shaders?
18:18karolherbst: or mhh..
18:19zamundaaa[m]: I'm not sure what you mean?
18:20DemiMarie:thinks if she needs to use a pastebin for a code block
18:22karolherbst: the reporter also uses llvm-18...
18:42DemiMarie: zamundaaa: https://gist.github.com/DemiMarie/ecd489920163017b09a5a8528fd03f27
18:43DemiMarie: (put in a gist to not confuse IRC users)
18:44zamundaaa[m]: I don't know if there's guarantees for that, but fetching pixels from the buffer multiple times would be bad for performance
18:44zamundaaa[m]: So I doubt it would ever do such a thing
18:46pendingchaos: I can imagine situations where that transformation might be beneficial
18:49pendingchaos: whether it's allowed or not probably depends on what kind of load "c[0]" is and what's between "a = c[0]" and "b[a] = 0"
18:55DemiMarie: In C it is allowed unless c is a pointer to an _Atomic type, but GLSL doesn’t have those and even if it did I doubt compositors would use them for these operations.
19:00DemiMarie: pendingchaos: that’s what makes me nervous.
19:09DemiMarie: Are there any formats that have an alpha channel and for which there is no opaque format with the same layout?
19:09DemiMarie: If I get an image in ARGB8888, I can tell a compositor to treat it as XRGB8888 without needing to make any copies. Are there any formats for which there is no opaque version?