11:04 glehmann: anholt: looks like the new traces are flaky: https://mesa.pages.freedesktop.org/-/mesa/-/jobs/98168633/artifacts/results/index.html
12:10 rodrigovivi: mripard: mlankhorst tzimmermann ack on this drm-ras patch going through drm-xe-next? https://patchwork.freedesktop.org/series/164601/
12:10 rodrigovivi: we already got rv-b from netdev maintainer and we have more active ras development going on xe side... so we would like to minimize conflicts
12:25 mripard: rodrigovivi: yep, ack
12:38 rodrigovivi: thanks
13:57 karolherbst: jenatali: what are the semantics of "DXIL_INTR_FMA"? Always fused? The backend can decide?
13:59 jenatali: karolherbst: always fused but only available for doubles...
13:59 karolherbst: the only thing I've found is "Fused multiply-add. This operation is only defined in double precision. Fma(a,b,c) = a * b + c"
13:59 karolherbst: right...
13:59 karolherbst: okay
13:59 jenatali: I have a feature request to our compiler team to expose it for 32 bit
14:00 karolherbst: jenatali: okay.. anyway, fyi, I'm working on cleaning up the ffma mess, so we could deal with it being always fused, or sometimes fused, but for CL I guess you kinda want both anyway 🙃
14:00 karolherbst: or only "always fused"
14:02 jenatali: karolherbst: many thanks for cleaning this up! Dealing with libclc is going to be complicated, since you sometimes need the software fallback and sometimes not, and iirc library functions have alternate approaches embedded if it gets compiled without hardware fma that's cheaper than full software fma
14:02 karolherbst: yeah..
14:02 karolherbst: llvm-22 is also being more painful there
14:02 karolherbst: it starts to use SPV_KHR_fma :')
14:03 karolherbst: but yeah.. I'm adding three multadd variants: never fused, sometimes fused (for layered impl or llvmpipe mostly, generated by frontends) and always fused
14:03 karolherbst: so it's all nice and explicit
14:04 karolherbst: anyway.. I'm planning to move the "do we use sw or real ffma" out of vtn_opencl
14:04 karolherbst: and make it an explicit lowering that takes the impl from libclc kinda like inlining
14:05 karolherbst: or we copy the ffma sw impl, because then vulkan drivers could also use it
14:06 lumag: CounterPillow: sorry, I'll take a look soon
14:07 jenatali: karolherbst: I think we might need 2 libclc compilations though to handle the different versions of functions that expect hardware fma or have workarounds
14:07 jenatali: The rest of that plan sounds great though
14:07 jenatali: Lemme know if you need reviews
14:07 karolherbst: jenatali: libclc has uhm... "flags" to change it's behavior now
14:07 karolherbst: noinline functions that return true/false
14:08 karolherbst: e.g. to indicate denorm flushing
14:08 jenatali: Ooh ok
14:08 karolherbst: and we could override them in theory
14:08 CounterPillow: lumag: thanks :) no worries, sent out v14 already, so it's no longer a "would be convenient if it happened soon", more of a thing still on the list of outstanding stuff along with AMD's dead silence
14:08 karolherbst: ~~that's why spec constants exist, but guess there is no way in CLC to express spec constants...~~
14:08 jenatali: Been a while since I looked but makes sense
14:08 karolherbst: I think the same exists for fma, but worst case we could add it
14:14 jenatali: Fwiw DXIL does have mad() which iirc is allowed to be fused, but the point is just that when marked precise it can't be reassociated
14:14 jenatali: So I'd use the "maybe fused" instr to map to that I think
14:16 karolherbst: austriancoder: "MAD" in the etnaviv compiler is fused or unfused?
14:17 karolherbst: jenatali: yeah, I've added an option that layered impls can turn on to keep "ffma_weak" (that can be implemented as both, but exact one has to always be fused or unfused as a global choice)
14:18 jenatali: Nice
14:28 karolherbst: simon-perretta-img: same question for you: "pco_fmad" is that fused or unfused? I'm kinda getting mixed signals there from the pco_isa.py file
16:13 anholt: glehmann: downside to having done my stress testing mostly outside of work hours: the x86 runners weren't loaded at the time. MR incoming.
17:44 karolherbst: jenatali: in case you want to take a peek, it's still WIP tho, but it _should_ be almost done: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41165
17:46 jenatali: 61 commits :O
17:49 karolherbst: well
17:49 karolherbst: most are small
17:50 karolherbst: and there are 7 empty commits I just added so I can group the commits
17:50 karolherbst: or rather make sure I do things in the correct order
18:01 jenatali: FWIW I'm ~80% sure that DXIL FMad is ffma_weak, not fmad, since it inherits from DXBC which has this wording for "precise": "So if component(s) of a mad instruction are tagged as PRECISE, the hardware must execute a mad (or exact equivalent), and cannot split it into a multiply followed by an add. Conversely, a multiply followed by an add, where either or both are flagged as PRECISE, cannot be merged into a fused mad."
18:01 karolherbst: jenatali: I found this: https://github.com/Microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst#fmad
18:02 karolherbst: yeah..
18:02 karolherbst: but those are fmad semantics
18:02 jenatali: Ah ok
18:02 karolherbst: precise fmad: never fused
18:02 karolherbst: precise ffmake_weak: fused or unfused, but gotta pick and stick to it
18:02 karolherbst: *ffma_weak
18:02 jenatali: "cannot be split into a multiply followed by an add" makes it read as optionally fused
18:03 karolherbst: mhhh yeah.. the DXBC wording sounds different...
18:04 karolherbst: but that sounds more like real ffma
18:04 karolherbst: like even inexact ffma can be unfused, because it already allows reassoc and transformation, so you don't even have the promise it stays a mul + add at all
18:04 jenatali: But of course dxilconv upconverts one to the other so their semantics should be the same. This stuff is always dumb
18:05 karolherbst: naming is difficult 🙃
18:05 jenatali: Anyway my personal take is that we should map ffma_weak to FMad
18:05 karolherbst: any good way to confirm what's what?
18:06 jenatali: WARP implements FMad as FMA on CPUs where it's available and I haven't seen that blow up yet? :)
18:07 karolherbst: same for precise FMad?
18:07 jenatali: Yeah, precise indicates that it can't be unfused later
18:08 karolherbst: guess the DXIL docs I found are wrong then
18:09 karolherbst: would be good to have that clarified
18:09 jenatali: Yeah idk. It's fine, it's easy enough to clean up the semantics later, don't worry too much about it for my backend, I'll talk to people and figure it out
18:09 karolherbst: thanks!
18:11 karolherbst: what's a bit scary, that I push my MR through the VK CTS on NVK and I see this: "Pass: 717421, UnexpectedImprovement: 78, ExpectedFail: 7, Warn: 1, Skip: 726493, Duration: 51:15, Remaining: 51:54" :')
18:11 karolherbst: something about 10x6 and 12x4 formats...
18:56 alyssa: wasn't nvk conformant
18:57 karolherbst: sure, but I think that's new stuff since then
18:58 karolherbst: it's just weird that out of all things, fma handling is fixing those tests
19:10 alyssa: ah
19:16 karolherbst: anyway yeah.. it's only 12x4 and 10x6 format tests that are fixed by that..