04:37 kode54: I was told it's normal for the Xe driver to suspend the GT on boot
04:37 kode54: that leads to no VT to log into
05:44 ickle_: diff --git a/tests/i915/gem_exec_create.c b/tests/i915/gem_exec_create.c
05:44 ickle_: index 05cc83dcea08..8551ca5722b1 100644
05:44 ickle_: --- a/tests/i915/gem_exec_create.c
05:44 ickle_: +++ b/tests/i915/gem_exec_create.c
05:44 ickle_: @@ -338,7 +338,7 @@ igt_main
05:44 ickle_: igt_require(sz);
05:44 ickle_: igt_require_memory(m->threads * count, 0, CHECK_RAM);
05:44 ickle_: if (count * sz < r->size) {
05:44 ickle_: - count = (m->threads * count * sz - r->size) / sz;
05:44 ickle_: + count = m->threads * count * sz / sz;
05:44 ickle_: igt_require_memory(count, sz, CHECK_RAM);
05:44 ickle_: }
06:07 kode54: JFS
06:07 kode54: JFC
06:07 kode54: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/272#note_1906890
08:08 MrCooper: jenatali: Marge doesn't actively watch the CI, it just gets the green/red result at the end; besides, sometimes a human watching can retry a job which failed due to a flake and prevent the pipeline from failing
09:04 luc: catenated symbol name like this https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/targets/dri/target.c#L7 is missing ##drivername in `readelf -s`, why? is this a readelf bug?
09:04 luc: http://ix.io/4vKK
09:04 luc: 337 ~ 342
13:50 glassVK: hello, I am a student in search of learning resources..
13:51 glassVK: do you guys have room for beginner contributors?
14:47 alyssa: dj-death: Happy code deletion
22:44 alyssa: currently doing screen capture in firefox in sway
22:44 alyssa: ahaha the future is now
22:44 alyssa: :D
23:29 alyssa: anholt_: ok, so the hsl algorithm fail I have on AGX is similar to what you had on turnip
23:30 alyssa: The shader does `mediump float foo = r - g;` where r and g are loaded from a mediump vertex input backed by an R32_FLOAT vertex buffer
23:30 alyssa: So logically, the test expects to do fsub(f2f16(r), f2f16(g))
23:31 alyssa: However, our backend copyprop is implementing this effectively as f2f16(fsub(r, g))
23:32 alyssa: which, now I'm wondering if that's legal. certainly not if it's exact. probably fine for gles and technically a test bug even though there's also a driver bug that vk would hit
23:32 alyssa: The shader code *looks* innocuous enough, something like
23:32 alyssa: fadd32 r0l, r1, r2
23:33 alyssa: but.. maybe promoting 16-bit ALU to 32-bit ALU to fold away f2f16 sources isn't kosher after all
23:34 alyssa: similar problem with the destination... If we have f2f32(fadd(x, y)) the backend will fold that to
23:34 alyssa: fadd32 r0, r1l, r2l
23:34 alyssa: but again doing the add at higher precision than the nIR
23:34 alyssa: unclear to me if/when doing stuff at higher precision would ever not be ok
23:34 alyssa: this also affects midgard which architecturally lacks fp16 scalar arithmetic and instead does fp32 with f2f16/f2f32 on the inputs/outputs
23:35 alyssa: (though I don't think the test is failing there, possibly by pure luck of getting vectorized and using the true fp16 vector units)
23:35 alyssa: so.. all in all, this is possibly both a driver bug and a CTS bug. unsure what to do about either one
23:37 alyssa: (By the way, why are no other gles drivers affected? because normal GPUs would fold the f2f16 into the vertex load, since it'd be dedicated vertex fetch hardware that does the memory float32 -> register fp16 internally. AGX does vertex fetch in software which makes the conversions all explicit ALU.)
23:42 alyssa: I'm now also wondering if this would run astray of gl invariance rules, if we swap in a fast linked (vertex fetch separate from main VS, f2f16 not folded in) for a monolithic (f2f16 folded in by promoting some other ALU to fp32) program
23:42 alyssa: we don't do this yet but will soon to deal with recompile jank, and down the line zink+agxv+gpl would do this too
23:44 alyssa: so maybe the backend copyprop is bogus... but it's still not obvious to me when promoting internal operation precision is exact and when it's not
23:45 alyssa: I guess the spicy case is something like x=y=10^6 and calculate fsub(f2f16(r), f2f16(g))
23:45 alyssa: should be nan or inf
23:45 alyssa: but f2f16(fsub(r, g)) = 0.0
23:46 alyssa: so am I not allowed to fold conversions at all? :|
23:46 alyssa: hmm, well, not quite
23:47 alyssa: I can fold alu32(f2f32(x))
23:47 alyssa: and I can fold f2f16(alu32(x))
23:47 alyssa: since we were already doing a 32-bit operation, there's no difference
23:47 alyssa: the problem case is only when we promote a 16-bit operation to 32-bit
23:48 alyssa: so can't fold f2f32(alu16(x)) or alu16(f2f16(x))
23:48 alyssa: for fadd/fmul/ffma I have separate fp16/fp32, so that's a hard and fast rule
23:48 alyssa: but for all other alu, it's all internally 32-bit (even if you convert both source and destination)
23:49 alyssa: that.. should still be fine? like, that should just be an implementation detail at that point. although, ugh, hm
23:50 alyssa: No, the invariance issue from this backend optimization in particular is specifically from the fp16 alu and fp32 alu being different hardware, and changing the opcode isn't ok
23:50 alyssa: The other cases have nothing to do with the optimizer and amounts to me asking "is this hw a valid implementation of the fp16 op in NIR at all"
23:51 alyssa: so I think as long as I disallow the opcode-switching cases I should be in the clear. I think. mediump melts my head.
23:51 alyssa: (and the opcode-switching cases are probably valid in gles if not for invariance issues with fast linking, but definitely not valid in vulkan with strict float rules)