04:11 KungFuJesus: imirkin: cpp is 4
04:11 KungFuJesus: the macro claims the colorspace is ARGB?
04:12 KungFuJesus: but the GL call is strictly saying RGBA
04:13 KungFuJesus: is nouveau expecting the bytes to already be reordered at this point?
04:13 KungFuJesus: if so, we may be looking in the wrong place
04:26 imirkin: right, so the naming conventions are all wildly confusing
04:26 imirkin: ARGB in one place is literally identical to BGRA in another
04:26 imirkin: there's also array formats vs packed formats
04:26 imirkin: which cause differences in whether they're affected by endianness or not
04:27 imirkin: anyways ... the core library takes care of ensuring that things are in the format that they should be in
04:27 imirkin: and you can assume that the core library gets it right
04:27 imirkin: so if a buffer's format is X, then it will be fed data formatted as X
04:34 KungFuJesus: captured the entire call stack here in perf, unstripped binaries are great for that
04:36 KungFuJesus: which function reorders the beffore before the texture is put through the push buffer?
04:36 KungFuJesus: buffer*
04:37 KungFuJesus: I'm not 100% sure that the issue lies this deep in the stack. I mean, presumably radeon works with BE with full, correctly mapped textures in GL?
04:46 KungFuJesus: hmm, what's weird is a trivial texture example in glut works as expected
04:47 KungFuJesus: maybe an apitrace of that will tell me something useful
05:00 KungFuJesus: hmm, that's interesting. The crappy trivial example working from the internet is using the argument "4" for the internal format. The number of components rather than a #define from the standard table...
05:01 KungFuJesus: eh, didn't make a different, changing it to GL_RGBA didn't break anything
05:08 imirkin: i'd recommend trying to create a minimal repro
05:08 imirkin: as that will be easiest to study
05:08 imirkin: 4 = GL_RGBA, back before that argument was called "components", in GL 1.0
05:09 imirkin: i believe they flipped it to internalFormat in GL 1.1, but had to maintain compatibility with old programs
05:20 KungFuJesus: although, the mortar on this brick texture does look a little yellow
05:20 KungFuJesus: probably something to do with texture environment settings, though
05:21 KungFuJesus: yes, looks yellow on x86 machine, lol
05:24 imirkin: you can also check with software rendering
05:25 KungFuJesus: so far the only difference I can see that might make a difference is this ancient example doesn't bind the texture, doom legacy does
05:25 KungFuJesus: binding effectively maps the texture to an internal buffer object
05:27 imirkin: glBindTexture? that does nothing.
05:27 imirkin: look at what's happening at the gallium api level
05:27 imirkin: you can get a call trace with like GALLIUM_TRACE=foo.xml run-the-program
05:27 imirkin: which will create a foo.xml file which you can then inspect with a tool inside the mesa tree
05:44 KungFuJesus: hmm, there's some blend stuff happening
15:36 karolherbst: imirkin: ping on https://patchwork.freedesktop.org/series/54027/#rev2 ?
15:45 karolherbst: imirkin: btw, nvidia inverses the U flag as well for float comparisons
15:49 karolherbst: "(a >= 0.0) ? 0.0 : 1.0" -> "0 GTU a"
15:59 karolherbst: and I think we actually have to follow that as well. so if we inverse a float comparison, we have to flip the U flag, because (a == 0) has to be equal to !(a == 0) for all as
15:59 karolherbst: uhm
15:59 karolherbst: (a == 0.0) == !(a != 0.0)
16:00 karolherbst: which isn't the case if a is NaN and if inverse(EQ) == NEU
16:00 karolherbst: uhm
16:00 karolherbst: NE
16:07 imirkin: karolherbst: forgot all about it
16:08 imirkin: inverse of unordered is still unordered
16:08 imirkin: and vice-versa
16:08 imirkin: so that's surprising
16:09 imirkin: oh right. there's reverse and inverse. which are different
16:09 imirkin: karolherbst: we have a pair of functions which i'm 99% certain get this right
16:09 karolherbst: reverse: (a > b) -> (b > a) or (a < b)
16:09 karolherbst: so the U flag remanse
16:09 karolherbst: *remains
16:09 karolherbst: inverse is a ! operation essentially
16:10 imirkin: right
16:10 karolherbst: so we have to flip the U flag
16:10 karolherbst: imirkin: inverseCondCode doesn't flip U
16:10 karolherbst: inverse does: return static_cast<CondCode>(cc ^ 7)
16:10 karolherbst: but it should do ^ 15 I believe
16:10 imirkin: hmmmm
16:11 imirkin: would need thought.
16:11 karolherbst: I have a patch to fix that for NE and EQ, but maybe we have to do that for all
16:11 karolherbst: yeah... was playing around with the nvidia compiler
16:11 karolherbst: and at least it flips it around
16:11 karolherbst: but it's the CL one (cl -> ptx -> sass)
16:12 karolherbst: maybe they don't bother for graphics?
16:12 karolherbst: but then again, why bother making a difference at all?
16:13 karolherbst: but all that makes sense if you say that the compiler should in the best case not change the output
16:13 karolherbst: and remain sanity
16:13 karolherbst: a (a != b) in glsl is already a FNEU by definition
16:13 karolherbst: and a == b is FEQ
16:14 karolherbst: so why should !(a != b) be a FEQU after optimizations?
16:14 karolherbst: and !(a == b) be FNE
16:16 karolherbst: I highly doubt we run into that issue yet as we don't optimize any of that, but I still have some patches pending for merging the SETs into SLCTs and there it starts to matter
16:18 karolherbst: imirkin: anyway, what is your opinion on getting unordered operations on integer comparisons? ugly but okay or should we try to not do that?
17:46 imirkin: karolherbst: probably best to avoid...
18:10 karolherbst: imirkin: yeah ... I just hoped we could avoid a type aware inverseCondCode, but I guess this allows us to add asserts easily as well
18:10 karolherbst: (or just put those into the emiter or something)
18:43 karolherbst: imirkin: current idea would be something like that: https://github.com/karolherbst/mesa/commit/19adfa66901c935655672c13bed4ed89c9942a17
18:46 karolherbst: we might want to assert that only the correct ones are used though... dunno
18:46 karolherbst: inverseCondeCodeFloat(CC_TR) is kind of undefined this way
18:47 karolherbst: but CC_TR doesn't make sense to have in such a case either way
18:47 imirkin: presumably CC_FALSE is the one
18:48 karolherbst: imirkin: maybe we want to split predicational from relational CondCodes?
18:48 karolherbst: I don't really see the benefit in having one enum for both actually
18:49 imirkin: they match up nicely
18:50 karolherbst: imirkin: right, but error checking is much harder this way. like for a SLCT having a CC_TR on the comparison doesn't make much sense
18:50 karolherbst: or does it?
18:50 karolherbst: mhh, actually that's even emitable
18:51 imirkin: yeah dunno
18:52 karolherbst: maybe I add a isRelationalCondCode(CondeCode) and check against that in some places... could be helpful, would need it for a proper inverse implementation anyhow
18:52 karolherbst: or at least with our current CodeCode enum
18:53 karolherbst: mhhhh, CC_U
18:53 karolherbst: that's a funny one
18:53 karolherbst: that could be actually helpful in some places
18:54 karolherbst: allthough that's the same as (eq a a)
18:54 imirkin: well
18:54 karolherbst: slct.f32.f32.eq a b c
18:54 imirkin: interesting.
18:55 karolherbst: uhm
18:55 karolherbst: .u
18:55 karolherbst: "slct.f32.f32.u a b c"
18:55 karolherbst: otherwise it would be a set + slct
18:55 imirkin: right, so if we're taking shortcuts, a == a is always true
18:55 imirkin: but if we're not taking shortcuts, the a == a can be false
18:55 imirkin: (for floats)
18:55 karolherbst: yeah
18:55 karolherbst: a == a is the only way to check for NaN in glsl afaik
18:56 imirkin: there's a isNaN
18:56 imirkin: iirc
18:56 imirkin: http://docs.gl/sl4/isnan
18:56 karolherbst: ohh
18:56 imirkin: (and isinf)
18:56 karolherbst: isinf is actually quite silly to implement :/
18:56 karolherbst: more expensive than let's say isNan
18:57 karolherbst: or... wait
18:57 karolherbst: let me check
18:58 karolherbst: yeah.. isFinf is always a check against a second number, so we can't fold it away that easily
18:58 karolherbst: worst case there is an abs as well
18:59 karolherbst: (feq INFINITY (fabs a))
19:00 imirkin: no great way to do it
19:00 karolherbst: I like the CC_U... maybe we could make use of it
19:00 karolherbst: I highly doubt it's that much in use though
19:01 karolherbst: (slct.ne (set.eq a a) b c) -> (slct.u a b c)
19:02 karolherbst: there is no nu though? mhh
19:05 karolherbst: heh
19:05 karolherbst: FSET.BF.LE.AND R0, |R0|, +INF , PT ;
19:05 karolherbst: for "!isnan(a) ? 1.0 : 0.0;"
19:06 karolherbst: doesn't seem like that I can force nvidia to use .U
19:06 karolherbst: sad
19:07 karolherbst: ohh, that's because of the silly ptx generated