04:11KungFuJesus: imirkin: cpp is 4
04:11KungFuJesus: the macro claims the colorspace is ARGB?
04:12KungFuJesus: but the GL call is strictly saying RGBA
04:13KungFuJesus: is nouveau expecting the bytes to already be reordered at this point?
04:13KungFuJesus: if so, we may be looking in the wrong place
04:26imirkin: right, so the naming conventions are all wildly confusing
04:26imirkin: ARGB in one place is literally identical to BGRA in another
04:26imirkin: there's also array formats vs packed formats
04:26imirkin: which cause differences in whether they're affected by endianness or not
04:27imirkin: anyways ... the core library takes care of ensuring that things are in the format that they should be in
04:27imirkin: and you can assume that the core library gets it right
04:27imirkin: so if a buffer's format is X, then it will be fed data formatted as X
04:34KungFuJesus: captured the entire call stack here in perf, unstripped binaries are great for that
04:36KungFuJesus: which function reorders the beffore before the texture is put through the push buffer?
04:37KungFuJesus: I'm not 100% sure that the issue lies this deep in the stack. I mean, presumably radeon works with BE with full, correctly mapped textures in GL?
04:46KungFuJesus: hmm, what's weird is a trivial texture example in glut works as expected
04:47KungFuJesus: maybe an apitrace of that will tell me something useful
05:00KungFuJesus: hmm, that's interesting. The crappy trivial example working from the internet is using the argument "4" for the internal format. The number of components rather than a #define from the standard table...
05:01KungFuJesus: eh, didn't make a different, changing it to GL_RGBA didn't break anything
05:08imirkin: i'd recommend trying to create a minimal repro
05:08imirkin: as that will be easiest to study
05:08imirkin: 4 = GL_RGBA, back before that argument was called "components", in GL 1.0
05:09imirkin: i believe they flipped it to internalFormat in GL 1.1, but had to maintain compatibility with old programs
05:20KungFuJesus: although, the mortar on this brick texture does look a little yellow
05:20KungFuJesus: probably something to do with texture environment settings, though
05:21KungFuJesus: yes, looks yellow on x86 machine, lol
05:24imirkin: you can also check with software rendering
05:25KungFuJesus: so far the only difference I can see that might make a difference is this ancient example doesn't bind the texture, doom legacy does
05:25KungFuJesus: binding effectively maps the texture to an internal buffer object
05:27imirkin: glBindTexture? that does nothing.
05:27imirkin: look at what's happening at the gallium api level
05:27imirkin: you can get a call trace with like GALLIUM_TRACE=foo.xml run-the-program
05:27imirkin: which will create a foo.xml file which you can then inspect with a tool inside the mesa tree
05:44KungFuJesus: hmm, there's some blend stuff happening
15:36karolherbst: imirkin: ping on https://patchwork.freedesktop.org/series/54027/#rev2 ?
15:45karolherbst: imirkin: btw, nvidia inverses the U flag as well for float comparisons
15:49karolherbst: "(a >= 0.0) ? 0.0 : 1.0" -> "0 GTU a"
15:59karolherbst: and I think we actually have to follow that as well. so if we inverse a float comparison, we have to flip the U flag, because (a == 0) has to be equal to !(a == 0) for all as
15:59karolherbst: (a == 0.0) == !(a != 0.0)
16:00karolherbst: which isn't the case if a is NaN and if inverse(EQ) == NEU
16:07imirkin: karolherbst: forgot all about it
16:08imirkin: inverse of unordered is still unordered
16:08imirkin: and vice-versa
16:08imirkin: so that's surprising
16:09imirkin: oh right. there's reverse and inverse. which are different
16:09imirkin: karolherbst: we have a pair of functions which i'm 99% certain get this right
16:09karolherbst: reverse: (a > b) -> (b > a) or (a < b)
16:09karolherbst: so the U flag remanse
16:09karolherbst: inverse is a ! operation essentially
16:10karolherbst: so we have to flip the U flag
16:10karolherbst: imirkin: inverseCondCode doesn't flip U
16:10karolherbst: inverse does: return static_cast<CondCode>(cc ^ 7)
16:10karolherbst: but it should do ^ 15 I believe
16:11imirkin: would need thought.
16:11karolherbst: I have a patch to fix that for NE and EQ, but maybe we have to do that for all
16:11karolherbst: yeah... was playing around with the nvidia compiler
16:11karolherbst: and at least it flips it around
16:11karolherbst: but it's the CL one (cl -> ptx -> sass)
16:12karolherbst: maybe they don't bother for graphics?
16:12karolherbst: but then again, why bother making a difference at all?
16:13karolherbst: but all that makes sense if you say that the compiler should in the best case not change the output
16:13karolherbst: and remain sanity
16:13karolherbst: a (a != b) in glsl is already a FNEU by definition
16:13karolherbst: and a == b is FEQ
16:14karolherbst: so why should !(a != b) be a FEQU after optimizations?
16:14karolherbst: and !(a == b) be FNE
16:16karolherbst: I highly doubt we run into that issue yet as we don't optimize any of that, but I still have some patches pending for merging the SETs into SLCTs and there it starts to matter
16:18karolherbst: imirkin: anyway, what is your opinion on getting unordered operations on integer comparisons? ugly but okay or should we try to not do that?
17:46imirkin: karolherbst: probably best to avoid...
18:10karolherbst: imirkin: yeah ... I just hoped we could avoid a type aware inverseCondCode, but I guess this allows us to add asserts easily as well
18:10karolherbst: (or just put those into the emiter or something)
18:43karolherbst: imirkin: current idea would be something like that: https://github.com/karolherbst/mesa/commit/19adfa66901c935655672c13bed4ed89c9942a17
18:46karolherbst: we might want to assert that only the correct ones are used though... dunno
18:46karolherbst: inverseCondeCodeFloat(CC_TR) is kind of undefined this way
18:47karolherbst: but CC_TR doesn't make sense to have in such a case either way
18:47imirkin: presumably CC_FALSE is the one
18:48karolherbst: imirkin: maybe we want to split predicational from relational CondCodes?
18:48karolherbst: I don't really see the benefit in having one enum for both actually
18:49imirkin: they match up nicely
18:50karolherbst: imirkin: right, but error checking is much harder this way. like for a SLCT having a CC_TR on the comparison doesn't make much sense
18:50karolherbst: or does it?
18:50karolherbst: mhh, actually that's even emitable
18:51imirkin: yeah dunno
18:52karolherbst: maybe I add a isRelationalCondCode(CondeCode) and check against that in some places... could be helpful, would need it for a proper inverse implementation anyhow
18:52karolherbst: or at least with our current CodeCode enum
18:53karolherbst: mhhhh, CC_U
18:53karolherbst: that's a funny one
18:53karolherbst: that could be actually helpful in some places
18:54karolherbst: allthough that's the same as (eq a a)
18:54karolherbst: slct.f32.f32.eq a b c
18:55karolherbst: "slct.f32.f32.u a b c"
18:55karolherbst: otherwise it would be a set + slct
18:55imirkin: right, so if we're taking shortcuts, a == a is always true
18:55imirkin: but if we're not taking shortcuts, the a == a can be false
18:55imirkin: (for floats)
18:55karolherbst: a == a is the only way to check for NaN in glsl afaik
18:56imirkin: there's a isNaN
18:56imirkin: (and isinf)
18:56karolherbst: isinf is actually quite silly to implement :/
18:56karolherbst: more expensive than let's say isNan
18:57karolherbst: or... wait
18:57karolherbst: let me check
18:58karolherbst: yeah.. isFinf is always a check against a second number, so we can't fold it away that easily
18:58karolherbst: worst case there is an abs as well
18:59karolherbst: (feq INFINITY (fabs a))
19:00imirkin: no great way to do it
19:00karolherbst: I like the CC_U... maybe we could make use of it
19:00karolherbst: I highly doubt it's that much in use though
19:01karolherbst: (slct.ne (set.eq a a) b c) -> (slct.u a b c)
19:02karolherbst: there is no nu though? mhh
19:05karolherbst: FSET.BF.LE.AND R0, |R0|, +INF , PT ;
19:05karolherbst: for "!isnan(a) ? 1.0 : 0.0;"
19:06karolherbst: doesn't seem like that I can force nvidia to use .U
19:07karolherbst: ohh, that's because of the silly ptx generated