19:07 daniels: going to have ~5min GitLab downtime for an upgrade, please bear with us
19:21 karolherbst: ehhh..
19:22 daniels: karolherbst: ?
19:22 daniels: it's back now
19:22 imirkin: probably just upset at opencl..
19:22 daniels: haha, entirely understandable
19:22 karolherbst: nope big endian and formats
19:22 daniels: karolherbst: re _build_$CROSS, I'd definitely support that, it makes a lot of sense
19:22 karolherbst: https://gist.github.com/karolherbst/c7456c05472505d4e40742d4136dc49b
19:23 karolherbst: ....
19:23 karolherbst: find the mistake
19:23 imirkin: another good target for upset-ness...
19:23 daniels: the mistake is big-endian, hth
19:23 karolherbst: just look at the code
19:23 daniels: (also that your 'big-endian' target is a big-endian CPU with a little-endian bus with a ??-endian device)
19:23 karolherbst: doesn't matter for the code
19:23 imirkin: karolherbst: heh
19:24 imirkin: karolherbst: rgb should be bgr
19:24 karolherbst: yep
19:24 imirkin: they flipped it twice
19:24 karolherbst: correct
19:24 imirkin: both the rgb vs bgr and the values
19:24 imirkin: so the two branches are identical :)
19:24 daniels: standard
19:24 karolherbst: :)
19:25 daniels: ISTR the last person looking into BE NV on here was imirkin, who found a flip and concluded that the only way anything could ever have possibly have worked was a matching flip somewhere else
19:25 karolherbst: somebody "fixed" u_format.csv
19:25 karolherbst: and I think it is all bonkers
19:25 karolherbst: also
19:25 daniels: karolherbst: as long as they introduced a matching flip somewhere else, np
19:25 karolherbst: the code is bonkers anyway
19:25 imirkin: daniels: yeah, there's lots of flipping going on in many places
19:25 imirkin: daniels: but the bug karolherbst is talking about is a very localized one specific to one format
19:25 karolherbst: channel order is not what big endian changes
19:26 karolherbst: it just happens to work for channel sizes having byte aligned sizes
19:26 karolherbst: but all packed stuff is just bonkers
19:26 imirkin: daniels: the issue was mostly that with drmAddFB (not the 2 variant), you could only specify "depth" but not "channel order"
19:26 imirkin: and so a stack would just do whatever it wanted
19:26 karolherbst: daniels: nope, this breaks channel value
19:26 daniels: yeah, and xf86-video-nouveau is special there
19:26 karolherbst: the order is _not_ the issue
19:27 imirkin: but with drmAddFB2, we added formats to these things. and in some places got them wrong.
19:27 karolherbst: FAILED: {0.096774, 0.888889, 0.000000, 1.000000} obtained
19:27 karolherbst: {0.000000, 0.000000, 1.000000, 1.000000} expected
19:27 imirkin: yeah, i mean RGB565 in BE != BGR565 in LE
19:27 karolherbst: exactly
19:27 imirkin: it works for RGBx8888
19:28 karolherbst: hence all the flipping around is the wrong approach
19:28 karolherbst: in registers it's all correct anyway
19:28 karolherbst: the problem jsut occurs if you load more than one byte from memory
19:28 imirkin: karolherbst: yeah, that code is just wrong. but it's doubly wrong for having the two branches be identical.
19:28 karolherbst: yeah
19:28 karolherbst: sooo
19:29 imirkin: (and unlike usual, where two wrongs make a right, that is not the case here.)
19:29 karolherbst: the bug is "uint16_t value = *(const uint16_t *)src;"
19:29 karolherbst: not something else
19:29 karolherbst: on big endian we should probably just read byte by byte
19:29 karolherbst: or swap after read
19:29 karolherbst: or have a le_load thingy
19:30 imirkin: but ... is bgr565 defined as LE byte order?
19:30 karolherbst: I guess?
19:30 imirkin: i don't think so
19:30 karolherbst: it's a packed format, isn't it?
19:30 imirkin: actually the code might be correct =]
19:30 imirkin: i.e. that there's no difference for packed formats
19:30 imirkin: i dunno
19:30 karolherbst: I wouldn't be surprised if it's not defined at all
19:30 imirkin: this stuff gives me a headache
19:30 karolherbst: yeah...
19:31 karolherbst: the issue is, it's not clear how stuff is supposed to be
19:31 karolherbst: but
19:31 karolherbst: packed is usually all le
19:31 imirkin: normally you're dealing with hardware, where whatever the hardware does is the definition of correctness
19:31 daniels: 'is <XX> defined as <XX> byte order?' - the only correct answer is empirical
19:31 karolherbst: so LE it is
19:31 daniels: cf. the last time anyone tried to reverse-engineer what DRM_FORMAT_* meant on non-LE
19:32 imirkin: daniels: i think we settled on something there though
19:32 daniels: either you observe it and stick to it, or you define it and you fix the world + keep it fixed
19:32 imirkin: except it's behind a quirk not to break existing drivers
19:32 karolherbst: I'd say it's all LE and we just fix up the loads
19:32 karolherbst: although that can hurt
19:32 daniels: imirkin: I think the settlement was 'don't even look at it, let alone breathe on it'
19:32 karolherbst: but again, why should we care
19:32 daniels: karolherbst: you're the one who started this conversation, so you tell me :P
19:33 karolherbst: daniels: let's say some things changed and I am responsible for some things I wasn't before in change of things I had to deal with, which I won't have to anymore
19:33 karolherbst: :p
19:33 karolherbst: it comes with some other internal advantages though
19:34 daniels: karolherbst: no judgement :)
19:34 karolherbst: anyway, I think I am slowly getting the bigger picture of everything here
19:34 karolherbst: atm I am just unsure how things are supposed to be
19:35 daniels: Dante's Inferno is quite a big picture, yes
19:35 karolherbst: honestly.. I am close to just rip out this entire u_format BE mess, do a regression test, figure out nothing regresses and go from there
19:40 imirkin: the problem is that llvmpipe is basically the opposite of hardware
19:40 imirkin: so there have been many sequences of "fixes" for llvmpipe
19:40 imirkin: which actively break stuff for hardware running on BE
19:40 imirkin: this is primarily because the concept of BE CPU and BE GPU are conflated
19:41 imirkin: on at least nvidia hardware, the GPU remains LE
19:41 imirkin: with some nice helpers to occasionally do byteswapping for you to make life more interesting
19:42 karolherbst: yeah...
19:42 karolherbst: it's annoyihng
19:42 karolherbst: with llvmpipe I only see channel orders flipped
19:42 imirkin: whereas llvmpipe is naturally in whatever endianness the CPU is
19:42 imirkin: (one could obviously invest work to make it not do that, but... why)
19:42 karolherbst: imirkin: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7689
19:42 karolherbst: that's a nice one
20:22 airlied: imirkin: not sure you can ever make GL or Vulkan work on a BE CPU + LE GPU combo
20:22 airlied: modern GL that is
20:22 airlied: you'd at least need all the apps to be aware of what endian the GPU is as well
20:23 karolherbst: airlied: memory mapped stuff I assume?
20:23 karolherbst: yeah.. that'd be all bonkers
20:24 karolherbst: although... depends
20:25 karolherbst: I think if you do all BE/LE handling inside the GPU, it might even work
20:32 bnieuwenhuizen: karolherbst: is it still a LE GPU then?
20:33 karolherbst: bnieuwenhuizen: I meant in shaders or macro code or whatever
20:33 karolherbst: or process memory before executing stuff
20:33 karolherbst: I don't think drivers can actually handle it correctly if the application can map stuff
20:33 bnieuwenhuizen: well, if that includes rendertargets and texture data I'm pretty sure that scuttles the performance :)
20:33 Plagman: if there's a gitlab admin around, the last comment there looks like a spambot: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3850
20:33 karolherbst: well
20:34 karolherbst: sometimes you just have to tell the user that doing stupid things also lead to terrible perf
20:37 karolherbst: bnieuwenhuizen: but the point is, that you can probably get away with texture data by using the formats where it doesn't matter what arch you are on
20:37 karolherbst: but yeah...
20:37 karolherbst: it's all terrible
20:40 karolherbst: daniels: re _build, one could probably also just add a volume for _build ...
20:40 karolherbst: I think I'll try that first and see if this works good enough
20:41 imirkin: airlied: yeah, i think GL 4.3 would be impossible. GL 4.2 ... maybe?
20:41 imirkin: i'd have to think about whether images could work. it's not clearly broken, but also not clearly working
20:44 karolherbst: mhh, that "rm -rf _build" is problematic then mhh
20:44 karolherbst: oh well
21:38 karolherbst: imirkin: okay.. no matter how you think about any of the packed format layout stuff... the util_format_b5g6r5_unorm_fetch_rgba is wrong
21:38 karolherbst: the input is not void*, but it is uint8_t*
21:39 karolherbst: the question is rather, should we make it all opaque, meaning we deal with void* or should we be very explicit internally and have it all LE
21:39 karolherbst: any thoughts on that?
21:41 airlied: imirkin: yeah I think GL 4.1 is about where I'd draw the line
21:41 airlied: I'm happy to mostly give up on BE/LE combos
21:44 karolherbst: mhhh
21:45 imirkin: airlied: the stuff that shipped was up to nv4x, aka DX9 aka GL2, so those higher things were never really considered i suspect
21:48 glennk: not a huge list of BE systems with pcie support afaik
21:48 airlied: imirkin: actually fp64 vertex fetch is broken in mesa as well
21:48 airlied: have to rework the sneaky use 2 32-bit things gallium does to a 64-bit thin
21:49 airlied: since you end up casting (uint64_t *) to (uint32_t *) and boom
21:49 imirkin: yeah, but that's fixable
21:49 imirkin: whereas i think like SSBO is fundamentally impossible
21:50 airlied: yeah QBO's were another really hard
21:50 karolherbst: imirkin: why though?
21:50 imirkin: karolherbst: the application would have to be aware
21:50 airlied: at least all our qbo 64-bit test cases are broken even on pure BE
21:50 karolherbst: imirkin: you can fix the shader
21:50 karolherbst: just have to swap things around
21:50 imirkin: hmmmmm
21:51 karolherbst: not saying it's great
21:51 karolherbst: but
21:51 karolherbst: ...
21:51 imirkin: maybe you're right
21:51 imirkin: how would atomic things work?
21:51 imirkin: (e.g. increment)
21:51 karolherbst: cas?
21:51 imirkin: hehe
21:51 imirkin: performance ftw! :)
21:51 karolherbst: well
21:51 karolherbst: :D
21:52 karolherbst: at least it is correct
21:53 karolherbst: could probably check what nvidia is doing...
21:53 karolherbst: ohh wait
21:53 karolherbst: "unsupported"
21:53 karolherbst: oh well
21:53 imirkin: "not supporting it"
21:53 imirkin: ;)
21:54 imirkin: but like
21:54 imirkin: let's say you use ssbo to write to a buffer
21:54 imirkin: then feed it in as an index buffer
21:55 karolherbst: mhhh
21:55 imirkin: even if you had crazy auto-swap detection
21:55 karolherbst: yeah, I guess those things will break
21:55 imirkin: it will fail since it doesn't know if you're doing it as 32-bit or 16-bit indices
22:06 airlied: yeah I think nvidia knew it was unsupportable, and that's why we have ppc64le
22:07 airlied: and we should just ignore mixed endia, and bash llvmpipe/s390 into shape
22:07 karolherbst: yeah
22:07 airlied: glennk: I did have a GM107 in my G5 at one point
22:08 karolherbst: airlied: that GPU can be "BE" though
22:08 airlied: it might even still be in there
22:08 karolherbst: for ... some parts
22:08 airlied: karolherbst: it can't really
22:08 karolherbst: ahh, I guess memory was all LE still?
22:08 airlied: yeah it isn't BE enough to solve the problem
22:08 karolherbst: how did apple made that stuff work btw?
22:08 karolherbst: just features weren there breaking that stuff?
22:08 airlied: they moved to intel after nv40 :-P
22:08 karolherbst: figures
22:08 glennk: airlied, last gen g5 right?
22:09 airlied: glennk: yeah the last one
22:09 airlied: you can get a s390 with PCIE slots :-P
22:09 karolherbst:runs
22:09 airlied: I think it would be easier to just buy an s390 and a ppc64le offload processor :-P
22:09 glennk: i guess if decimal floating point is your thing...
22:10 karolherbst: "just buy an s390"
22:10 airlied: karolherbst: yeah like if you wanted to connect a gpu to an s390, putting a ppc64le in between would probably not faze the budget
22:10 karolherbst: true
22:11 karolherbst: or like.. buy a proper rendering farm using ppc64le and just do your stuff on s390
22:21 karolherbst: okay.. I think I have a plan
22:29 karolherbst: yeah lol
22:30 karolherbst: removing the big endian stuff in u_format.csv doesn't change the failing tests in the format unit tests
22:30 karolherbst: so much for that
22:30 karolherbst: so.. now let's implement and fix it for real
22:45 karolherbst: ahh, some stuff regresses after I remove the generated code, but that's stuff working "by accident", like 16/8 bit format things
22:56 karolherbst: nice.. it works :)
22:56 karolherbst: unpack fixed
22:58 karolherbst: pack looks like to be slightly more annoying
22:59 karolherbst: ohh.. actually not
23:00 imirkin: karolherbst: all the BE support does is byteswap some things at the "boundaries". internally the GPU remains LE
23:00 karolherbst: yeah, so "memory remains LE"
23:00 imirkin: which is esp fun for GART
23:01 imirkin: i couldn't wrap my head around it
23:01 karolherbst: I flashed an "windows" AMD GPU once for my power mac
23:01 karolherbst: that was fun
23:01 imirkin: anyways, sadly my G5 suffered a power-supply-based death
23:01 karolherbst: I have access to a few BE PPC machines with nvidia GPUs
23:02 imirkin: so i can't test anything related to that stuff
23:02 karolherbst: but also no motivation to actually care
23:02 karolherbst: but I think it would be nice to not break stuff...
23:04 imirkin: hmmm... there's a G4 powermac on craiglist for $40
23:04 imirkin: oh, but it's out in the boonies. nevermind.
23:05 karolherbst: progress
23:05 karolherbst: https://gist.github.com/karolherbst/a8843be303dac899ce39b38e295cc4d1/revisions
23:05 karolherbst: \o/
23:05 karolherbst: that's master to my initial format BE fixes :)
23:05 imirkin: oh, and those apparently had ATI Rage 128's
23:06 karolherbst: still some things to fix
23:07 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7689/diffs :)
23:07 karolherbst: so no format BE nonesense in the CSV anymore and more fixed stuff
23:07 karolherbst: I'd call it a win
23:08 imirkin: karolherbst: you know there's a __bswapN primitive right?
23:08 karolherbst: "ups"
23:08 imirkin: karolherbst: and also a cpu_to_be etc
23:08 karolherbst: I think I should use those instead of writing my loops :D
23:08 karolherbst: where is that cpu_to_be thing though?
23:09 imirkin: btw, dunno if you considered this, but the tests could also be buggered
23:09 karolherbst: nope, the test is fine
23:09 karolherbst: it opperates on "unpacked u8 arrays"
23:09 imirkin: util_cpu_to_le32
23:09 karolherbst: so this is just the unpack/pack stuff
23:09 imirkin: for example
23:10 karolherbst: but anyway, the thing before was also broken
23:10 karolherbst: so...
23:10 imirkin: swapping bytes means that you're reading a LE value as BE etc
23:10 imirkin: which should not be happening in GL...
23:10 karolherbst: yeah.. but the issue is a bit more annoying
23:10 karolherbst: ultimately you have uint8_t arrays of data
23:11 karolherbst: and we do 16/32 bit casts into it
23:11 imirkin: yeah, that doesn't work ;)
23:11 karolherbst: so you end up with *(uint16_t*)dst = value
23:11 karolherbst: and things just fall apart
23:11 imirkin: well
23:11 imirkin: depends where dst came from
23:11 imirkin: it actually could work just fine
23:11 imirkin: that will write the logical value as in CPU-endian bytes
23:12 imirkin: of course if something else down the line expects those to be not-cpu-endian, then fail
23:12 karolherbst: there is no such thing as "CPU-endian"
23:12 imirkin: sure there is.
23:12 karolherbst: the CPU is all LE
23:12 karolherbst: or well
23:12 karolherbst: it doesn't matter
23:12 imirkin: unless it's BE
23:12 karolherbst: is the correct answer
23:12 karolherbst: no
23:12 karolherbst: it has no endianness
23:12 imirkin: e.g. in the ARCH_CPU_IS_BE case
23:12 karolherbst: in the register it's all the same
23:12 imirkin: in the register, yes
23:12 imirkin: but on writing the value out, it matters
23:12 karolherbst: what matters is how things get written to/read from memory
23:13 imirkin: so if you're doing manual swaps
23:13 imirkin: that means that you're wanting to do the opposite of what the CPU normally wants to do
23:13 karolherbst: yeah, I mean I now what you are trying to say
23:14 karolherbst: but in the end it would all end in pain
23:14 karolherbst: I tihnk..
23:14 imirkin: doing byteswaps is the painful direction imo...
23:14 karolherbst: _if_ packed formats are in the native endianess format, then sure
23:14 imirkin: which they most certainly are
23:14 karolherbst: but mhh
23:14 imirkin: that's like the definition of a packed format
23:14 karolherbst: sure?
23:14 imirkin: the PROBLEM is when you have an array format
23:14 karolherbst: what about if you want to share resource data between archs?
23:14 imirkin: and you ASSUME that you can treat it as a packed format
23:14 imirkin: which you totally cannot
23:15 imirkin: then it's tough shit
23:15 karolherbst: texture data being the same on BE and LE binaries
23:15 karolherbst: well
23:15 karolherbst: there were games being supported on both
23:15 imirkin: if you're uploading it as a packed type, you have to make sure you store it in that way
23:15 imirkin: yeah, they were probably using e.g. GL_UNSIGNED_BYTE type
23:16 karolherbst: hopefully
23:16 imirkin: or the data was being deserialized from some semi-compressed format on disk
23:16 imirkin: and packed by the CPU into the CPU's endianness
23:18 karolherbst: but ultimately it's a huge mess as for 565 you'd use GL_UNSIGNED_SHORT_5_6_5 instead
23:18 karolherbst: or whatever variant you are interested in
23:19 karolherbst: ohhhh
23:19 karolherbst: GL _does_ specify order
23:19 imirkin: in the 16-bit word, yes
23:19 karolherbst: yes
23:19 imirkin: but not how that 16-bit word is laid out into bytes
23:19 karolherbst: mhhhhh
23:19 imirkin: in fact, it even has GL_SWAP_BYTES
23:19 imirkin: which allows you to flip things around
23:20 imirkin: probably another way that those games made things work
23:20 karolherbst: ehhh
23:21 karolherbst: mhh
23:21 karolherbst: no, I think OpenGL spec is kind of LE
23:21 karolherbst: there are some hints
23:21 imirkin: nope.
23:21 karolherbst: like table 8.4 in the 4.6 spec
23:21 karolherbst: but maybe that's also "element" stuff
23:21 karolherbst: and totally ignores the bigger picture of BE/LE memory
23:22 imirkin: i think you're taking the wrong approach. rather than adding lots of byte swapping, you should be removing it from whereever it's messing things up
23:22 imirkin: maybe the test itself.
23:22 karolherbst: yeah.. maybe
23:22 karolherbst: but I don't think the test does any ordering
23:22 imirkin: i'm not saying it's the test
23:22 imirkin: just that it COULD be the test
23:22 karolherbst: right
23:23 imirkin: (i haven't read it)
23:23 karolherbst: and maybe X is buggy and should do some swapping as well
23:23 imirkin: there are lots of different options
23:23 imirkin: and it's always tempting to just add more swapping
23:23 karolherbst: thing is
23:23 karolherbst: GL_SWAP_BYTES is not really legal for packed formats afaik
23:24 imirkin: ah, maybe not
23:24 imirkin: i'd have to RTFS
23:24 karolherbst: what does "Element Size" even mean
23:24 karolherbst: the full pixel or channels?
23:24 karolherbst: I guess channels
23:24 imirkin: good question.
23:24 karolherbst: in which case it's only defined for 8/16/32 bits
23:24 imirkin: presumably just a single channel
23:25 karolherbst: which also makes sense that element are channels
23:25 imirkin: or could be a pixel.
23:25 karolherbst: yeah.. but 8 bit pixel?
23:25 imirkin: 223
23:25 karolherbst: and why not defined for 64bit?
23:25 karolherbst: or 128bit?
23:25 imirkin: because there are no packed formats that go that high?
23:26 karolherbst: but this is for all formats
23:26 imirkin: yeah
23:26 karolherbst: also it talks about types
23:26 karolherbst: and stuff
23:26 imirkin: but there just aren't any that go above 32-bit
23:26 karolherbst: I think it's channels in this case
23:26 imirkin: like a type=GL_RGB5_E9 is 32-bit
23:26 karolherbst: sure
23:27 imirkin: er. RGB9_E5 =]
23:27 karolherbst: ahh
23:27 karolherbst: elemets are channel values
23:27 karolherbst: and all values together are called "groups"
23:27 karolherbst: "These elements are grouped into sets of one, two, three, or four values, depending on the format, to form a group."
23:28 imirkin: yeah, but is RGB9_E5 one element or 4
23:28 imirkin: er, 3
23:28 imirkin: it might just be 1 element
23:28 karolherbst: but the elemnts are not of size 8/16 or 32
23:29 karolherbst: ahh, maybe
23:29 imirkin: i.e. packed vs array
23:29 imirkin: array would be 4
23:29 imirkin: packed would be 1?
23:29 karolherbst: yeah, I guess that would make sense
23:29 karolherbst: UNSIGNED_INT_5_9_9_9_REV == uint
23:30 airlied: also just beacuse GL Doesn't define something, doesn't mena gallium shouldn't
23:30 airlied: Vulkan defines things afaik at this level
23:30 karolherbst: airlied: right, but the question is, can we get around ditching conversions rather than adding more
23:31 karolherbst: and it kind of comes down to, what format should data be stored when you have texture data
23:31 karolherbst: does GL demand LE order, because things would just break
23:31 imirkin: GL definitely does not
23:31 airlied: I'd just define formats like vulkan doesn't explicitly for gallium
23:31 airlied: then fix GL to work on top
23:31 karolherbst: yeah...
23:31 airlied: like vulkan does rather
23:32 karolherbst: atm, I'd treat an interface giving me a uint8_t* thing as uint8_t memory
23:32 karolherbst: and if you take a 16 bit pointer, you have to adjust
23:32 karolherbst: because it's uint8_t data, not uint16_t
23:32 karolherbst: then, after we fix all the usages
23:32 imirkin: then you're attaching endianness to data that just doesn't have it
23:32 karolherbst: we can take a look and see where to optinize stuff
23:33 karolherbst: imirkin: well, then the API should be void*
23:33 imirkin: better to assume that it's in CPU-endian
23:33 imirkin: and if you want something else, do byteswaps
23:33 karolherbst: not really though
23:33 karolherbst: because it would also be broken on BE
23:34 karolherbst: taking a 16 bit pointer into 8 bit memory just works on LE, because it's the natural order of data
23:34 karolherbst: but if you do that on BE you are in bigger trouble anyway
23:34 karolherbst: because the order of data just doens't match anymore
23:35 imirkin: that also just works on BE
23:35 imirkin: just as much as it works on LE
23:35 karolherbst: I am sure it does not
23:35 airlied: pretty sure it doesn't
23:36 karolherbst: I already wrote a C test :)
23:36 imirkin: you mean typecasting to u8 doesn't work
23:36 karolherbst: you have a u8 array
23:36 imirkin: like if you have u16 data in memory
23:36 karolherbst: and take the data by index
23:36 imirkin: and you treat it as u8, then *(u8*)x won't be equal to (u8)*(u16 *)x
23:36 karolherbst: on LE it is all in natural order, when you write uint16_t data into it
23:37 imirkin: but as long as data is written and interpreted consistenly, it just doesn't matter
23:37 karolherbst: so le16.lo == lu8[0] and so forth
23:37 karolherbst: but on BE that doesn't work
23:37 imirkin: correct
23:37 imirkin: but why would you necessarily care about that?
23:37 karolherbst: because the interface is giving you a uint8_t
23:37 karolherbst: and you now write uin16_t values into it by casting the pointer to uin16_t
23:38 imirkin: ok
23:38 imirkin: and who will be reading these values?
23:38 karolherbst: soo
23:38 karolherbst: there are two parts
23:38 karolherbst: void* and the unpacked uint8_t data
23:38 imirkin: one problem is that array format vs packed is the same on LE but byteswapped on BE
23:38 karolherbst: the interface says: extract the stuff from void* and put it into the correct order into the uint8_t array
23:39 karolherbst: value by value
23:39 imirkin: which order is the correct order?
23:39 karolherbst: rgba
23:39 imirkin: that's ... not a byte order.
23:40 imirkin: you could say "i want rgba8 array data", in which case dst[0] == r, dst[1] == g, etc
23:40 imirkin: but you could says "i want rgba8 int32-packed data"
23:40 karolherbst: ehm.. wait, I think I have it the wrong way
23:41 imirkin: in which case the correct thing depends on the endianness
23:41 karolherbst: void* is for the unpacked stuff
23:41 karolherbst: and uint8_t is the packed data
23:41 imirkin: same point as above applies.
23:43 karolherbst: I think in the end it really boils down to how we define "packed data" inside gallium
23:44 imirkin: packed data is always in cpu-endian
23:45 karolherbst: yeah.. I think in the end that's probably what we should end up doing
23:50 karolherbst: imirkin, airlied: would you agree that the macros in src/util/format/u_format_tests.c would need to be adjusted to the endianess?
23:53 airlied: when I last looked I think I thought so
23:53 karolherbst: okay
23:53 karolherbst: let's see how fixing it at this point works out
23:53 airlied: but they do use bit shifts
23:54 karolherbst: and?
23:54 karolherbst: in the CPU it's all LE
23:54 karolherbst: they define arrays of u8 data for 8/16/32 bit values
23:54 airlied: yup so I thought mabye they didn't need changes
23:55 karolherbst: so the first element contains the MSB on BE
23:55 karolherbst: LSB on LE
23:55 karolherbst: so we just need to turn around the order of stored values
23:55 karolherbst: LE: #define PACKED_1x16(x) {(x) & 0xff, (x) >> 8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
23:55 karolherbst: BE: #define PACKED_1x16(x) {(x) >> 8, (x) & 0xff, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
23:56 karolherbst: and so on
23:59 airlied:hasn't had enough booze to work it out, but I think it's a good place to start discussion