02:31 gentlewonder: I have not faced any of the hardware dying from my collection, even ever. imirkin i wonder what happens to the cards fan controller for it to die? Some analog element capacitor on the board needs replacing or something?
02:33 gentlewonder: This is the only type of failures i have got once but this was with X86 AMD-s abit motherboard, i got the capacitor a tall one on the mobo replaced and it worked again, back those days i did not know what function this capacitor was meant to do even though.
02:35 gentlewonder: it is a wild guess, but maybe some pulsed volateges the lower ones are generated from the mobo itself using those, but needs researching.
02:41 gentlewonder: this would be identified on the board with eye inspection mostly, cause those capacitors in question that are failing will develop such dusty content on the surrounding metal similar to corrosion when they die, we have a simplified slngish term for this , we say "punnis".
02:43 gentlewonder: oftenly in addition to that they look swollen in shape too, or at least sometimes.
02:43 gentlewonder: so you should replace such ones with similar charecteristics using soldering iron.
02:49 gentlewonder: generally during NV34 days , later some info has been leaked that those VLIW kind of chips used xtensa vliw generators for their modified isa based chips. I had one , if i remember correctly it had no control flow which had to be implemented by other means, and software based vertex shaders.
02:54 gentlewonder: this kind of chip is very easy to handle, there are host based caches and indirect buffer caches, which need to be kept hot, and those are not clocked either, then you need to add flow control a fast one, and make correct registers allocated. which in most cases work without doing that naturally, but just to be sure that things get forwarded to the next variable without delays.
02:59 gentlewonder: the flow control is primarly done with ringbuffers pseudo op-s in the ring stream, but since on those chips everything goes through the ring, this is easy.
03:04 gentlewonder: I am aware that anarsoul even figured it out for utgard, which is similar chip, but has hw shaders for vertex pipeline.
03:19 gentlewonder: if you ever wondered how it technically works so that texture l2 reads incur less than cycles delay, the answer came from ATTILA gpu simulator, and this thing is managed by operand forwarding ontop of texture arbiter which has a wake up logic, it works based of clock differences registered.
03:28 gentlewonder: so the primary concern there appears to be whether bubbles that fill in the hazards following based of the long-latency ops will wake-up the l1 cache state in the FSM, unfortunently no, which is why you need new intrinsics for those chips.
03:47 gentlewonder: it can always push to the ring buffer async unless all texture units that proccess those are currently occupied, then the ring is not being read by the GPU, and pipeline is stalled.
04:02 gentlewonder: hence mareko and also nha have discussed on when in vertex pipeline or fragment pipeline the texture unit will be freed say in case of varyings, or whatever arbiter present on vertex pipelines, when the pipeline can progress ahead, is when the chip generates the read request to this varying, or in some cases when the write to the framebuffer is done
04:02 gentlewonder: or the vertex position and whatnot
04:10 gentlewonder: imirkin can correct me if i am wrong, but i think you can fill in the arbiters queues, but on chip pipeline is executed in order for the ring in question
04:12 gentlewonder: You would say of course in-order cause this is how FIFO works in hw right, cause this is first in first out, and it is removed out from the pipeline without OoO instruction processing when it completes it's task
04:19 gentlewonder: hence to look at in computer terminology what ring buffer is mostly, a deeply pipelined in-order processing core right?
04:20 gentlewonder: sometimes on SIMD Unified or unified supersalar shader engines, however shader processor is decoupled as async though, and this can be OoO
04:20 gentlewonder: like in-order dispatch out of order execution and in-order commit
04:26 gentlewonder: my cambodian google can not access anymore arm norway patents , but barrelled definitely does not mean OoO capabilities for the ring, it is just term of deep pipeliking to fill the pipeline fifo in deep ways
04:27 gentlewonder: cause as you know NVIDIA acquired ARM, the patents are inaccessible by the google engine, somewhere i had those bookmarked but they are not needed anymore.
04:32 gentlewonder: barelled shader should mean in-order vliw equivalence, which means bundle of 4instructions can be executed in parallel
04:33 gentlewonder: barrelled that was.
04:41 gentlewonder: Well yeah if something in the bundle graduates or all the bundle is graduated , and it should happen in-order, it can process another bundle in parallel when compiler reordered things so that this is possible
04:41 gentlewonder: that is vliw strentghs due to cheaper resource usage it can add more alus
04:45 gentlewonder: but yeah unified vliw shader engines have OoO caps, but specialized like r400-r300 and nv34 and utgard do not
04:45 gentlewonder: but ringbuffer is in-order always for the MMIO FF
04:47 gentlewonder: if things need an alu, alu units will be forwarded for upto one bundle in for 4consecuitive OPS of shading, if it is a pseudo ringbuffer op, it executes in place in the FF without using shading alus.
04:48 gentlewonder: i do not think there are multiple barrells of rasterizers i.e instances of any of the MMIO ff pipelines.
05:07 gentlewonder: so the FF is happening before the fragment shading, and all it does is lock step feed pixels/threads in it's FF pipelining in the order to the VLIW frgment pipeline
05:12 gentlewonder: SO one needs to look at the pipeline of ogl do understand how this thing works, however compute shaders on the latest OGL pipe are async somehow, this pipeline i maybe do not entirely know.
05:14 gentlewonder: i did not follow the intels jekstrand hw semaphores and fences talks very carefully in that type of pipeline.
05:20 gentlewonder: what i think i remember and need to relook, that compute shaders plug to the pipeline before the fragment shader and after the rasterizer, but can work async to fragment shaders too, so programmers responsibility is to sort of like use the barriers in between them right
05:27 gentlewonder: i studied the first, the most complex SIMD/VLIW unified shader architectures and OoO cores similar to mali latest gpus, soon nvidia :) before i realised finally that those are the most complex cores in current technology, but i studied them well and became an expert on those as well.
05:29 gentlewonder: in case of GCN this is in-order issue which is guaranteed from empty issue window and inflight buffers arbiters round robin that out-of order completitions of in-order executions look like in-order issues :)
05:37 gentlewonder: those simd wf arbiters , fetch arbiters reorder stuff into issue queues in the center of the chip, and get the current fetch wfid, later they can be consumed if wanted in the execution order.
09:55 tagr: skeggsb: any thoughts on the syncfd/syncobj patche series?
10:00 karolherbst: tagr: is that required to fix that flickering issue with mutter?
10:02 tagr: well, at least conceptually it should fix that
10:03 tagr: the idea is to exchange a sync FD between the two driver so that they can wait on each other
10:04 tagr: but I may not have tested it under the particular situation that you're running into
10:04 karolherbst: mhhh
10:04 karolherbst: I am wondering why that is needed at all as with i915+nouveau it also more or less works? Or maybe it doesn't...
10:05 karolherbst: but I don't see tearing/flickering at least
10:05 karolherbst: also not in below 60 fps workloads
10:05 karolherbst: not sure if that is solved inside i915 or nouveau though
10:05 karolherbst: or if the solution is crappy performance wise as I know we have performance issues when offloading a lot of data
10:06 karolherbst: like only being able to saturate 30% of the PCIe bandwidth and stuff...
10:06 karolherbst: and still running into issues
10:06 tagr: karolherbst: yeah, what's a bit confusing to me is that it doesn't happen at higher EMC frequencies and reclocking
10:06 karolherbst: I assume stuff is just finished earlier
10:06 karolherbst: and the hw is fast enough to finish its job before it gets accessed or so
10:07 tagr: because if this was purely synchronization then I would expect it to also happen (although perhaps not as frequently) with reclocking
10:08 tagr: hm... I'm not exactly sure how the synchronization works on the GPU at this level
10:08 karolherbst: tagr: mhhh, can you select clocks between min/max and see if lowering step by step makes it worse?
10:08 tagr: I was assuming that the eglSwapBuffers() (or whatever equivalent there is) was ensuring that rendering to the buffer was complete
10:08 karolherbst: tagr: ohh, I think I remember something
10:08 karolherbst: we had an issue like that with i915+nouveau at some point
10:08 karolherbst: and intel started to sync hard on the offloaded thing somehow
10:09 tagr: karolherbst: on Jetson Nano there are only two frequencies, but I may be able to try on Jetson TX1 where we have more
10:09 karolherbst: which also means if the offloaded thing runs below 60 fps, i915 runs below 60fps as well
10:09 karolherbst: and the full desktop starts to stutter
10:09 tagr: karolherbst: okay, makes sense
10:09 karolherbst: and this also caused weird flickering/tearing issues :)
10:10 karolherbst: not tearing, but tiles where either from the previous rendering or the current one
10:10 karolherbst: and you saw a rough line
10:10 tagr: that's basically what the syncfd patches are supposed to resolve, the idea is to take that sync FD from nouveau, pass it into KMS during the page-flip and have the page-flip wait for the sync FD to signal before actually performing the flip
10:11 karolherbst: right
10:11 karolherbst: but I am wondering why i915 can already do that
10:11 karolherbst: but I think there is a way to do that on the kernel level
10:11 tagr: that wait is currently a software wait, but we could potentially implement that via Tegra's syncpoint mechanism (though I'm not sure if Tegra210 support that)
10:11 tagr: (on the GPU side)
10:12 tagr: I suppose there's perhaps a way to do that using DMA-BUF
10:12 karolherbst: but I also don't remember who I was talking to back then when I found this "regression" as my desktop started to get laggy under heavy load and it wasn't the case before :)
10:12 karolherbst: and it was caused by a kernel update
10:12 karolherbst: and I used to have a patch to disable that
10:12 karolherbst: but I lost that one
10:12 tagr: but again, I'm wondering why it's even necessary because eglSwapBuffers() is supposed to block until all rendering is done, isn't it?
10:13 karolherbst: I think the driver has to do something
10:13 karolherbst: for scanout I mean
10:13 karolherbst: soo.. you have the rendering, but also the compositing
10:13 karolherbst: and both need to sync
10:13 karolherbst: and the compoistor also needs to sync on the GPU it offloaded stuff to
10:14 karolherbst: or well.. the device used for scanout has to do that
10:14 karolherbst: and I think that's what i915 added at some point
10:15 tagr: so what we do in the display driver is implement ->prepare_fb(), which does drm_gem_fb_prepare() and pinning of memory
10:15 karolherbst: so.. before that you even had to disable vsync on two levelts: inside the application _and_ inside the compositor, and it still could mess up
10:15 karolherbst: with the new thing you only need to vsync inside the compositor
10:15 karolherbst: more or less
10:15 karolherbst: mhhh
10:16 karolherbst: yeah.. I don't remember the details sadly, but there is something somewhere inside i915 showing how they are doing this
10:16 tagr: drm_gem_fb_prepare() does the magic that's needed to make the syncfd work
10:17 tagr: memory pinning should take care of caches, but that shouldn't be relevant here because display accesses memory directly
10:17 tagr: so as long as the GPU has finished work and flushed its caches, display should be able to see the whole frame
10:17 tagr: but I suspect that that's not what's happening
10:18 tagr: I'll take a look at i915 to see if I can find anything
10:25 tagr: hm... I wonder if perhaps we're releasing the scanout buffer too early
10:25 tagr: so that we don't actually have a problem with the GPU still rendering to a buffer that's already being scanned out, but rather the GPU already starting to render to the next frame while that buffer is still being scanned out
10:26 tagr: but that really shouldn't be happening with double-buffering
17:44 gentlewonder: And btw, issue queues and instruction buffer, and dispatchers are optional, and bunch alus of various kinds on SIMD unified. So hence NVIDIA acquisation of ARM is very controversial decision, it is bad decision imo. ok arm allready wasted large amount of silicon and metal resources, they have lot of software for their ISAs around the world, but new tech comes like GERAM, carbon nanatubes and software wise they are (ARM) lot weaker than XILINX or
17:44 gentlewonder: INTEL. for their configurable fpgas. They have good nonleaking transistors and flash tech, and can change their wiring technology. So good deal for ARM looking bad for NVIDIA.
18:00 gentlewonder: miaow can be optimized in great detail, cause inflight buffers are wires, 40x40 fetch arbiters are allready there start with, they can lead to decoding arbiters of the same amount, getting rid of dispatcher , instruction buffers and also issue queues, then the functional units of long latency ops should be lost (however this needs software support later), and add more adders and or multipliers, like on fpga's
18:11 gentlewonder: i mean those long latency ops they can be done on chip in place execution also on fpgas routed from frontend to new intrinsics, but sw intrinsic in case of new algorithms is imo more flexible.
18:13 gentlewonder: cause reformating and generating all the new intrinsics on the chip takes a quite some time to synthesize and reconfugure everything.
18:19 gentlewonder: for instance on fpga's they adder for luts and interconnection networks, they have 32 bit flipflops, and on their new transistors, forming a bus of wires arbitrated over takes hence 32LUTs for 1024wired bus.
18:26 gentlewonder: with a smarter carry based addition , this can be optimized down to couple or slightly more LUTS, needs a bit of work, then rest of the digital switches like those transistors or flash cells even can be used to form a powerful parallel many-core allready.
18:49 gentlewonder: Now the MMIO we talked that ff stuff, gradutes in-order and feeds the pixels to the programmable stages, now MMIO as the name implies on the other hand is handled by IO bus which is in something like AXI4 amba or pci , pcie , agp and such
18:50 gentlewonder: i am not parctularly very big expert on those yet (requires one free day more :))
18:50 gentlewonder: but those are master slave based pipelined transactional buses, many transactions can be in flight
18:52 gentlewonder: they offer bursts and locked master slave relations , and it is a higher pilotage, but those transactions complete pipelined and out or in order, both are possible as i remember.
19:00 gentlewonder: if by definition they can be out-of-order or async so those terms are sort of like hand to hand, implying that when there is async burst things can also complete ooo
19:01 gentlewonder: those are synopsys IP cores that i minorly looked at, this async burst comes with fast devsel capability of the cycle 2 which is data out phase which can be quite long
19:01 gentlewonder: the address cycle or address phase can be probably also self-timed and nonclocked, this is idsel stuff.
19:02 gentlewonder: so idsel and devsel and their graduations according to wikipedia are also not clocked, maybe only on wishbone bus they are, dunno
19:09 gentlewonder: in either case DMA engine or MMIO is async in respect to the shaders, but i think when you go doing MMIO from the cpu to the device and you have write combined memory on both sides, this is going to end up scheduling very fast
19:27 gentlewonder: anyways, whisbone bus left me an impression that they may not have the async non-clocked capabilities, however if people care about this interconnect, this can be easily added.
19:33 gentlewonder: if you have IO arbiters in sw and there have to be, cause everything that goes to GPU should be non-hazardous on the cpu source iobus side, i.e always in order for any GPU
19:34 gentlewonder: then let's leave the paradigm the same, no out-of-order bus transaction completitions, i.e bursts of the same sizes for MMIO
19:41 gentlewonder: so all this is not very relevant, since uncachable write combined stuff will be processed right away, and current IO infra needs no changes hence.
19:51 gentlewonder: instead of chnging the IO code, for in-order processors i plan to make a new scheduler all in all, which will be compatible with the current IO, zero-cycle MMIO should be the result, currently inside branches it is one cycle in mesa probably
19:54 gentlewonder: but the state tracking as i talked longer time ago, all the state that program generates, has to be registered into compressed buffers to get rid of the state tracking delay, that needs a bit of end-user program parsing and generating this upfront from the drivers flow
19:54 gentlewonder: then compress it, and starts to run it without cpu overhead anymore
19:54 gentlewonder: present
20:02 gentlewonder: that the state tracker of OPENGL server client model has to be there, isn't wrong programming of course, though in opengl it is bad programming to use cpu conditionals, then still it is entirely allowed.
20:10 gentlewonder: I head off now finally, i have not got much to do around Opengl , than only to look at the compute shader including 4.6-5 or some ogl pipeline.
20:11 gentlewonder: i do that asap. but not during near weeks.
20:19 KungFuJesus: imirkin: Where should I look if I suspect texture filtering may be putting the card into a specific little endian mode?
20:19 imirkin: KungFuJesus: i saw your comments earlier ... that makes zero sense
20:19 imirkin: (i believe you, but it makes zero sense)
20:20 imirkin: the card doesn't have a big-endian mode. it's always little-endian.
20:20 imirkin: the card does have some facilities that make operating it from a big-endian cpu a little more pleasant
20:24 KungFuJesus: interestingly...the swizzle in this file is different based on endianness: https://cgit.freedesktop.org/nouveau/xf86-video-nouveau/tree/src/nv40_xv_tex.c#n174
20:25 KungFuJesus: but "xv" in this is probably "xvideo"?
20:25 KungFuJesus: I think that file's barking up the wrong tree
20:25 imirkin: yes, xv is xvideo
20:25 imirkin: let's seee....
20:25 imirkin: ah yes
20:25 imirkin: but note that it's only for A8L8
20:25 imirkin: not for e.g. ARGB8
20:26 imirkin: and i think that's because it's trying to do something dodgy to begin with
20:26 KungFuJesus: freaking ARB, there's too many texture formats lol
20:26 imirkin: it's also just as likely that this never worked correctly
20:27 KungFuJesus: I'll grab some screenshots for you with and without texture filtering enabled
20:28 KungFuJesus: It's nothing something I can really observe all that well in doom because every surface is textured, so basically when texture filtering parameters aren't set you get a big sea of white polygons
20:29 KungFuJesus: there's definitely some glColor being set for the mario logos, though, heh
20:29 imirkin: so ...
20:29 imirkin: i'd encourage you to test simpler things than "doom"
20:29 imirkin: e.g. if you have a theory that linear texturing is fubar
20:29 imirkin: then write a *trivial* application
20:29 KungFuJesus: yeah, I tried without any luck to produce a minimal example with a single texture
20:29 imirkin: that demonstrates this
20:30 imirkin: piglit is good for this
20:30 KungFuJesus: also, did your G5 die a liquid cooling system death?
20:30 KungFuJesus: Mine tried to but I converted my quad to quad air cooling :)
20:30 imirkin: it wasn't one of the liquid-cooled ones
20:30 KungFuJesus: ah
20:31 imirkin: the power supply just went sour as far as i can tell
20:31 imirkin: i tried removing it
20:31 KungFuJesus: oh, well that's not hard to find a replacement for
20:31 imirkin: but you basically have to remove EVERYTHING if you want to take it out
20:31 KungFuJesus: yep
20:31 KungFuJesus: I tried to flush and replenish the cooling system...it did not go well. I suspect the pumps have just lost their head pressure for the thermal load I was giving it
20:32 KungFuJesus: turns out, the 2.4 GHz heatsinks are good enough. You just need a hacksaw
20:32 imirkin: yeah, i just have no idea how to take it apart
20:32 imirkin: i looked at some instructions
20:32 imirkin: but they were much more involved than i was willing to go
20:32 imirkin: and that says something ... i'm not so easily discouraged.
20:34 KungFuJesus: yeah you have to take out the whole board
20:35 KungFuJesus: I tried to buy a replacement LCS + CPU from macpartstore.com...they're pretty sketch. Somehow they ended up sending me basically an entire Powermac G5 when the first LCS didn't work. So yeah, I have a spare parts machine
20:36 KungFuJesus: Their idea of testing a "working system" is to see if it boots and can work at idle temperatures. Even at idle the thing would runaway eventually
20:38 KungFuJesus: oh btw I think there's some unchecked path somewhere in the texture upload too. Wish I saved the error message, but eventually one of the "pushed" parameters in nouveau gave some failure message and then my file system stopped working :-/. The "hires textures" feature of the game caused this. It probably ought to have panic'd the kernel. I can probably reproduce it, but if it's barfing on file
20:38 KungFuJesus: system memory I'm not sure I want to reproduce it.
20:45 KungFuJesus: imirkin: I'm using the "legacy GL" renderer (v1.3 max features) just to minimize the complexity. It does still use shaders, though
20:45 imirkin: the hardware uses textures too
20:45 imirkin: i actually find GL 2+ easier to use the than the fixed function pipeline
20:45 imirkin: s/textures/shaders/ of course
20:47 KungFuJesus: https://imgur.com/a/Y8aGsDX
20:47 imirkin: yea.....
20:47 imirkin: curious though
20:47 KungFuJesus: actually the glColor components might be right in all of those, it could be the alpha setting for some of the textures
20:48 imirkin: oh hm. i guess if it's ABGR, then pure red would look the same.
20:48 KungFuJesus: but unlike in doom, GL_UNPACK_SWAP_BYTES does not invert the problem
20:49 imirkin: again, i'd strongly recommend making a minimal repro
20:49 KungFuJesus: I have verified through those printfs in that window (against a working little endian machine) that the values are all correct so there's no endianness bug in reading the textures
20:49 KungFuJesus: that's the problem, I can't seem to find the minimal stack of GL calls that make this happen
20:50 imirkin: forget that
20:50 imirkin: just write a program that tests your thesis
20:50 KungFuJesus: tried to all of Sunday and came back empty
20:50 imirkin: hm ok
20:50 imirkin: so it's nothing so simple.
20:50 KungFuJesus: what's weird is that in my attempts to reproduce this, GL_UNPACK_SWAP_BYTES was ignored. I couldn't make the endianness of the textures wrong when I had tried
20:51 KungFuJesus: (other than swizzling manually in memory, of course)
20:51 imirkin: so that supports my theory that this is not the linear vs flat thing as you think it is.
20:51 imirkin: oooh, i wonder what happens if you add SRGB to the mix
20:51 imirkin: can you check if srgb is enabled for the failing cases?
20:52 KungFuJesus: sure, how do I do that?
20:52 imirkin: look for a glEnable(something)
20:52 KungFuJesus: wasn't aware that GL had colorspaces
20:52 imirkin: it doesn't. just srgb.
20:52 KungFuJesus: hah, there's a lot of glEnables
20:52 KungFuJesus: let me grep
20:53 imirkin: hold on
20:53 imirkin: actually it'd be a different internal format for the texture
20:53 imirkin: although for framebuffer it'd be like glEnable(GL_FRAMEBUFFER_SRGB)
20:54 KungFuJesus: so both of these examples are using SDL for the windowing management...
20:54 imirkin: and yeah, check the precise GL format of the textures
20:54 KungFuJesus: wonder if that's the link
20:55 KungFuJesus: if I pop glEnable(GL_FRAMBUFFER_SRGB) into my minimal test code, would that work? Should I search the apitrace?
20:55 imirkin: welllll
20:55 imirkin: that'd make the fbo srgb
20:56 imirkin: but it sounds like maybe the textures need to be srgb to fail
20:56 imirkin: like ... i totally believe that we never notice that we got our textures messed up
20:56 imirkin: and it just happens to work out
20:56 imirkin: coz who cares which component is which
20:56 imirkin: but srgb affects colors but not alpha
20:56 imirkin: which would create an imbalance there
20:56 imirkin: it could also be doing srgb stuff in the shader directly, leading to the same problems
20:57 imirkin: although again, that wouldn't depend on sampler settings
20:57 imirkin: so srgb is the only "other" thing the sampler does
20:57 imirkin: which could cause this
20:58 KungFuJesus: I can tell you the texture formats, one sec
20:59 KungFuJesus: ugh, hold on, need to rebuild qapitrace
21:00 imirkin: oh
21:00 imirkin: btw, there's some fun issues with apitrace
21:00 KungFuJesus: right, non-native endianness bug
21:00 imirkin: if you take the apitrace recorded on BE and replay it on LE
21:01 imirkin: i filed a bug about that a while back
21:01 KungFuJesus: yep, this is on the BE machine, shoudl be good
21:01 imirkin: https://github.com/apitrace/apitrace/issues/601
21:02 KungFuJesus: man 4 cores are not enough on a 90 nm machine from 2005
21:04 imirkin: i treated the G5 same as my ARM boards - nfsroot, etc. it was about as powerful ;)
21:07 KungFuJesus: lol
21:07 KungFuJesus: ah, it was a qt lib that needs rebuilding. apitrace dump it is
21:10 KungFuJesus: GL_RGBA seems to be the "format" for the texture creation calls. Should I be looking for something else?
21:11 imirkin: that's legal
21:11 imirkin: but actually you're looking for internalFormat
21:11 imirkin: not for format
21:11 imirkin: in the glTexImage* calls
21:12 imirkin: should be the first arg, iirc
21:12 imirkin: format/type are at the end and specify the format/type of the data
21:13 KungFuJesus: dpaste.com/86PWMEPJW
21:13 imirkin: right ok
21:13 imirkin: GL_RGBA is also legal for internal format :)
21:14 imirkin: SWAP_BYTES won't have any effect for GL_UNSIGNED_BYTE data type, iirc?
21:14 imirkin: i never quite remember how that pack/unpack stuff works precisely
21:14 KungFuJesus: imirkin: it's not supposed to, though I think the PACK_ALIGNMENT can be affected by it
21:15 imirkin: all i know is that mesa handles it :)
21:15 imirkin: and it magically comes out into a normalized set of formats on the driver end
21:15 KungFuJesus: is there a different internalformat I should try?
21:15 imirkin: no
21:15 imirkin: i was hoping this would be an srgb format
21:16 imirkin: there's some other way to flip on srgb though
21:16 imirkin: can you just grep through the dump for SRGB?
21:16 KungFuJesus: sure
21:17 KungFuJesus: grep -i for srgb came up empty
21:17 imirkin: ok. so i'm wrong.
21:17 imirkin: could be the mechanism of texture upload where things get messed up
21:18 KungFuJesus: dpaste.com/HTL7GF38D
21:19 KungFuJesus: do those "mask" values look correct?
21:20 imirkin: for ARGB sure
21:20 KungFuJesus: are those binary masks against a 32bit word?
21:20 imirkin: yes
21:20 KungFuJesus: eh, it's RGBA, though
21:20 imirkin: how do you know
21:20 imirkin: this is the X visual
21:21 imirkin: BGRA (which is ARGB-int32 in little endian) is quite common
21:21 KungFuJesus: oh, I'm looking at the attrib list just before it for chooseVisual
21:21 imirkin: and frequently the only scanout-able thing
21:22 imirkin: that's just about the number of bits present
21:28 KungFuJesus: where can I intercept the texture upload path?
21:29 imirkin: you really first need to get a minimal repro
21:29 imirkin: try to figure out what's special about this
21:30 KungFuJesus: It's unfortunate the graphics pipeline's immediate mode is pretty much a lie. It'd be helpful to be able to walkthrough the code to see at which point the textures are completely wrong. However, even with fixed function hardware, the issue probably takes place long before anything drew to the screen
21:34 KungFuJesus: with this dump I think by frame 3 we have the texture issues
21:50 KungFuJesus: I uploaded the apitrace for this to that bug
21:50 KungFuJesus: https://gitlab.freedesktop.org/mesa/mesa/-/issues/1167
22:13 KungFuJesus: anything in those apitraces that look remotely suspect?
22:28 KungFuJesus: imirkin: I think I have a minimal working example, maybe
22:29 KungFuJesus: https://imgur.com/a/TTFKOnT
22:33 imirkin: cool
22:33 KungFuJesus: I first tried that with a brick texture. What was misleading was the fact that only the white mortar looked off, and it was only slightly yellow
22:34 KungFuJesus: I think red textures are a bad test for this, it may have been easier to produce all along
22:34 imirkin: and flipping it between linear and nearest causes the difference?
22:34 imirkin: yeah, red is bad
22:34 imirkin: since RGBA -- A and R are interchangeable if they're both 1 :)
22:34 KungFuJesus: Neither make a difference, I don't think, but let me try
22:35 KungFuJesus: nope, they both screw it up :)
22:35 KungFuJesus: but it's pretty basic, hah
22:35 imirkin: yay!
22:35 imirkin: so it probably has more to do with texture upload
22:35 imirkin: than anything else
22:37 imirkin: there are probably different paths ... or formats ... and they get screwed up
22:37 KungFuJesus: totally ripped from the internet from like a novice example of textures in GL, but it works
22:37 imirkin: don't argue with success
22:38 KungFuJesus: alright, so I'll upload the PPM texture + source code on the bugzilla?
22:38 KungFuJesus: err, gitlab
22:38 imirkin: i mean ... heh. you can. but i probably won't look at it
22:38 imirkin: if you have questions how something in particular works, happy to answer
22:42 KungFuJesus: how can I verify the correctness of the texture uploads within nouveau/mesa?
22:44 imirkin: yeah ... so it's a total mindfuck
22:44 imirkin: coz it's not important what data is there. it's important how the GPU interprets that data.
22:45 imirkin: and if you interpret it on the CPU, then you might be in for a surprise.
22:45 imirkin: what format are you uploading the data as?
22:45 KungFuJesus: "GL_RGBA"
22:45 KungFuJesus: lol, whatever that ends up being
22:45 imirkin: no, i mean
22:45 imirkin: the bytes
22:45 KungFuJesus: oh, uncompressed texture
22:45 imirkin: GL_RGBA + GL_UNSIGNED_BYTE ?
22:45 KungFuJesus: yep
22:47 imirkin: ok, so that ends up being uploaded as BGRA8888_UNORM
22:47 imirkin: which is an endian-sensitive format
22:48 imirkin: (mesa core will do the byte-swapping for you)
22:48 imirkin: so ...
22:49 imirkin: there's some shit where the display formats are a bit retarded and lies
22:49 imirkin: my guess is that the texture format might be unpacked? dunno
22:50 imirkin: but otoh, bgra8888 being broken would show up in lots of places. dunno.
22:50 imirkin: i gave up pretty quickly ;)
22:50 imirkin: i seem to recall it also mattered how the texture was uploaded
22:50 imirkin: i had problems with vertex/index buffers though
22:50 imirkin: sorry for being non-specific. it's a tough problem.
22:50 imirkin: you have to really wrap your head around what's going on
22:51 imirkin: i never did.
22:53 KungFuJesus: gotta head out, will explore this later, though