IRC Logs of #nouveau on irc.freenode.net for 2025-06-02

00:11 mhenning[d]: don't be sorry! It's a good question to ask
00:13 mhenning[d]: but yeah between that fact that there's only 2 directions and the difficulty of actually writing the abstraction I think the abstraction isn't worth it for this one
15:45 snowycoder[d]: mhenning[d]: gfxstrand[d]
15:45 snowycoder[d]: Can I ask you for advice?
15:45 snowycoder[d]: I implemented the texbar insertion algorithm in two ways:
15:45 snowycoder[d]: The first way is what codegen does:
15:45 snowycoder[d]: - For each tex instruction find the result uses
15:46 snowycoder[d]: - Use Dijkstra to compute the minimum n. of pushes between the texture push and its use, that will be the barrier level
15:46 snowycoder[d]: - Merge barrier levels
15:46 snowycoder[d]: - Cull the barrier levels with data-flow analysis of the stack size.
15:46 snowycoder[d]: The second way is another method that I dreamed tonight:
15:46 snowycoder[d]: - Use data-flow analysis to simulate where each register is on the stack
15:46 snowycoder[d]: - When we use that register, pop it
15:46 snowycoder[d]: - When we add a tex instr, push the register it writes
15:46 snowycoder[d]: My method is much simpler (200 LoC vs 500), does a lot less iterations, but it needs 255 bytes per block during data-flow analysis (1 u8 for each register).
15:46 snowycoder[d]: It should also compute less conservative bounds in some edge-cases (I can write more about this, but I didn't see it in real shaders).
15:46 snowycoder[d]: Can you just skim the Draft code when you have time and tell me if I messed up something? It seems too good to be true.
15:46 snowycoder[d]: Code is here in the last two commits: https://gitlab.freedesktop.org/SnowyCoder/mesa/-/commits/texbar?ref_type=heads
15:47 snowycoder[d]: I can also create two branches or open a Draft MR if it's more comfortable.
16:00 trevormindy: What we have today is tremendous compute power and storage capability already. I do not own such gpu which has less than 32bit integer, all atom hw and sw shaders allow 32bit integer ops yeah on GPU too, then i have powervr sgx540mp2 Google ai says it can do both 16bit and 32bit integers already from earlier like 535 that gumstix has, and r300 works on 32bit components too mostly but
16:00 trevormindy: floats, so it's the only one not able to do integer 32bit but 24bit mantissa based there, but this hardware is laptop that needs soldering on the ac contact pins, which i am too lazy to perform. As matter of fact i never met a gpu having limited to minimum allowed at es2.0. But all it matters quite few, the procedures have been tested with highest success so far. So in short you get nvidia
16:00 trevormindy: cards stable maybe next year, and by then i already maintain super powerful codepath. snowycoder[d] i had forget already some gpu code, but the description makes sense. It's the memory instruction and warp synchronization as i see.
16:38 mhenning[d]: snowycoder[d]: Sure, I'll take a look later today
16:57 lucathegreedy: technically anything related to my execution is global generic shader which is very straightforward and always appearing in the same form even on graphics, none of this type of scheduling or synchronization between dependencies is required. You'd end up just simply reordering manually that to the best performance. So nothing in my labs depends on this codepath in the compiler. If we talk
16:57 lucathegreedy: as to going back to those procedures i posted, you start off by memory loads, but the register dependencies are taken care of by the hardware, everything is symmetrically done, so 112-32=80+36 there is nothing that makes the war,raw,waw dependency to happen, everything is loaded to registers , i never depend on your work at all, cause from day one you have been kicking the guy who made
16:57 lucathegreedy: all available and made all your work and you scam around like you are the developers.
19:56 jasonram: it's actually that you should know this type of code is wrong already now, reason being the hardware according to google's AI is already that strong 25 years ago,where you might choose to implement sync and sched, games are not going to run well and use tremendous cooling etc. hence i am saying right away if you continue on such path I never am interested in any of this. (from 82.145.63.*)
19:58 HdkR: Interesting, a hostname and alias name mismatch this time around.