03:13rhyskidd: Lyude: yup, interested
03:14rhyskidd: access to displayport standards via membership would be useful (I don't have an employer who can provide that otherwise)
03:14rhyskidd: you may have seen that nv released register docs on their display hardware's crc functionality, so another block is there to support nouveau testing via igt
03:15rhyskidd: but, i'm still limited in the time i can spend on nouveau, so if the number of memberships is limited in anyway -- i'd rather they go to other interested parties first before me
03:16rhyskidd: i should be a current X.org member
04:03imirkin: rhyskidd: Lyude has been working on adding crc support to nouveau
04:20rhyskidd: ah ok -- i've been off irc for a little while
04:20rhyskidd: Lyude: how is that going?
04:22imirkin: this is the latest thing that's published, but i think there's been more work since: https://gitlab.freedesktop.org/lyudess/linux/commits/wip/nouveau-crc-v1
12:08AndrewR: hm, so I was reading https://www.khronos.org/opengl/wiki/Compute_Shader and finally I got even wague understanding of why all those shader storage/image extensions exist :} But my nv92 still out of luck because those compute shaders requre something newer (hw wise) than it has on-chip?
12:09AndrewR: also, what was problem with images in openCL (on nouveau)? I naively assumed they should be 'like textures' but then ..I realized later GPU have their sort-of MMU for reason? Like, you can upload it into logically-contious space, yet physically it will be in different areas of memory and other way around ?
12:12karolherbst: AndrewR: well... the reason why we don't have compute shaders on tesla are quite silly
12:12karolherbst: technically we could do compute shaders, but..
12:12karolherbst: 1. we can't do ssbos and/or images inside fragment shaders
12:13karolherbst: that's actally the reason
12:13AndrewR: karolherbst, thanks....
12:14karolherbst: AndrewR: you could just force on the extension and see what happens though...
12:14karolherbst: we could even fix all the bugs
12:14karolherbst: but we can't expose it because of the spec
12:15karolherbst: AndrewR: images allow you to write into texture data
12:15karolherbst: which has all sorts of implications
12:16karolherbst: also.. we have to fix clover in a few places, where it enforces AMD like behaviour
12:19AndrewR: karolherbst, I don't have any useful (borrowed from internet) code to run there yet :} I was thinking about mpeg-like decoder/encoder, but this usually involve more than parallel stages ... and myself definitely not capable enough for even just understand math behind even simples decoder :/ so, if there will be example on the net ..I can try to see what may happen if I force those shaders on on nv92
12:19karolherbst: AndrewR: well, OpenCL is usually better suited for that kind of work
12:22AndrewR: karolherbst, but it still not quite here because reasons you outlined :} and myself is nowhere good enough for fixing this .....
12:28AndrewR: I mostlys ee vendors promising big things, not code itself :} http://www.radgametools.com/bnkhist.htm - "Bink 2 now has GPU video decoding! This is generally two to three times faster than CPU-only decoding alone! BinkGPU uses the CPU to decode the bitstream, and compute shaders on the GPU for everything else! Available for Windows DirectX 11, Linux GL 4.3, Sony PS4 and Microsoft Xbox One. " So, this was used...in one form or another (I just remember yo
12:28AndrewR: u wanted to demo something in 2020, and hoped to provide something simple ....well, apparently even simple things aren't _that_ simple)
12:29karolherbst: but that uses graphics though
12:34AndrewR: karolherbst, https://pdfs.semanticscholar.org/7a16/60f0f2dad76481f3223d0a8048ebb213fc3a.pdf - "Partial Image Decoding On The GPU For Mobile Web Browsers", 2014 ....
12:54karolherbst: imirkin: uff.. one value isn't processed at all :/
13:03karolherbst: it's not a technical bug
13:03karolherbst: it's a bug inside the RA algorithm
13:05karolherbst: seriously... I'd just port over to util/register_alocation because of the lacking documentation and the terrible code
20:22imirkin: karolherbst: remind me the TGSI of the "minimal" failing shader?
20:23karolherbst: imirkin: https://gist.github.com/karolherbst/9214680816fa1c981e46b20d1e6570af
20:23karolherbst: it got even smaller
20:24karolherbst: ra output: https://gist.githubusercontent.com/karolherbst/0b5fb5b39e9ff4011c37147ddf0f33cb/raw/24bff0117e706eb22220de4ce1ffa54a2e43dfde/gistfile1.txt
20:24karolherbst: vale 156 is placed into this "hi" list
20:25karolherbst: because of deg 60/60
20:45imirkin: it should def retry, so if it's not retrying, something very weird is going on
20:49karolherbst: well, it retries.. but.. it marks that value for spilling, and the next round just does the same. The thing is just, why should that value be spilled anyway?
20:49karolherbst: there are enough regs
20:49karolherbst: but then.. why does it work with other gens..
20:49karolherbst: it works with c0 as well
20:51karolherbst: 0xf0: RIG_Node[%158]($-1): 4 colors, weight inf, deg 84/252
20:52karolherbst: so.. even thoug 84 seems high, it still good
20:52karolherbst: 0xc0: RIG_Node[%158]($-1): 3 colors, weight inf, deg 48/61
20:52karolherbst: now we have 48
20:52karolherbst: I guess that[s because it's 96 bits
20:53karolherbst: the "degree" calculation is doing something stupid
20:55karolherbst: or maybe we just have to adjust RelDegree a bit? ....
20:55karolherbst: the lack of documentation here is just super annoying
20:56karolherbst: why was that RelDegree table choosen that way
21:00imirkin: yeah, there's an issue with how we spill wide values
21:01imirkin: so we spill it, but it doesn't end up reducing register pressure at all
21:01imirkin: coz of the constraint moves we re-add
21:01imirkin: i've seen this before
21:01karolherbst: well, that's before spilling though
21:02karolherbst: so.. we calculate this degree and degreeLimit
21:02karolherbst: if degree >= degreeLimit, we add it to the list of values we want to spill
21:02karolherbst: this value obviously doesn't have to be spilled, it even has a spill weight of inf
21:02karolherbst: although, that might actually be another issue
21:03imirkin: there are a _lot_ of issues
21:03imirkin: imo attempting to redo RA entirely is a giant boondoggle
21:04karolherbst: sure, but that might give as the benefit of having code involved, others understand as well
21:04imirkin: instead focus on moving away from libdrm_nouveau so we can have threading
21:04imirkin: that seems like a wildly better use of time
21:05karolherbst: ohh, I agree, I am just waiting on skeggsb on that, as he wanted to give me something for that, or let's say we are in discussion about the entire topic and he wanted to share something with me
21:05karolherbst: so I decided to wait on that
21:07karolherbst: imirkin: anyway... I am even convinced it's not even fixable with the current UAPI.. or only with a big pile of pain around it.
21:07karolherbst: and libdrm_nouveau is just a small issue overall
21:07karolherbst: this entire bug is bigger than just that
21:08karolherbst: how I see it, we require resizable pushbuffers and have to rewrite all of nvc0 to have this "one pushbuffer per operation" thing going on
21:08karolherbst: otherwise threads will just step over each others toes
21:09karolherbst: and we have to get rid of this pointless context switching we have as well
21:10karolherbst: but right now we allocate a bo, so what happens if that's too small?
21:10karolherbst: sure, we can allocate a bigger one and copy data
21:10karolherbst: but this still doesn't fix the issue, that contexts can just mess with the hw context state as they wish
21:10karolherbst: and at some point they are out of sync with the state each gl context assumes it has
21:11imirkin: i think you've gone too far.
21:11karolherbst: I am convinced that's what has to happen in the end. It doesn't have to be done in one step though
21:11imirkin: there's an approach that lets you not do all this
21:11imirkin: the basic problem right now is that we don't control when a submit happens
21:11imirkin: and thus when a kick happens
21:12imirkin: which messes everything up
21:12imirkin: an API that makes this explicit rather than explicit would allow us to fix everything
21:12karolherbst: sure, that's the pushbuf part of the issue
21:12karolherbst: and everybody agrees that implicit pushes are bad
21:12karolherbst: how do you verify the gl state of a context matches the hw state, a long as every other context can just do submits
21:12imirkin: same way we do it now
21:12imirkin: only one context is active, etc
21:13karolherbst: who says it's not broken?
21:13imirkin: i do.
21:13karolherbst: how are you able to verify that with multiple threads?
21:13karolherbst: then it's pointless to do multiple threads
21:13imirkin: the implicit push is what killed the locking scheme
21:13imirkin: yes, entirely pointless.
21:13imirkin: we can't change the hardware.
21:13karolherbst: we need a solution which doesn't involve locks
21:14karolherbst: radeonsi does the state tracking thing as well
21:14imirkin: threads aren't entirely pointless btw
21:14karolherbst: and I am sure they don't lock threads
21:14imirkin: they allow you to do a bunch of stuff in parallel, like DMA ops, etc
21:14karolherbst: what about CPU overhead?
21:14karolherbst: that's not irrelevant either
21:14imirkin: another good reason for threads
21:14imirkin: but none of that is defeated by locking around submits / current ops
21:14karolherbst: if we lock it becomes an issue
21:15imirkin: all we're doing is just serializing the core interactiosn with hw
21:15imirkin: i think that's perfectly reasonable.
21:15karolherbst: ehhh, that goes beside the point
21:15karolherbst: I was meaning what we do about the graph context state
21:15imirkin: it's attached to the context
21:15karolherbst: as every gl context could just change it
21:16karolherbst: stepping over other gl contexts
21:16imirkin: and passed around when switching the "main" context
21:16imirkin: yeah, that could cause a bit of extra emits
21:16imirkin: not a big deal imo
21:16karolherbst: how can we do that, if we have 4 active threads doing context operations?
21:16imirkin: "don't do that"
21:16imirkin: that's also not how software works.
21:16karolherbst: well.. software might do it
21:16imirkin: sure. we can't optimize for everything all at once.
21:16karolherbst: I don't want to go for a solution, which already has known issues
21:17karolherbst: because then we just have to fix it later again
21:17imirkin: it's not something that would cause breakage
21:17karolherbst: it might cause wrong rendering or something
21:17karolherbst: maybe crash the context
21:17imirkin: doing draws from multiple contexts sucks. there's no real way around it.
21:17imirkin: definitely not
21:17imirkin: you re-emit the things that you need to when switching contexts
21:17imirkin: and you keep track of the "current" graph state
21:18imirkin: this is all already done.
21:18karolherbst: but then you have the concurrency issue
21:18imirkin: what issue
21:18karolherbst: multiple threads doing context operations and you end up switching the main context while another thread is still busy
21:18imirkin: you block that switch. that's the whole point of locks.
21:18imirkin: also at the gallium level, this only happens when you call draw()
21:18karolherbst: but then we are at perf issue again ;)
21:19karolherbst: and there is a solution without locks already
21:19karolherbst: radeonsi is doing it
21:19imirkin: that solution is to make everything a lot slower
21:19imirkin: i don't think that's a great solution.
21:19karolherbst: well, and I don't think throwing it locks is a solution either
21:19imirkin: instead of only re-emitting the things that need changing, you re-emit everything
21:19karolherbst: it's a workaround
21:19karolherbst: uhm.. no
21:19karolherbst: you can do a diff
21:19imirkin: diff against what
21:19karolherbst: you just need to track the state
21:19imirkin: we already do that.
21:20imirkin: or something like that
21:20imirkin: and it's copied when swithcing the "active" context
21:20karolherbst: I know it tracks a bit
21:20karolherbst: but not everything
21:20karolherbst: or maybe it's everything
21:20imirkin: It Works (tm)
21:20imirkin: if you think otherwise, just read more code, until you agree.
21:21imirkin: look at what it tracks, and look at what we do when switching active contexts
21:21karolherbst: but the thing is, that we can do the one submit per draw thing, we just need to rework how we do the tracking, how we update it, and have a solution which doesn't require us locking everything
21:21imirkin: lockign around draw is eminently reasonable.
21:22karolherbst: as long as we push out the things in one go without other threads interrupting
21:38karolherbst: imirkin: another thing which is a bit annoying are operations on screens, where we write into the push buffer, which might not trigger a push, but which push buffer should be used and we still have the resizing issue there
21:40imirkin: which is why you must have exactly one "active" context
21:42karolherbst: but what if that happens from a different thread?
21:43karolherbst: I mean sure, we could solve this with more locking, but then not only draws lock, but other calls as well
21:53imirkin: basically only draws and to a much smaller degree things like transfers will have an inner lock
21:53imirkin: there should be no locks around waits
23:37karolherbst: imirkin: btw, any idea why the reldegree table is that big? vec16 type of things?
23:38karolherbst: no ida how colors could be anything bigger than 4, as we really only care about predicates/gprs/address file, or is there something else to consider?
23:39imirkin: on nv50
23:39imirkin: we treat everything as 16-bit
23:40imirkin: i.e. each color is a 16-bit color. so up to 8, definitely
23:40imirkin: (coz mul takes 16-bit half-regs)
23:41karolherbst: sure, but the table is still 17x17, not 9x9
23:41karolherbst: not that it matters much, just wondering
23:42karolherbst: heh.. fun, my example crashes on nv50
23:43karolherbst: in nv50_ir::LValue::isUniform getInsn returns NULL, funky
23:43karolherbst: texlod 3D $r0 $s0 r___ f32 %r42 %r36 %r37 %r38 %r43
23:43karolherbst: and the lod source is checked.. wait
23:44karolherbst: it's indeed never defined
23:45karolherbst: found a bug then :) no idea if we should care though
23:45karolherbst: imirkin: textureLod called with an undefined lod value...
23:50imirkin: isUniform is kinda busted.
23:51imirkin: anyways, yeah, probably needs some mild adjustment
23:51karolherbst: well, if there is no definition, we can assume the value is uniform :) uniformly undefined :p
23:51imirkin: for nv50, texlod/bias must take a dynamically uniform value for lod/bias
23:52karolherbst: just.. depending on the implications that might not be the greatest idea
23:52imirkin: so if it's not, we do a per-lane dance
23:52imirkin: executing one lane at a time
23:52imirkin: times 4 ops
23:52karolherbst: right... just what do we do if somebody messes up their glsl code and the lod is undefined?
23:52imirkin: i don't think anything messes up bigtime, it just uses the lod from a single lane to determine which lod to retrieve from
23:52karolherbst: I am sure this leads to weird applications bugs anyway
23:53imirkin: yeah, obviously
23:53imirkin: but compiler probably shouldn't crash
23:55karolherbst: question is, what should isUniform return when there is no definition?
23:55karolherbst: true or false?