11:35 cwabbott: imirkin: you're talking about the last hunk, right? I don't really remember why I put it there, but I just looked at it again and I think I understand why... one optimization I left out of the commit message is that when inserting constraint moves, it will skip inserting a move for any MERGE sources that only have one use (the MERGE itself)
11:37 cwabbott: the assumption being that if we did insert a move, that move would be immediately successfully coalesced anyways so it would never need to be emitted
11:40 cwabbott: but what if you had something like "uint64 a = merge(c, d); uint64 b = ...; uint128 x = merge(a, b);"? then we'd be forced to merge "a" with "x", but those are both compound and that's not implemented properly (see the TODO added in the other hunk)
12:09 karolherbst: imirkin, Lyude: any thoughts? https://github.com/karolherbst/mesa/commit/300838cf0687f104e9c85546d13c1c09663e04f0 this seems to work just fine for nvc0. But wondering if there is more verification of the pushbuf API I should add wrappers for
12:09 karolherbst: like setting bufctx, but I think it doesn't actually touch the pb
15:03 imirkin: cwabbott: that particular if moves the MERGE/etc closer to the usage point. anyways, i'll think about that.
15:04 cwabbott: imirkin: the moving isn't the main point
15:04 cwabbott: the main point is the early return
15:04 imirkin: right, it also doesn't add the constraint move
15:04 cwabbott: the moving only happens if it's an immediate or load
15:04 cwabbott: so it's actually kind of a rematerialization
15:04 imirkin: yeah
15:06 cwabbott: the problem is with a MERGE that's the source of another MERGE
15:06 cwabbott: I think if you don't insert a mov in between it blows up
15:07 cwabbott: iirc the original shader was using a MERGE of a MERGE of a MERGE to build an 8-component value
15:08 cwabbott: and the RA doesn't support merging two values that are already compound (i.e. coalesced with a MERGE or SPLIT) so you can't really combine all 3 at once
15:09 cwabbott: it just doesn't combine the masks correctly or something
15:10 imirkin: right. thanks for the context.
15:38 karolherbst: imirkin: btw, any opinions on the multithreading fix? I think it's technically correct and that I got rid of all unexpected/random kicks. I think long term we might want something completely different, but at least with this we don't need to do a bigger rework
15:38 imirkin: link?
15:38 imirkin: fyi i have zero gitlab notifications enabled.
15:39 karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8440
15:40 imirkin: remember that nouveau_bo_map() can kick too.
15:40 karolherbst: mhhh... let's see
15:41 karolherbst: right
15:41 karolherbst: but that's not a problem
15:41 imirkin: anyways, i have work now
15:41 imirkin: i'll try to look in the evening
15:41 imirkin: but tbh i'd rather convince myself that cwabbott's thing is right
15:41 karolherbst: sure
15:42 karolherbst: mhh, nouveau_bo_wait could indeed cause issues I think... depends on where we actually call nouveau_bo_map
15:42 imirkin: yes.
15:42 imirkin: it's a giant clusterf***
15:43 karolherbst: yep
15:43 imirkin: the thing i had also worked 99% of the time
15:43 imirkin: but 1% of the time it deadlocked ;)
15:43 karolherbst: well, I have 0 deadlocks :)
15:43 imirkin: that's the right number of deadlocks.
15:43 karolherbst: yep
15:44 imirkin: anyways, even if hypothetically i'm a fan of the approach, i'll want to soak test it for a while
15:44 imirkin: coz this stuff is quite subtle
15:44 karolherbst: ohh, I think I know why nouveau_bo_map doesn't end up kicking, because there is nothing in the pushbuffer :D
15:44 karolherbst: yeah
15:44 imirkin: i was a fan of my original approach too, so ... it doesn't necessarily bode well :)
15:44 karolherbst: :D
15:44 karolherbst: any tests you were testing with?
15:44 imirkin: nouveau_bo_map doesn't *ALWAYS* kick
15:44 imirkin: but it *can* kick
15:44 imirkin: well, there are some multithreading tests in piglit
15:45 karolherbst: yeah, but with my approach, the pushbuffer is already kicked once the "draw" or whatever releases the pushbuffer
15:45 karolherbst: so there is nothing left to flush out
15:45 imirkin: i also remember warsow2 at the time was a problem. they've disabled MT on nouveau, but you can force it with something
15:45 imirkin: (e.g. renaming the driver to noveau)
15:45 karolherbst: ahh, right, warsow2
15:45 karolherbst: and chromium
15:45 karolherbst: I just checked with deqp fornow
15:45 karolherbst: *for now
15:46 imirkin: and just running various random games
15:46 imirkin: rename the driver to like noveau though
15:46 imirkin: in case people have workarounds
15:46 karolherbst: mhhh
15:47 karolherbst: imirkin: do you know which piglit tests are affected? I bet there are none called "multithreaded" or so
15:47 imirkin: actually that's exactly what they're called
15:47 karolherbst: in deqp they are
15:47 imirkin: might be like "multithread" or "concurrency"
15:47 karolherbst: sure you mean piglit?
15:47 imirkin: yes
15:48 karolherbst: ohh.. glx-multithread
15:48 imirkin: glx-*
15:48 imirkin: yea
15:48 imirkin: and friends.
15:50 karolherbst: mhh "[68/68] skip: 68" :D
15:50 imirkin: might be using MESA_multithread_something?
15:50 imirkin: which isn't implemented
15:50 karolherbst: let's see
15:50 karolherbst: ohh
15:50 karolherbst: I used gbm
15:51 imirkin: lol
15:51 imirkin: yeah that won't do it
15:53 karolherbst: mhh [9/9] skip: 4, pass: 2, crash: 3
15:55 karolherbst: that race in kernelspace is quite annoying
15:56 karolherbst: "glx-multithread: ../src/gallium/drivers/nouveau/nvc0/nvc0_winsys.h:117: BEGIN_NVC0: Assertion `mtx_trylock(&pushbuf_data(push)->push_lock) == thrd_busy' failed." :)
15:56 karolherbst: nice, all crashes are asserts
15:57 karolherbst: but yeah, some tests require GLX_MESA_multithread_makecurrent
15:58 karolherbst: ahh, nvc0_miptree_transfer_map
15:58 karolherbst: mhh
16:05 karolherbst: ohh, that one test causes some bigger issues, nice
16:09 imirkin: yeah. multithreading is annoying.
16:09 imirkin: GLX_MESA_multithread_makecurrent is not supported by anything afaik, so don't worry about that.
16:09 imirkin: it was an early idea, i guess
16:35 karolherbst: mhhh, right.. nouveau_buffer is racy as hell
16:35 karolherbst: the mm_slab stuff especially
16:35 karolherbst: but all the mm bits in the end
16:37 karolherbst: [9/9] skip: 4, pass: 4, crash: 1 :)
16:37 karolherbst: "glx-multithread-buffer: ../src/gallium/auxiliary/tgsi/tgsi_from_mesa.c:292: tgsi_get_sysval_semantic: Assertion `!"Unexpected system value to TGSI"' failed." :(
16:40 imirkin: delightful, isn't it
16:41 karolherbst: mhhh.. it's actually my fault for doing silly things :D
16:41 imirkin: was the silly thing "working on nouveau"?
16:43 karolherbst: well.. I shift 1 by values bigger than 63
16:44 imirkin: you're living too far in the future
16:45 karolherbst: yep
16:45 imirkin: ILP128
16:45 imirkin: it's coming!
16:46 karolherbst: well.. compiler already support it sometimes :D
16:55 imirkin: on 64-bit platforms
17:59 karolherbst: imirkin: actually.. I think we might want to get rid of info->sv because we really only care about certain "types" and having bools feels more natural here
18:00 karolherbst: or am I missing something?
18:00 imirkin: i don't remember what info->sv is.
18:00 imirkin: i can look tonight
18:00 karolherbst: nv50_ir_prog_info.sv for system values
18:00 karolherbst: nv50 does some weird stuff though
18:01 imirkin: we need the locations
18:01 imirkin: of the sysvals
18:01 imirkin: which vary by platform
18:01 karolherbst: sure
18:01 karolherbst: but that's handled inside the driver
18:01 imirkin: sorta?
18:01 imirkin: there's a callback
18:01 karolherbst: for system values?
18:01 imirkin: maybe not
18:01 imirkin: maybe only for inputs/outputs
18:01 karolherbst: it's just all constant data
18:01 imirkin: i forget :)
18:01 karolherbst: yeah
18:02 karolherbst: imirkin: for nvc0: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/gallium/drivers/nouveau/nvc0/nvc0_program.c#L243-265
18:02 karolherbst: checking 4 bools feels kind of more efficient than a loop
18:03 karolherbst: although that's probably optimized away
18:03 karolherbst: mhh
18:03 karolherbst: maybe not
18:03 karolherbst: well. should be
18:03 karolherbst: anyway
18:03 karolherbst: will see if there is something tricky I didn't catch
22:06 karolherbst: imirkin: I think you want to remove the turing chipset until I fix all the issues. Will enable it then
22:37 imirkin: karolherbst: yeah - looks broken :)
22:37 imirkin: was just about to ask you
22:37 karolherbst: i hit the exact same issue when running the mt tests
22:37 imirkin: but it's unrelated to mt
22:37 karolherbst: correct
22:37 imirkin: it's the sysval thing?
22:37 karolherbst: yes
22:37 imirkin: but you made it sound like it was fixable in nouveau
22:37 imirkin: whereas this is in tgsi_from_mesa
22:37 karolherbst: anyway, thanks for wiring the shim up, now we can shout at people breaking our compiler :p
22:38 imirkin: it was all at anholt's urging
22:38 karolherbst: imirkin: it's complicated... it can be fixed in nouveau, but it's also quite broken in nir actually
22:38 karolherbst: turns out, we have more than 64 sys vals now
22:38 imirkin: ok cool
22:38 karolherbst: and I was thinking instead of keeping that huge array in our compiler, we only set the fields we are actually interested in
22:39 imirkin: so why does turing hit this?
22:39 imirkin: and not others?
22:39 karolherbst: volta+ uses nir by default
22:39 imirkin: so is all nir broken then?
22:39 karolherbst: yes
22:39 imirkin: why aren't other (nir) drivers hitting this?
22:39 imirkin: oh
22:39 karolherbst: because they don't translate the MESA to TGSI enums :)
22:39 imirkin: you call some function in tgsi_from_mesa.c directly?
22:40 karolherbst: the assert is inside tgsi_get_sysval_semantic
22:40 karolherbst: but we also scan all system values
22:40 imirkin: right, but it's not being called from common code
22:40 karolherbst: nir->info.system_values_read is a 64 bit value, but as we have more system values, we wrap around the bit check
22:40 karolherbst: right
22:40 imirkin: which is why this only happens on TU102
22:40 karolherbst: it's all codegen
22:40 karolherbst: ahh, yeah
22:40 imirkin: well, nv50_ir_from_nir
22:40 karolherbst: correct
22:41 karolherbst: I think it makes sense to rework the system val handling but ... nv50 is a bit annoying here :/
22:41 karolherbst: quick and dirty fix is " for (uint8_t i = 0; i < 64; ++i) {" instead of all system values
22:42 karolherbst: anyway, I will dig deeper tomorrow and see what's the best fix here
22:42 karolherbst: getting rid of that huge sv array would also have some shader caching benefits
22:45 imirkin: sounds good
22:47 karolherbst: ohh, actually, we only save set elements.. so the benefit isn't all that big :D