11:35cwabbott: imirkin: you're talking about the last hunk, right? I don't really remember why I put it there, but I just looked at it again and I think I understand why... one optimization I left out of the commit message is that when inserting constraint moves, it will skip inserting a move for any MERGE sources that only have one use (the MERGE itself)
11:37cwabbott: the assumption being that if we did insert a move, that move would be immediately successfully coalesced anyways so it would never need to be emitted
11:40cwabbott: but what if you had something like "uint64 a = merge(c, d); uint64 b = ...; uint128 x = merge(a, b);"? then we'd be forced to merge "a" with "x", but those are both compound and that's not implemented properly (see the TODO added in the other hunk)
12:09karolherbst: imirkin, Lyude: any thoughts? https://github.com/karolherbst/mesa/commit/300838cf0687f104e9c85546d13c1c09663e04f0 this seems to work just fine for nvc0. But wondering if there is more verification of the pushbuf API I should add wrappers for
12:09karolherbst: like setting bufctx, but I think it doesn't actually touch the pb
15:03imirkin: cwabbott: that particular if moves the MERGE/etc closer to the usage point. anyways, i'll think about that.
15:04cwabbott: imirkin: the moving isn't the main point
15:04cwabbott: the main point is the early return
15:04imirkin: right, it also doesn't add the constraint move
15:04cwabbott: the moving only happens if it's an immediate or load
15:04cwabbott: so it's actually kind of a rematerialization
15:04imirkin: yeah
15:06cwabbott: the problem is with a MERGE that's the source of another MERGE
15:06cwabbott: I think if you don't insert a mov in between it blows up
15:07cwabbott: iirc the original shader was using a MERGE of a MERGE of a MERGE to build an 8-component value
15:08cwabbott: and the RA doesn't support merging two values that are already compound (i.e. coalesced with a MERGE or SPLIT) so you can't really combine all 3 at once
15:09cwabbott: it just doesn't combine the masks correctly or something
15:10imirkin: right. thanks for the context.
15:38karolherbst: imirkin: btw, any opinions on the multithreading fix? I think it's technically correct and that I got rid of all unexpected/random kicks. I think long term we might want something completely different, but at least with this we don't need to do a bigger rework
15:38imirkin: link?
15:38imirkin: fyi i have zero gitlab notifications enabled.
15:39karolherbst: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8440
15:40imirkin: remember that nouveau_bo_map() can kick too.
15:40karolherbst: mhhh... let's see
15:41karolherbst: right
15:41karolherbst: but that's not a problem
15:41imirkin: anyways, i have work now
15:41imirkin: i'll try to look in the evening
15:41imirkin: but tbh i'd rather convince myself that cwabbott's thing is right
15:41karolherbst: sure
15:42karolherbst: mhh, nouveau_bo_wait could indeed cause issues I think... depends on where we actually call nouveau_bo_map
15:42imirkin: yes.
15:42imirkin: it's a giant clusterf***
15:43karolherbst: yep
15:43imirkin: the thing i had also worked 99% of the time
15:43imirkin: but 1% of the time it deadlocked ;)
15:43karolherbst: well, I have 0 deadlocks :)
15:43imirkin: that's the right number of deadlocks.
15:43karolherbst: yep
15:44imirkin: anyways, even if hypothetically i'm a fan of the approach, i'll want to soak test it for a while
15:44imirkin: coz this stuff is quite subtle
15:44karolherbst: ohh, I think I know why nouveau_bo_map doesn't end up kicking, because there is nothing in the pushbuffer :D
15:44karolherbst: yeah
15:44imirkin: i was a fan of my original approach too, so ... it doesn't necessarily bode well :)
15:44karolherbst: :D
15:44karolherbst: any tests you were testing with?
15:44imirkin: nouveau_bo_map doesn't *ALWAYS* kick
15:44imirkin: but it *can* kick
15:44imirkin: well, there are some multithreading tests in piglit
15:45karolherbst: yeah, but with my approach, the pushbuffer is already kicked once the "draw" or whatever releases the pushbuffer
15:45karolherbst: so there is nothing left to flush out
15:45imirkin: i also remember warsow2 at the time was a problem. they've disabled MT on nouveau, but you can force it with something
15:45imirkin: (e.g. renaming the driver to noveau)
15:45karolherbst: ahh, right, warsow2
15:45karolherbst: and chromium
15:45karolherbst: I just checked with deqp fornow
15:45karolherbst: *for now
15:46imirkin: and just running various random games
15:46imirkin: rename the driver to like noveau though
15:46imirkin: in case people have workarounds
15:46karolherbst: mhhh
15:47karolherbst: imirkin: do you know which piglit tests are affected? I bet there are none called "multithreaded" or so
15:47imirkin: actually that's exactly what they're called
15:47karolherbst: in deqp they are
15:47imirkin: might be like "multithread" or "concurrency"
15:47karolherbst: sure you mean piglit?
15:47imirkin: yes
15:48karolherbst: ohh.. glx-multithread
15:48imirkin: glx-*
15:48imirkin: yea
15:48imirkin: and friends.
15:50karolherbst: mhh "[68/68] skip: 68" :D
15:50imirkin: might be using MESA_multithread_something?
15:50imirkin: which isn't implemented
15:50karolherbst: let's see
15:50karolherbst: ohh
15:50karolherbst: I used gbm
15:51imirkin: lol
15:51imirkin: yeah that won't do it
15:53karolherbst: mhh [9/9] skip: 4, pass: 2, crash: 3
15:55karolherbst: that race in kernelspace is quite annoying
15:56karolherbst: "glx-multithread: ../src/gallium/drivers/nouveau/nvc0/nvc0_winsys.h:117: BEGIN_NVC0: Assertion `mtx_trylock(&pushbuf_data(push)->push_lock) == thrd_busy' failed." :)
15:56karolherbst: nice, all crashes are asserts
15:57karolherbst: but yeah, some tests require GLX_MESA_multithread_makecurrent
15:58karolherbst: ahh, nvc0_miptree_transfer_map
15:58karolherbst: mhh
16:05karolherbst: ohh, that one test causes some bigger issues, nice
16:09imirkin: yeah. multithreading is annoying.
16:09imirkin: GLX_MESA_multithread_makecurrent is not supported by anything afaik, so don't worry about that.
16:09imirkin: it was an early idea, i guess
16:35karolherbst: mhhh, right.. nouveau_buffer is racy as hell
16:35karolherbst: the mm_slab stuff especially
16:35karolherbst: but all the mm bits in the end
16:37karolherbst: [9/9] skip: 4, pass: 4, crash: 1 :)
16:37karolherbst: "glx-multithread-buffer: ../src/gallium/auxiliary/tgsi/tgsi_from_mesa.c:292: tgsi_get_sysval_semantic: Assertion `!"Unexpected system value to TGSI"' failed." :(
16:40imirkin: delightful, isn't it
16:41karolherbst: mhhh.. it's actually my fault for doing silly things :D
16:41imirkin: was the silly thing "working on nouveau"?
16:43karolherbst: well.. I shift 1 by values bigger than 63
16:44imirkin: you're living too far in the future
16:45karolherbst: yep
16:45imirkin: ILP128
16:45imirkin: it's coming!
16:46karolherbst: well.. compiler already support it sometimes :D
16:55imirkin: on 64-bit platforms
17:59karolherbst: imirkin: actually.. I think we might want to get rid of info->sv because we really only care about certain "types" and having bools feels more natural here
18:00karolherbst: or am I missing something?
18:00imirkin: i don't remember what info->sv is.
18:00imirkin: i can look tonight
18:00karolherbst: nv50_ir_prog_info.sv for system values
18:00karolherbst: nv50 does some weird stuff though
18:01imirkin: we need the locations
18:01imirkin: of the sysvals
18:01imirkin: which vary by platform
18:01karolherbst: sure
18:01karolherbst: but that's handled inside the driver
18:01imirkin: sorta?
18:01imirkin: there's a callback
18:01karolherbst: for system values?
18:01imirkin: maybe not
18:01imirkin: maybe only for inputs/outputs
18:01karolherbst: it's just all constant data
18:01imirkin: i forget :)
18:01karolherbst: yeah
18:02karolherbst: imirkin: for nvc0: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/gallium/drivers/nouveau/nvc0/nvc0_program.c#L243-265
18:02karolherbst: checking 4 bools feels kind of more efficient than a loop
18:03karolherbst: although that's probably optimized away
18:03karolherbst: mhh
18:03karolherbst: maybe not
18:03karolherbst: well. should be
18:03karolherbst: anyway
18:03karolherbst: will see if there is something tricky I didn't catch
22:06karolherbst: imirkin: I think you want to remove the turing chipset until I fix all the issues. Will enable it then
22:37imirkin: karolherbst: yeah - looks broken :)
22:37imirkin: was just about to ask you
22:37karolherbst: i hit the exact same issue when running the mt tests
22:37imirkin: but it's unrelated to mt
22:37karolherbst: correct
22:37imirkin: it's the sysval thing?
22:37karolherbst: yes
22:37imirkin: but you made it sound like it was fixable in nouveau
22:37imirkin: whereas this is in tgsi_from_mesa
22:37karolherbst: anyway, thanks for wiring the shim up, now we can shout at people breaking our compiler :p
22:38imirkin: it was all at anholt's urging
22:38karolherbst: imirkin: it's complicated... it can be fixed in nouveau, but it's also quite broken in nir actually
22:38karolherbst: turns out, we have more than 64 sys vals now
22:38imirkin: ok cool
22:38karolherbst: and I was thinking instead of keeping that huge array in our compiler, we only set the fields we are actually interested in
22:39imirkin: so why does turing hit this?
22:39imirkin: and not others?
22:39karolherbst: volta+ uses nir by default
22:39imirkin: so is all nir broken then?
22:39karolherbst: yes
22:39imirkin: why aren't other (nir) drivers hitting this?
22:39imirkin: oh
22:39karolherbst: because they don't translate the MESA to TGSI enums :)
22:39imirkin: you call some function in tgsi_from_mesa.c directly?
22:40karolherbst: the assert is inside tgsi_get_sysval_semantic
22:40karolherbst: but we also scan all system values
22:40imirkin: right, but it's not being called from common code
22:40karolherbst: nir->info.system_values_read is a 64 bit value, but as we have more system values, we wrap around the bit check
22:40karolherbst: right
22:40imirkin: which is why this only happens on TU102
22:40karolherbst: it's all codegen
22:40karolherbst: ahh, yeah
22:40imirkin: well, nv50_ir_from_nir
22:40karolherbst: correct
22:41karolherbst: I think it makes sense to rework the system val handling but ... nv50 is a bit annoying here :/
22:41karolherbst: quick and dirty fix is " for (uint8_t i = 0; i < 64; ++i) {" instead of all system values
22:42karolherbst: anyway, I will dig deeper tomorrow and see what's the best fix here
22:42karolherbst: getting rid of that huge sv array would also have some shader caching benefits
22:45imirkin: sounds good
22:47karolherbst: ohh, actually, we only save set elements.. so the benefit isn't all that big :D