02:15mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1476039793845600448/image.png?ex=699facae&is=699e5b2e&hm=837480a1b52766bf2babe84b5cf62412aacd1ee69781b03b901668c3cb69934b&
02:15mangodev[d]: should this list of args be in a specific order for nouveau things to work correctly?
02:16airlied: no
02:16airlied: gsp rm should be the default anyways
02:24mangodev[d]: oaky
02:24mangodev[d]: so nothing should be conflicting with `nouveau.atomic=1`?
06:15asdqueerfromeu[d]: mangodev[d]: I think the GSP option should be `nouveau.config=NvGspRm=1` instead
06:15mangodev[d]: asdqueerfromeu[d]: should that also be for atomic?
06:16asdqueerfromeu[d]: mangodev[d]: No
06:16mangodev[d]: ah
06:16mangodev[d]: i'm mainly asking because my atomic modeset doesn't seem to work
06:18asdqueerfromeu[d]: Also `nvme_load=yes` might be redundant (because my (NVMe) drive is getting detected even without it)
07:15mangodev[d]: asdqueerfromeu[d]: good to know
07:15mangodev[d]: just came with my default distro grub config, so i just left it be
07:15mangodev[d]: -# i also don't have any NVME drives so it doesn't really matter anyway, all of my drives are SATA SSDs
11:51karolherbst[d]: okay.. we have to optimize our ssbo bound checks a bit 😄
11:54karolherbst[d]: so I'm seeing +50% perf with bound checks disabled around SSBOs and it's a lot more than I anticipated that would eve rmatter
12:06karolherbst[d]: phomes_[d]: do you want to check a few games with a patch that is _likely_ to break some? But I'd still like to know how big the impact is across the board.
12:06phomes_[d]: sure
12:07karolherbst[d]: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/ce453e70df4c7d25bf4e8d22d5fe232666d26b68
12:07karolherbst[d]: it just disables all bound checks round ssbos, so some games might just run into invalid memory accesses
12:08karolherbst[d]: but... our implementations is also really bad, so games that do only do sound accesses might actually see big improvements
12:25phomes_[d]: atomic heart goes from 60->64. Beyond a steel sky 57->58
12:25phomes_[d]: lego builds journey crashes
12:25karolherbst[d]: heh
12:26phomes_[d]: the rest performs the same
12:31karolherbst[d]: okay, thanks for testing!
14:17marysaka[d]: Is anyone working on ZBC at the moment (or planning to in the near future)? I might start looking into that next week if no one is on it
15:08Mary: x512[m]: Pushed the branch
15:10x512[m]: This: https://gitlab.freedesktop.org/marysaka/mesa/-/commits/nvk-openrm-580 ?
15:11Mary: yes
15:11x512[m]: Thanks.
16:52mhenning[d]: karolherbst[d]: oh yeah, that's expected. we wait on all variable latency instructions at the end of a basic block. I've been meaning to extend snowy's cross-block pass in calc_instr_deps to maxwell+ but I haven't gotten around to it
16:54karolherbst[d]: mhenning[d]: yeah.. though I think what also hurt is the entire control flow + bssy+ bysnc around it.. I have a shader that went from 12k to 8k instructions when I disabled the checks
16:55karolherbst[d]: but solving that kinda requires proper prediction.. _however_... loads have an input predicate to force the result to be 0 starting with Turing
16:55karolherbst[d]: so I was wondering if I want to hook that up...
16:55mhenning[d]: Do all loads have that? I thought you said it was just shared
16:56karolherbst[d]: shared only have the stride on the non uniform address, and ULDC with a global VA has also an input predicate, but not the other ULDC variants
16:57karolherbst[d]: but yeah.. `LDG` seems to have it as well
16:57karolherbst[d]: it's uhm.. well hidden, so I might not have noticed earlier
16:57karolherbst[d]: but yeah.. LDL and LDS doesn't have it
16:57karolherbst[d]: *don't
16:58karolherbst[d]: but not really sure what you refer to precisely here, because I don't think I said that shared loads to have that predicate..
16:58karolherbst[d]: anyway.. LDG seems to have it and that all that matters for this
16:59karolherbst[d]: ohh.. not even LD has it
16:59karolherbst[d]: or uhm.. maybe it does..
16:59karolherbst[d]: yeah.. so LD, LDG and ULDC with a global VA
17:00mhenning[d]: Anyway, that could work. For the ones that don't have it in hardware we could even lower to 2 instructions really late (post-RA) without much difficulty
17:00karolherbst[d]: well.. we don't need it for shared or local, do we?
17:01mhenning[d]: I guess yeah
17:01karolherbst[d]: but for stores we'll need to figure out proper instruction predication, because I'd rather not hack it up for stores even just as a temporary workaround
17:06karolherbst[d]: kinda impressive that our load_global_nv will have 4 sources...
17:06karolherbst[d]: ehh 3
17:16karolherbst[d]: ohhh.. I found something interesting
17:17karolherbst[d]: mhenning[d]: atm, we spill a UPred to a full UGPR, right?
17:17mhenning[d]: I think so yeah
17:17karolherbst[d]: yeah.. so I found something really nice
17:17karolherbst[d]: UP2UR and UR2UP
17:17karolherbst[d]: which are bit wise writes
17:18mhenning[d]: yeah, there's also P2R and R2P
17:18mhenning[d]: at least the way those work is that they spill all registers into a byte of a gpr
17:18karolherbst[d]: ohh yeah, there is
17:19karolherbst[d]: right.. P2R seems to work on bytes
17:19mhenning[d]: which makes them hard to integrate with the current way we do spilling, which assumes you spill one value at a time
17:19karolherbst[d]: ohh wait.. I got confused, the bit selection is on the input
17:19karolherbst[d]: UP2UR also writes full bytes
17:19karolherbst[d]: but you can mask it seems?
17:20mhenning[d]: even then you can only spill register 3 to a bit in position 3 mod 8
17:21mhenning[d]: anyway, that's some thing I'm aware of that doesn't feel too high priority
17:22karolherbst[d]: yeah.. not sure it would help much, but it could a little
17:25karolherbst[d]: though I think this can be packed.. mhh
17:26karolherbst[d]: at least with UR2UP you can select the byte _and_ apply a mask
17:27karolherbst[d]: could be useful for a more optimal u2b? Depending on things
17:43karolherbst[d]: ATOMG is such a disaster now... https://gitlab.freedesktop.org/mesa/mesa/-/blob/f7580d6c1f372265812fd505ae0b74810ed104e5/src/nouveau/compiler/nak/sm70_encode.rs#L3448-3478
17:44karolherbst[d]: I wonder if I want to rethink my "always pick the UGPR form" approach, because it's totally not working for ATOMG
17:44karolherbst[d]: but also "immediate offset is 23 bits but only in UGPR form" is also a disaster...
17:44karolherbst[d]: ohh I haven't even fixed that part of the code yet...
17:45karolherbst[d]: anyway, this just gets back to the issue that NAK might just randomly spill UGPRs to GPRs and it's kinda a mess
18:57airlied[d]: karolherbst[d]: so for coopmat loads we in theory should ignore bounds checking if the feature but isn't set
18:57karolherbst[d]: right...
18:58airlied[d]: But I don't think vtn and nir are ready for it
18:58karolherbst[d]: I was considering that it might also be okay to only check for the entire matrix
18:58karolherbst[d]: instead of each individual access
18:58airlied[d]: We would need to introduce another state so the Sabo loads we generate are unbound while normal ones stay bound
18:58karolherbst[d]: right....
18:59karolherbst[d]: I have some ideas on how to do that, will probably try to prototype something next week then
19:00airlied[d]: I expect for cmat perf it makes a big difference
19:00karolherbst[d]: yeah.. I saw +50% by just disabling the checks
19:02karolherbst[d]: but there are a handful of things that can be done here to improve the situation, just need to come up with a good plan and such