00:15 mhenning[d]: phomes_[d]: Want to test https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33613 when you get a chance? I'm thinking we should just land it if we don't have evidence of a perf issue with it.
00:41 phomes_[d]: Things look pretty normal except for The Surge 2 which drops by 2 fps
00:49 airlied[d]: clearing origin and blocksizes didn't seem to help
01:44 karolherbst[d]: mhhhh
01:45 karolherbst[d]: phomes_[d]: I wonder if that disables load/store vectorization, but... does the negative impact go away with my MRs?
01:49 karolherbst[d]: anyway.. digging into the scalar TEX insanity for sm50 again and it's like.. pain
01:51 airlied[d]: though writing a small test to do just that operation doesn't get a fault either
02:40 airlied[d]: this might be some state wierdness alright, since my simple test, even with guard pages passes fine
02:46 airlied[d]: maybe I'm wrong and this is a texture sampling fault
03:33 mhenning[d]: phomes_[d]: Okay, thanks. The spreadsheet says The Surge 2 is a vulkan native game? It's not supposed to be affected.
03:36 airlied[d]: okay got a crucible test to reproduce it with a texture from linear
04:14 airlied[d]: and prop gives back the same memory size
04:48 airlied[d]: I feel I ask this and forget, if I have a crucible test on nvidia, how are we dumping command streams these days?
04:56 mhenning[d]: airlied[d]: envyhooks https://gitlab.freedesktop.org/nouveau/envyhooks
04:56 airlied[d]: ah I guessed right, now to try and dance the kernel + nvidia driver versions
05:14 airlied[d]: do we have any way to dump descriptor sets like texture header
06:08 gfxstrand[d]: no
06:11 airlied[d]: I've filed https://gitlab.freedesktop.org/mesa/mesa/-/work_items/15277 for now
06:11 airlied[d]: going to hack fedora 44 to just overallocate height
06:11 airlied[d]: because it's blocking the release
06:11 airlied[d]: but it feels like SPH or context register magic
07:13 phomes_[d]: mhenning[d]: I double checked. It is affected.
07:13 phomes_[d]: | branch | min ssbo align = 16 | min ssbo align = 4 |
07:13 phomes_[d]: | :--- | :---: | :---: |
07:13 phomes_[d]: | old baseline | 59 | 57 |
07:13 phomes_[d]: | current main | 60 | 57 |
07:13 phomes_[d]: | main + karol misc MRs | 65 | 64 |
07:13 phomes_[d]: | branch | min ssbo align = 16 | min ssbo align = 4 |
07:13 phomes_[d]: | :--- | :---: | :---: |
07:13 phomes_[d]: | old baseline | 59 | 57 |
07:13 phomes_[d]: | current main | 60 | 57 |
07:13 phomes_[d]: | main + karol misc MRs | 65 | 64 |
08:40 marysaka[d]: airlied[d]: That remind me I really need to make envyhooks more like a library that you could type stuffs to dump memory ect... problem is that depending of the allocation we might never have a CPU mapping (or even something mapeable) so I was considering adding support for creating a context and sending some commands around to copy things ect....
09:22 karolherbst[d]: mhenning[d]: maybe it's native but uses vkd3d internally? Who sets the engine name? The game?
09:25 chikuwad[d]: the game does afaik
09:36 karolherbst[d]: phomes_[d]: maybe we want to know what engine name the games set and add that info to the sheet?
09:37 karolherbst[d]: for linux native ones
09:38 phomes_[d]: yes. I used mangohud to get the value. I will drop in here and print the engine name and put that in the sheet: https://github.com/flightlessmango/MangoHud/blob/b7730c978cacbb44d2ac7cc537dd5d91560e4365/src/vulkan.cpp#L1976
09:52 chikuwad[d]: https://www.pcgamingwiki.com/ also has a lot of info on games
09:53 chikuwad[d]: including but not limited to engine names, graphics APIs, support for EAX/surround audio, community patches/fixes, etc
14:55 phomes_[d]: the sector promotion off MR has a nice performance boost for most games
14:55 karolherbst[d]: sector promotion off?
14:56 phomes_[d]: 40939
14:56 karolherbst[d]: ohh https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40939
14:56 karolherbst[d]: heh...
14:56 karolherbst[d]: seems like promotion wastes bandwidth then...
15:05 mohamexiety[d]: out of all things
15:05 mohamexiety[d]: 🐸
15:06 karolherbst[d]: yeah though I suspect it thrashed the L2 cache constantly?
15:06 karolherbst[d]: I wouldn't be surprised if nvidia makes it depend on the texture size or something
15:07 karolherbst[d]: maybe it helps with compression not doing that
15:07 karolherbst[d]: though it seems to hurt dirt rally
15:16 phomes_[d]: I can loop dirt rally a few time on baseline and that MR to see if it was just noise
15:33 gfxstrand[d]: phomes_[d]: Seriously? 😩
15:34 karolherbst[d]: gfxstrand[d]: have you verified the encoding of `TEX` in 272e8ec461a637ed41b1bb5a23f792d07a2eca01? Codegen uses `0xc038` and that's also what nvdisasm is happy about
15:34 karolherbst[d]: doesn't seem to like `0x0380`
15:34 gfxstrand[d]: I would expect prefetching to help, generally. 🤷🏻‍♀️
15:34 karolherbst[d]: gfxstrand[d]: only if the entire texture is used, no?
15:34 karolherbst[d]: or well..
15:34 karolherbst[d]: big parts of it
15:35 gfxstrand[d]: If you’re interpolating, I would expect fetching 2x2 blocks would be fine.
15:35 karolherbst[d]: otherwise you just move things out of L2 that are actually used to replace it with things you don't use
15:35 karolherbst[d]: mhhhh
15:35 karolherbst[d]: yeah...
15:35 gfxstrand[d]: But we don’t really know what’s being promoted to what
15:35 karolherbst[d]: but games are complex (tm) these days, who knows what they are up to
15:35 karolherbst[d]: right....
15:36 karolherbst[d]: maybe it only helps with small textures? Did we ever check what nvidia is up to there?
15:41 gfxstrand[d]: karolherbst[d]: 🤷🏻‍♀️ Pretty sure I got it from somewhere but it may be a typo.
15:42 karolherbst[d]: okay.. I'll CTS the TEXS changes for sm50 anyway.. but yeah it looks like a typo
15:42 karolherbst[d]: there are also a few issues: https://gitlab.freedesktop.org/mesa/mesa/-/work_items/10842
15:42 karolherbst[d]: but I think that's before that change?
15:43 karolherbst[d]: anyway, I'll verify
15:43 gfxstrand[d]: I’m a little scared that CTS didn’t catch that. Those paths should be pretty easy for CTS to hit
15:43 karolherbst[d]: well.. maybe we never use bound textures for some reasons 🙃
15:44 gfxstrand[d]: Oh, we’re using them
15:44 karolherbst[d]: on maxwell?
15:44 mhenning[d]: karolherbst[d]: That bug is ancient. It's for maxwell back when maxwell wasn't supported.
15:44 gfxstrand[d]: No, not pre-Volta
15:44 karolherbst[d]: yeah.. so we don't hit that path
15:44 karolherbst[d]: but also we don't use it pre volta? 🙃 for TEXS that's kinda required sadly
15:45 gfxstrand[d]: That’s on the list of reasons perf kinda sucks pre-Volta.
15:45 karolherbst[d]: okay..
15:45 karolherbst[d]: I wonder if it wasn't enabled due to this encoding issue...
15:45 karolherbst[d]: I can look into it...
15:46 karolherbst[d]: given there is some interested in using NVK+Zink pre volta, somebody would have to look into it sooner or later anyway..
15:47 gfxstrand[d]: It’s not enabled because pre-Volta, you can only fetch texture handles from one chuf so we have to gather them somehow. On Volta+, as long as it’s early enough in the descriptor set to be on the cbuf range, you get the cbuf path.
15:47 karolherbst[d]: ahhh right...
15:47 karolherbst[d]: that sounds a lot of work to get TEXS going.. so maybe I just say "yeah.. that's for later" and call it a day
15:48 karolherbst[d]: which I guess means https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40900 is ready for review. I can move the bug fix out to its own MR...
15:48 gfxstrand[d]: We can do texs on Volta+
15:48 karolherbst[d]: oh yeah, sure
15:49 karolherbst[d]: but I already started it for sm50 and it's a disaster as expected: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/d3979c1aad8e1e9202b8b5437cdfe37ad9fc1f32
15:49 gfxstrand[d]: 😅
15:49 karolherbst[d]: yeah...
15:50 karolherbst[d]: they merge a couple of flags into a single 4 bit mask (target, lod, depth compare...)
15:50 karolherbst[d]: and the write mask is a 3 bit flag 🙂
15:50 karolherbst[d]: who would have known that using another 8 bit for a dest reg is gonna cause this
15:51 karolherbst[d]: ahh and offsets are also encoded in the 4 bits
15:51 karolherbst[d]: it's great
17:14 karolherbst[d]: mhh hopper is actually SM9.0, no?
17:15 mohamexiety[d]: yes
17:16 karolherbst[d]: mhh when was the UGPR limit increased? Atm a lot of code uses sm10.0 for that
17:19 karolherbst[d]: ahh no, that seems to be SM10.0 indeed
18:19 _lyude[d]: airlied[d]: BTW - taking a look atm to see if I can better understand the possible flashing screen fix I came up with. Was worried it might have been a dead-end but I actually /think/ I understand what could be racing here if I'm understanding how all of this works correctly.
18:19 _lyude[d]: So - we start with a gem object with a vmm, it has a fence on vma->fence. We enter `nouveau_gem_object_close` while it has a fence, which causes us to call `nouveau_gem_object_unmap()`. `vmm->refs` is lowered to 0 at this point, but it's still present within the list until the work queued by `nouveau_cli_work_queue` gets executed. That work calls `nouveau_gem_object_delete_work` which then finally
18:19 _lyude[d]: decrements it (again! but it seems to check for <= 0, which is a bit strange) and removes it from the list. But in the time between the work being queued and the work actually executing, if I'm not mistaken it should be entirely possible for another `nouveau_vma_new()` call to pick up the `vmm` scheduled for destruction. Since `nouveau_gem_object_delete_work` doesn't grab any locks, this would
18:19 _lyude[d]: likely race
18:20 _lyude[d]: I think in this case too we do really just need to enforce the dma_resv locking for anything touching the refcount or trying to search through `nvbo->vma_list`
18:26 Lyude: so at least I know that's racing, but now I guess I need to figure out how this trickles back down to framebuffer creation failing
19:58 airlied[d]: _lyude[d]: that sounds like we need one of those get unless zero type of refcounts
19:59 _lyude[d]: That as well, though I don't think the count going negative is the break since the code does actually check for that
19:59 airlied[d]: no but when ref is 0 we shouldn't create a new ref
19:59 airlied[d]: or when ref is 0 we shouldn't proceed to add a new vma
20:00 chikuwad[d]: MSMR running at ~80fps on NVK, high preset, 1080p native
20:00 chikuwad[d]: love to see it
20:00 _lyude[d]: Yeah - we do need to make sure that access is synchronized though regardless
20:01 _lyude[d]: I will finish up the patches I've got so far and add some extra spots that seem to be missing the right locks
22:13 karolherbst[d]: that shader with all the loads with my UGPR + GPR and range analysis + more nir_opt_offset changes:
22:13 karolherbst[d]: Instruction count: 846 -> 707
22:13 karolherbst[d]: Static cycle count: 3398 -> 2922
22:13 karolherbst[d]: Max warps/SM: 28
22:13 karolherbst[d]: Spills to mem: 0
22:13 karolherbst[d]: Spills to reg: 0
22:13 karolherbst[d]: Fills from mem: 0
22:13 karolherbst[d]: Fills from reg: 0
22:13 karolherbst[d]: Num GPRs: 72
22:13 karolherbst[d]: SLM size: 0
22:14 karolherbst[d]: this one: https://gist.githubusercontent.com/karolherbst/e4a00bd3ae19523c3775e6709cce0f87/raw/bf43d792c56bdd8fbd2af542c87a57c94278f5d8/gistfile1.txt
22:15 karolherbst[d]: but I think it would be better to start landing the UGPR stuff after the release got branched out...
22:15 karolherbst[d]: I'd rather have more testing on that one..
22:36 airlied[d]: phomes_[d]: how big an fps diff was that MR?
22:38 phomes_[d]: `Atomic heart 1
22:38 phomes_[d]: Deep Rock Galactic 2
22:38 phomes_[d]: Lego Builders Journey 0
22:38 phomes_[d]: DiRT Rally 2.0 0
22:38 phomes_[d]: Recipe for disaster 6
22:38 phomes_[d]: Subnautica 3
22:38 phomes_[d]: Urban Trial Playground 2
22:38 phomes_[d]: X-com 2 1
22:38 phomes_[d]: Beyond a steel sky 3
22:38 phomes_[d]: Serious Sam 2017 13
22:38 phomes_[d]: The Surge 2 0
22:38 phomes_[d]: X4 Foundations 0`
22:39 phomes_[d]: increase in fps
23:15 airlied[d]: nice
23:37 karolherbst[d]: phomes_[d]: so what did make the surge not regress anymore? 🙃
23:37 karolherbst[d]: a rebase?
23:38 karolherbst[d]: ehh wait
23:38 karolherbst[d]: that was the other MR...
23:38 phomes_[d]: Yes that was a different one
23:40 karolherbst[d]: accidental perf improvement
23:40 karolherbst[d]: wanna test something experimental?
23:40 karolherbst[d]: 😄
23:41 phomes_[d]: Yes
23:41 karolherbst[d]: https://gitlab.freedesktop.org/karolherbst/mesa/-/tree/nak/opt/ugpr_range
23:42 dwfreed: karolherbst: asrivats matched this ban: *!*@38.0.0.0/8
23:42 karolherbst[d]: I'm sure that will help a couple of things
23:42 dwfreed: /8 seems a little wide
23:42 karolherbst: uhh.. right
23:42 karolherbst: I wanted to remove that one
23:42 karolherbst: done
23:44 karolherbst[d]: _but_ I havne't properly tested that branch after the rebasing.. but the CTS _seems_ happy enough now