IRC Logs of #nouveau on irc.freenode.net for 2024-09-20

01:51 tiredchiku[d]: at what point do we start having game specific workarounds
06:41 phomes_[d]: there are already some workarounds. See nvk_instance.c
06:41 phomes_[d]: also https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29892 for a few more
06:54 asdqueerfromeu[d]: phomes_[d]: Why weren't all of RADV's zero VRAM workarounds moved over to common driconf?
08:02 phomes_[d]: I did not move the section of zink-on-radv cases for a few reasons. I wanted to test them on nvk+zink first. I also think they should probably go to the shared driver=zink section instead of individual vulkan drivers. And lastly I was worrying about CSGO because it now is vulkan native and if the setting is only needed for radv+zink then it would match on all vulkan drivers even if not needed.
08:11 phomes_[d]: driconf settings are a bit of a mess. We can match drivers, engines and executable. But we also namespace settings per driver and sometimes graphics API
08:12 phomes_[d]: the entire thing could use a big cleanup but I was just trying to fix some game issues 🙂
09:57 tiredchiku[d]: drivers have had workarounds/fixes for games for ages
10:04 notthatclippy[d]: Why is this a userspace thing even? Shouldn't the kernel be handing out already zeroed vram pages?
10:09 pendingchaos: radv has a separate driconf because steamos updates radv separately from radeonsi: https://gitlab.freedesktop.org/mesa/mesa/-/commit/53ca85ac2a1acf1476c4b494f5fdfa2cc39c644c
10:09 pendingchaos: so that the radv and radeonsi workarounds can be updated without affecting the other
10:10 notthatclippy[d]: Oh, wait, dejavu. https://discord.com/channels/1033216351990456371/1034184951790305330/1262525227448139998 / https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=nouveau&date=2024-07-15
11:26 DodoGTA: pendingchaos: Maybe Mesa should have its own driconf splitter (so that other Vulkan drivers like NVK could benefit from independence)?
11:26 DodoGTA: Also is your radv_fossils repository only for personal use (or are there some trusted people that have access)?
12:43 drumandtroll: those red hat old farts , yeah they visited me on the island and burned out lots of resources to abuse and attempt to kill me and had me in tropical jail, had all the systems intruded into, which all i decoded and saw, i put them all to real jail with other actors of abuse.
13:56 pendingchaos: DodoGTA: radv_fossils is shared with some other radv/aco devs
14:32 gfxstrand[d]: phomes_[d]: What's currently holding that up? Are the RADV folks still nervous or is it just waiting on me?
14:34 gfxstrand[d]: notthatclippy[d]: Arguably, yes. Or at least it should be zeroing pages that don't come from our process. But it's not. 😫
14:40 karolherbst[d]: I thought there was an MR to move those into common places?
14:40 notthatclippy[d]: Per linked chat, it's a perf issue if done synchronously. nvidia.ko uses a Copy Engine on the GPU to do it async, but plugging that into the common kernel code is..pain.
14:40 karolherbst[d]: oh it's the one linked here
14:40 karolherbst[d]: yeah...
14:40 karolherbst[d]: sooo
14:41 karolherbst[d]: the problem in linux and specifically with nouveau is, that it's using TTM
14:41 karolherbst[d]: and nobody wants to add this to ttm afaik
14:41 karolherbst[d]: you could start getting real smart with buckets of unused VRAM and the likes, but I don't think any infra like this exists
14:42 karolherbst[d]: drivers could e.g. always async clear after freeing memory, but like...
14:43 phomes_[d]: I split the patches in way where we first introduce the shared DRI_CONF_VK_ZERO_VRAM setting, then use it for NVK for the game issues I have observed. We can drop the last two if RADV would rather not use the shared setting
14:44 notthatclippy[d]: karolherbst[d]: Does mesa have a knob for this? It would be useful for sensitive apps like browsers to make sure they don't leak the image
14:44 karolherbst[d]: well.. no
14:44 karolherbst[d]: atm drivers leak everything and applications can just fetch the old data
14:45 karolherbst[d]: maybe amdgpu has a kernel flag these days to clear pages globally?
14:46 karolherbst[d]: yeah.. I don't think it exists
14:46 karolherbst[d]: application can just ask to get zeroed memory
14:47 notthatclippy[d]: That's different from asking your memory be zeroed after you free it. Though I guess it could just manually zero it, but that requires more control over it all
14:47 karolherbst[d]: yeah, sure
14:47 triang3l[d]: notthatclippy[d]: If that was done on Mesa's side, something like a process shutdown would still leak the memory contents
14:47 karolherbst[d]: yeah, it needs to be done in the kernel, but nobody has put in the work yet afaik
14:47 notthatclippy[d]: Yeah...
15:53 gfxstrand[d]: Yeah. i915 did but those were system RAM pages. For VRAM, there's a pile of tracking that TTM would need to add for which process or DRM file most recently used a page to know whether or not it actually needs to be cleared. Then you'd have to either zero with a WC map or zero with the copy engine, each of which is potentially problematic depending on `$stuff`.
19:01 notthatclippy[d]: For whatever it's worth, the NV driver doesn't track the last owner process of a page. When freed, pages are added to a to-be-scrubbed list, where they're asynchronously zeroed by CE and then moved to the scrubbed heap. All allocations come from the scrubbed heap. Allocations block only if there's insufficient scrubbed memory and the to-be-scrubbed list has the necessary bits.
19:01 notthatclippy[d]: AFAIK this bit never had any perf issues in any of the obvious places you'd expect them; it just never shows up in real world scenarios.
19:02 notthatclippy[d]: (it had _other_ perf issues in the past, but mostly related to the process of initiating a scrub and handling completion being in a critical section that starved out other internal bookkeeping stuff, that starved even other stuff, etc)
19:03 gfxstrand[d]: That doesn't seem like a bad solution. We'd just need to implement something like it in TTM.
19:03 gfxstrand[d]: If there are perf problems, those can likely be solved in userspace by caching and sub-allocating so as not to thrash the scrub pile.
19:03 karolherbst[d]: notthatclippy[d]: could be a bit smarter and keep some small bucket of memory reserve for applications freeing it, but yeah.... if that's good enough that's good to know
19:04 karolherbst[d]: gfxstrand[d]: oh yeah.. good idea
19:04 notthatclippy[d]: Oh, mesa doesn't do that already?
19:04 karolherbst[d]: some drivers do
19:04 gfxstrand[d]: Most mesa GL drivers have a cache
19:04 notthatclippy[d]: All of the NV userspace components are pretty heavy on suballocating.
19:04 gfxstrand[d]: sub-allocating, less so, but caching is usually a thing
19:05 karolherbst[d]: caching == reusing memory instead of giving it back to the kernel
19:05 gfxstrand[d]: Vulkan driver leave it to the client to sub-allocate
19:06 karolherbst[d]: so some drivers just keep a bucket of memory allocations before actually freeing memory, so they can quickly reuse those
19:06 karolherbst[d]: and yeah.. I guess we could just tell drivers to deal with it in userspace
19:06 karolherbst[d]: (the less needs to be done in ttm, the better)
19:07 karolherbst[d]: mhhhh
19:07 karolherbst[d]: it's kinda sad we don't have a strong "there is high memory pressure, please free your stuff" things going on in linux
19:07 karolherbst[d]: this could e.g. force mesa drivers to clear those caches
19:09 asdqueerfromeu[d]: karolherbst[d]: Does Windows do that?
19:09 karolherbst[d]: not sure, but macos sure does
19:09 notthatclippy[d]: I assume there's no unified multithreading model for the mesa drivers? So events delivered to a side thread may or may not be able to do anything useful?
19:11 gfxstrand[d]: No there isn't
19:13 notthatclippy[d]: Well, that would be quite the effort to implement the thing Karol suggested then.
19:14 notthatclippy[d]: On the other hand, it would be relatively easy for us to do in the proprietary stack, but I don't think it ever came up. Not that we're a gold standard of driver quality or anything :D
19:18 gfxstrand[d]: Vulkan does have a feedback mechanism where clients can query their current budget and adjust
19:18 gfxstrand[d]: But there isn't a signal
19:20 notthatclippy[d]: If the app is supposed to handle the caching and release, then it can also implement an async polling thing, but I wouldn't rely on (m)any apps actually doing that. Unless it's part of the driver, it's not gonna be any use.
19:21 gfxstrand[d]: For the GL drivers with caches, they often use memadvise to tell the kernel it's safe to throw unused bits of their cache away in high-pressure scenarios.
19:21 karolherbst[d]: could also be a special return value from submit ioctls
19:22 karolherbst[d]: and then the driver just checks and does a thing if there is a "please clear caches" thing set
19:22 gfxstrand[d]: karolherbst[d]: Please, no
19:23 karolherbst[d]: you mean making the hot path even more expensive ain't a great idea?
19:23 notthatclippy[d]: It also doesn't work for a case where an app has some allocations but then isn't doing anything actively.
19:23 karolherbst[d]: right...
19:24 karolherbst[d]: though the new signal handler API could actually allow for some things I guess...
19:24 karolherbst[d]: but anyway, that's just a nice to have
19:24 gfxstrand[d]: I mean
19:24 gfxstrand[d]: a) Why would returning it from an ioctl be the right place to get that information? Wouldn't you want an app that isn't actively submitting to trim down?
19:24 gfxstrand[d]: b) The kernel has no way of knowing whether or not the cache is trimmed so it'll just spam that to all clients whenever things are tight at which point it may as well be a query.
19:24 gfxstrand[d]: c) Very VK_SUBOPTIMIAL vibes...
19:25 karolherbst[d]: right
19:25 karolherbst[d]: on macos it's solved on a toolkit level
19:25 karolherbst[d]: sadly we don't have a layer like that in linux really
19:25 gfxstrand[d]: There has been some discussion of trying to bolt something on to the memory priority extension.
19:26 gfxstrand[d]: But realistically, this is why the kernel evicts stuff to system RAM and/or disk.
19:26 karolherbst[d]: right
19:26 karolherbst[d]: GPUs also have enough VRAM so it doesn't really matter all that much anymore
19:27 gfxstrand[d]:laughs in Chrome tabs
19:27 karolherbst[d]: ah yeah...
19:27 karolherbst[d]: but that's like RAM :blobcatnotlikethis:
19:27 karolherbst[d]: firefox is the same tho
19:28 gfxstrand[d]: Yeah, but we have a real problem right now for Chrome on not-high-memory machines with trying to use Vulkan and/or Zink since there's currently no memadvise equivalent.
19:28 karolherbst[d]: I see
19:28 gfxstrand[d]: I've been chatting on/off with Google people about it for a while but it has yet to trickle to the top of the priority list.
19:28 karolherbst[d]: yeah, not saying I wouldn't love to see things like that happening, it's just... yeah...
19:29 gfxstrand[d]: Also, designing some for Vulkan is a lot more annoying than GL, unfortuntately.
19:29 karolherbst[d]: people have other priorities for an issue which doesn't impact developers all that much with their beefy machines
19:29 karolherbst[d]: like I have 64GiB RAM + 70GiB swap 🙃 for reasons (tm)
19:30 gfxstrand[d]: hehe. Yeah...
19:33 redsheep[d]: karolherbst[d]: I'd say it's getting less true recently than GPU vram is ahead of requirements. 8gn cards continue to get sold, and at least for games that is getting tighter and tighter all the time.
19:34 redsheep[d]: Things felt pretty comfy a few years ago but I've seen games start to show signs of memory pressure with really extreme settings on some games with 24 GB of vram, let alone 8
19:35 redsheep[d]: Usually have to put resolution at 8k to make that happen with 24gb, but it can happen
19:36 karolherbst[d]: yeah, but mesa ditching a few MiB of cached allocations here and there won't make much of a difference with heavy gaming
19:36 karolherbst[d]: sure it helps a bit
19:37 redsheep[d]: Yeah, fair
19:37 karolherbst[d]: I mean, it's a good idea to have those things in place, but it's more of an issue if you are low on VRAM anyway (like your GPU has 2GiB of VRAM) and you have 10 gui apps all caching like 50MiB of VRAM or so
20:39 zmike[d]: it's my vram
20:39 zmike[d]: you can have it over my cold, crashed driver-hands