IRC Logs of #dri-devel on irc.freenode.net for 2025-05-13

07:08 daniels: rburton: afraid not
09:27 mlankhorst: airlied: if still awake, why is evicted memory not accounted?
09:39 mlankhorst: seems at least inconsistent how TTM_PL_FLAG_MEMCG Is cleared, since amdgpu_bo_placement_from_domain can set multiple placements with MEMCG flag set
09:49 mlankhorst: I rebased anyway, but need to test on something other than eviction :)
09:49 mlankhorst: https://gitlab.freedesktop.org/mlankhorst/kernel/-/commits/ttm-memcg-nouveau
10:01 rburton: so today i discovered ninjatrace and the first package i tried it on was mesa. looks like src/intel/perf/libintel_perf.a took over a minute to link...
10:02 psykose: when i tried it there was some intel file that took 60s to build, nothing took that long to link though
10:02 psykose: maybe ld.bfd takes that long
10:03 rburton: the target is src/intel/perf/libintel_perf.a.p/meson-generated_.._intel_perf_metrics.c.o so actually maybe thats the compile
10:03 kode54: I mean, it may take that long to link if you're enabling LTO
10:04 psykose: that is the compile yeah, same one
10:04 rburton: pretty obvious outlier https://usercontent.irccloud-cdn.com/file/WlJ2JtHo/Screenshot%202025-05-13%20at%2011.04.18.png
10:36 daniels: rburton: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12544
10:53 rburton: daniels: see i knew you were useful for something
11:09 kode54: lolol component of ACO taking 52 seconds
11:09 phasta: ever since the URL change my dim setup just seems broken. I now cloned a new instance, but something still seems broken. When updating branches, it asks me for names for the new remotes. https://paste.debian.net/1374457/
11:09 kode54: but I guess that's better than just building the whole of LLVM
11:09 phasta: Do I have to delete the old remotes first?
13:08 jani: phasta: add this to your .ssh/config:
13:08 jani: Host gitlab.freedesktop.org
13:08 jani: Hostname ssh.gitlab.freedesktop.org
13:09 jani: after that, the updates should work(tm)
13:09 jani: it's a bit of a chicken and egg situation, and we didn't get a heads up
13:10 phasta: jani, thx!
13:11 phasta: There's nothing more annoying than having to repair your screw driver while you actually have to drive a lot of screws…
13:12 jani: yeah...
13:12 jani: btw see also https://gitlab.freedesktop.org/drm/maintainer-tools/-/issues/20. it's not perfect
13:14 jani: phasta: always https://www.youtube.com/watch?v=5W4NFcamRhM
13:25 phasta: changing an ssh link is scientific proof for the butterfly effect ;)
13:27 kode54: have to change the ssh link because the web link is a CDN that's behind SNI and does host sharing and therefore can't possibly do SSH forwarding
13:36 phasta: What is the "rerere" cache?
13:39 ukleinek: phasta: that's where git stores merge conflict resolutions
13:40 ukleinek: it stands for "REuse REcorded REsolution"
13:47 phasta: https://paste.debian.net/1374477/ I should have stayed in bed today
13:50 phasta: hm, one also needs to reset rerere in the linux tree. Ah well.
14:47 karolherbst: it goes without saying, but if you see people instigating or pirating/leaking specifications, please let us (CoC team) know. Can't have anybody do that on our platforms...
15:38 Lynne: reacting in this way and explicitly pointing it out; I'd assume you've just seen the jpeg2000 specifications
15:39 eric_engestrom: mesa devs: please write any (future) news-worthy note about your work that will be in 25.2.0 in this issue: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13155
15:39 eric_engestrom: (and sorry for taking so long to put something like this in place...)
15:52 alyssa: karolherbst: can I pirate the vulkan spec
15:53 alyssa: pirates 🤝 dragon
15:53 ccr: :)
15:53 alyssa: eric_engestrom: huge +1 for doing that on the issue tracker instead of in-tree, docs/relnotes.txt are the silliest kind of merge conflicts there are and this should solve that nicely
15:54 alyssa: thank you!
15:54 alyssa: (I just categorically do not use docs/relnotes or docs/features anymore largely because the conflicts are silly and we have vulkaninfo for that)
15:55 eric_engestrom: ❤️
15:57 eric_engestrom: and yeah, putting that in a text file was untenable, having a folder with a bunch of files would solve that but feels over-engineered; this is a much simpler solution and it might get noisy (we'll see how people use it), but with the structure I suggested I think it should work well
15:59 alyssa: nod
15:59 alyssa: meanwhile apparently it's "uprev CTS for unrelated reason and now i have a bunch of new test fails to fix" day
15:59 alyssa: yahoo
16:16 alyssa: second CTS bug of the day, yippee
16:27 karolherbst: Lynne: the jpeg2000 specification?
16:30 alyssa: how is 'dEQP-VK.pipeline.monolithic.input_attribute_offset.vec2.offset_*.padded.*.*' failing, that wasn't even touched
16:35 alyssa: rg3igalia: any idea?
16:35 alyssa: the padded/packed ones are failing, overlapping is not
16:35 alyssa: that whole file was last touched in 2023 so I'm.. confused
16:36 alyssa: I'm kinda suspicious of, like, UB in the CTS revealed by bumping to gcc-15 or something? IDK
16:44 alyssa: is that failing for anyone else that bumped CTS + gcc-15 ecently?
16:46 Lynne: karolherbst: renouned to be the worst written specifications in the world. if it were the script to a hit TV show, the fanbase would trigger riots
16:47 robclark: alyssa: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120089 ?
16:47 alyssa: robclark: gah, thanks.
16:48 robclark: I don't remember offhand _which_ tests were broken.. but it was related to gcc 15 so I guess the same
16:48 alyssa: yeah it's these
16:48 alyssa: can I build cts with clang?
16:49 robclark: I would expect so.. goog/android kinda likes clang
16:49 alyssa: i kinda do too but feel vaguely guilty about it
16:49 cwabbott: just set -O2
16:50 robclark: or that, yeah
16:51 alyssa: maybe there's something to this whole "run debian and let other people hit all the bugs first" thing
16:51 alyssa: :P
16:51 cwabbott: I definitely felt like the guinea pig when I hit that
16:52 cwabbott: upgraded from f40 to f42 to get vulkan 1.4 headers, and immediately everything is on fire
16:52 alyssa: yeah...
16:53 ccr: insert "this is fine" meme here?
16:55 cwabbott: also I learned from this that CMake uses -O3 by default for Release and -O2 for RelWithDebInfo
16:55 cwabbott: a great CMake moment
16:56 daniels: takes me back tbh https://usercontent.irccloud-cdn.com/file/ebtlzmxk/1747155360.JPG
16:59 zmike: hahah
17:01 ccr: it was egcs all along
17:07 mdnavare_: Thanks airlied , I was able to use the igt tools dpcd read, that uses the dev/drm_dp_aux0
19:35 airlied: mlankhorst: evicted isn't accounted as we can't agree on how that should work, so we've punted on making a decision
19:36 airlied: the problem with accounting eviction is, what do you do if the owner of the object has no space left in their cgroup?
19:55 sima: memcg aware shrinker
19:55 sima: which is terrible
19:56 airlied: sima: that is the concept, but it still doesn't really solve the wtf do we do part of the problem
19:57 airlied: like we only evict the clients that have space?
19:58 airlied: though memcg aware ttm pools is already giving me the run away vibe
20:04 sima: airlied, I think we'd evict as usual, but try to shrink in system memory to keep them within limits
20:05 sima: and if that fails, kill them
20:05 sima: blame sysadmin for setting inconsistent limits
20:05 sima: I guess could instead skip to another client, but oh dear does that feel like better running shoes are needed
20:07 airlied: I think I'm in the build what we can, and see if the solutions fall out
20:08 sima: yeah same
20:31 mlankhorst: airlied: it can force it into the cgroup and go over limit then kill
20:32 mlankhorst: airlied: I'm having troubles making the code count a non-zero GPU cgroup limit at all though for the normal case
20:37 airlied: mlankhorst: but killing a misc process due to another process just allocating VRAM seems wrong
20:38 airlied: since we don't want to double account VRAM allocation in main memory as well "just in case"
20:38 mlankhorst: memory.max on the other hand will first set the
20:38 mlankhorst: limit to prevent new charges, and then reclaim and OOM kill until the
20:38 mlankhorst: new limit is met - or the task writing to memory.max is killed.
20:39 airlied: the problem is eviction doesn't happen in the process context of the process that owns the evicted object
20:39 airlied: there is no way to return an ENOMEM to that process to let it handle things
20:39 mlankhorst: Yeah that's the fun part
20:39 airlied: so we could go over the limit, and the next time it calls malloc it will blow up
20:40 mlankhorst: If we go over max (hard limit), you die
20:40 airlied: but if the first thing it does is push the object back into VRAM then the eviction shouldn't have killed it
20:40 mlankhorst: if we go over high (soft limit), new allocations fail in the other process
20:40 airlied: since the eviction was only temporary over commit caused by another process
20:40 mlankhorst: Which it should be able to deal with, or the limit should not have been set
20:41 airlied: like you could have 4GB of VRAM allocated, it seems wrong to cause you to die because another process allocates 1GB
20:41 airlied: now maybe we can solve it via dmem limits
20:41 mlankhorst: That's entirely doable
20:41 airlied: so that you can't overcommit VRAM to cause evictions
20:41 mlankhorst: Exactly :)
20:41 airlied: but I still have trouble just saying evictions should definitely blow up the original process
20:42 airlied: hence why evictions are not being focused on yet :-)
20:42 mlankhorst: They definitely could, but that's why it's recommended to set memory.high, not memory.max
20:42 airlied: the initial series is just to get accounting, so we can know which cgroup is taking all the ram, I haven't really gotten to failing allocs
20:43 mlankhorst: Ooo memcg aware shrinker
20:43 mlankhorst: You could re-use the dmem one probably, perhaps we should just cgroup to unify it
20:43 airlied: I haven't worked out how to use dmem properly yet either :-)
20:44 airlied: I only see a toplevel dmem in /sys/fs/cgroups
20:45 mlankhorst: mkdir igt; echo +dmem > cgroup.subtree_control; cd igt; echo $$ > cgroup.procs; ./xe_evict
20:47 mlankhorst: and watch -n.1 'grep gpu memory.stat; cat dmem.current' somewhere for pretty numbers
20:55 airlied: ah nice, I should probably write some igts to play around with this a bit more
20:59 sima: airlied, imo the other processes should be sufficiently vram limited if you have system memory limits that are too tight
20:59 sima: like if you set stupid limits, you get to keep the pieces
20:59 sima: I think the only legit use-case would be if you've marked the surplus vram allocations as purgeable
21:00 sima: but in that case we'd just ditch them instead of moving into system memory
21:00 sima: like if you have 2 cgroups A and B and swapping out dmem of A into system memory would push it over memcg limits
21:01 sima: then you need to limit dmem of B to make sure that doesn't happen
21:01 sima: if you don't, it's just a silly misconfig imo
21:03 airlied: sima: yeah but Christian disagreed with that recently, but I think more in the case where we kill B even if it doesn't have a dmem limit because we can't satisfy it's VRAM allocation due to the system configuration
21:04 sima: yeah I think in that case we need to shoot A first and release all it's memory
21:04 sima: which currently we cant
21:04 sima: otoh if B needs more dmem and you didn't configure your limits to make sure that's possible
21:04 sima: again seems like silly misconfig to me?
21:05 sima: essentially "no limit" = "you get what's left"
21:06 sima: and with the dmem/memcg split you need to think a bit harder to make sure it all works out
21:08 sima: it also depends how we implement limits, but the simplest essentially means that the upper hard limit for all other groups is the minimal amount of dmem you will be left with
21:08 sima: everything above needs to have room for in memcg or it can blow up
21:09 sima: e.g. with 10gb dmem, A and B both have 6gb max you need 2gb of memcg headroom for each for swapout or it could fail in funny ways
21:10 sima: or 2gb of purgeable memory that doesn't need memcg for eviction
21:12 mlankhorst: Dying is also the extreme case, it could be some memory that is madvised could be purged for example, or filesystem caches written
21:13 sima: yeah, we should at least trigger the other memcg shrinking if we don't have a ttm memcg-aware shrinker yett
21:13 sima: no idea how that works though
21:35 mlankhorst: the flag should be the other way around then, MEMCG_FORCE for eviction
23:04 zzoon: Lynne: have you checked https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34866 ?
23:53 Pie-jacker875: hey I would like to make a bug report but I'm a bit of a noob and am not sure how to get some of the information. steamvr_room_setup crashes and I'm not sure what would be useful to include. It's a unity application, so I have the player.log file for one.