00:39daniels: mareko: ask ajax and MrCooper, but I think they only really need llvmpipe and spice
00:43daniels: jenatali: thankyou!
00:44jenatali: I'd been putting it off. I really hate building LLVM
00:45daniels: me too buddy
00:59jenatali: Apparently the Vulkan runtime no longer installs unattended with /S but now the SDK includes it?
01:54zmike: tarceri: actually I assigned for you to make sure it goes in since it's blocking another MR from landing
02:15mareko: MrCooper: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33211/diffs?commit_id=70398ff5140891899927590c46d27ef8c48c6898
02:45jenatali: Ugh how do I see which LLVM module is needed but missing?
02:49airlied: usually grep
07:45MrCooper: mareko: technically I'm in a different team now (focus on mutter & Xwayland), as is ajax, so you rather need to ask airlied or José Exposito; AFAIR we do support amdgpu with acceleration on ppc64el in RHEL in principle though, so not having any CI coverage isn't great
08:45sima: dakr, good mail, thanks for doing the wrestling
08:46sima: also chatted with airlied and we're at 15+ years of dma-api maintainers randomly nacking stuff gpu drivers want/need
10:57sima: DemiMarie, on the amd/virtio discussion, all these issues you point out is why I think there's either pup(FOLL_LONGTERM) or real hw support so that the iommu/gpu handles page faults/invalidations at the hw level
10:57sima: and the mmu notifiers just pass tlb flush commands forward as needed
10:57sima: anything else indeed just falls apart everywhere at the seams
13:25zmike: mareko: is LINEAR really not supported for RGBA32F formats?
13:28zmike: cuz it seems to work...
14:56DemiMarie: sima: In this case pup(FOLL_LONGTERM) is even more attractive because device memory is just virtual memory.
14:57DemiMarie: sima: Can the forced migration to device memory be done reliably?
14:58DemiMarie: Also, time to bypass the DMA API maintainers and send something directly to Linus?
15:00phasta: You should think long-term. Are then fixes and reworks also to be sent directly to him 3 years down the road?
15:02sima: DemiMarie, I didn't really follow that part since it was about virtio specific things
15:03sima: the kernel really can't, because if you do this like hmm you again need hw support for pagefaults
15:03sima: plus hmm cannot guarantee migration to device memory
15:04DemiMarie: sima: the idea I had is to move the pages to device memory and leave them there
15:05sima: anon memory probably freaks out to no end if it's suddenly device memory without a struct page
15:05DemiMarie: If you don't have HW support for pagefaults then it's up to the host kernel to fail the operation.
15:05DemiMarie: What about device memory with a struct page?
15:05sima: you could do it as coherent device memory, then anon memory works in your device memory (unlike device private memory that hmm uses)
15:06sima: but you're again stuck on the core mm's inability to guarantee migration
15:06sima: migration is all best effort
15:06DemiMarie: stop_machine()? Only half joking.
15:06sima: not enough
15:07DemiMarie: Why can't migration be reliable?
15:07sima: linux core mm does a lot of randomly grabbing a page/folio reference, and those all block migration
15:07sima: with enough whacking it mostly works for stuff like cma or memory hotunplug with zone_moveable, but it's brittle
15:08DemiMarie: What about make_device_exclusive_range() or similar, but without the exclusive part?
15:08sima: pup(FOLL_LONGTERM) is one of the pieces to make it less brittle, so that you know whether an elevated refcount is temporary and more retrying should help
15:08sima: or a permanent pin, and more retrying is only going to heat the world
15:08sima: DemiMarie, that doesn't move anything
15:09heat:the world
15:10sima: DemiMarie, I guess you could try with coherent device memory and just migrating really, really hard
15:10sima: then you're at the same peril like cma or memory hotunplug
15:10DemiMarie: sima: could there be a way to lock out anyone who tries to grab a reference?
15:10sima: but for per critical stuff like hmm migration it's fundamentally fallible
15:11sima: DemiMarie, disable all the cool features like transparent hugepages
15:11sima: numa load balancing
15:11sima: ksm
15:11sima: writeback too iirc
15:11sima: constantly more getting added
15:11sima: defo direct i/o
15:11DemiMarie: sima: I meant "grab a mutex so they block"
15:12sima: no
15:12sima: DemiMarie, https://chaos.social/@sima/113911739075079093
15:12heat: in theory you could do that but you'd create "heating the world" on the opposite, refgrabbing direction
15:12DemiMarie: Why is that?
15:13sima: see link but tldr is the linux core mm is designed on the principle that quicksand is awesome
15:13heat: because if there was a refcount lock-out you'd spin on folio_get
15:13heat: because there isn't, you spin on page migration (or fail)
15:14heat: it's way easier to fail page migration than failing a normal-ass refcount
15:14sima: it's also that core mm is lockless to the max
15:14DemiMarie: For performance reasons?
15:14heat: yes
15:14sima: so even if you hold a reference and the lock for something, it's really surprising how little guarantees that often gives you
15:15sima: like the entire pte walking is just pure yolo, and it happens absolutely everywhere all the time
15:15DemiMarie: Why does it not crash? RCU?
15:15heat: hey it's not pure yolo it's homebred RCU
15:15sima: some of the best people in the world banging their heads at it for decades
15:16heat: gup_fast generally just disables interrupts and doesn't use RCU
15:16sima: heat, oh yeah it's a work of art
15:16heat: to free a page table you need to do a TLB shootdown thus IPI thus if your IRQs are disabled it's safe to traverse
15:16heat: it is in effect homebred RCU
15:17sima: there's also so much fun due to locking inversions
15:17sima: where you lookup a thing, grab the locks and then recheck whether you got the right one
15:17sima: and there's fundamentally no way to just take a lock to make things stable
15:17sima: and it's getting worse every year, like with lockless vma traversals and page faults
15:17DemiMarie: I wonder at what point it would actually have been faster (dev time wise) to formally prove the whole thing correct and not have to do the debugging.
15:18sima: DemiMarie, open random file in mm/ and stand back in awe at the if ladders
15:18sima: especially anything handling pagetable entries
15:18sima: but yeah formal proof probably good idea
15:19sima: but the issue is also, what do you even want to proof
15:19DemiMarie: "no memory corruption"
15:19sima: because some things look very, very fish from a "will it livelock" pov
15:19sima: not even close to enough
15:19DemiMarie: no deadlocks, no livelocks, etc
15:19sima: the livelocks are real pain
15:20sima: and often stochastic stuff
15:20sima: like the race windows align such that you win often enough to never pile up, but if you'd have consistently bad luck you'd pile up
15:20DemiMarie:wonders if past a certain point people should just be using multiple machines, rather than trying to make mm scale to huge machines
15:20sima: yes
15:20sima: cloud didn't happen just for fun
15:20heat: this is not just about making mm scale to huge machines
15:21heat: small machines are also heavily impacted
15:21heat: big locks suck
15:21sima: yeah small cros tend to really thrash mm
15:21heat: the per-vma locking patches address problems <checks notes> in android when apps create like 80 threads at startup
15:21DemiMarie: Big locks suck unless you care about reliability and security way more than performance. I suspect that is why OpenBSD is so full of them.
15:22heat: OpenBSD is full of them because it's a hobby kernel
15:22sima: yup
15:22heat: they would like to get rid of them and are slowly doing so
15:22sima: that too
15:22sima: like I think core mm is probably one place where rust wont help
15:23sima: like some of the memory barrier comments in there are just pure nightmare fodder
15:23DemiMarie: ATS might, though. That's full dependent & linear types.
15:23sima: since it's not just about your cpu code, but also about stuff like how tlb fetches actually walk pagetables on your machine
15:24heat: like, yes big locks make for simpler code, which is nice for security and reliability. but they also make you prone to suffer terrible choking on those huge locks, thus a reliability problem (and in effect, probably a security one, depending on what you're running)
15:25sima: DemiMarie, I think more formal proofing would be good, afaik only rcu in upstream linux is fully formally proved
15:26DemiMarie: sima: I was thinking of extracting core mm from F* or Coq.
15:27DemiMarie: heat: I think safety critical systems prefer to use multiple components that are individually single-threaded. They can scale by having many cores that don't share memory.
15:27sima: DemiMarie, e.g. https://lore.kernel.org/dri-devel/887df26d-b8bb-48df-af2f-21b220ef22e6@redhat.com/ last paragraph
15:27sima: device-exclusive was added, but not everywhere, boom in way too many places
15:28DemiMarie: Honestly I think userptr is rather cursed.
15:31DemiMarie: Can migration be reliable enough to make uAPI depend on it?
15:33DemiMarie: I also wonder if this could be dealt with using hypervisor magic: "hey, that page of mine is a blob object now"
15:42mareko: zmike: why wouldn't it be supported?
15:42zmike: mareko: I have an MR to fix
17:54jenatali: Ugh. Meson 1.5.1 can't use CMake to find LLVM 19
17:54jenatali: What a mess
17:55daniels: jenatali: ...
17:56jenatali: Means I need to rebuild the primary Windows container too to get a new meson apparently
17:58daniels:twitches
17:58daniels: that was a deeply unpleasant time of my life
17:58daniels: the bit where I broke up with my long-term girlfriend was probably way less damaging than Windows + Meson + LLVM + CMake + CI
17:59jenatali: Yeah... I got the build working locally with llvm19 so at least I'm pretty confident that just bumping meson should work
17:59daniels: heading out now, fingers crossed for you tho :)
18:11dj-death: daniels: and you do this for work...
18:49jenatali: Aaaand new meson doesn't install without long paths enabled
18:50jenatali: I hate dependency updates
18:57mareko: wouldn't it be nice if LLVM wasn't required by Mesa
19:03jenatali: Mhmm
19:04jenatali: LLVM as a runtime dependency is terrible
19:07kisak: mareko: hypothetically, how would you feel about delaying pulling llvm<18 support until after mesa 25.0-branchpoint and hopefully radeonsi/ACO is good to go for the newer AMD gfx generations by the time 25.1 rolls around? ~non-sequitor~ If the mesa build sees llvm 15 is around, but not usable with radeonsi/llvm, will it automatically build radeonsi/ACO or will it fail the build as requirements not met
19:07kisak: for radeonsi/llvm?
19:09kisak: jenatali: llvm being too new for meson autodetect is a chronic issue. Over in Debian land, the build system adds in the equivilent to
19:09kisak: export PATH:=/usr/lib/llvm-15/bin/:$(PATH)
19:10jenatali: Yeah but Windows doesn't do llvm-config :(
19:10kisak: well, that's dandy
19:11jenatali: Fun, LLVM 19 requires /Zc:preprocessor for MSVC to be able to compile its headers
19:11jenatali: Hopefully Mesa likes that too
19:15jenatali: Looks like yes, phew
19:20dcbaker: jenatali: we shouldn’t require long paths in meson. That sounds like a bug on our end
19:21jenatali: dcbaker: It was a test that got run during chocolatey install that was too long
19:21jenatali: I'll grab the log, one sec
19:22alyssa: mareko: llvmpipe's existence makes that kind of a nonstarter..
19:23jenatali: dcbaker: Ah pip, not choco. Log: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/70278327#L391
19:24jenatali: And I was wrong it's not meson it's numpy :(
19:24jenatali: Oh it's meson's tests running as part of numpy's install. Gross
19:25dcbaker: jenatali: of course it’s cmake… and of course it’s in numpy which has a vednored copy of meson while we get some of their stuff upstream…
19:25dcbaker: I wonder if I can ask the numpy folks to not run our tests on install
19:26jenatali: Seems like the right call
19:30dcbaker: Although that’s also an old version of numpy and numpy >=2.0 should work
19:35mareko: kisak: I can delay that. LLVM isn't required by AMD drivers and ACO is used when LLVM is disabled at build time, but it's also not a tested or optimized configuration on RDNA 1-4. It's possible that when you enable llvmpipe, it also enables LLVM for radeonsi.
19:36mareko: radeonsi+ACO likely won't be ready by 25.1
19:37jenatali: dcbaker: There's an issue with some of Mesa's scripts that prevent it from working with >= 2.0
19:38dcbaker: Sigh. I guess i only fixed piglit. I probably fix that
19:38jenatali: Oh maybe it was piglit, I don't remember. That same container gets used to build both
19:40jenatali: Yeah https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29649#note_2493559 says it was piglit
19:40jenatali: Should've checked if that constraint could be removed. Oh well
21:01jenatali: Uh... glsl compiler warnings test is failing with access violation (segfault) and I don't repro it :(
21:04DemiMarie: sima: Actually, there is another option: try to migrate the pages, and if that is not possible, either return an error to userspace or leave the pages on the CPU and try again later.
21:10jenatali: Uh... and passed on re-run. That's not good