IRC Logs of #dri-devel on irc.freenode.net for 2025-04-10

08:24 jfalempe: What is the recommanded way to fix dim with the new gitlab url? I can fetch from git@ssh.gitlab.freedesktop.org:drm/kernel.git but not from "git@gitlab.freedesktop.org:drm/kernel.git" or "ssh://git@gitlab.freedesktop.org/drm/kernel.git"
08:24 jfalempe: and dim setup always add the later that doesn't work.
08:26 mlankhorst: jfalempe: ssh.gitlab.freedesktop.rog should work
08:27 mlankhorst: Host gitlab.freedesktop.org
08:27 mlankhorst: Hostname=ssh.gitlab.freedesktop.org
08:27 mlankhorst: in ~/.ssh/config
08:27 jfalempe: mlankhorst: thanks, I will try that.
08:28 jani: yes, that for the first step, and after that drm-rerere update will switch to ssh.gitlab...
08:28 jani: there was really no way to automate this after the fact
08:38 jfalempe: Thanks, that worked :)
09:36 sima: mlankhorst, tzimmermann the backmerge in drm-misc-next lost the depends BROKEN from linus for DRM_HEADER_TEST due to 8e623137f112eb86ad949e3bcb6c0e5ae11a092a moving it
09:37 sima: I guess this needs a fixup or even more headlines about turds
09:41 tzimmermann: sima ok
09:42 sima: tzimmermann, thanks for taking care, and feel free to add my upfront a-b tag so that patch isn't stuck
09:52 tzimmermann: sima, patch is on dri-devel. if i hear nothing, i'll merge it this afternoon
09:53 sima: tzimmermann, lgtm
10:07 mlankhorst: Not the turds!
11:33 K900: zmike FYI some people on the NixOS side made enough noise about me removing XA from the package that we had to put it back
11:33 zmike: wat
11:33 K900: I personally would love to see that thing die
11:34 K900: But there's like
11:34 K900: A guy
11:34 K900: Who's xkcd_workflow.jpg for xf86-video-vmware
11:34 K900: Thought you might want to be aware
11:34 K900: https://github.com/NixOS/nixpkgs/pull/392492
11:35 K900: On the other hand, if it gets killed upstream, I will be able to just tell that guy to go pound sand
11:35 K900: Which I am also fine with
11:35 zmike: there's nothing stopping the one guy from staying with mesa 25.1
11:36 K900: Yes, and there's also nothing stopping the one guy from complaining when his weird Mesa 25.1 setup breaks two years later
11:36 K900: But I guess we'll cross that bridge when we come to it
11:36 K900: Also I'll probably re-kill it in 25.1
11:37 K900: I really just don't care anymore
11:38 zmike: imo stay in sync with upstream
11:39 K900: Well we don't ship nine
11:39 K900: Or omx
11:39 K900: Or clover
11:39 K900: And no one is complaining about those
11:39 zmike: good to know
12:10 mairacanal: sima, if you have some time, could you ack https://lore.kernel.org/dri-devel/8e277f667296ca5d95bed270ee981aee0e5c3086.camel@igalia.com/T/? it's the resubmission of the patch I mentioned yesterday
13:42 sima: mairacanal, ack and apologies
13:47 bbrezillon: sima: when you find some time, I'd like to have your opinion on https://patchwork.kernel.org/project/dri-devel/list/?series=949896 (non-blocking allocation in the GPU fault handler path)
13:55 sima: bbrezillon, I guess my question is what userspace does if a job fails to allocate
13:55 sima: it avoids issues in the kernel, but if that means userspace can't really use it, we haven't gained much
13:58 bbrezillon: sima: it does what it does today when a job faults
13:59 bbrezillon: on panfrost (old GPUs), it just ignores the fault and continues as if the GPU had done what was expected
14:00 bbrezillon: on panthor, the execution context is flagged as faulty and must be destroyed/recreated
14:00 bbrezillon: we have code handling that in mesa/gallium
14:00 bbrezillon: and panvk just reports it as a DEVICE_LOST
14:00 sima: bbrezillon, yeah if userspace is ok with that then I think it's all ok
14:01 sima: it's probably not great though, since it just randomly makes the gpu unreliable under memory pressure
14:01 sima: but if you say it's the same as know then eh ...
14:01 sima: standardizing this more definitely sounds like a good idea
14:01 sima: did christian könig comment on any earlier versions?
14:01 bbrezillon: it's the same as now, without the potential deadlock, I guess
14:02 sima: ah then I guess it's better for sure
14:02 bbrezillon: I didn't Cc Christian on that patchset :-/
14:02 sima: yeah it's maybe also more for gfxstrand
14:02 sima: was also more thinking about maybe adding some howto overview sections
14:02 sima: but that's all orthogonal I think
14:03 bbrezillon: yeah, I can definitely add more docs
14:03 sima: oh alyssa also knows how to suffer through this with less DEVICE_LOST
14:03 alyssa: [citation needed]
14:04 sima: alyssa, apple streamout memory allocation fun
14:04 sima: or well, just preallocating it all
14:05 bbrezillon: sima: it's mostly infra patches in this patchset. Right now panfrost/panthor don't try hard when a allocation failure happens (NO_RETRY|NO_RECLAIM), but I guess this can be extended
14:05 alyssa: yeah asahi just preallocates a giant buffer and hopes you never run out
14:05 alyssa: i do not recommend it but i don't have a better option
14:05 sima: strictly speaking you can't do more than NO_RECLAIM in fence critical sections
14:05 sima: tzimmermann, reminds me, I owe you a long explainer on the atomic commit fence critical section pain
14:05 bbrezillon: yeah, that's what I thought
14:06 bbrezillon: fortunately, on CSF HW we have a way out
14:06 sima: the slightly better one is to preallocate in the kernel when we run such a context, so that that emergency reserve could be shared between context
14:06 sima: but it means you need to in-order schedule these
14:06 bbrezillon: (exception handler called on OOM, to flush the primitives)
14:07 sima: yeah, but that's only for tiler cache, not for emulating gs stuff
14:07 alyssa: bbrezillon: partial renders don't solve the general problem you get with geom/tess/xfb
14:07 bbrezillon: nah, emulating GS is another problem indeed
14:07 sima: or whatever the exact painful combo was, I'm really good at forgetting that nightmare fuel again :-D
14:07 alyssa: sima: any of geom/tess/xfb/mesh emulation hits this on asahi
14:07 sima: bbrezillon, but yeah tiler flush gets you out of this
14:08 alyssa: probably we could do better in a few simple cases but yeah
14:08 sima: tzimmermann, I guess ping me when I should start typing, or whether you prefer some mail somewhere
14:11 sima: bbrezillon, on the docs, that's really more a "would be nice, in some separate patch set" thing, just trying to distill all the various discussions into some docs and linking to infrastructure like your sparse shmem bo
14:12 sima: and defo cc: alyssa, gfxstrand and christian könig on that one as the people who have real understanding
14:13 alyssa: me? understanding? kernel code?
14:13 alyssa: i understand just enough to know i'm screwed and no more (:
14:14 sima: alyssa, oh also userspace uapi aspects
14:14 sima: plus someone gets to upstream the apple gem driver :-P
14:16 alyssa: kernels are magical things that come from dnf install
14:16 alyssa: what is upstream? what is driver?
14:18 sima: well, it's all just there to run userspace, so who cares what is kernel and what is hw :-P
14:26 bbrezillon: sima: I mean, if we can have the docs with the changes, that's probably better :-)
14:27 bbrezillon: and I don't mind writing extensive docs for something I worked on, because at least I understand it (or I'm supposed to understand it)
14:28 bbrezillon: and just wanted to know if the overall approach is somewhat sound, or if we're going to hit a wall at some point
14:30 sima: well it's the same design wall as always, but just from scrolling around in the series it looks reasonable
14:44 alyssa: i think i might be able to do better in a few special cases with geom/tess/xfb
14:44 alyssa: but for the general thing.. yeah, we kinda screwed here..
14:55 alyssa: sima: so here's what I don't get exactly
14:55 alyssa: why is "GPU gets unreliable under memory pressure" a.. bad thing, exactly?
14:55 alyssa: like. under memory pressure, you're going to get device loss in userspace anyway due to allocating command streams and stuff
14:55 bbrezillon: that ^
14:56 bbrezillon: I don't get it either tbh :-)
14:56 alyssa: so whether we get device loss right now or loss in 10ms from now, seems, sort of irrelevant at that point
14:56 alyssa: I guess potentially the differnce is that the CPU side alloc is allowed to go trigger the shrinker and succeed
14:58 sima: alyssa, we get unreliable much earlier than userspace would notice otherwise
14:58 sima: because we can only do GFP_NORECLAIM
14:58 sima: whereas actual allocation failures are GFP_KERNEL or GFP_USER
14:59 sima: and the kernel loves to fill all the memory with caches, only leaving watermarks free, so if you exhaust those quickly enough you're out of luck
14:59 alyssa: right.. so it's about whether we can e.g. evict page caches and such?
14:59 sima: despite that there's potentially enormous amounts of memory around that's trivially reclaimable
14:59 sima: alyssa, yeah
14:59 alyssa: right ok
14:59 sima: well even entirely clean cache dropping isn't possible
14:59 sima: because locking
14:59 bbrezillon: but there are flags for that, no?
15:00 bbrezillon: like, retry-but-not-too-hard
15:00 sima: yeah it's the GFP hierarchy, but since dma_fence are maximally nasty you can't do any of them
15:00 bbrezillon: I remember i915 progressively increasing the reclaimness
15:01 sima: and rule of thumb is that already in the io-path you better have mempools because GFP_NOIO can just fail with sufficient amounts of bad luck
15:01 sima: yeah but that's either not in dma-fence paths, or when it was that code, very much not a good idea
15:01 alyssa: bbrezillon: is the drm r4l call right now?
15:01 sima: once you don't hold nasty amounts of locks, you can set a lot more GFP flags to get at memory that's reclaimable
15:02 alyssa: or did i just miss it an hour ago because of timezones
15:02 bbrezillon: alyssa: was an hour ago :-/
15:02 alyssa: d'oh.
15:02 sima: bbrezillon, like i915 had a GFP_NOFS path (avoids shrinkers) when holding locks, and then a lock-drop+retry fallback
15:02 alyssa: screwed by daylight savings once again. oh well, next week hopefully then
15:13 bbrezillon: sima: I see. Is they not was we can say, I want weak reclaim, and don't use that shrinker...
15:13 sima: oh there is
15:13 sima: you don't get dma_fence at the end of the batch
15:14 sima: but you need some kind of preempt
15:14 sima: so for compute workloads this works really well, if you want interop, not so much
15:16 bbrezillon: *Is there no way
15:19 bbrezillon: hm, not even trying to flush the FS cache is likely going to introduce a perf regression on panthor, and an higher failing rate on panfrost...
15:19 bbrezillon: because as you said, Linux tends to fill those cache as much as possible
15:20 bbrezillon: and relies on reclaim to flush those on memory pressure
15:59 cwabbott: alyssa: gfxstrand: why do we have a separate load_global_constant when afaict it's identical to load_global with CAN_REORDER access bit set?
16:10 alyssa: cwabbott: i think "historical accident & we should merge"
16:11 alyssa: i would offer to do the refactor but i'm spread really thin right now
16:25 cwabbott: alyssa: apparently right now we lower load_global(_constant) to load_global_ir3 which has 2x32 address and offset
16:26 cwabbott: I guess I should translate load_global_constant to that with WRITE_ONLY+CAN_REORDER rather than creating a load_global_constant_ir3?
16:26 alyssa: should be able to yea
16:26 cwabbott: *READ_ONLY
16:27 cwabbott: the reason I'm looking into it is that apparently PoE 2 runs terribly in part because it uses raw pointers instead of UBOs and we're not moving them to the preamble
16:28 alyssa: oof
16:32 dj-death: cwabbott: on intel we can use a different cache on some HW if I remember correctly
16:33 dj-death: constant cache vs data cache
16:33 cwabbott: right, but that's what ACCESS_NON_WRITEABLE is for
16:33 cwabbott: you don't need a separate intrinsic for that
16:35 dj-death: yeah, might need to be careful with the invalidation flags of the vulkan API though
16:35 cwabbott: seems like it was initially added by gfxstrand for OpenCL const memory (where it was already redundant) then Kayden made NonWriteable loads use it so now it's really the same thing
16:36 dj-death: it might be fine then
16:40 Kayden: Yeah, alyssa and I had talked about that a while ago. load_global with an access flag makes more sense
16:41 Kayden: just hadn't gotten around to refactoring
17:54 Lynne: dj-death: https://github.com/cyanreg/intel_bda_test
17:54 Lynne: finally got around to writing a small program to replicate the issue I'm seeing with BDA on intel
17:55 Lynne: it prevents the vulkan ffv1 code in ffmpeg from running so we had to disable it
17:55 Lynne: meson build ; ./build/intel_bda_test, you don't need anything but libavutil and shaderc to compile
17:56 Lynne: main.c at the bottom has some switches to turn on validation layers and change device
18:07 dj-death: Lynne: cool, thanks
18:09 dj-death: Lynne: will get to it at some point, just not in the next few days
18:09 alyssa: hakzsam: oh, ughh, I remember how we got into this mess with shifts
18:10 alyssa: nir can't represent ALU instructions where both sources can be independently sized
18:12 alyssa: and that requirement seems very tricky to lift due to algebraic's bitsize matching
18:13 Lynne: dj-death: thanks
18:14 Lynne: pretty sure its a buffer descriptor issue
18:14 Lynne: but I could be wrong
18:26 alyssa: ok. and i'm not super motivated to solve that because even with int8/int16, backends can usually fold away the extra u2u32's
18:26 alyssa: so it's.. ugly but not actually hurting any competent backend, I expect
18:48 mareko: alyssa: but shifts can already have src0 = 16 bits, src1 = 32 bits
18:48 alyssa: yes, so..?
18:48 mareko: so they are independently sized
18:49 alyssa: not in the sense that src1 must always be 32-bit
18:49 alyssa: you can't have two arbitrary pairs of bitsizes
18:50 mareko: that's all nice, but it's a problem for 16-bit vec2 vectorization that doesn't allow 32 bits in shifts
18:52 alyssa: ugh! right, that!
18:52 alyssa: forgot about that detail. yep, /that/ was why I got annoyed about this years ago.
18:54 alyssa: hmm in principle aco could ingest a (i16vec2 = ishl i16vec2, i32vec2)
18:54 alyssa: translating it to a hardare ishl + a pack
18:55 alyssa: the question is whether aco's optimizer can sanely optimize out pack-of-upcasts
18:55 alyssa: that's definitely... easier to do in nir. but can't without changing the ir, but.
18:55 alyssa: gah
18:55 alyssa: this is all coming back to me now >.>
18:59 mareko: if there was u2u32_for_shift, it could be handled in the vectorize callback as a special case
19:00 mareko: or u162u32_for_shift
19:04 alyssa: mareko: how do you mean?
19:06 mareko: alyssa: u2u32 isn't vectorized to vec2, but u162u32_for_shift would be
19:07 mareko: and the backend will replace it with 16-bit src1
23:13 cmarcelo: does anyone use vkrunner as a _library_?
23:16 cmarcelo: (trying to gauge if we could retire that offering and keep only the vkrunner executable).