IRC Logs of #dri-devel on irc.freenode.net for 2025-03-17

01:03 mareko: do people prefer a per-instruction (tex and intrinsic) CAN_SPECULATE flag, or just shader_info::speculating_can_page_fault?
01:19 mareko: are there any ops where CAN_REORDER doesn't mean CAN_SPECULATE?
01:19 mareko: non-memory ops, that is
06:57 aibackendsnow: have you ever noticed as to how adders are at all implemented inside the hardware blocks? it's as genius as possible. even hardware does nothing digit by digit, the implementation is 8 full or half adders on 64bit hardware, so probability theory riddles out 4.2B combinations there indeed on 32bit, cause you are at 0 or 1 to start with as all over again as like uncertainty was restarted
06:57 aibackendsnow: every time, but hardware allows to fix higher valued combinations per power as equal probabilities too not only due to it's high entropy design, in fact when we used this say Mart's or Cornells encoding in hardware things might turn out slimmer even in hw, but that would not be very secure, very fast would the ISA or sw command stream get reverse engineered and there would be no
06:57 aibackendsnow: alternative. Regardless of what your demands are here, no one is going to respect them since you were wrong I was correct. I have not coded the compiler, but the work is straightforward there. So today drones only need bayesian probability which essentially means it can have an easy ai backend on the onboard microcontroller or cpu. The backend gets drained by sensor data and it learns
06:57 aibackendsnow: itself on the fly as it was a human brain. War is lame stuff, but europes 8xx billion almost trillion dollars separated for artillery other munition and drones last which are cheap to manufacture, is quite a usably big sum in those terms, and Europe has all the capacity and skill to carry out militarization very fast. There is no question we could not combat russians and destroy their
06:57 aibackendsnow: lines altogether, but in the end do you want to destroy nations like Russians, they look like us don't they, they are smart too aren't they? There is no ultimate truth that Trump is so bad, he has his points too and Elon has them too. And elon supporting white races or cheering to them more, cause he is white, is also allowed position. Me i said i also am not racist, but in africa it's
06:57 aibackendsnow: often the danger part get's spotted, and primarly cause african tribe predators are not friendly nor intelligent enough, but this yeah never applied much to large and advanced black communities in united states , who very often can be very friendly and intelligent.
09:24 mlankhorst: sima: I'm looking into preventing overcommit on xe, and there are cases that we want to ensure sysmem is not evicted too. For vram, we can use dmem cg, but how should we handle system memory?
09:26 K900: ]'[
09:26 K900: Oops
09:26 K900: Cat says hit
09:26 K900: *hi
09:58 ella-0: Hi. Does mesa have a clear policy on the use of LLMs and other AI "assisted" developer tools when contributing?
10:24 sima: mlankhorst, need to allocate memory with GFP_ACCOUNTING or whatever the exact flag is
10:24 sima: and then make sure we account against the right memcg
10:24 sima: and all behind a kconfig/module_opt opt-in so we don't break anything existing
10:24 sima: at least that's what I thought the consensus was with all the discussions with tejun and others
10:32 mripard: I guess it's kind of related to the patches I sent last week too, but how we should track system memory isn't really clear to me
10:33 mripard: for GEM, if we track differently the memory allocations between backends, it will look super weird and be a nightmare to configure
10:33 mripard: plus the fun things like the driver has changed backends, so now your allocations aren't tracked by the same cgroup anymore
10:34 mripard: so I really think we should be allocating all buffers through dmem, but then it's also weird to allocate something from GFP but not have it show up in memcg I guess
10:38 mripard: (plus, concepts like region size in dmem doesn't map super well to the main DMA allocator for example that doesn't really have a fixed size AFAIK)
10:45 sima: mripard, we need to allocate all buffers through memcg GFP_ACCOUNT
10:45 mripard: sima: it doesn't make sense for some of them
10:45 sima: iirc tejun has been pretty clear that dmem should be only for stuff that memcg wouldn't track
10:46 sima: mripard, for cma it might make sense to track it in both memcg and a cma limit (which could be dmem, not sure)
10:47 javierm: sima: cma is not really system memory since that region is carved out IIUC
10:47 mripard: Then dmem is useless to me I guess. I wished I had learned about that like a year ago or something.
10:49 sima: javierm, it's not carved out, it's shared
10:49 mripard: ish
10:49 sima: and relies on memory migration to throw stuff out
10:49 mripard: it can be reused for cache
10:49 sima: yeah it's only for GFP_MOVEABLE
10:49 mripard: that's it
10:49 sima: anon memory too I thought
10:50 sima: essentially anything that try_to_migrate() can get moving
10:50 mripard: oh right, maybe
10:50 javierm: sima, mripard: ah, I see
10:50 sima: (exceptions apply, which is where the fun starts)
10:50 mripard: but still, it's not shared-shared :)
10:50 javierm: I didn't know that. I thought could only be used for cma allocs
10:50 sima: mripard, yeah but in really big machines there's entire numa nodes which are GFP_MOVEABLE only
10:50 sima: to allow hotunplug
10:50 sima: and those aren't accounted differently in memcg either afaik
10:51 sima: javierm, nah that was the old carveout, and people didn't like it because it wastes memory when not in use
10:51 javierm: sima: got it
10:52 mripard: sima: eventually, it's a matter of API and what you allocate your memory from
10:53 mripard: with heaps, v4l2 dma contiguous backend, or GEM DMA backend, you would allocate the buffer from dma_alloc_attrs
10:53 mripard: and you have no guarantee that it will be in A) RAM, B) under Linux control
10:55 mripard: and rolling a dice to know in which cgroup a call to dma_alloc_attrs is going to be tracked doesn't sound great
10:55 sima: hm right getting GFP_ACCOUNTING in there could be really tricky
10:56 sima: mripard, I guess it would need to be a new dma_alloc attr so you get accounting, if it's a struct page backed memory range
10:56 sima: since you might not even get cma but plain discontig pages, made contig with the magic of iommu
10:58 mripard: I don't see what's wrong with just accounting it in both
10:59 mripard: reworking dma_alloc_attrs to get pages / folios (and easier accounting) was something I discussed in my series last week to
11:00 mripard: but that's kind of separate from the discussion about how we want to account it
11:00 mripard: and I think any solution based around "CMA/dma-coherent goes one way, GFP the other" is not going to work
11:01 sima: mripard, but then allocating shmem pages in dmem feels very wrong imo
11:02 mripard: because then, if we configure our system to limit the dmem allocations, and then we lose that accounting over an upgrade, it's a UAPI regression
11:02 mripard: and all we need to do to trigger it is to merge a patch switching GEM backends, and we merged plenty of those
11:09 Company: mareko: since you asked yesterday: I tested my benchmark on AMD - it's 6500, so gen 10.5 I think - and I get 1650fps with radeonsi and 1100fps with zink. So it's not as bad as with Intel, but still bad
11:12 Company: benchmark is GSK_RENDERER=ngl GDK_DEBUG=no-vsync GSK_DEBUG=full-redraw gtk4-widget-factory - and I run it on F42, so Mesa 25.0 and GTK 4.18
11:38 sima: mripard, yeah that's why I'm not a big fan of accounting cma as dmem
11:38 sima: it's just memory, that's where it physically is, and if we change the allocator that fact wont change
11:39 sima: but if you account it as dmem then there's all the fun like "so ... userptr"
11:39 sima: and similar things
11:39 sima: way back I thought special dmem for all system memory gem allocations would be good, but tejun convinced me that the corner cases are a bit too awkward
11:40 sima: mripard, and do we really have that many changes of dma vs shmem backends in drivers? I'd assumed that's a hw constraint
11:40 sima: or are people picking the dma backend for lack of knowledge?
11:47 javierm: sima: it also happens in the other direction IIRC, people using the GEM shmem backend but then the driver switching to GEM DMA to avoid shadow buffering
11:48 sima: hm
11:49 sima: well this wouldn't be an issue if dmem isn't used for accounting of either still
11:49 sima: I think the one that's awkward is switching from ttm to shmem for some of the tiny vram cards we have
11:50 sima: but then I'm not sure people really care that much about that 4M framebuffer, if it's even that large
11:56 sima: mripard, javierm I'm also not super worried about regressions at the fringes, because by default we don't enforce any limits
11:56 sima: so even if we do have a regression I think there's a small chance we'll get an actual bug report
12:10 mripard: sima: if we don't account CMA, then we shouldn't account any heap or similar regions
12:10 mripard: in which case, memcg == dmem
12:10 sima: why memcg == dmem?
12:11 mripard: if we account only system memory in dmem, what's the point
12:11 sima: we don't account system memory in dmem at all
12:11 sima: only vram or well device memory
12:11 sima: that was the point of dmem
12:11 mripard: what do you call "device memory" ?
12:11 sima: stuff that's not tracked by memcg essentially
12:12 sima: *possibly tracked
12:12 mripard: this is.. worse ?
12:12 sima: so "doesn't have struct page or is a zone_device range"
12:12 sima: how so?
12:12 sima: btw gtg now, but maybe chat with mlankhorst about this
12:12 mripard: going back to dma_alloc_attrs, so memory region to allocate from are set in the DT, for that one device
12:13 sima: since I thought we had pretty solid agreement that dmem is not going to track anything in system memory
12:13 mripard: so for one particular device, on the same platform, using the same driver and kernel, you might get dmem or memcg accounting?
12:13 sima: yeah
12:13 mripard: how is that remotely workable for distros?
12:13 sima: I mean, that's already the case for all the big gpu drivers
12:13 sima: if it's igpu, it's entirely memcg
12:13 sima: if it's dgpu, it's dmem + overflow stuff tracked with memcg
12:13 mripard: AMD at least doesn't do any tracking
12:14 sima: I don't think distros can set limits for this realistically
12:14 robmur01: the problem with dma_alloc is that the caller must assume it's dmem and treat it as such, but under the hood it may well not be :(
12:14 sima: ok really gtg now, ttl
12:15 robmur01: mripard: BTW I think I saw you replied, will catch up on the thread later
13:38 mripard: sima: if distros can't use it, I'll go back to what I was saying then. dmem is useless to me, I've wasted my time, and we still have a problem to solve.
13:45 calico: Hello, someone's here?
13:46 K900: No
13:46 calico: I'll take this as a yes
13:46 calico: any Mesa devs on here?
13:47 pixelcluster: https://dontasktoask.com/
13:47 calico: k I just wanted to make sure I ended to the right place before posting my copypaste question
13:49 calico: Here's an issue I'm getting on my SolidRun HC LX2 which has a Radeon RX 550 GPU plugged in into the PCIe 8X port. Here's some pics of the issue: https://imgur.com/a/KyAHnYd
13:49 calico: At the time (2-3 years ago), the only existing patch was that one made by jnettlet for Mesa (from SolidRun):
13:49 calico: https://raw.githubusercontent.com/void-linux/void-packages/35f67c4694a73b869205cf50baa84c5e4ab71ade/srcpkgs/mesa/patches/0001-radeonsi-On-Aarch64-force-persistent-buffers-to-GTT.patch
13:49 calico: If I remember well, it worked kinda well on Mesa 22.
13:49 calico: I tried it recently on Mesa 24-25 and it doesn't seems to work anymore.
13:49 calico: I also tried the new patch: that one is a replacement of glibc libmemcpy: https://github.com/jnettlet/cortex_a72_memcpy/
13:49 calico: According to jnettlet, it should be used with this DRI config ... https://gist.githubusercontent.com/jnettlet/1f461487bee9c3e2a2d994f25441717d/raw/35df03e7bfa75ab34da90d6cfaaeb12fcfaf89b5/01-honeycomb-x11.conf ...
13:49 calico: So I applied both two (memcpy + dri conf), restarted the PC and it changed nothing.
13:49 calico: According to ld.so man, I should add this "/usr/lib/aarch64-linux-gnu/libmemcpy.so" into "/etc/ld.so.preload" in order to override the glibc memcpy and I htink I should put the dri conf here: /usr/share/drirc.d/01-honeycomb-x11.conf
13:49 calico: So, well, I don't know what I did wrong. It seems to work for both jnettlet and for other guys in the SR community ... even on Mesa 25
13:52 psykose: try `LD_DEBUG=all <someapp>` and check the preload is actually loaded
13:53 calico: According to lsof it is indeed loaded ... but I could try LD_DEBUG=all again ...
13:54 calico: btw I also did some tracing with `strace -o log glxinfo` and the drirc.d conf seems to be loaded correctly.
13:59 calico: psykose: anyway to pipe ld_debug output to file only?
14:00 psykose: 2> file
14:01 calico: also found LD_DEBUG_OUTPUT
14:03 calico: will take a while
14:06 alyssa: mareko: per-instruction CAN_SPECULATE is useful to model cases when we know a given instruction is safe but we don't know that in general
14:06 alyssa: e.g. accesses to internal driver memory that is guaranteed to be there is SPECULATE even if we don't have that guarantee on app accesses
14:07 alyssa: honeykrisp uses this in a few places
14:07 alyssa: also for us speculating load/store is different from speculating textures
14:09 calico: psykose: result ... export LD_DEBUG_OUTPUT=ld_debug.txt; LD_DEBUG=all firefox shows libmemcpy.so is loaded http://0x0.st/8Qli.18567
14:09 psykose: yea looks fine
14:10 calico: about the 01-honeycomb-x11.conf ... is there a way to force all apps to disable -GL_ARB_buffer_storage? like a "*"
14:11 calico: btw I'm on gentoo
14:11 calico: and I compiled libmemcpy myself through an ebuild
14:13 psykose: something like MESA_EXTENSION_OVERRIDE=-GL_ARB_buffer_storage probably
14:13 psykose: in env
14:19 calico: just added it into /etc/env.d/99mesa and it is still broken
14:19 psykose: is it actually set
14:22 mareko: alyssa: GL can speculate anything that's reorderable
14:23 calico: psykose: not sure it was set? but then I ran export MESA_EXTENSION_OVERRIDE=-GL_ARB_buffer_storage before starting Xorg and it work
14:23 psykose: work as in it fixed the issue or
14:23 calico: seems to have fixed it yes
14:23 psykose: what's the name of your xorg executable
14:24 mareko: alyssa: ddx/ddy can't be sunk into control flow, but can be hoisted (i.e. speculated) out of control flow
14:25 calico: psykose: there's /usr/bin/Xorg and there's also /usr/bin/X that points to /usr/bin/Xorg ...
14:25 psykose: that should work then afaik
14:26 psykose: not sure why executable="Xorg" didn't match it
14:26 calico: I'll try to add X to the conf file
14:26 calico: I shouldn't have to add Firefox and other apps to that file too right?
14:27 psykose: if you run `ps aux | grep X` do you see Xorg or X
14:27 psykose: i forget if symlinks resolve the name or not
14:28 psykose: a quick test says apparently no
14:28 psykose: which makes sense
14:28 calico: I see X ...
14:28 psykose: :)
14:28 psykose: well there you go
14:29 calico: damn
14:29 calico: three weeks lost for a freaking X
14:30 psykose: happens
14:30 psykose: and yeah you don't have to set it for apps pretty sure
14:31 psykose: so just another line in drirc should be the only change
14:31 psykose: or figuring out why stuff is using X on gentoo i guess, but that doesn't matter really
14:36 calico: just wish there will be a proper fix someday ...
14:37 calico: not just patching the memcpy and disabling the ARB buffer ...
14:38 calico: but well, thank you a lot psykose you made my day
14:38 K900: I'm pretty sure this is attempting to work around hardware errata
14:38 K900: So the proper fix would require a new silicon stepping
14:40 calico: well ... my Shapphire RX 550 from 2017 costed about than CAD 200 or even less ... I don't think AMD nor Nvidia would make as cheap GPU anymore
14:42 K900: It's not the GPU that's the problem
14:42 K900: It's the SoC
14:42 psykose: reminds me of the altra pci errata
14:42 psykose: every generation of arm stuff has cursed pcie somehow :(
14:43 calico: I know the GPU not the problem ... I was just saying it's not worth putting a CAD 700$+ Radeon RX 5700 XT or wathever newer high end GPU on a SR HC LX2 workstation :P
14:44 psykose: that's actually 3 gens old now!
14:44 psykose: :D
14:48 sima: mripard, well if distros want to use it, then it also needs to work for amd and other big gpu
14:49 sima: and as-is dmem isn't enforcing yet
14:49 sima: plus I'm not sure how distros will use this with your design, you still need to somehow tell them which dmem a gpu is allocating from
14:49 sima: and how much there is
14:50 sima: feels like that's the missing part, once you can autoconfig first time around you can just do that again if the hw changes
14:50 sima: and I kinda cound "different dt" under different hw
14:50 sima: at least for regressions
14:50 K900: calico: I don't think the exact GPU matters here
14:50 mripard: sima: I agree about it needs to be more widespread, see the series in your inbox from last week
14:50 K900: It's very likely the PCIe interface that's broken
14:50 K900: So any GPU will have some sort of issue
14:51 calico: k
14:51 sima: mripard, yeah I'm behind on mail :-/
14:52 mripard: but if you say that we can't use it because it's very wrong, I'm not going to waste more time on it
14:53 mripard: also, "different DT" might also mean "different firmware"
14:53 mripard: which, again, can change over an upgrade
14:59 sima: I guess I need to catch up on mails and ponder this some
15:07 calico: btw on the same HW, I also got the GPU to fail to start on kernel 6.12.16 ... so I had to downgrade to 6.6: http://0x0.st/8QlW.txt (dmesg logs with colors)
15:10 calico: should probably post that on a radeon IRC channel if any
17:28 calico: psykose (or any dev here): Could you help me as well with this? http://0x0.st/8Q0A.html ... Xorg refuses to start on my old ASUS A8V motherboard that has a Radeon HD 4650 AGP GPU (I also tried another HD 4650 to make sure)
17:28 calico: Some years ago I also tried older kernel versions and it was also broken.
17:30 psykose: no idea personally, never used one of those
17:33 calico: k but how can I track the failure easily ftrace?
17:34 calico: that thing have been broken since years
17:36 calico: the "radeon r600" should be supported by mesa btw
19:05 benjaminl: if I want to put a 64-bit int in driconf, is just making it a string and parsing it in the driver a reasonable option, or should I add a new type?
19:08 alyssa: mareko: "GL can speculate anything that's reorderable"
19:08 alyssa: I don't think that's true
19:09 alyssa: consider a dynamically indexed load_ubo, in an app that does not use robustness
19:09 alyssa: a shader like `if (bound_the_whole_ubo) { load_ubo(big index) }`
19:09 alyssa: and the app only binds the whole ubo sometimes
19:10 alyssa: if you speculate that load, you might get a page fault, depending on your hardware
19:10 alyssa: "but AMD's load_ubo is robust, it's fine" I hear you complain
19:10 jenatali: "does not use robustness" - is that a thing in GL?
19:10 alyssa: the whole point of can_speculate is communicating that precisely to NIR
19:10 alyssa: jenatali: yes, it's the default
19:10 alyssa: robust buffer access is only added in gl4.x/gles3.2
19:11 jenatali: Ooh
19:11 alyssa: in practice competent hw will be robust anyway because of d3d
19:11 jenatali: Right
19:11 alyssa: but it's not a spec requirement by any means
19:11 alyssa: and it's not true for loads of mobile hw
19:11 jenatali: Makes sense, I just had my "older API == robust" glasses on I guess
19:12 alyssa: it's more like "microsoft == robust" :p
19:12 jenatali: :P
19:13 glehmann: GL probably also cannot speculate bindless textures?
19:15 alyssa: probably not, no
19:57 mareko: alyssa: that seems like all memory ops need a can_speculate flag
19:57 mareko: including tex
19:57 mareko: deduced from API usage and NIR options
19:58 alyssa: right
19:59 alyssa: but again, it can't be figured out from NIR options
20:01 mareko: can_speculate = options->robust && !gl_bindless;
20:02 mareko: or even bindless textures can set it if the driver/hw is robust for it
20:03 alyssa: that misses a lot of cases, including cases that are per-instruction
20:03 mareko: examples?
20:04 alyssa: ok, let's suppose you have SGPRs for the # of samples, and for the address of a table of sample positions
20:04 alyssa: and you lower load_sample_pos to load_global_constant(SGPR base address + math)
20:04 alyssa: that load can always be speculated, because we guarantee inside the driver that the sample position table is always present and filled out
20:05 alyssa: but (depending on API usage) none of the loads originating from the app can be
20:05 alyssa: only the driver knows
20:05 alyssa: hence why we have a per-intrinsic flag (and you're right that we need a per-tex flag too)
20:05 mareko: load_sample_pos lowering can clamp the address
20:06 alyssa: right, how does the driver tell NIR that it's clamped? by setting the can_speculate bit on that load
20:06 mareko: though load_global_constant would have a speculatable flag
20:06 alyssa: it already does.
20:06 alyssa: maybe I don't understand what you're proposing
20:07 mareko: can_reorder, and if there is access, then also can_speculate, are required of course
20:08 mareko: and then we just need to determine that in glsl_to_nir and other NIR generators
20:11 mareko: for that determination, options->robust_{regular, bindless} are required, and that should be it
20:12 mareko: DXVK/DX9-11 can also set can_speculate on almost everything reorderable for RADV
20:12 alyssa: how would DXVK forward that information thru?
20:12 alyssa: new vk ext?
20:12 mareko: I guess
21:14 JEEB: &1
21:39 glehmann: alyssa: without descriptor indexing, descriptors have to be valid if they are statically used. so all accesses can be speculated
21:42 glehmann: well, if buffer/image access is always robust, which isn't required in core vk
22:02 DemiMarie: calico: the only feasible way to get the GPU to work on a platform with busted non-device PCI BARs is to map the BAR as device memory (I forget which type) to work around the hardware bugs, and then patch the kernel to emulate unaligned access
22:02 DemiMarie: calico: I don’t know if that is your specific problem
22:03 DemiMarie: Asahi might carry an out of tree patch for this as it would allow using eGPUs on Apple silicon. It is unlikely that such a patch would be accepted upstream.
22:04 calico: DemiMarie: you're talking about my gpu problem on the SR HC LX2?
22:04 DemiMarie: calico: one possible cause of GPU problems on ARM
22:04 DemiMarie: I don’t know if it is your particular problem as I know nothing about that particular SoC
22:05 DemiMarie: What I do know is that many Arm platforms do not support cache-coherent PCIe BAR memory
22:05 calico: k thx for the info ... will tell it to jnettlet
22:08 calico: DemiMarie: what about my other issue with the Radeon HD 4650 AGP?
22:08 DemiMarie: calico: if it is real AGP I would consider the hardware obsolete and replace it if possible
22:09 DemiMarie: AGP was a predecessor to PCIe
22:09 DemiMarie: actually no, but it is definitely obsolete
22:09 calico: I know ... but I'd like a lot to get the ASUS A8V of my childhood to work again :P
22:10 DemiMarie: calico: Software rendering would be the simplest thing to try
22:12 calico: uuuh ... I think the driver for cards of that era (2005-2010) is also broken for PCIe ones ...
22:12 calico: even though Mesa should support it
22:14 DemiMarie: Does software rendering work?
22:16 calico: you mean with radeon.modeset=0 or nomodeset? ... yes
23:33 calico: DemiMarie: fancy tty also works