IRC Logs of #dri-devel on irc.freenode.net for 2025-04-11

08:00 sima: alyssa, thanks for typing up the more reliable tiler heap heuristics, saved me some work
08:00 sima: bbrezillon, ^^ was about to type that up for you re your "possible perf regressions" comment from yesterday
08:01 sima: bbrezillon, it's probably still reasonable to have the page fault handler at least try to save the day with GFP_NORECLAIM allocations
08:03 tanty: cmarcelo, from Igalia's side I don't think we are using vkrunner as a library.
08:18 bbrezillon: sima: yeah, alyssa's comment got me thinking to, and I think I can come up with something for panthor's tiler heap that would try a non-blocking allocation, then, if it fails, fallback to incremental rendering, and defer blocking allocation to a separate workqueue, so more chunks are available next time the tiler needs them
08:19 bbrezillon: but I still have no solution for older GPUs, which we're asked to support Vulkan on, and which I'd love to use the same transparent shrinker (swapout on reclaim, swapin on next VM use) on
08:22 sima: bbrezillon, I have a really horrendous idea
08:22 bbrezillon: shoot
08:22 sima: let me do some workout to decided whether I want to attach my name to it before I go type
08:22 sima: it's like ... terrible
08:22 bbrezillon: :D
08:26 sima: it might also not work, depending how picky the hw is
08:39 bbrezillon:compulsively hitting the "Get Mail" button to see sima's crazy idea :-)
08:48 sima: bbrezillon, I'll ping you
08:48 sima: also got distracted writing some notes instead of doing work-out
08:49 bbrezillon: Good training
10:01 daniels: sima: as long as it's better than forcing all rendering to be completely synchronous on hardware which can't do incremental rendering :)
10:54 sima: daniels, it's not strictly better, but I think like overcommit we can pile up enough very clever tricks that it's good enough in practice without breaking too many fundamental pillars
10:54 sima: also it's getting worse, the longer I'm drafting notes :-P
12:01 sima: bbrezillon, daniels I'm sorry :-(
12:27 bbrezillon: sima: don't be. The global mempool idea is something Arm wanted to have for this kind of on-demand/unpredictible allocations (it's called JIT-something in kbase)
12:28 sima: bbrezillon, even for panthor?
12:28 sima: I guess it might still be beneficial, but maybe with slightly less aggressive limit heuristics so we don't waste too much memory
12:30 sima: bbrezillon, and the horrible thing is just step 2, trying to get an actual dynamic memory requirement out of a dead context
12:32 bbrezillon: oh yeah
13:32 alyssa: daniels: you don't like mesa statically allocating 128MB for every VkDevice just in case???
13:48 sima: 128MB for you, 128MB for you, 128MB for everyone!
13:48 sima: who doesn't like that
13:52 MrCooper:was hoping for a car or something
14:05 alyssa: unfortunately the witcher 3 is OOMing the 128MB
14:05 alyssa: so i might need to bump to 512M or something everywhere
14:05 alyssa: :p
14:18 kusma: mareko: I'm having some issues understanding the interaction of AMD_framebuffer_multisample_advanced and ARB_internalformat_query... The problem I see arise when some formats supports higher multisample-counts than others, yet AMD_framebuffer_multisample_advanced says that an implementation *must* support up to MAX_COLOR_FRAMEBUFFER_SAMPLES_AMD samples, and specifying samples higher than that are illegal... That erase that flexibility again,
14:18 kusma: doesn't it?
14:19 kusma: We're also applying that limitation in Mesa for all renderbuffers, but I think that should only apply when RenderbufferMultisampleAdvancedAMD is called...
14:37 daniels: alyssa: on the bright side, Collabora could really monetise our production-ready branch with awesome memory savings from just reverting to how it was before
14:38 alyssa: real
14:39 bbrezillon: $$$$$$
14:41 alyssa: unrelated, but does anyone else get annoyed by dozens of "Image Type operand of OpTypeSampledImage should not have a Dim of Buffer.' when running vk cts
14:53 zmike: yes
14:57 alyssa: how do we fix this
15:00 zmike: impossible
15:07 alyssa: k
15:17 melissawen: if userspace tries an async page flip and gets a EINVAL rejection, userspace will try to perform a sync flip, right? it will try again async flipping or not anymore after the first rejection?
15:19 zamundaaa[m]: melissawen: yes, it'll fall back to sync
15:20 zamundaaa[m]: Or maybe first async with some planes removed or sth. The usual mess with atomic tests
15:23 melissawen: zamundaaa[m], I see... so userspace can approach the rejection with trial and error?
15:23 zamundaaa[m]: Yes
15:26 melissawen: I see. Thanks for clarifying! I'm in a context where AMD driver rejects an async page flip when the fb mem type changes: https://lore.kernel.org/amd-gfx/20230804182054.142988-1-hamza.mahfooz@amd.com/
15:27 emersion: melissawen: it can fallback to front buffer rendering like Xserver does
15:27 emersion: ie, mutate the current front buffer
15:32 melissawen: So, at some point, AMD driver allocates FB in VRAM but when there is no more space in VRAM, it passes to allocate in GTT without explicitly mention the mem type change. So, the async flip test in IGT started to fail because of this dual mem types :/
15:33 melissawen: I'm trying to figure out what is the right way to implement this mem type change condition in the IGT test
15:33 melissawen: considering userspace perspective
15:34 melissawen: I don't think userspace is aware of these two mem types to decide the right conditions to try to async flip
15:35 emersion: async flip is allowed to fail for any driver-specific reason
15:36 melissawen: emersion, yeah, but how to better translate this expectation in the current async flip test?
15:37 melissawen: maybe just allowing failure, and then skipping the test?
15:37 melissawen: or trying a different set of planes, for example? or... just failing the test and, that's it?
15:40 emersion: yeah, i'd say skipping is fine
15:40 emersion: i wonder how other tests requiring hw features behave?
15:40 emersion: like, plane tests
15:41 emersion: (could skip on -EINVAL and hard fail on anything else)
15:43 melissawen: I suspect current implementation isn't generic enough about hw features. It creates a scenario in which the test should not fail for any reason, but probably this scenario is valid just for a subset of hw.
15:43 emersion: indeed
15:44 melissawen: and then specific limitations were added over time
15:44 melissawen: I wonder if I should add another specific limitation for AMD's case or what hahaha
15:45 melissawen: I think I'll follow your suggestion to just skip if EINVAL
15:46 melissawen: thanks .o/\o.
15:47 emersion: cool, feel free to CC me :P
15:50 melissawen: sure!
16:23 cmarcelo: tanty: thanks good to know. I might write up an MR to start a more wider discussion then.
17:38 alyssa: gfxstrand: for sparseBuffer emulation, I'll need to use a different address for loads vs stores/atomics of SSBOs
17:38 alyssa: I can calculate the required fixup in shader, the buffer descriptor remains the same thing that nvk uses
17:39 alyssa: any preferences on how that looks like wrt the common code?
17:39 alyssa: easiest thing for me is probably a special address mode
17:40 alyssa: or a pair of new modes (one robust, one not)
17:41 alyssa: I just don't know how much weirdness we are ok putting in lowre_explicit_io
17:41 alyssa: otoh, if bbrezillon goes thru with sparse on panvk, this will be used there too lol
18:05 sima: alyssa, I guess minimally zink needs to pass this up through gl as arb_context_robustness fail
18:05 sima: for everything else you don't replay work, you just eat the gpu hang and potentially a bunch of corrupted rendering and hope that's enough
18:06 sima: it's not any different from current gl "recovery"
18:09 alyssa: ouch
18:14 sima: alyssa, well hence the bracketing with the tricks in step 1 and quirks in 3 to try and make sure this doesn't actually ever happen
18:14 sima: or once per game you run or something like that
18:14 sima: and I did say this is terrible
18:19 sima: bbrezillon, I don't think you gain anything from a background task, because memory reclaim will already try to make more memory available, to help the current job
18:20 sima: and resizing on the next cs ioctl avoids all the pain with async errors and potential deadlocks
18:20 sima: I guess I'll reply on-list too
20:59 mareko: kusma: yes, that extension doesn't allow different sample counts for different formats
21:02 kusma: Ack, thanks!