IRC Logs of #dri-devel on irc.freenode.net for 2025-03-24

03:45 zzoon: Lynne: can you share a media for testing the failure?
07:46 tzimmermann: airlied, sima, hi! please merge the drm-misc-next-fixes PR at https://lore.kernel.org/dri-devel/20250313180135.GA276891@linux.fritz.box/
07:52 airlied: tzimmermann: done
07:52 tzimmermann: thanks
08:09 tzimmermann: sima, what was the reason we cannot do begin_cpu_access outside of atomic_update. specifically this patch: https://patchwork.freedesktop.org/patch/507240/?series=109768&rev=1
08:09 tzimmermann: ?
08:51 sima: tzimmermann, it's like vmap/vunmap, can take reservation lock and all other kinds of nasty things that you can't do in the main commit path because it might deadlock with dma_fence
08:51 sima: so only in the prepare/unprepare and related fb hooks, which are carefully place before the point of no return and after we've already signalled completion
08:51 sima: or do you mean something else?
08:58 tzimmermann: sima, i'm not sure i understood your answer. we call ->begin_fb_access right after ->prepare_fb. it does the vmap if necessary. would also be the natural place to call begin_cpu_access for the vmap'ed GEM buffer. yet that happens only in atomic_update. is that really related to fencing?
09:00 sima: tzimmermann, hm that might be a bug actually
09:01 sima: tzimmermann, I guess I'm not clear on what your question is, that patch you linked looks like we should have it
09:01 sima: hm
09:02 sima: ok I'm wrong
09:02 sima: so from locking pov, we need that patch
09:02 sima: but from a correctness pov, we need the current code
09:02 sima: cache coherency correctness I mean
09:02 tzimmermann: indeed. and there was a discussion back then. you mentioned that begin_cpu_access does 2 different things. and at least one of them was problematic
09:03 sima: so for correctness we need to 1. wait for rendering to finish, which needs to be done in the async part (and which is ok)
09:03 tzimmermann: ok
09:03 sima: 2. _after_ that, flush caches, so that the cpu reads for these drivers that "scan out" using cpu copies is coherent
09:03 sima: that part might take locks which are too nasty
09:03 tzimmermann: makes sense
09:03 sima: which is an oopsie
09:04 sima: like a "I kinda screwed up atomic commit semantics for dma_fence/sync_file really fundamentally" oopsie
09:04 sima: and I think when we discussed that we figured better correct scanout at the cost of maybe a lockdep splat
09:05 sima: I did come up with some ideas how to sort this mess out, but a) it's lots of work b) not sure it's good enough c) need to quickly go for a grocery run so would need to chat a bit later
09:05 sima: but spoiler: this is a really nasty locking hierarchy design snafu here unfortunately
09:06 tzimmermann: then get your breakfast first :)
13:08 K900: Allow me to bump https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33890/ again now that Gitlab is hopefully all the way up
14:14 Lynne: zzoon: I can replicate with any stream
14:15 Lynne: alder lake
15:33 dj-death: is the offset of load_push_constant guaranteed to be uniform?
15:34 dj-death: because the divergence analysis is always considering the result of that intrinsic as uniform
15:34 dj-death: doesn´t sound right...
15:34 pendingchaos: yes
15:35 dj-death: dEQP-VK.pipeline.monolithic.push_constant.graphics_pipeline.dynamic_index_vert_command2 seems to disagree
15:36 dj-death: a vertex input value is used as an index to a push constant
15:36 pendingchaos: https://docs.vulkan.org/spec/latest/chapters/interfaces.html#interfaces-resources-pushconst
15:36 pendingchaos: "Any member of a push constant block that is declared as an array must only be accessed with dynamically uniform indices."
15:36 alyssa: could be bogus CTS
15:37 dj-death: yeah
15:37 dj-death: looks like it
15:37 dj-death: int arr_selector = int(abs(gl_Position.x) * 0.0000001 + 2);
15:37 dj-death: matInst.arrType[int(matInst.index[arr_selector])
15:37 dj-death: and matInst is a push constant block
15:38 dj-death:files a bug
15:40 dj-death: welll
15:40 dj-death: "dynamically uniform"
15:40 dj-death: can the divergence analysis tell?
15:41 jenatali: Interesting, D3D's rules around push constants are even stronger than that, they need to be literals
15:43 pendingchaos: tell what?
15:44 alyssa: dj-death: dynamically uniform is stronger than not divergent, but e.g. nir_opt_preamble does that analysis
15:45 dj-death: ok, they I filed https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/5659
15:51 zmike: daniels / MrCooper / ???: any ideas about solving https://gitlab.freedesktop.org/mesa/piglit/-/issues/112
15:54 dj-death: alyssa: could someone make the argument that it's dynamically uniform because they know only one lane is going to run? ;)
15:54 dj-death: but actually the code does not allow you to make the deduction
15:55 MrCooper: zmike: not offhand
15:57 alyssa: dj-death: Uhhh
15:58 dj-death: I mean in a fragment shader with helper lanes
15:59 dj-death: it's probably difficult to make that argument
16:03 cmarcelo: is marge already ready to be used in mesa/mesa?
16:03 jenatali: I think not yet
16:09 dj-death: someone assigned it, it seems
17:53 benjaminl: https://maintenance.gitlab.freedesktop.org/ says that marge is done
17:54 benjaminl: (maybe wasn't a couple hours ago though)
18:06 jenatali: Marge itself is, but Mesa pipelines are still on the floor
18:07 jenatali: So attempting to assign to marge will run failing pipelines and just waste CI cycles
20:24 karolherbst: mhhh.. NIR_DEBUG=sweep to sweep memory after each pass to see if peak memory usage could be lowered?