IRC Logs of #dri-devel on irc.freenode.net for 2024-05-03

07:29 pq: DemiMarie, I think KMS not having any way to cancel a pending buffer swap (while waiting for fences) may have factored in.
07:36 mripard: mlankhorst: I don't think the fbdev buffer address fix should be cherry-picked
09:35 mlankhorst: mripard: Hmm, it was marked as fixes, throw it out?
09:35 mlankhorst: Mostly because of Cc: <stable@vger.kernel.org> # v6.4+
09:36 mlankhorst: but I guess it would rather belong to drm-misc-fixes in that case
09:44 mripard: mlankhorst: it's not so much that it shouldn't be backported, but more that it's likely to trigger regressions and we should give it a test run in next
10:17 mlankhorst: mripard: shall I rebase without that patch?
12:00 mripard: no, I guess that's too late now
12:00 mripard: we'll see
12:01 mripard: it's not clear to me why we would want to cherry-pick fixes into drm-misc-next-fixes though if nobody asked for it, but it's kind of a separate discussion
13:38 zmike: DavidHeidelberg: I ran piglit and didn't see any new fails
14:57 DemiMarie: pq: In retrospect, was forbidding indefinite DMA fences the right decision?
15:16 pq: DemiMarie, I don't know. I suppose it is, as long as KMS has no way to cancel. Trying to show a buffer with a fence that never signals would require a reboot to fix, or something.
15:17 pq: to get the screen update again, I mean
15:17 pq: I have no idea of any other aspect.
15:25 stsquad: I'm trying to get vkmark up and running on an arm64 platform with AMD but it refuses to detect anything (vulkaninfo works). Looking at strace I see: openat(AT_FDCWD, "/home/alex/src/mesa.git/build/src/amd/vulkan/libvulkan.so.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
15:25 stsquad: but looking at my build I see
15:25 stsquad: ./src/amd/vulkan/libvulkan_radeon.so
15:26 stsquad: so have I failed to build some component?
15:27 stsquad: I configured mesa with
15:27 stsquad: meson setup -Dvulkan-drivers=amd,virtio -Dgallium-drivers=radeonsi,r300,r600,virgl build
15:31 HdkR: stsquad: You need to do an install step to get the symlinks setup correctly
15:33 pendingchaos: libvulkan.so.1 is the Vulkan loader, not Mesa
15:33 sima: tursulin, looked at your wq question mail and decided that this is too hard ...
15:54 stsquad: HdkR ahh the docs suggested mesondev should do the trick, or should this point at an install path?
16:34 MrCooper: DemiMarie pq: my understanding is the biggest issue isn't KMS but the fact that memory reclaim may wait for DMA fences to signal, so if that never happens, it could result in deadlock deep down in kernel memory management code
16:51 tursulin: sima: It is a bit hard. Just strikes me that if there is a reason why some have to use ordered and some can cope with concurrency, it would be good to document that criteria. Or figure out if something is even broken.
16:52 tursulin: If someone tackles the kthread_work(er) conversion it would be important to know.
17:10 colinmarc: @stsquad meson devenv should work for vulkan, I've used that before. you can try setting VK_LOADER_DEBUG=all. The error message is not complaining about libvulkan1.so missing, it's coming from that library complaining about something else
17:11 colinmarc: Err, ignore the second part, I read it incorrectly
17:50 DemiMarie: MrCooper: why does memory reclaim need to wait for fence completion?
17:54 lynxeye: DemiMarie: Since the GPU driver pins the memory used for a job freeing/moving those pages must wait for the fence signaling completion of that job.
17:54 DemiMarie: lynxeye: why is pinning necessary?
17:55 lynxeye: DemiMarie: because most GPUs and certainly the interactive rendering jobs don't like page faults
17:56 DemiMarie: lynxeye: why is that?
17:57 DemiMarie: I thought that anything that wanted hard realtime needed to use mlock().
17:58 DemiMarie: And mlock()’d memory is not eligible to be reclaimed.
18:00 lynxeye: DemiMarie: a GPU fundamentally is a device that is designed to get around the memory latency bandwidth product issue by keeping a huge number of memory requests in flight. Having even a small percentage of those requests fault on translation errors that need CPU interaction to resolve would stop the whole machinery.
18:02 DemiMarie: lynxeye: because the CPU would not be able to keep up with the rate at which the GPU accesses data?
18:02 DemiMarie: Or because the GPU cannot keep making requests after a fault?
18:05 lynxeye: DemiMarie: The GPU might be able to make a bit of progress after a fault, but essentially page faults make a hugely parallel machine wait for a much less parallel machine (your CPU) to resolve the faults. Each fault is just wasting resources on the GPU side, where potentially keep hundreds of shader cores idle, waiting for the fault to be resolved.
18:06 DemiMarie: lynxeye: I see. Why is demand paging of VRAM supported at all then?
18:08 DemiMarie: Also, was marking memory used by the GPU as indefinitely ineligible for reclaim considered? Compute workloads definitely do not guarantee completion in a bounded time.
18:09 lynxeye: cause obviously there are workloads that can't fit the whole workingset into vram
18:09 DemiMarie: Does compute need to use explicitly pinned buffers that are subject to RLIMIT_MEMLOCK and that the kernel does not attempt to reclaim?
18:09 DemiMarie: lynxeye: do these workloads have anything resembling decent performance?
18:12 lynxeye: For GPUs that don't have working page faults the GPU memory in the compute job effectively becomes mlocked for the duration of the job. Some of the big compute guys can and will do page faults, but the performance of the workload also really depends on faults being the exception rather than the norm.
18:34 stsquad: colinmarc the vktools vkcube-wayland just crashes weston (don't get much of a useful backtrace) - vkmark can't figure out the window system with either kms or wayland
18:35 stsquad: loader debug doesn't show anything really
22:58 DavidHeidelberg: zmike: that's good, I think only GL-CTS sort tests are failing bow
22:58 DavidHeidelberg: *now
23:00 DavidHeidelberg: if u had Matrix I would invite you I to our room regarding to X11+EGL transp
23:14 zmike: DavidHeidelberg: I ran those and they passed assuming you're talking about the ones mentioned in the MR
23:25 DavidHeidelberg: Oh.. OK, which GPU?
23:25 DavidHeidelberg: Maybe some GPUs offer some format which breaks it OR the back port isn't enough, but I reproduced the problem even on main branch I think
23:27 DavidHeidelberg: zmike: ^
23:41 zmike: DavidHeidelberg: tried multiple amd ones