IRC Logs of #dri-devel on irc.freenode.net for 2024-05-01

09:25 Company: Lynne: if you're interested, I'd value your input on https://gitlab.gnome.org/GNOME/gtk/-/issues/6673#note_2099143 - and if it's right, how to handle it (or YUV dmabuf configuration in general)
09:27 Company: I don't even know if chroma should be interpolated with linear or nearest
09:42 Lynne: linear, always
09:43 Company: so would that be a driver bug then?
09:44 Lynne: I'm not sure what happens, chroma should be untouched until it comes time to convert the image to rgb
09:45 Company: in any case: If it's always linear, we don't need to have a property to set the filter
12:54 robclark: sima, I'm wondering how the wq based drm sched doesn't cause fence signaling path issues.. ie. isn't it possible for the thread that work gets run on to be blocked in reclaim for some unrelated work?
15:45 sima: robclark, it's supposed to be multi-threaded enough
15:45 sima: but yeah the driver can block itself, or it's wq
15:48 robclark: sima: but, I think we want something with dedicated thread(s), not something that shares threads with other parts of the kernel
15:51 sima: robclark, wq maintainer consensus is a very strong "nope" because everyone creating their dedicated threads causes massive overhead
15:51 sima: so now the wq subsystem folds them all together and forks new ones as needed
15:51 sima: or at least that's the promise
15:52 robclark: right, but forking a new one requires allocating memory, so we can't rely on that
15:52 robclark: idk if we could use kthread_work somehow.. but current thing seems broken
15:52 sima: robclark, hm, not sure anyone thought about that one before, since it's rather common to do a lot of stuff with wq ..
15:53 sima: robclark, tdr was also a work item, so we had that issue already beforehand
15:53 sima: maybe there's some "this is needed for reclaim" flag for creating the wq?
15:54 robclark: idk, there is a highpriority flag, but idk if that will really help
15:54 sima: robclark, the much bigger issue is that I think still all drm/sched drivers allocate in their ->run callback, so that's much worse
15:54 robclark: hmm, I don't
15:54 sima: and I've kinda not gotten around to that since years ...
15:54 sima: well at least back when I checked last time around they were all splatting nicely
15:55 robclark: I'm not quite to the point of enabling drm/sched fence signaling annotations, but that should be the goal
15:55 sima: yeah
15:55 robclark: yeah, there were some indirect issues with runpm/interconnect/etc
15:55 robclark: some of my fixes for that have landed
15:55 sima: just kinda thinking that we should get to that first before pondering whether wq itself is good enough ...
15:55 sima: yeah I remember all the rpm discussion, good fun
15:55 robclark: most of them weren't _real_ issues, but just same lock tangled up in runtime vs startup allocations
15:56 robclark: whereas the wq issue is I think a legit deadlock that we can hit
15:56 sima: robclark, yeah ... otoh I'm hungry and it's time to prep some food :-)
15:56 sima: ttyl
15:57 robclark: (also, gabertron was seeing some jank issues traced back to the wq migration... which is kinda what got me wondering if using a wq is actually fundamentally sound)
16:00 sima: robclark, yeah maybe we need to ask really strongly for dedicated thread again
16:00 sima: like I think it can still be on-demand allocated, but it must not be freed as long as something is queued up
16:01 robclark: maybe we can ask for a thread that is lockdep annotated for reclaim
16:01 sima: otoh that's only good enough for classic end-of-batch dma_fence jobs and not the preempt variant and all the other guc stuff xe needs to push through
16:01 sima: robclark, there's enough for that already, if you add dma_fence annotations
16:01 robclark: ofc then we have to finish fixing all the fence signaling annotation issues, but it would be a good way of anyone using the thread to do allocations
16:01 sima: the issue is what happens for on-demand thread allocation ...
16:02 sima: since that's not covered by the current wq annotations, so we can't toss in our own annotations (or just do stuff like fs_reclaim annotations directly)
16:02 robclark: yeah, we'd have to have threads already existing.. but one or small # might be sufficient
16:05 sima: robclark, I think (but very unsure) that just one emergency thread per system, that does nothing else than process dma_fence work item, should be enough
16:07 robclark: I think so
16:07 robclark: if wq maintainer don't want to accept a drm specific WQ flag, we could pitch it as WQ_RECLAIM_DISALLOWED and have the annotation at the wq level
16:16 tursulin: robclark, sima: kthread_work is what I mentioned in the early days of drm sched wq rework
16:16 tursulin: memory is hazy but I thought it would be less latency regression and still be able to allocate one or many as per driver needs
16:18 tursulin: robclark, sima: also, if you can spare the time, https://lore.kernel.org/dri-devel/20240403182951.724488-1-adrian.larumbe@collabora.com/ could use a 2nd+ opinion on binary vs text for a list of drm clients in sysfs
16:25 robclark: tursulin: if we can make kthread_work work, I think that would be the best option
16:26 robclark: tursulin: re: fdinfo sysfs, I think the best excuse I can think of for wanting text over binary is scripting
16:26 sima: tursulin, uh I think that needs an ack from gregkh or we really can't just land it
16:27 sima: the entire idea I mean, irrespective of the binary vs ascii part
16:30 robclark: hmm, gregkh wasn't cc'd... that should probably be corrected when next version is sent
16:30 robclark: but we defn need something like this
16:31 sima: robclark, yeah but maybe in procfs instead?
16:31 sima: like at least to me sysfs feels very wrong
16:33 robclark: hmm, perhaps.. it is kinda proc related
16:36 tursulin: or, wait for it, drmfs of some sort
16:37 robclark: idk about drmfs... sounds a bit like encouraging drivers to create too much uabi ;-)
16:38 robclark: mbrost, tursulin: so I'm wondering, would it work for wq alternative to just have driver pass in a kthread_worker? Then drivers could choose one kthread_worker per scheduler (to match old behavior) or a fixed #? Like I guess guc has a fixed # of priority levels, one kthread_worker per priority level seems like it should work?
16:39 daniels: alarumbe: ^
16:40 tursulin: robclark: haha maybe on drmfs.. could be handy for per client stuff like debug streams, profiling, who knows
16:41 tursulin: kthread_work I think should be possible as almost a drop in replacement. Long time since those disuccsions so maybe I am forgetting something but don't think so
16:43 alarumbe: hi, I was preparing a v3 of the drm clients patch series, but this time around I'll be using a specific ioctl
16:43 robclark: hmm, I think I like the procfs (or sysfs) better than ioctl
16:44 alarumbe: I realised sysfs is not really the way to go, because the number of PIDs could potentially extend over the size of a single page, and binary attributes don't really work either, because they're meant for binary date that doesn't change between read() calls
16:45 alarumbe: I thought of using the private field in the file pointer to store all the PIDs when calling show() with offset=0, but then we wouldn't know when to free it because we've no control over the release() method for the file
16:46 alarumbe: I've a working implementation using an ioctl that somehow feeels more natural than all the ugly hacks I thought of on top of the sysfs attribute interface
16:47 mbrost: robclark: I missed the beginning of this conversation, what is the issue with a wq?
16:48 alarumbe: I'd rather still keep the ioctl output binary because there's already a debugfs file that gives the list of clients, and I thought this new thing was meant to service the needs of UM profilers
16:54 robclark: mbrost: wq shares thread w/ other drivers, so no guarantee that thread isn't blocked on reclaim
16:55 mbrost: even a dedicated ordered work queue?
16:55 mbrost: e.g. one from alloc_ordered_workqueue
16:55 robclark: I believe so.. but I haven't dug completely thru the wq code
16:56 mbrost: that doesn't seem right but if it is, then yea that is a problem
16:57 robclark: alarumbe: tbh I'd be fine if we define the procfs as just listing the first N clients.. at some point a viz tool will have a hard time showing an unbounded # of processes
16:58 alarumbe: I thought about that, 'why would anyone need to list more than 1024 DRM clients simultaneously on an ncurses utility'
16:58 mbrost: did this pop up in 6.9? We had some issues hitting deadlocks in Xe due to our workqueue usage / work queue changes is 6.9
16:58 alarumbe: but then after trying to cram as many pids as possible into a single page, I thought maybe DRM ioctl would do a better job
16:59 mbrost: the changes in 6.9 for wq I'm referring too: https://www.phoronix.com/news/Linux-6.9-Workqueue
17:05 mbrost: we abused the system_wq (or system_unbound_wq) in Xe and deadlocked ourselves, fixed it by using an ordered work queue
17:05 mbrost: let me try to find a link to that thread
17:08 mbrost: https://lore.kernel.org/intel-xe/20240328182147.4169656-1-matthew.brost@intel.com/
17:10 mbrost: If you have wq questions, maybe ask Tejun Heo. He was very responses when we pinged him.
17:14 sima: mbrost, robclark yeah I think dedicated ordered wq with maybe a flag really should work
17:14 sima: or wq code needs to be fixed
17:16 mbrost: the system work queues are definitely shared, that is how we deadlocked ourselves in Xe
17:53 robclark: mbrost: I am not sure we've actually seen a deadlock, but gabertron was mentioning janks, I _think_ with v6.9
20:56 sima: mbrost, yeah I think even with all the reworks system_wq still doesn't guarantee any isolation against other stuff that's scheduled there