IRC Logs of #dri-devel on irc.freenode.net for 2023-05-02

07:49 airlied: Venemo: hey you might know, do mesh shaders outputs get fixed function clipped like any outputs from the vert stages?
08:20 javierm: danvet: I wrote on Friday the patch you suggested for mutter but have some issues with the compat layer, the damaged clips are added to the plane's atomic state before committing
08:21 javierm: danvet: I've added my findings in the MR https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2979, it would be great if you could take a look when you have some time
08:21 javierm: tzimmermann: probably you can help here too ^
08:42 bbrezillon: danvet: With drm_sched_resubmit_jobs() being deprecated, I wonder who's supposed to set the job's parent field back to something non-NULL before drm_sched_start() is called (those are set to NULL in drm_sched_stop(), when the fence callback is removed). Are drivers supposed to manually iterate over the pending_list to set it up?
08:48 dj-death: any radv folk willing to glance over the NIR/Spirv bits of KHR_ray_tracing_position_fetch? : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22734
09:24 javierm: emersion: ah, Zack did post a v2 that I missed before https://lore.kernel.org/all/50fd57193508f33a1e559ef74599c9e52764850d.camel@vmware.com/T/
09:30 mareko: airlied: yes
09:31 mareko: airlied: mesh shaders are just like vertex shaders, but are launched as compute shaders that must produce "VS outputs"
09:36 emersion: javierm: ah, so what's missing?
09:38 javierm: emersion: only a single thing that danvet mentioned, I've just asked in the mailing list to Zack if plans to post a v3 or we could help with that
09:38 javierm: emersion: https://lore.kernel.org/all/YvPfedG%2FuLQNFG7e@phenom.ffwll.local/
09:39 javierm: basically not using a DRIVER_VIRTUAL driver cap and instead add a new plane type
09:42 javierm: a DRM_PLANE_TYPE_VIRTUAL_CURSOR or something like that. Since other than that the cursor, the virtual machine KMS drivers behave by the uAPI contract
09:46 javierm: emersion: but I'm very happy that this is almost ready to land, I first thought that would need to implement the whole thing to get virtio-gpu out of the mutter atomic deny list
09:50 emersion: hm, not sure a new plane type makes sense
09:50 emersion: you'd also need the not-fun part: igt
10:12 danvet: javierm, don't use dirtyfb and page_flip together
10:12 danvet: dirtyfb is for frontbuffer rendering, no page flip
10:12 danvet: and hence the fb check to make sure that you actually report damage for the current fb
10:12 danvet: page_flip is for a new buffer and has an implicit damage of everything
10:13 danvet: if you want both page flip and damage, you must use atomic
10:13 danvet: javierm, I don't have an account there, please add this to the mr ^^^
10:14 danvet: bbrezillon, I'm lost, or well not on top of sched discussions enough to have any idea of what you should do :-/
10:14 danvet: emersion, btw on your hdr hackfest summary, unstable uabi isn't a thing in upstream
10:15 emersion: hm
10:15 danvet: all the "hide it behind knobs" we've done was for uapi we planned to keep forever, but weren't sure we have all the pieces and the kernel+userspace stack was bug-free enough to unleash it on users
10:15 danvet: but uapi you plan to actually break/remove is a completely different kind of thing
10:16 danvet: unless you are very careful with not leaking users and make sure all users can fall back to something you wont get regression reports for
10:16 danvet: it'll be a regression and the experimental uabi becomes rather permanent
10:16 emersion: i'm personally in the "let's not merge vendor specific stuff" camp FWIW
10:17 emersion: but this is something we wanted to discuss on-list
10:17 emersion: maybe it's no big deal if it's stable
10:17 danvet: I'm no fan of vendor kms props either
10:18 emersion: harry seemed okay with maintaining this uAPI forever for this hw
10:18 danvet: your writeup at least sounded like that's the compromise because you can remove it again
10:18 danvet: and there's solid chances that's not how it'll play out
10:18 emersion: yeah… that's how we wanted to compromise
10:18 danvet: yeah if it's for some specific gpu just to get things going it might be ok
10:18 danvet: yeah that's not really how it works :-)
10:19 emersion: okay, good to know
10:19 emersion: yup, it would just be for AMD DCN 3
10:19 emersion: but then nothign stops somebody else from wanting to do the same thing
10:19 emersion: or expand to more hw
10:19 emersion: then things get into a meh state
10:20 danvet: yeah I think we need really solid reasons that we really can't get off a good per-plane color mgmt api without going vendor specific at first
10:20 emersion: anyways, this is all mostly to deal with the generic uAPI being stuck
10:20 emersion: i really hope we can un-stuck it now
10:21 emersion: a lot of frustration seems to come from the cost of adding new uAPI
10:21 emersion: (from me first tbh)
10:21 emersion: we've also discussed mandatory vkms impl for new uAPI BTW
10:22 emersion: (which is at odds with the above)
10:22 emersion: (vkms not caught up enough for this to be practical just yet anyways)
10:23 Venemo: airlied: yes, mesh shader output rasterization works exactly the same as any other
10:24 javierm: danvet: ah, I see. So then was my misunderstanding how the legacy KMS API should be used. Thanks for the clarification, I'll just drop that MR then and instead focus on using atomic for virtio-gpu
10:25 danvet: emersion, yeah I think vkms for blending uapi would be nice, so that you can validate the igts against something everyone can use
10:25 emersion: javierm: nice :)
10:27 javierm: emersion, danvet: it was meant to be a temporary workaround but I understand now that legacy KMS + damage clipping + page flip isn't a supported combination
10:28 javierm: emersion, danvet: about uAPI, I remember than in media/v4l2 at some point new features were merged but just the uAPI not exposed
10:29 danvet: yeah that'd be an option too
10:29 javierm: that seems to be a good compromise to at least experiment and try to find patterns between the different drivers to define a generic uAPI
10:30 emersion: uAPI not exposed?
10:30 emersion: how does that work?
10:31 javierm: emersion: people will need to run a patched kernel but it's better than keeping all the patches out-of-tree for a long time
10:31 emersion: ah, so just a patch to #define ENABLE_WHATEVER basically?
10:31 emersion: that's a bit weird
10:32 javierm: emersion: it is, yeah. But better than getting stuck with a uAPI that was defined too early
10:32 emersion: yea
10:33 ccr: what is needed is a time machine. travel to future to acquire the perfect uAPI and see what mistakes were made, etc.
10:38 bbrezillon: danvet: any idea would I should ask?
10:39 danvet: könig? or whoever marked that function obsolete
10:40 bbrezillon: also tried hooking up native fence support, as you suggested last time, and it's not clear to me when the core can check the FW seqno against the drm_sched_job->parent->seqno
10:41 bbrezillon: danvet: yep, it's könig, but he doesn't seem to be on IRC
10:41 danvet: yeah könig's not on irc
10:41 bbrezillon: oh well, I'll write an email
11:24 karolherbst: jenatali: if you have some time, mind reviewing the clc patch in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22776 ?
11:29 gawin: karolherbst: that snippet for nv30 where should be put? have you sent patch for next stable release?
11:29 karolherbst: I have no idea yet
11:30 karolherbst: would have to figure something out what would be the best place for it
11:30 karolherbst: I've asked Ben tho
14:32 swick[m]: emersion: you mean the adaptive sync CTS?
14:32 emersion: yea
14:33 emersion: there are more CTSes but haven't looked at the others
14:33 swick[m]: not being able to read DisplayID from the dedicated bus doesn't seem to be a problem... I've not seen a single monitor exposing data there
14:33 emersion: do you have a command to read everything from user-space via I2C?
14:35 emersion: i tried to come up with one but pretty sure i got it wrong
14:37 swick[m]: ugh, on a custom kernel without i2c-dev rn..
14:38 swick[m]: I used i2cdetect and i2cget FWIW
14:39 emersion: yea that's what i (tried to) use as well
14:42 swick[m]: and EDDC specifies the addresses
14:52 emersion: i'm trying `i2cget 3 0xA0 0x00 b 32` and it's not working
14:52 emersion: "SET Segment 0, device A0h or A4h, start address 00h, READ 128 bytes"
15:11 bbrezillon: mbrost__: Do you have a branch with the latest drm_sched-kthread-to-wq patches, or should I assume https://gitlab.freedesktop.org/drm/xe/kernel/-/commit/18b968ab6a98eb1801a385d6a6a82967f3ebb322 is the latest version?
15:12 mbrost__: This my latest version of Xe + DRM scheduler changes: https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-upstreaming/-/tree/upstream_prep?ref_type=heads
15:14 mbrost__: I want to post another, non-RFC, rev of this soon too https://patchwork.freedesktop.org/series/116055/
15:15 mbrost__: The branch I shared should be more or less Xe based on that RFC + feedback
15:16 mbrost__: oh, I might have mis-understood your question... multitasking...
15:18 mbrost__: the version you posted is close, in the branch I shared I allow the user to pass in the run_wq
15:19 mbrost__: apply this on top... https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-upstreaming/-/commit/d55965795198583f222f2e191cd3edfbdfa64d6a
15:47 swick[m]: emersion: what do you mean with not working? most displays just don't have anything there...
15:47 emersion: i mean that the parameters i give to i2cget are invalid
15:47 emersion: it complains about them being out of range
15:49 swick[m]: there was a gotcha wrt the address but I already forgot
15:57 swick[m]: emersion: ah yes, the address i2c tools wants is the one without the read/write bit, so A0h right shifted by one
15:59 swick[m]: we should probably add a script to libdisplay-info to read the EDID and DisplayID blobs directly from i2c
15:59 emersion: that would be handy
16:00 emersion: hm still getting the error
16:00 emersion: Error: Chip address out of range (0x08-0x77)!
16:01 emersion: maybe i'm not understanding what chip vs. data address is?
16:02 swick[m]: i2cget 3 0x50 0x00 does not work?
16:03 emersion: bleh sorry, i used 0x80 by mistake
16:04 emersion: i get the expected "read failed"
16:04 emersion: hm, but that's the same address for EDID though?
16:05 emersion: the spec says, for EDID:
16:05 emersion: "SET Segment 0, device A0h, start address 00h, READ 128 bytes"
16:05 emersion: and for DisplayID:
16:05 emersion: "SET Segment 0, device A0h or A4h, start address 00h, READ 128 bytes"
16:05 emersion: now i'm confused
16:05 bbrezillon: mbrost__: thanks, I think I had a version with the custom run_wq already
16:05 emersion: i'd like to check that i can grab the EDID at least
16:05 mbrost__: cool, that should be the latest
16:06 swick[m]: emersion: A0h can contain either EDID or DisplayID, A4h can only contain DisplayID afaiu
16:06 emersion: i see
16:06 swick[m]: but A0h should be readable
16:06 mbrost__: I'll probably get all drm sched changes out today on the list
16:06 emersion: hm
16:11 bbrezillon: mbrost__: drm_sched_main() still has a loop to dequeue jobs until there's a reason to stop dequeuing (dep no signaled, or all job slots filled). I was wondering if it wouldn't be better to dequeue one job at a time (or at least limit the number of jobs you dequeue) so that you don't block other works scheduled on the same run_wq.
16:12 bbrezillon: That's particularly important if we want to use the same ordered (single-threaded) workqueue for both the job dequeueing and tdr, to guarantee that nothing tries to queue stuff to the HW while we're resetting it
16:14 bbrezillon: mbrost__: which is basically what ckönig suggested here https://lore.kernel.org/dri-devel/5c4f4e89-6126-7701-2023-2628db1b7caa@amd.com/
16:33 jenatali: Down to 41 CTS fails :)
17:04 mbrost: bbrezillon, that is kinda tricky
17:05 mbrost: give me a few and let see what it looks like
17:08 mbrost: but let's say you use the system_wq, you can as many cores on the machine running in parallel as long as each item is its work_struct
17:08 mbrost: the system_wq is just a wrapper for a pool of kthreads I think
17:10 mbrost: so IMO I don't see each work_struct running in a loop while it has work as that big of a deal
17:11 mbrost: Another option to would be pass in unique run_wq to each scheduler and now you timeslicing
17:11 mbrost: if you have more work_struct than cores...
17:12 mbrost: or pass in a few ordered wq to many schedulers
17:14 mbrost: but either way let prototype 1 dequeue per pass of the main loop if think that would be better, I'm not going argue this one either way
17:17 bbrezillon: mbrost: so, multithreaded workqueue improves the situation, yes, but ckönig was actually suggesting using a single-threaded workqueue, so we don't have to call drm_sched_stop/start() to stop/start the drm_sched while we're resetting the GPU
17:19 mbrost: well you can pass in any run wq you want but confused how you could away from not calling start / stop
17:19 mbrost: the existing code calls start / stop on the kthread
17:20 mbrost: we call start / stop in our reset flows and they rock solid, esp when compared to the i915
17:21 mbrost: do you mean the same ordered wq as the reset one / tdr?
17:22 bbrezillon: well, if you have a single-threaded wq for your drm_sched workers, and you queue your reset worker to this queue, you're guaranteed that the reset function will only be executed when all drm_sched workers are idle
17:22 bbrezillon: yes, that's what ckönig suggested, if I'm correct
17:23 mbrost: yea that is true
17:24 mbrost: nothing preventing you from doing that now but yea in that case i guess dequeuing 1 item would make a bit more sense
17:25 mbrost: i don't think I want to design Xe to share a WQ though, calling start / stop is easy enough
17:25 bbrezillon: mbrost: ok. I guess we need a replacement for drm_sched_job_resubmit() then...
17:26 bbrezillon: didn't just what you were using in the Xe driver
17:26 bbrezillon: *didn't check
17:26 mbrost: and like I said our reset flows are rock solid, I've written tests to hammer these paths as I was super paranoid about this not working after working on the i915
17:26 mbrost: we just kill the entity scheduler if it hangs
17:27 bbrezillon: drm_sched_stop()
17:27 bbrezillon: I guess
17:27 mbrost: no recovery, I think this faith's idea for all VM bind drivers i think
17:28 mbrost: We have a buy in from our UMDs
17:28 mbrost: 1 sec, i'll point you to our TDR callback
17:29 bbrezillon: you mean remaining jobs are just cancelled?
17:29 mbrost: https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_guc_submit.c#L786
17:29 mbrost: yea and entity is banned
17:31 mbrost: that function should be pretty well commented
17:32 bbrezillon: hm, okay. So that's for the 'one entity is going crazy, but the GPU and FW-proc are still functional' case
17:33 mbrost: yes
17:33 bbrezillon: we do have global hangs where we need to stop all entities, reset the GPU and FW, and start all entities (and by start, I mean kick the previously active entities at the FW level, and call drm_sched_start())
17:34 mbrost: We also have GT resets (entire GPU is trashed) but in practice that should be impossible or very, very rare
17:34 mbrost: in that case, we try to find the bad entity, ban it, resubmit the others
17:35 bbrezillon: that's where synchronization gets messy, because the reset operation, which is always queued on a workqueue, has to wait for all drm_sched workers to be idle
17:36 mbrost: A GT reset basically should only be triggered by junk hardware or if our KMD has a bug
17:36 mbrost: yea, let me point you to our GT reset code...
17:37 mbrost: https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_guc_submit.c#L1362
17:37 mbrost: https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_guc_submit.c#L1401
17:38 bbrezillon: mbrost: would you mind replying to Christian regarding the drm_sched_{stop,start}()? I feel like your opinions are diverging here, and more and more driver keep depending on your work ;-)
17:38 mbrost: yea we do have to loop over every entity and stop it
17:38 bbrezillon: yep, that's what we were planning to do initially
17:38 mbrost: there is nothing stopping from the current interfaces for doing it either way
17:39 mbrost: s/for/from
17:39 bbrezillon: it's just that, with drm_sched_resubmit_jobs() being deprecated, I was a bit lost as to what drivers were supposed to do to reassign the drm_sched_job::parent fence
17:40 mbrost: that is news to me it being deprecated
17:41 mbrost: the seems like a unilateral decision, drivers should be allowed to do this either way
17:42 bbrezillon: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3&id=5efbe6aa7a0e
17:43 mbrost: is this merged?
17:43 bbrezillon: it's in Linus' tree :)
17:43 mbrost: ugh... I need to pay better attention to the list
17:44 bbrezillon: was merged in 6.3-r1 apparently
17:45 mbrost: Xe doesn't need most of the nonsense in that function anyways
17:45 mbrost: really all we need is loop over pending jobs and call run_job
17:47 bbrezillon: well, the only thing we'd need in the powervr driver is a loop re-assigning the parent fence, we don't even need to call run_job() (the ringbuf should already be filled and it content preserved, unless I'm mistaken)
17:47 mbrost: ok maybe we just open code the resubmit
17:48 bbrezillon: just felt odd to iterate over the pending_list manually
17:48 mbrost: in Xe the parent fence would be the same anyways
17:48 bbrezillon: but if that's how it's supposed to be done, I'm fine with that
17:48 bbrezillon: yep, same here, we don't re-allocate the fence
17:49 bbrezillon: it's the same object, same seqno
17:49 mbrost: and ref count is still correct too
17:50 bbrezillon: in any case, we should document what drivers are expected to do between the stop and start calls, because it's unclear right now
17:51 bbrezillon: like, the bare minimum is to re-assign parent fences, otherwise pending jobs are dropped when drm_sched_start() is called
17:51 mbrost: again I don't think we should force a driver to do anything
17:53 mbrost: I'm confused by that last statement
17:53 bbrezillon: well, it's not about forcing them to do anything, but if you call drm_sched_start() after having kicked the queue, your driver is likely to end up with use-after-free bugs
17:53 mbrost: in Xe you can run_job as many times as you want you always get the same fence back
17:53 bbrezillon: at least that's what happens if you keep in-flight jobs in some internal list/queue
17:54 mbrost: which is assigned to the parent
17:55 mbrost: it is fence which looks at a memory location for a value to be greater or equal to the jobs seqn
17:55 bbrezillon: oh, you're taking about this statement https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler/sched_main.c#L562
17:55 bbrezillon: yep, same here: u32 mem object containing the seqno
17:57 bbrezillon: and the dma_fence is just a wrapper around that, with dma_fence::seqno being checked against the FW fence object seqno
17:57 bbrezillon: run_job() does queue things to the ringbuf though
17:57 mbrost: i think I understand what you are getting at
17:58 bbrezillon: but we can easily add a "submitted" boolean to avoid resubmitting it
17:59 bbrezillon: and even if we were actually resubmitting, that's not a big deal, as long as we reset the ringbuf pointer before kicking the queue
17:59 mbrost: we stop the signaling of fences returned from run_job in our reset flows
17:59 mbrost: xe_hw_fence_irq_stop
17:59 mbrost: xe_hw_fence_irq_start
18:03 mbrost: resets are confusing
18:04 mbrost: if it would help I can post a kernel doc patch explaining how Xe does these and avoids races
18:04 mbrost: it might already done this at one point...
18:07 bbrezillon: dunno, I really thought everyone was on-board with this drm_sched_resubmit_jobs() deprecation
18:08 bbrezillon: looks like it's not really the case...
18:13 mbrost: If you do a GT reset (think of it as a power cycle) and you don't want lose all pending work, run_job needs to be called again
18:13 mbrost: on all pending jobs
18:14 mbrost: get rid of drm_sched_resubmit_jobs() fine, Xe will just have a loop where it calls run job
18:16 bbrezillon: then what's the point of deprecating it, if people keep open-coding it...
18:16 mbrost: yea that what i'm confused about
18:17 bbrezillon: I mean, I can code it differently, and optimize the resubmit logic, but why should I bother, it's not a hot-path anyway
18:17 mbrost: how would a pending job complete after a power cycle unless you stick it back on the hardware
18:21 bbrezillon: tried reading amdgpu code, and I don't quite get how they do it...
18:31 mbrost: i might have lost a few comments
18:34 mbrost: 'implement
18:34 mbrost: + * recovery after a job timeout.' - yea that isn't how Xe is using it
18:34 mbrost: Xe uses after the entire device is reset
21:05 mbrost: bbrezillon: drm sched changes we discussed: https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-upstreaming/-/commit/2cdb17a975e12e9206b5a9c433581699c67c619f
21:05 anholt_: Have you wished that piglit didn't take so long? I have an MR for you! https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/804
21:05 mbrost: Seems to work just fine, it would be fun with we had some benchmarks and see if this had a difference
21:08 jenatali: anholt_: <3
21:09 zmike: anholt++
21:12 anholt_: I swear we need to have static int called = 0; assert(called++ < 1000) in piglit_probe_pixel.
21:14 jenatali: Sounds like a good idea to me
21:37 DemiMarie: danvet emersion: What about hiding the uAPI behind BROKEN?
21:37 DemiMarie: I assume no distro will ship a CONFIG_BROKEN=y kernel, and if they do (or patch out the dependency) they get to keep both pieces.
21:40 emersion: DemiMarie: what is CONFIG_BROKEN?
21:49 DemiMarie: emersion: It is a catchall, never-enabled Kconfig entry used to disable broken code (hence the name). If there is a Kconfig that should never actually be set, one can use `depends on BROKEN` to make sure it is in fact never set.
21:51 emersion: i see
23:21 airlied: zmike: you any clues on the lvp memory model fails that CI is seeing?
23:33 penguin42: TToTD: If you hit a PROFILING_INFO_NOT_AVAILABLE it might be because you missed a wait() on an event - it took me a while to figure that out
23:33 penguin42: especially since ROCm didn't complain about it
23:34 zmike: airlied: I had a ticket where I was tracking all the known fails
23:34 zmike: if it's not in there I have no idea
23:38 airlied: ah there was a flake lodged by DavidHeidelberg[m]
23:41 DavidHeidelberg[m]: ?
23:44 airlied: you logged a lavapipe flake a while back, seems to be more prevalent now, will have to make the effort to track it down I suppose
23:44 airlied: zmike: just saw 20520 fell down the crack
23:46 DavidHeidelberg[m]: right
23:46 jenatali: Anybody know who I should ping to get opinions on !22800?
23:47 airlied: jenatali: whoever wrote your qsort :-P
23:48 jenatali: Heh, blame the C standard authors for not requiring it to be stable :P
23:50 zmike: airlied: oh I forgot about that
23:50 zmike: I'm not actually sure it's correct now that I look again
23:52 airlied: Venemo, mareko : do mesh shaders interact with transform feedback?
23:53 airlied: ah d3d12 doesn't seem to support it at least
23:53 jenatali: Right
23:56 jenatali: Huh, actually having working caches seems to have taken ~20 minutes off my CTS run time. I'll take it
23:56 jenatali: Maybe it means I can bump the CI factor from 4 to 3...
23:59 zmike: airlied: no, there is no xfb with mesh