07:49airlied: Venemo: hey you might know, do mesh shaders outputs get fixed function clipped like any outputs from the vert stages?
08:20javierm: danvet: I wrote on Friday the patch you suggested for mutter but have some issues with the compat layer, the damaged clips are added to the plane's atomic state before committing
08:21javierm: danvet: I've added my findings in the MR https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2979, it would be great if you could take a look when you have some time
08:21javierm: tzimmermann: probably you can help here too ^
08:42bbrezillon: danvet: With drm_sched_resubmit_jobs() being deprecated, I wonder who's supposed to set the job's parent field back to something non-NULL before drm_sched_start() is called (those are set to NULL in drm_sched_stop(), when the fence callback is removed). Are drivers supposed to manually iterate over the pending_list to set it up?
08:48dj-death: any radv folk willing to glance over the NIR/Spirv bits of KHR_ray_tracing_position_fetch? : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22734
09:24javierm: emersion: ah, Zack did post a v2 that I missed before https://lore.kernel.org/all/50fd57193508f33a1e559ef74599c9e52764850d.camel@vmware.com/T/
09:30mareko: airlied: yes
09:31mareko: airlied: mesh shaders are just like vertex shaders, but are launched as compute shaders that must produce "VS outputs"
09:36emersion: javierm: ah, so what's missing?
09:38javierm: emersion: only a single thing that danvet mentioned, I've just asked in the mailing list to Zack if plans to post a v3 or we could help with that
09:38javierm: emersion: https://lore.kernel.org/all/YvPfedG%2FuLQNFG7e@phenom.ffwll.local/
09:39javierm: basically not using a DRIVER_VIRTUAL driver cap and instead add a new plane type
09:42javierm: a DRM_PLANE_TYPE_VIRTUAL_CURSOR or something like that. Since other than that the cursor, the virtual machine KMS drivers behave by the uAPI contract
09:46javierm: emersion: but I'm very happy that this is almost ready to land, I first thought that would need to implement the whole thing to get virtio-gpu out of the mutter atomic deny list
09:50emersion: hm, not sure a new plane type makes sense
09:50emersion: you'd also need the not-fun part: igt
10:12danvet: javierm, don't use dirtyfb and page_flip together
10:12danvet: dirtyfb is for frontbuffer rendering, no page flip
10:12danvet: and hence the fb check to make sure that you actually report damage for the current fb
10:12danvet: page_flip is for a new buffer and has an implicit damage of everything
10:13danvet: if you want both page flip and damage, you must use atomic
10:13danvet: javierm, I don't have an account there, please add this to the mr ^^^
10:14danvet: bbrezillon, I'm lost, or well not on top of sched discussions enough to have any idea of what you should do :-/
10:14danvet: emersion, btw on your hdr hackfest summary, unstable uabi isn't a thing in upstream
10:15emersion: hm
10:15danvet: all the "hide it behind knobs" we've done was for uapi we planned to keep forever, but weren't sure we have all the pieces and the kernel+userspace stack was bug-free enough to unleash it on users
10:15danvet: but uapi you plan to actually break/remove is a completely different kind of thing
10:16danvet: unless you are very careful with not leaking users and make sure all users can fall back to something you wont get regression reports for
10:16danvet: it'll be a regression and the experimental uabi becomes rather permanent
10:16emersion: i'm personally in the "let's not merge vendor specific stuff" camp FWIW
10:17emersion: but this is something we wanted to discuss on-list
10:17emersion: maybe it's no big deal if it's stable
10:17danvet: I'm no fan of vendor kms props either
10:18emersion: harry seemed okay with maintaining this uAPI forever for this hw
10:18danvet: your writeup at least sounded like that's the compromise because you can remove it again
10:18danvet: and there's solid chances that's not how it'll play out
10:18emersion: yeah… that's how we wanted to compromise
10:18danvet: yeah if it's for some specific gpu just to get things going it might be ok
10:18danvet: yeah that's not really how it works :-)
10:19emersion: okay, good to know
10:19emersion: yup, it would just be for AMD DCN 3
10:19emersion: but then nothign stops somebody else from wanting to do the same thing
10:19emersion: or expand to more hw
10:19emersion: then things get into a meh state
10:20danvet: yeah I think we need really solid reasons that we really can't get off a good per-plane color mgmt api without going vendor specific at first
10:20emersion: anyways, this is all mostly to deal with the generic uAPI being stuck
10:20emersion: i really hope we can un-stuck it now
10:21emersion: a lot of frustration seems to come from the cost of adding new uAPI
10:21emersion: (from me first tbh)
10:21emersion: we've also discussed mandatory vkms impl for new uAPI BTW
10:22emersion: (which is at odds with the above)
10:22emersion: (vkms not caught up enough for this to be practical just yet anyways)
10:23Venemo: airlied: yes, mesh shader output rasterization works exactly the same as any other
10:24javierm: danvet: ah, I see. So then was my misunderstanding how the legacy KMS API should be used. Thanks for the clarification, I'll just drop that MR then and instead focus on using atomic for virtio-gpu
10:25danvet: emersion, yeah I think vkms for blending uapi would be nice, so that you can validate the igts against something everyone can use
10:25emersion: javierm: nice :)
10:27javierm: emersion, danvet: it was meant to be a temporary workaround but I understand now that legacy KMS + damage clipping + page flip isn't a supported combination
10:28javierm: emersion, danvet: about uAPI, I remember than in media/v4l2 at some point new features were merged but just the uAPI not exposed
10:29danvet: yeah that'd be an option too
10:29javierm: that seems to be a good compromise to at least experiment and try to find patterns between the different drivers to define a generic uAPI
10:30emersion: uAPI not exposed?
10:30emersion: how does that work?
10:31javierm: emersion: people will need to run a patched kernel but it's better than keeping all the patches out-of-tree for a long time
10:31emersion: ah, so just a patch to #define ENABLE_WHATEVER basically?
10:31emersion: that's a bit weird
10:32javierm: emersion: it is, yeah. But better than getting stuck with a uAPI that was defined too early
10:32emersion: yea
10:33ccr: what is needed is a time machine. travel to future to acquire the perfect uAPI and see what mistakes were made, etc.
10:38bbrezillon: danvet: any idea would I should ask?
10:39danvet: könig? or whoever marked that function obsolete
10:40bbrezillon: also tried hooking up native fence support, as you suggested last time, and it's not clear to me when the core can check the FW seqno against the drm_sched_job->parent->seqno
10:41bbrezillon: danvet: yep, it's könig, but he doesn't seem to be on IRC
10:41danvet: yeah könig's not on irc
10:41bbrezillon: oh well, I'll write an email
11:24karolherbst: jenatali: if you have some time, mind reviewing the clc patch in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22776 ?
11:29gawin: karolherbst: that snippet for nv30 where should be put? have you sent patch for next stable release?
11:29karolherbst: I have no idea yet
11:30karolherbst: would have to figure something out what would be the best place for it
11:30karolherbst: I've asked Ben tho
14:32swick[m]: emersion: you mean the adaptive sync CTS?
14:32emersion: yea
14:33emersion: there are more CTSes but haven't looked at the others
14:33swick[m]: not being able to read DisplayID from the dedicated bus doesn't seem to be a problem... I've not seen a single monitor exposing data there
14:33emersion: do you have a command to read everything from user-space via I2C?
14:35emersion: i tried to come up with one but pretty sure i got it wrong
14:37swick[m]: ugh, on a custom kernel without i2c-dev rn..
14:38swick[m]: I used i2cdetect and i2cget FWIW
14:39emersion: yea that's what i (tried to) use as well
14:42swick[m]: and EDDC specifies the addresses
14:52emersion: i'm trying `i2cget 3 0xA0 0x00 b 32` and it's not working
14:52emersion: "SET Segment 0, device A0h or A4h, start address 00h, READ 128 bytes"
15:11bbrezillon: mbrost__: Do you have a branch with the latest drm_sched-kthread-to-wq patches, or should I assume https://gitlab.freedesktop.org/drm/xe/kernel/-/commit/18b968ab6a98eb1801a385d6a6a82967f3ebb322 is the latest version?
15:12mbrost__: This my latest version of Xe + DRM scheduler changes: https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-upstreaming/-/tree/upstream_prep?ref_type=heads
15:14mbrost__: I want to post another, non-RFC, rev of this soon too https://patchwork.freedesktop.org/series/116055/
15:15mbrost__: The branch I shared should be more or less Xe based on that RFC + feedback
15:16mbrost__: oh, I might have mis-understood your question... multitasking...
15:18mbrost__: the version you posted is close, in the branch I shared I allow the user to pass in the run_wq
15:19mbrost__: apply this on top... https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-upstreaming/-/commit/d55965795198583f222f2e191cd3edfbdfa64d6a
15:47swick[m]: emersion: what do you mean with not working? most displays just don't have anything there...
15:47emersion: i mean that the parameters i give to i2cget are invalid
15:47emersion: it complains about them being out of range
15:49swick[m]: there was a gotcha wrt the address but I already forgot
15:57swick[m]: emersion: ah yes, the address i2c tools wants is the one without the read/write bit, so A0h right shifted by one
15:59swick[m]: we should probably add a script to libdisplay-info to read the EDID and DisplayID blobs directly from i2c
15:59emersion: that would be handy
16:00emersion: hm still getting the error
16:00emersion: Error: Chip address out of range (0x08-0x77)!
16:01emersion: maybe i'm not understanding what chip vs. data address is?
16:02swick[m]: i2cget 3 0x50 0x00 does not work?
16:03emersion: bleh sorry, i used 0x80 by mistake
16:04emersion: i get the expected "read failed"
16:04emersion: hm, but that's the same address for EDID though?
16:05emersion: the spec says, for EDID:
16:05emersion: "SET Segment 0, device A0h, start address 00h, READ 128 bytes"
16:05emersion: and for DisplayID:
16:05emersion: "SET Segment 0, device A0h or A4h, start address 00h, READ 128 bytes"
16:05emersion: now i'm confused
16:05bbrezillon: mbrost__: thanks, I think I had a version with the custom run_wq already
16:05emersion: i'd like to check that i can grab the EDID at least
16:05mbrost__: cool, that should be the latest
16:06swick[m]: emersion: A0h can contain either EDID or DisplayID, A4h can only contain DisplayID afaiu
16:06emersion: i see
16:06swick[m]: but A0h should be readable
16:06mbrost__: I'll probably get all drm sched changes out today on the list
16:06emersion: hm
16:11bbrezillon: mbrost__: drm_sched_main() still has a loop to dequeue jobs until there's a reason to stop dequeuing (dep no signaled, or all job slots filled). I was wondering if it wouldn't be better to dequeue one job at a time (or at least limit the number of jobs you dequeue) so that you don't block other works scheduled on the same run_wq.
16:12bbrezillon: That's particularly important if we want to use the same ordered (single-threaded) workqueue for both the job dequeueing and tdr, to guarantee that nothing tries to queue stuff to the HW while we're resetting it
16:14bbrezillon: mbrost__: which is basically what ckönig suggested here https://lore.kernel.org/dri-devel/5c4f4e89-6126-7701-2023-2628db1b7caa@amd.com/
16:33jenatali: Down to 41 CTS fails :)
17:04mbrost: bbrezillon, that is kinda tricky
17:05mbrost: give me a few and let see what it looks like
17:08mbrost: but let's say you use the system_wq, you can as many cores on the machine running in parallel as long as each item is its work_struct
17:08mbrost: the system_wq is just a wrapper for a pool of kthreads I think
17:10mbrost: so IMO I don't see each work_struct running in a loop while it has work as that big of a deal
17:11mbrost: Another option to would be pass in unique run_wq to each scheduler and now you timeslicing
17:11mbrost: if you have more work_struct than cores...
17:12mbrost: or pass in a few ordered wq to many schedulers
17:14mbrost: but either way let prototype 1 dequeue per pass of the main loop if think that would be better, I'm not going argue this one either way
17:17bbrezillon: mbrost: so, multithreaded workqueue improves the situation, yes, but ckönig was actually suggesting using a single-threaded workqueue, so we don't have to call drm_sched_stop/start() to stop/start the drm_sched while we're resetting the GPU
17:19mbrost: well you can pass in any run wq you want but confused how you could away from not calling start / stop
17:19mbrost: the existing code calls start / stop on the kthread
17:20mbrost: we call start / stop in our reset flows and they rock solid, esp when compared to the i915
17:21mbrost: do you mean the same ordered wq as the reset one / tdr?
17:22bbrezillon: well, if you have a single-threaded wq for your drm_sched workers, and you queue your reset worker to this queue, you're guaranteed that the reset function will only be executed when all drm_sched workers are idle
17:22bbrezillon: yes, that's what ckönig suggested, if I'm correct
17:23mbrost: yea that is true
17:24mbrost: nothing preventing you from doing that now but yea in that case i guess dequeuing 1 item would make a bit more sense
17:25mbrost: i don't think I want to design Xe to share a WQ though, calling start / stop is easy enough
17:25bbrezillon: mbrost: ok. I guess we need a replacement for drm_sched_job_resubmit() then...
17:26bbrezillon: didn't just what you were using in the Xe driver
17:26bbrezillon: *didn't check
17:26mbrost: and like I said our reset flows are rock solid, I've written tests to hammer these paths as I was super paranoid about this not working after working on the i915
17:26mbrost: we just kill the entity scheduler if it hangs
17:27bbrezillon: drm_sched_stop()
17:27bbrezillon: I guess
17:27mbrost: no recovery, I think this faith's idea for all VM bind drivers i think
17:28mbrost: We have a buy in from our UMDs
17:28mbrost: 1 sec, i'll point you to our TDR callback
17:29bbrezillon: you mean remaining jobs are just cancelled?
17:29mbrost: https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_guc_submit.c#L786
17:29mbrost: yea and entity is banned
17:31mbrost: that function should be pretty well commented
17:32bbrezillon: hm, okay. So that's for the 'one entity is going crazy, but the GPU and FW-proc are still functional' case
17:33mbrost: yes
17:33bbrezillon: we do have global hangs where we need to stop all entities, reset the GPU and FW, and start all entities (and by start, I mean kick the previously active entities at the FW level, and call drm_sched_start())
17:34mbrost: We also have GT resets (entire GPU is trashed) but in practice that should be impossible or very, very rare
17:34mbrost: in that case, we try to find the bad entity, ban it, resubmit the others
17:35bbrezillon: that's where synchronization gets messy, because the reset operation, which is always queued on a workqueue, has to wait for all drm_sched workers to be idle
17:36mbrost: A GT reset basically should only be triggered by junk hardware or if our KMD has a bug
17:36mbrost: yea, let me point you to our GT reset code...
17:37mbrost: https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_guc_submit.c#L1362
17:37mbrost: https://gitlab.freedesktop.org/drm/xe/kernel/-/blob/drm-xe-next/drivers/gpu/drm/xe/xe_guc_submit.c#L1401
17:38bbrezillon: mbrost: would you mind replying to Christian regarding the drm_sched_{stop,start}()? I feel like your opinions are diverging here, and more and more driver keep depending on your work ;-)
17:38mbrost: yea we do have to loop over every entity and stop it
17:38bbrezillon: yep, that's what we were planning to do initially
17:38mbrost: there is nothing stopping from the current interfaces for doing it either way
17:39mbrost: s/for/from
17:39bbrezillon: it's just that, with drm_sched_resubmit_jobs() being deprecated, I was a bit lost as to what drivers were supposed to do to reassign the drm_sched_job::parent fence
17:40mbrost: that is news to me it being deprecated
17:41mbrost: the seems like a unilateral decision, drivers should be allowed to do this either way
17:42bbrezillon: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3&id=5efbe6aa7a0e
17:43mbrost: is this merged?
17:43bbrezillon: it's in Linus' tree :)
17:43mbrost: ugh... I need to pay better attention to the list
17:44bbrezillon: was merged in 6.3-r1 apparently
17:45mbrost: Xe doesn't need most of the nonsense in that function anyways
17:45mbrost: really all we need is loop over pending jobs and call run_job
17:47bbrezillon: well, the only thing we'd need in the powervr driver is a loop re-assigning the parent fence, we don't even need to call run_job() (the ringbuf should already be filled and it content preserved, unless I'm mistaken)
17:47mbrost: ok maybe we just open code the resubmit
17:48bbrezillon: just felt odd to iterate over the pending_list manually
17:48mbrost: in Xe the parent fence would be the same anyways
17:48bbrezillon: but if that's how it's supposed to be done, I'm fine with that
17:48bbrezillon: yep, same here, we don't re-allocate the fence
17:49bbrezillon: it's the same object, same seqno
17:49mbrost: and ref count is still correct too
17:50bbrezillon: in any case, we should document what drivers are expected to do between the stop and start calls, because it's unclear right now
17:51bbrezillon: like, the bare minimum is to re-assign parent fences, otherwise pending jobs are dropped when drm_sched_start() is called
17:51mbrost: again I don't think we should force a driver to do anything
17:53mbrost: I'm confused by that last statement
17:53bbrezillon: well, it's not about forcing them to do anything, but if you call drm_sched_start() after having kicked the queue, your driver is likely to end up with use-after-free bugs
17:53mbrost: in Xe you can run_job as many times as you want you always get the same fence back
17:53bbrezillon: at least that's what happens if you keep in-flight jobs in some internal list/queue
17:54mbrost: which is assigned to the parent
17:55mbrost: it is fence which looks at a memory location for a value to be greater or equal to the jobs seqn
17:55bbrezillon: oh, you're taking about this statement https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler/sched_main.c#L562
17:55bbrezillon: yep, same here: u32 mem object containing the seqno
17:57bbrezillon: and the dma_fence is just a wrapper around that, with dma_fence::seqno being checked against the FW fence object seqno
17:57bbrezillon: run_job() does queue things to the ringbuf though
17:57mbrost: i think I understand what you are getting at
17:58bbrezillon: but we can easily add a "submitted" boolean to avoid resubmitting it
17:59bbrezillon: and even if we were actually resubmitting, that's not a big deal, as long as we reset the ringbuf pointer before kicking the queue
17:59mbrost: we stop the signaling of fences returned from run_job in our reset flows
17:59mbrost: xe_hw_fence_irq_stop
17:59mbrost: xe_hw_fence_irq_start
18:03mbrost: resets are confusing
18:04mbrost: if it would help I can post a kernel doc patch explaining how Xe does these and avoids races
18:04mbrost: it might already done this at one point...
18:07bbrezillon: dunno, I really thought everyone was on-board with this drm_sched_resubmit_jobs() deprecation
18:08bbrezillon: looks like it's not really the case...
18:13mbrost: If you do a GT reset (think of it as a power cycle) and you don't want lose all pending work, run_job needs to be called again
18:13mbrost: on all pending jobs
18:14mbrost: get rid of drm_sched_resubmit_jobs() fine, Xe will just have a loop where it calls run job
18:16bbrezillon: then what's the point of deprecating it, if people keep open-coding it...
18:16mbrost: yea that what i'm confused about
18:17bbrezillon: I mean, I can code it differently, and optimize the resubmit logic, but why should I bother, it's not a hot-path anyway
18:17mbrost: how would a pending job complete after a power cycle unless you stick it back on the hardware
18:21bbrezillon: tried reading amdgpu code, and I don't quite get how they do it...
18:31mbrost: i might have lost a few comments
18:34mbrost: 'implement
18:34mbrost: + * recovery after a job timeout.' - yea that isn't how Xe is using it
18:34mbrost: Xe uses after the entire device is reset
21:05mbrost: bbrezillon: drm sched changes we discussed: https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-upstreaming/-/commit/2cdb17a975e12e9206b5a9c433581699c67c619f
21:05anholt_: Have you wished that piglit didn't take so long? I have an MR for you! https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/804
21:05mbrost: Seems to work just fine, it would be fun with we had some benchmarks and see if this had a difference
21:08jenatali: anholt_: <3
21:09zmike: anholt++
21:12anholt_: I swear we need to have static int called = 0; assert(called++ < 1000) in piglit_probe_pixel.
21:14jenatali: Sounds like a good idea to me
21:37DemiMarie: danvet emersion: What about hiding the uAPI behind BROKEN?
21:37DemiMarie: I assume no distro will ship a CONFIG_BROKEN=y kernel, and if they do (or patch out the dependency) they get to keep both pieces.
21:40emersion: DemiMarie: what is CONFIG_BROKEN?
21:49DemiMarie: emersion: It is a catchall, never-enabled Kconfig entry used to disable broken code (hence the name). If there is a Kconfig that should never actually be set, one can use `depends on BROKEN` to make sure it is in fact never set.
21:51emersion: i see
23:21airlied: zmike: you any clues on the lvp memory model fails that CI is seeing?
23:33penguin42: TToTD: If you hit a PROFILING_INFO_NOT_AVAILABLE it might be because you missed a wait() on an event - it took me a while to figure that out
23:33penguin42: especially since ROCm didn't complain about it
23:34zmike: airlied: I had a ticket where I was tracking all the known fails
23:34zmike: if it's not in there I have no idea
23:38airlied: ah there was a flake lodged by DavidHeidelberg[m]
23:41DavidHeidelberg[m]: ?
23:44airlied: you logged a lavapipe flake a while back, seems to be more prevalent now, will have to make the effort to track it down I suppose
23:44airlied: zmike: just saw 20520 fell down the crack
23:46DavidHeidelberg[m]: right
23:46jenatali: Anybody know who I should ping to get opinions on !22800?
23:47airlied: jenatali: whoever wrote your qsort :-P
23:48jenatali: Heh, blame the C standard authors for not requiring it to be stable :P
23:50zmike: airlied: oh I forgot about that
23:50zmike: I'm not actually sure it's correct now that I look again
23:52airlied: Venemo, mareko : do mesh shaders interact with transform feedback?
23:53airlied: ah d3d12 doesn't seem to support it at least
23:53jenatali: Right
23:56jenatali: Huh, actually having working caches seems to have taken ~20 minutes off my CTS run time. I'll take it
23:56jenatali: Maybe it means I can bump the CI factor from 4 to 3...
23:59zmike: airlied: no, there is no xfb with mesh