09:11kode54: eesh, someone posting badly formatted HTML mails to the amd-gfx list
09:12kode54: in which they take an HTML with their original message indented and someone replying to them inline, don't indent it further, and post their replies as "<first name>: <reply>"
09:13kode54: the indenting is also completely discarded by the plaintext version their mailer sent
09:15kode54: why should I not be surprised to find this: <meta name="Generator" content="Microsoft Word 15 (filtered medium)">
12:15mareko: the days of plain text email are gone
12:17superkuh: Be the change you want to see in the world.
12:22mareko: tell that to politicians
12:22Venemo: hahaha
12:25superkuh:runs his own mailserver that accepts mail from *everyone* and sends plain text.
19:29Venemo: agd5f: sorry Alex, but I still don't get your reply on GitLab
19:29Venemo: maybe I'm asking the wrong questions
19:29agd5f: Windows uses user queues so each process has at least one dedicated queue that it only used for that process
19:30Venemo: how is that different from user queues on Linux?
19:30agd5f: it's not that fundamentally different. Just a lot of work to get there.
19:31Venemo: I don't understand what you are telling me. a few days ago I understood your replies on issue 11759 as "it's impossible to implement proper isolation using user queues"
19:32Venemo: but now you are telling me that it's possible, just not done yet?
19:33agd5f: you can't use user queues as kernel queues. User queues are not a drop in replacement for kernel queues
19:33Venemo: I also don't understand how windows solves the issue that you outlined
19:33Venemo: how does their kernel submit anything to the gfx ring without hitting the same problem
19:34agd5f: with kernel queues, there is for example, 1 global gfx queue and all processes submit work the the kernel and the kernel schedules those jobs on that single queue.
19:34Venemo: yes, I understand it thus far
19:34agd5f: each job submitted to the kernel queue has a vmid associated with it so that it operated with it's own GPU virtual address space.
19:35agd5f: with user queues, you need 1 queue per user process because only that process can use that queue
19:35Venemo: yes
19:36agd5f: user queues requires adapting the whole stack to that programming model.
19:36Venemo: I imagined that we'd have 1 (or more) queue per process on Linux as well, so I don't see the problem yet
19:37agd5f: with kernel queues there is only 1 queue for all processes.
19:37agd5f: all apps jobs funnel into 1 queue
19:39Venemo: right
19:40Venemo: does Windows also have kernel queues or does it rely on just user queues?
19:40agd5f: for gfx11 and newer, it uses user queues
19:40agd5f: for older parts, they use kernel queues
19:41Venemo: okay
19:41Venemo: but you said earlier that is impossible on Linux because of an isolation issue
19:42Venemo: so how come Windows doesn't have that issue?
19:42agd5f: it's impossible to use user queues as kernel queues due to the isolation issue. I.e., you can't have 1 user queue and submit work from all applications to that single queue
19:42agd5f: user queues have to be per process
19:43Venemo: okay, let me stop you there. so we are basically now talking about compatibility with old user space that doesn't know user queues yet vs. a hyphothetical new kernel that only has user queues and still wants to be compatible with those old processes. do I understand that correctly?
19:45agd5f: you can mix the two within the limits of the hardware. E.g., if the hardware has 2 hardware queue threads, you can use one for a kernel queue and 1 for all user queues.
19:45agd5f: and the user queues would be dynamically mapped in and out of that 1 queue thread by the MES firmware
19:46Venemo: okay, so you're saying that it's impossible to expose both hardware queue threads (I assume this is the same concept as a "ring") on user queues, as long as we want to also expose a kernel queue?
19:47agd5f: right. You are limited to two hardware queue threads, those can be uses by 2 kernel queues, 1 kernel queue and X user queues or X user queues
19:47Venemo: okay, I think I am with you thus far
19:48agd5f: so if we were to enable two kernel queues, we could not enable any user queues because all of the threads would be full
19:49Venemo: now, my question is, why couldn't the kernel take care of those old applications by creating a user queue for them under the hood? that would still mean 1 user queue / process
19:50agd5f: it wouldn't be backwards compatible at the IOCTL level. The UMD has to allocate the memory for the queue and it's metadata
19:50agd5f: since it has to be in the user's GPU address space
19:51Venemo: but the kernel is responsible for memory management, isn't it? so I don't see why it couldn't allocate something in that process's GPU address space
19:53agd5f: It could, but then, we'd need some way to tell userspace not to use ranges that the kernel allocated for the queue and we'd have to prepopulate memory for all of the available queues for every process.
19:53Venemo: ah, that sounds tedious
19:54agd5f: FWIW, here's the current user queue kernel code: https://gitlab.freedesktop.org/contactshashanksharma/userq-integration/-/commits/integration-staging?ref_type=heads
19:58Venemo: so, basically in order to stay compatible (at least on already existing HW), the kernel has to have a kernel queue basically, and therefore there is no way to expose the two gfx rings for user queues
19:58Venemo: is that a correct understanding of the situation?
20:00agd5f: we need to always have a kernel queue for backwards compat. If we were to add a second kernel queue, that would prevent us from ever enabling user queues.
20:00Venemo: got it
20:00Venemo: as a workaround to this situation, could the kernel allocate the kernel queues on-demand?
20:01Venemo: so when every userspace process uses user queues, it wouldn't need to allocate a kernel queue
20:02agd5f: I guess in theory, although the over head of mapping and unmapping the kernel queue(s) may be significant
20:02Venemo: I think that would be an acceptable compromise
20:04Venemo: what do you think about that?
20:05agd5f: I'm not sure it would be worth the effort. It would be a fair amount of work to implement and then I think the performance would probably suck if you ended up needing a mix of kernel and user queues
20:06Venemo: :(
20:07Venemo: okay, so in order to get access to two different GFX rings on GFX11, we (userspace) would hyphothetically need to use both the kernel queue and a user queue?
20:08Venemo: that sounds a bit excessive to be honest
20:08Venemo: but if there is no other way...
20:08agd5f: for user queues, you could create two user queues with different priorities
20:08Venemo: right but those would still run on the same gfx ring, right? so they wouldn't be as good as two queues on two different rings
20:09airlied: also are the kernel queues kernel wide or per process?
20:09airlied: so if one process needs a second kernel queue, does that stop others getting user queues?
20:09agd5f: the user queues have priority levels
20:09Venemo: airlied: yes, Alex already explained that a few messages above
20:09Venemo: airlied: if the kernel exposed two kernel queues, there wouldn't be any room for any user queues
20:10airlied: but I wondering about the dynamic apporach
20:10agd5f: and the MES firmware will schedule/preempt across the user queues to meet the priority levels
20:10Venemo: okay
20:10airlied: like instantiating a second kernel queue if userspace asks for it, but if that block users queues for other process it wouldn't be workable
20:11Venemo: airlied: [22:05:03] <agd5f> I'm not sure it would be worth the effort. It would be a fair amount of work to implement and then I think the performance would probably suck if you ended up needing a mix of kernel and user queues
20:11agd5f: yeah, you'd have to wait for the kernel queue to drain and then umap it before you could start scheduling user queues
20:11agd5f: kernel queues are not preemptable
20:12Venemo: IMO it would still be a workable approach though, because it's very unlikely that userspace would actually need both
20:12airlied: well the question is how would we trickle the multi queue exposure up to apps
20:13airlied: perhaps if it was a drirc thing it might be doable, or keyed off some vulkan app identifiers
20:13airlied: otherwise it would be a container pain in the ass if an old driver in a flatpak kept asking for 2 queues for no reason
20:13Venemo: if we go with this approach, then either the user has new Mesa which would use user queues only; or old mesa which only uses kernel queues. I find it extremely unlikely that somebody would run two FSR3 games at the same time using different Mesa versions, and if they do, they deserve slow perf
20:13airlied: the question is how to have the mesa request multiple queues
20:14Venemo: with user queues
20:14airlied: if I have a mesa that wants kernel queues, how does it know when to ask for them
20:14Venemo: I mean, that mesa already exists, it would be the same as it is right now, no?
20:14airlied: or do we trust that is vulkan exposes them app will only use one anyways except for fsr3
20:15Venemo: oh okay, I see your dilemma
20:15airlied: you don't want some containerised app with older mesa always asking for two queues from the kernel when the rest of your userspace is on new shiny user queues
20:15agd5f: gfx10 already exposes 2 gfx queues. but the high priority one requires supervisor. You could drop that bit and use that to see how things perform with two kernel queues
20:16Venemo: assuming that user queues are implemented in a timely manner, I'd be okay to only expose more than 1 gfx queue from mesa on gfx11 only when user queues are supported
20:17Venemo: in a previous discussion, I think 6.13 was mentioned as a target, that's only a couple of months from now, isn't it?
20:19agd5f: what about software priorities? The GPU scheduler already supports priority levels
20:19agd5f: would that help?
20:19Venemo: agd5f: are you now talking about user or kernel queues?
20:19agd5f: kernel queues
20:20agd5f: so when the jobs are picked off the scheduling entities, the higher priority jobs would get picked first
20:21agd5f: create two contexts in the UMD and then set the ctx priority higher for one of them
20:22agd5f: and submit the high priority work to the higher priority context
20:22Venemo: would those two contexts share the same GPU address space?
20:22agd5f: the GPU scheduler should take that into account when the jobs are scheduled on the hw
20:22agd5f: yes
20:23Venemo: I assume it is doable. maybe VKD3D-Proton could do it even (create another Vulkan device and submit to that)
20:23Venemo: it doesn't sound as clean as simply exposing another queue, though
20:25agd5f: well, I'm not sure to what extent having two queues will ultimately help. there is only one set of shader hardware
20:26Venemo: I'm not sure what FSR3 does exactly, so I'll need to ask Hans-Kristian to elaborate a bit. but assuming that it relies on a 2nd graphics queue, then it is tedious to implement that using an entirely different context
20:28agd5f: The important aspect of user queues is the preemption part. I.e., a lower priority queues can be preempted
20:29Venemo: assuming that both the kernel and userspace is in good shape w.r.t the implementation of user queues, would it be possible to only have user queues on GFX12? and no kernel queues at all?
20:29agd5f: that was our original goal, but user queues weren't ready in time
20:30Venemo: you still have a lot of time till GFX12, no?
20:30agd5f: support is already upstream in 6.11
20:30Venemo: damn
20:31Venemo: well, I mean I'm happy that the support is there. just unfortunate w.r.t user queues
20:56Venemo: thank you agd5f I appreciate a lot that you've explained all of this to me
20:57Venemo: I've summarized this discussion on the gitlab as well, hopefully we'll be able to find the right solution
21:24agd5f: Venemo, no problem.