09:40jfalempe: sima: I sent a v8 of drm_panic yesterday, can you check if the locking is better this time ?
09:42jfalempe: sima: I directly register the primary plane now, so there is no need to walk through the plane list.
09:45sima: jfalempe, will try to take a look, still behind on mails
09:46sima: walking the plane list shouldn't be an issue though, since the planes are invariant over the lifetime of a drm_device
09:46sima: but primary plane is probably the most expected case, so might still be good idea to avoid issues
09:47jfalempe: sima: that simplifies the code a bit too, since the primary plane is what you really want to draw the panic screen.
10:03sima: jfalempe, yeah makes senes
12:51dorcaslitunya: Hello, my name is Dorcas and I am currently interested in participating in X.Org Endless Vacation of Code. I am currently in search for a mentor and an available project that I can be able to work on for this program. I am interested in drm projects but I am also open to any available project and very willing to learn. Eager to learn and grow through the mentor and program. Any leads appreciated! Incase I am unavailable on irc
12:51dorcaslitunya: , I can be found on email at anonolitunya@gmail.com
14:11tzimmermann: sima, airlied, everybody seems to be sending feature PRs for drm-next. do you still accept them? abhinav__ asked for a final drm-misc-next PR for something needed by msm
14:14dj-death: gfxstrand: did I provide a compelling enough reason on : https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27822 ?
14:15sima: tzimmermann, I feel like I'll leave that to airlied :-)
14:15tzimmermann: ok. i have no preference
14:15sima: yeah same
14:55gfxstrand: eric_engestrom: Are you around? Could you take a very quick look at the meson bits of !27832
14:56gfxstrand: dj-death: Did you see my comment on the descriptor buffer MR itself?
14:58dj-death: gfxstrand: yeah
14:58dj-death: gfxstrand: I think I implemented what you suggested
14:58dj-death: gfxstrand: it's relocated constants
14:58dj-death: gfxstrand: I just need access to the value to relocate in the shader when doing the shader upload
14:59dj-death: gfxstrand: once the runtime change is merged, then the ugly thing you pointed out in the DB MR will go away
15:03gfxstrand: dj-death: My suggestion was that you store the SAMPLER_STATE and whatever else you need to create the sampler in the anv_shader_bin itself and serialize it along with the rest of the shader. Then the only thing needed on upload is the device, not the pipeline layout.
15:03gfxstrand: I haven't thought through what a mess that'll make of YCbCr, though.
15:05dj-death: gfxstrand: so every shader upload would allocate a SAMPLER_STATE?
15:06dj-death: gfxstrand: the spec guarantees that the used sampler stays alives during the lifetime of the shader so that would be an extra copy for not much
15:06meymar: karolherbst: mwk eat shit idiots.
15:09meymar: Where as world would come down to depending anything what you do, it would actually look as big of a chaos as the code you do.
15:10gfxstrand: dj-death: I was assuming your hashing would fix it
15:11dj-death: gfxstrand: yeah, even if I do want you suggest, that's still not solving the cache lookup
15:12dj-death: oh... I see what you're suggesting now
15:12dj-death: yeah I guess that works
15:12dj-death: just have to have additional samplers allocated
15:12dj-death: but sounds closer to what sampler synthesizing does
15:12dj-death: should remove the need for the change
15:13gfxstrand: Yeah, that was my thought
15:13gfxstrand: That way the sampler is really part of the shader and gets cached properly and you're not trusting the app to give you the same thing in the descriptor set every time.
15:13gfxstrand: I mean, they had better otherwise it won't work and/or you had better hash and cache properly.
15:14gfxstrand: But it also prevents issues where you have exactly the same shader, different by one sampler bit, and you get a cache collision
15:15dj-death: yeah we're already hashing things properly
15:15dj-death: it's just the sampler creation that's not what you suggest
15:16dj-death: just border colors...
15:17gfxstrand: *sigh* Border color...
15:17gfxstrand: Why did we add border color back to Vulkan? 😢
15:18daniels: wasn't it for zmike?
15:19zmike: yes and thank you
15:20dj-death: gfxstrand: thanks btw
15:21gfxstrand: yw
15:21gfxstrand: Sorry for chucking a spanner in the works. 😅
15:23dj-death: there is still time to ship this sucker this week
15:23dj-death: unless CI fails me
16:29mripard: sima: if everything's working fine for the drm gitlab repo, could you let sfr know to update linux-next?
16:30sima: mripard, ah good point
16:30mripard: I'm not sure I have the street cred to do so :)
18:25DemiMarie: Is it possible to guarantee that GPUs will not crash by validating the commands produced by the compiler?
18:25airlied: no
18:25DemiMarie: Why?
18:25DemiMarie: I assume no firmware or hardware bugs.
18:25airlied: turing completeness
18:26DemiMarie: That only applies for arbitrary commands, not commands that are required to conform to certain restrictions.
18:26airlied: compilers produce instructions from a arbitrary shader input
18:27DemiMarie: See: Native Client
18:27DemiMarie: “Crash” means “faults” for this purpose.
18:29airlied: faulting is fine as long as the gpu and driver recovers
18:29airlied: you cannot avoid faults since someone can just write a shader that takes an arbitrary ubo value and tries to load from it
18:30DemiMarie: How often is recovery successful in practice?
18:31airlied: depends on the gpu drivers and gpu I suppose, intel, nvidia seem to be pretty good, amd has it's issues but faults usually don't take it down
18:32lstrano: DemiMarie: you assume no firmware or hardware bugs?
18:37DemiMarie: lstrano: No unknown ones, yes. Same as how people writing code for a CPU assume no unknown CPU hardware or microcode bugs.
19:01Company: DemiMarie: can't you just write a program that crashes when it validates correctly and does't crash when the program would crash - and then feed it itself?
19:02DemiMarie: Company: Such a program would be rejected.
19:03Company: that's my attempt at the halting problem
19:03DemiMarie: airlied: Is it mostly VCN problems that bring down the entire GPU?
19:03Company: like, isn't that literally the halting problem you're trying to solve?
19:04DemiMarie: Company: No. What I am asking is not, “Will XYZ program break the GPU?”, but rather “Does XYZ program conform to the restrictions I have set, which are enough to ensure that there are no unrecoverable faults?”.
19:04agd5f: DemiMarie, GPU page faults never directly hang the GPU. They trigger an interrupt and writes are dropped and reads return 0 or something like that. Where it can be a problem is if the page fault leads to a problem due to the data returned. E.g., if the data is garbage and some hardware is waiting on bit in that memory that never changed
19:05DemiMarie: agd5f: which hardware would that be?
19:05agd5f: AMD GPUs
19:05DemiMarie: Do AMD GPUs have such uninterruptable polling loops?
19:06DemiMarie: That seems like a hardware or firmware bug to me.
19:06Company: DemiMarie: but that turns your accepted programs into a small subset of all programs, because Turing-completeness would cause the halting problem
19:06DemiMarie: Company: which is fine in this use-case
19:06DemiMarie: because any program can be transformed into one that is in this subset.
19:06agd5f: DemiMarie, fixed function hardware can't be preempted
19:07DemiMarie: but this discussion is also too hypothetical and so should probably be dropped
19:11mattst88: yay for self-awareness :D
19:12DemiMarie: agd5f: can whatever is feeding it be preempted and forced to give the fixed-function hardware what it needs, so that it returns garbage instead of hanging forever?
19:13DemiMarie: agd5f: the problem is that resetting the GPU seems to often take out the thing that froze, but also a bunch of other stuff
19:14DemiMarie: the blast radius is the problem'
19:17agd5f: DemiMarie, you can take a perf hit and only allow a single application on gfx at a given time
19:17DemiMarie: agd5f: how big a perf hit?
19:18DemiMarie: Fundamentally, my question is about why the hardware is designed the way it is.
19:18agd5f: depends on what you are trying to run
19:18DemiMarie: agd5f: typical desktop
19:19agd5f: mainly single app use cases probably not that much. will impact more when you have multiple apps active since they can't run in parallel
19:19DemiMarie: It’s not a question about how to work around the problems with existing hardware, but rather me trying to understand why hardware is designed as it is.
19:20agd5f: DemiMarie, to maximize throughput
19:20DemiMarie: agd5f: how does a wide blast radius help with that?
19:21pixelcluster: agd5f: re: "GPU page faults never directly hang the GPU" really? always? this doesn't quite match what I see, AFAICT when page faults happen, the faulting waves get halted and I'm not sure if there is a way to un-halt them?
19:21DemiMarie: for instance, why can’t whatever feeds the rasterizers just be forced to feed the rasterizer with fake input (zeros?) if it is taking too long?
19:23agd5f: pixelcluster, there is a debug module parameter you can set to force a halt on a fault, but that is not the default
19:24pixelcluster: but why does my compute shader get halted then if I use global_store_dword to write to NULL? this happens without any debug parameters set, on any kernel version
19:24DemiMarie: agd5f: in practice, does a fault ever not result in a hang, unless one is doing a pure compute workload?
19:25agd5f: DemiMarie, yes
19:26DemiMarie: agd5f: what does the rasterizer do if the vertex shader faults?
19:26abhinav__: yes sima airlied one more PR from misc would be great for our feature
19:26agd5f: DemiMarie, consider something like a page fault in a command buffer or a resource descriptor or sampler descriptor. When that state gets loaded into the state machine, it might be an invalid combination that results in a deadlock somewhere in the hardware state.
19:26DemiMarie: agd5f: because the hardware is not designed to be robust against malicious input?
19:27DemiMarie: that seems like a hardware bug to me
19:27agd5f: DemiMarie, too many combinations of state to validate
19:28DemiMarie: agd5f: and they don’t try to use e.g. formal methods that work no matter how complex the system is?
19:28DemiMarie: more generally, why is the rasterizer shared between all the shader cores?
19:28DemiMarie: that seems to be the cause of the behavior that I am observing
19:29DemiMarie: AMD’s fix for LeftoverLocals being “do not let different contexts run in parallel” tells me that, unless I am missing something, there is a lot of internal sharing between different contexts on the GPU
19:30eric_engestrom: gfxstrand: I'm around, but I've hurt my wrist on top of already having a ton of non-work stuff taking all my time (and it will only increase for the next 2 months :/); looking at your MR now
19:31DemiMarie: robclark: for AMD GPUs, I’m wondering if not allowing more than one VM to use the GPU in parallel would be a good idea.
19:31agd5f: DemiMarie, the design largely the same for everyone's GPUs.
19:32pixelcluster: having hardware potentially deadlock is not that big of a deal yeah
19:32DemiMarie: is it guaranteed to only be a deadlock, and not e.g. a corruption of state belonging to other’s shaders (which would potentially allow privilege escalation)?
19:33DemiMarie: or a leak of state from other’s shaders?
19:33pixelcluster: not having a way to shut down everything from one specific context (and leaving everything from other contexts untouched) is an issue, and it would seem like other vendors can provide that
19:33DemiMarie: pixelcluster: exactly!
19:34DemiMarie: robclark: my concern is that AMD GPUs might have a lot more state (architectural or otherwise) that is shared between contexts than it seems
19:35robclark: limiting to single process using gpu at a time is a thing you want for preventing information leaks.. not sure if released yet but AFAIU something is in the works (not sure if fw or kernel level)
19:35pixelcluster: the rasterizers being shared isn't an issue in practice, I'm pretty sure
19:35DemiMarie: robclark: on AMD or everywhere?
19:36pixelcluster: I don't think more than one app can use raster hw at a time anyway, and that has always been this way
19:36robclark: well, probably everywhere for GPUs which can have multiple processes active at a time (think LL type issues)
19:36DemiMarie: I don’t want to get GPU acceleration shipped only for marmarek to have to issue a Qubes Security Bulletin for “oops, we didn’t properly isolate stuff on the GPU.”
19:36DemiMarie: robclark: I thought Intel and Nvidia were immune to LL
19:37agd5f: DemiMarie, there is no more or less state shared between contexts. The issue is that there is only one of each fixed function block so it can only be used by one thing at a time. If a particular fixed function block hangs you have to reset the whole engine. Maybe other vendors can just reset a particular block.
19:37DemiMarie: agd5f: I suspect being able to reset individual blocks is the difference
19:37robclark: I think intel already had sufficient state clearing (which I think implies they don't have multiple processes running at same time).. nv I'm not really sure the situation, we don't use 'em in any chromebooks
19:39DemiMarie: Intel having enough state clearing is good
19:40DemiMarie: If it is a kernel fix, what about forbidding multiple VMs from sharing the GPU at the same time, and then letting the guest decide if it wants to let multiple processes from itself run simultaneously?
19:42DemiMarie: robclark: also is it okay if I send you a direct message about something that is related to Chrome security but not graphics?
19:42robclark: I'm not sure what level of granularity is possible.. but if it is possible it could be useful to isolate VMs without isolating every process
19:42robclark: sure
19:43robclark: note that with nctx the host kernel sees a single process with multiple dev file open()s for a single VM
19:45DemiMarie: robclark: I think that would be useful too
19:51airlied: mripard: maybe just send one more PR I don't think it'll matter at this point
20:01DemiMarie: robclark: I’m pinging you a few times in various Qubes issues
20:20alyssa: karolherbst: do we support work_group_scan_*?
20:20karolherbst: no
20:20alyssa: k
20:20karolherbst: those are crazy to implement anyway
20:20alyssa: :(
20:21alyssa:wants them for geometry shaders
20:21karolherbst: they are like subgroup ops, but over the entire block
20:21alyssa: i know that's why i want them
20:21karolherbst: I think there is nothing against implementing them, I just suspect it's a lot of lowering
20:21alyssa: sure
20:21alyssa: also, is there any standard ballot in CL?
20:22alyssa: I polyfilled it in asahi_clc but that seems.. wrong
20:22karolherbst: there is cl_khr_subgroup_ballot
20:23alyssa: ahaha nice thanks :)
20:23karolherbst: it's in no way wired up, because I think the CTS requires weird features to test it or something...
20:23karolherbst: but I doubt it's hard to implement and might just work once the clc bits are added
20:23alyssa: meh I'll just use my hack for now
20:24karolherbst: okay
20:24alyssa:doing gpu horrors
20:24karolherbst: guess I should look into this soon, so it's better supported in mesa
20:24dj-death: karolherbst: how does that work in divergent control flow?
20:25DemiMarie: Thanks to everyone here for taking their time to help me as I try to understand GPUs and design GPU acceleration for Qubes OS!
20:25karolherbst: dj-death: through cl_khr_subgroup_non_uniform_vote maybe?
20:25Sachiel: karolherbst: speaking of CL, does it have opinions on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27281 ? (I will get back to this one of these days)
20:26karolherbst: dj-death: ohh wait.. cl_khr_subgroup_ballot also has non uniform variants
20:26karolherbst: mhhh
20:26karolherbst: "These functions need not be encountered by all work items in a sub-group executing the kernel."
20:26karolherbst: :')
20:27dj-death: I guess you have to barrier on shared memory
20:27karolherbst: I guess..
20:27dj-death: huh
20:27dj-death: actually that can't work with a barrier
20:27dj-death: for subgroup that's okay
20:28karolherbst: ohh right...
20:28dj-death: just use the execution mask to drop the unwanted lanes
20:28dj-death: but what's the criteria to have all items of the workgroup to come and do the scan or whatever?
20:29karolherbst: there are a couple of functions defined, but it sounds like that for that extensions not all threads actually have to reach the op
20:32dj-death: karolherbst: yeah, but then you still want to wait for the one that might reach it ;)
20:32karolherbst: mhh I guess...
20:35gfxstrand: tjaalton: Are you still the Debian/Ubuntu Mesa packager?
20:40jenatali: Huh. Who managed to get turnip working on Windows? https://vulkan.gpuinfo.org/displayreport.php?id=27282
20:45robclark: jenatali: fexemu + proton?
20:48alyssa: ^ that
20:52jenatali: Cool
20:59dj-death: robclark: what the kernel interface though?
21:07karolherbst: what kernel interface?
21:07karolherbst: fex intercepts the API calls
21:12robclark: fex would be using drm/msm + mesa compiled for x86 (unless thunking)..
21:14dj-death: oh
21:14dj-death: so mesa is under fex
21:15robclark: fex does have a way to thunk to native aarch64 .so, so it could either be x86 or aarch64
21:16dj-death: does that really qualify as "turnip on windows" then? ;)
21:16robclark: the thunking thing, seems harder to get going on non-multi-arch distro so I've not been able to try that myself
21:16jenatali: dj-death: I just meant that it was listed on gpuinfo as a Windows entry
21:16dj-death: yeah I get it :)
21:16dj-death: just having fun I guess
21:17robclark: it is a kinda funny thing to see
21:36gfxstrand: Ooh, I should totally submit an NVK vulkaninfo from wine to gpuinfo.org just to mess with people...
21:38robclark: indeed
21:40airlied: gfxstrand: someone already has
21:40airlied: https://vulkan.gpuinfo.org/displayreport.php?id=28542#device is I assume nvk
21:41jenatali: That's Linux though?
21:42airlied: I do wonder if we ported NVK to the nvidia linux kernel API how far a jump to windows would be :-P
21:43jenatali: I thought gfxstrand already prototyped NVK on WDDM
21:45gfxstrand: I haven't actually done the work to get NVK running on WDDM
21:45gfxstrand: I know what work needs to be done and I have the tooling to do it. It just hasn't been a high priority.
21:47airlied: well the WDDM API is the documented bit, it's the other driver private bits that I wonder how much the correspond
21:49alyssa: karolherbst: tempting to implement work_group_* stuff ughh
21:49karolherbst: yeah... but also...
21:50karolherbst: but I don't have to do all the things :P
21:50alyssa: I'll do libagx_work_group_* routines in .cl for now I guess
21:50karolherbst: yeah.. guess I could just copy those later or something...
21:51karolherbst: I already see us implementing spirv opcodes with internal clc code....
21:53alyssa: karolherbst: "error: non-kernel function variable cannot be declared in local address space"
21:53alyssa: oh come on.
21:53karolherbst: you need to specify the size
21:54alyssa: not using real cl kernels is biting me
21:54karolherbst: huh?
21:55karolherbst: where do you try to define the __local variable, because you should be able to do that in any function
21:55karolherbst: or do you need it variable sized?
21:56alyssa: "because you should be able to do that in any function" clang seems to disagree
21:56karolherbst: mh?
21:56karolherbst: how are you declaring it?
21:56karolherbst: https://registry.khronos.org/OpenCL/specs/3.0-unified/html/OpenCL_C.html#local-or-local
21:56gfxstrand: airlied: I've already done that part twice... I find it hard to believe that NVIDIA would be the insane one of the three.
21:57karolherbst: ohh wait
21:57karolherbst: maybe you can only do that in kernel functions? mhhh
21:57karolherbst: 🙃
21:58karolherbst: but yeah.. makes sense because otherwise you wouldn't be able to calculate how much local memory a kernel entry point would use...
21:58karolherbst: alyssa: I guess you need to pass it in as a pointer and then deal with the size at runtime, kinda like the variable shared mem stuff
21:59alyssa: exciting
22:00karolherbst: though.. I mean it's driver internal stuff, just keep track of how much shared memory you need 🙃
22:02alyssa: (-:
22:02alyssa:will continue this Later
22:55mdnavare_: hwentlan_: Ping ?
23:12mdnavare_: hwentlan_: vsyrjala airlied sima : A mode that is a VRR mode, according to VESA def, its VTOTAL value is stretched out to match the VRR refresh rate. Since for VRR we ignore MSA, the VSE and VSS pulses are ignored by sink, so would it suffice to create a VRR mode that only has a different value of VTOTAL calculated as = clock/(htotal * vrr_refresh_rate)
23:36Kayden: gfxstrand: congratulations on 1.3 :)
23:36gfxstrand: Kayden: Thanks!
23:36gfxstrand: I've now written 2 conformant 1.3 drivers. :D
23:40airlied: gfxstrand: you caught up :-P
23:44gfxstrand: To myself, yes.
23:44gfxstrand: Oh, right...
23:44gfxstrand: airlied: But you can't really claim to have made RADV support 1.3
23:50gfxstrand: So here's a question: If I rewrite NVK in rust, does that count as having written 3 Vulkan drivers?
23:51alyssa: yes
23:51alyssa: do it
23:51alyssa: infinite conformant vk driver glitch
23:53airlied: gfxstrand: hey having a company step in an staff up a team to make radv support 1.3 seems like a better win :-P
23:53alyssa: skill issue
23:53airlied: but yes a rust rewrite could win, I don't think you can nerdsnipe me into lvp rust :-P
23:58zmike: surely you'd have to start from the foundation of lavapipe anyway
23:58zmike: and rewrite llvm in rust