IRC Logs of #dri-devel on irc.freenode.net for 2024-03-27

00:36 zmike: I hope it blends
01:28 ILOVEPIE: Can someone point me to where i can find information on the nir_intrinsic_first_invocation instruction? I noticed it was missing from the nouveau compiler and it's causing issues with running a specific piece of software through NVK, I wanted to try my hand at fixing that since it seems to be a rather simple instruction.
01:39 alyssa: nir_intrinsics.py
08:33 tursulin: demarchi: that one slipped my mind.. mr uploaded
09:23 karolherbst: jenatali: fyi, I started to work again on program scope variables in case you are interested: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24515
09:24 karolherbst: it's kinda cursed, but maybe you have better ideas?
09:29 karolherbst: but would also be helpful to get input from others, because that feature is really cursed...
10:53 jenatali: karolherbst: I'll take a look
11:00 karolherbst: jenatali: I wonder if it makes sense to implement this for real from the start, but that's really annoying to do actually... my current plan is to have an entirely new entry point to vtn with a custom vtn_handle_constant which just emits it into a init entry point....
11:02 karolherbst: and have some nir opts which can optimize stores to the global var to a value in the initializer.. but that's even more work :')
11:06 jenatali: karolherbst: I'm traveling today so if I don't take a look by end of week, tag me in the MR so I get an email I can use as a reminder
11:06 karolherbst: okay, have fun!
11:07 jenatali: These IRC pings come in at 4am and by the time I'm actually in front of a PC I sometimes have forgotten :)
11:10 karolherbst: heh
12:47 Hazematman: Hey I've had this MR sitting for a little while right now to enable dma buf import & export support in llvmpipe & lavapipe. I was hoping I could get some feedback on it and get it landed https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27805
12:59 zmike: I'm getting to it
12:59 zmike: it's been a busy week
13:00 mareko: robclark: I'd like it to have a better comment after return is_drm_shim;
13:02 mareko: zmike: I think you need to re-run divergence analysis because nir_opt_varyings sets it to vertex divergence
13:02 zmike: hm ok, will try
13:02 zmike: thanks
13:05 Hazematman: Thanks zmike! :)
13:10 demarchi: tursulin: merged, thanks :)
13:10 mripard: jani, demarchi: I added a new patch to the Kconfig series, could you review it? https://lore.kernel.org/all/20240327-kms-kconfig-helpers-v3-7-eafee11b84b3@kernel.org/
13:11 demarchi: mripard: will do
13:12 demarchi: mripard: jani karolherbst could you take a look at https://gitlab.freedesktop.org/drm/maintainer-tools/-/merge_requests/46 ?
13:12 demarchi: related to the move of (most) repos to gitlab
13:13 mripard: demarchi: done :)
13:28 zmike: mareko: yeah that seems to fix things, thanks again
13:45 robclark: mareko: newer version just marks userspace fences signaled after submit.. but still OoM's mid way thru drawoverhead, so I guess I'm missing something
13:50 alyssa: jenatali: how do samplers in descriptor sets work in dozen?
13:50 alyssa: given that dx12 has a limit of 2048 samplers in the heap, but vk allows creating 4000 samplers?
13:52 alyssa: particular with EDI/update-after-bind
13:53 jenatali: alyssa: we bumped the limit due to that VK requirement
13:53 alyssa: how can that work? wasn't the limit already tight on nvidia?
13:54 jenatali: The 2048 limit comes from NV. They have 4096 max, which we partition into 2048 app samplers, 2032 static samplers across all possible root signatures (like descriptor set layouts), and 16 driver-reserved
13:54 jenatali: To get 4k you have to give up on the static samplers, which aren't beneficial for them from a perf POV anyway, just developer convenience
13:54 alyssa: (and does this cause a problem in turn where vk needs to bump limits for vkd3d-proton, and then we just end up in a layering feedback loop until both vk and d3d12 require infinite samplers?)
13:54 alyssa: :P
13:55 DemiMarie: 🤣
13:55 DemiMarie: Has anyone actually tried VK on D3D12 on VK on D3D12?
13:55 DemiMarie: I guess a Vulkan app running under Wine would be VK on D3D12 on VK.
13:56 karolherbst: nah, they just do vulkan directly for most part
13:56 jenatali: Why? It'd just be vk
13:56 DemiMarie: I thought Windows didn’t guarantee that Vulkan was present, so one must bring one’s own VK-on-D3D12 implementation.
13:57 mareko: zmike: divergence information currently is not guaranteed to be correct or up-to-date anyway if you have run any passes after divergence analysis, so any pass that uses "divergent" should have it re-run
13:57 zmike: got it
13:57 zmike: makes sense
13:58 mareko: Daniel suggested making it metadata that a pass can "require"
13:58 karolherbst: DemiMarie: why would that matter for windows?
13:58 karolherbst: *wine
13:59 DemiMarie: karolherbst: I wasn’t sure if there was a way to check if the driver used Vulkan
13:59 DemiMarie: *supported
14:01 jenatali: Demi: apps can and do assume it's present
14:01 jenatali: Dozen exists partly for that reason, so we can provide a fallback for the case where the driver doesn't provide it
14:04 jenatali: alyssa: I don't think that'll cause problems for vkd3d. But does that answer your question?
14:06 glehmann: > To get 4k you have to give up on the static samplers, which aren't beneficial for them from a perf POV anyway
14:06 glehmann: they are beneficial for AMD perf
14:09 jenatali: glehmann: right, I meant specifically for NV
14:09 jenatali: The way we lifted to 4k is for drivers to specify a separate limit for static samplers
14:27 alyssa: jenatali: mind linking relevant dx12 spec? not having luck searching
14:28 jenatali: alyssa: https://microsoft.github.io/DirectX-Specs/d3d/VulkanOn12.html#sampler-descriptor-heap-size-increase
14:28 alyssa: thx
14:29 jenatali: Note that we only have this problem for update-after-bind descriptors. We have a path which copies descriptors at draw time which has no limit on the number of samplers alive at a time, just per pipeline layout
14:30 jenatali: But we can only use that path if we don't enable the descriptor indexing extension
14:30 alyssa: yep
14:30 alyssa: jenatali: tangentially related -- does dx12 spec a limit for static samplers accessed in a single shader?
14:31 jenatali: alyssa: no, 2032 total alive at a time though
14:31 alyssa: ack
14:31 alyssa: thanks
14:32 jfalempe: sima: regarding drm panic, I only have one change from the v10 https://lists.freedesktop.org/archives/dri-devel/2024-March/446198.html. May I send a v11 with just that ?
14:42 canonjet: https://codeql.github.com/codeql-query-help/javascript/js-shift-out-of-range/
14:46 zmike: can I get a quack on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28429
14:47 sima: jfalempe, yeah macro sounds like a good idea if that helps
14:47 sima: and best to match semantics of the spinlock functions as close as possible
14:49 canonjet: dwfreed, so you look at the onion server or tor service exit node, and freeze my socket which would freeze any of the gui functionality to initiate new widget through the main oftc connection, it actually has visible side effects on all clients. but you could do it more intelligent i assume that i would not understand i am cut off.
16:00 tomeu: so, I have my userspace driver for the rockchip NPU already doing useful stuff, but the UABI header for their (out of tree) kernel driver is GPLv2 only
16:00 tomeu: how much of a problem is that to get the code merged into Mesa?
16:01 alyssa: tomeu: a lot, afaik
16:01 alyssa: this is part of why no kbase in mesa
16:01 tomeu: yeah, I vaguely remembered that being a problem
16:03 tomeu: alyssa: that said, won't you happen to have a commit laying around with an empty skeleton of a DRM driver? maybe from asahi?
16:04 alyssa: I pretend that I've never done kernel work
16:35 robclark: alyssa, tomeu: tu _does_ have support for downstream kgsl uapi... although, it isn't, like, the primary uabi, just an alternative so one can tu on android vendor kernels
16:42 alyssa: robclark: but the kgsl uabi headers aren't gpl..?
16:43 robclark: we do have uabi headers for a bunch of kernel drivers which are gpl, incl panfrost
16:44 alyssa: robclark: the uabi headers are specifically not gpl, unlike the driver
16:44 alyssa: (although panfrost was relicensed as mit recently)
16:45 alyssa: Linux-syscall-note
16:45 alyssa: though if kbase & rknpu have that I guess it's ok
16:47 robclark: looks like the syscall note is not specifically called out for a bunch for drm-uapi.. but I guess it is implied in that they are kernel uabi.. possibly some of that should be fixed on kernel side and sync'd back to mesa (ianal)
16:47 alyssa: wheeeee.
16:49 tomeu: this out-of-tree driver is plain gplv2... :(
16:50 DavidHeidelberg: maybe the header isn't copyrightable?
16:52 DavidHeidelberg: License is usually everywhere, but that doesn't imply it's really applicable everytime it's present
17:45 DavidHeidelberg: zmike: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28400 retry the failed job quickly
17:46 DavidHeidelberg: it still may pass
17:53 gfxstrand: cwabbott, robclark, alyssa: Can you double-check this paragraph of my upcoming blog post?
17:53 gfxstrand: https://paste.centos.org/view/300b8b3a
17:55 gfxstrand: I'm not intending for it to be an exact, detailed description of how control-flow works, just a hand-wavy description that's good enough for reasoning about re-convergence for the sake of a blog post.
18:09 robclark: gfxstrand: maybe of note, on qc we need to mark potential recovergence points
18:37 gfxstrand: robclark: Yes but getting re-convergence right is based on high IPs, right?
18:40 cwabbott: yeah, it's the same as ARM - the lowest IP is always the one executing and parked invocations have higher IPs
18:40 cwabbott: and also just like ARM you need to mark points where active threads fall through into parked threads
18:44 cwabbott: it's implemented as a priority queue of (inactive IP, inactive thread mask) instead of a simple vector IP, which we unfortunately have to care about in the compiler because we have to program a register with the max size, but that's too much detail for a blog post
18:44 alyssa: gfxstrand: ack for apple & arm
18:48 cwabbott: gfxstrand: there's also a fun corner case where NIR's block ordering doesn't work, for continues
18:49 cwabbott: that's a case where always having continue constructs all the time would be helpful
18:49 cwabbott: I have to manually create one in ir3, and panfrost is probably just broken
18:52 alyssa: can confirm
19:14 gfxstrand: cwabbott: Right
19:15 gfxstrand: cwabbott: We could have a pass that just adds empty continue constructs everywhere.
19:59 kiarash: hi
20:54 karolherbst: jenatali: btw, I'll just do the initializer lowering to init kernel code right away.. it's just a lot of work, so I guess you won't have to look at the MR for a while
20:55 DemiMarie: gfxstrand: wow, I was not expecting GPU ISAs to be that weird!
21:04 gfxstrand: Oh yeah, they get funky
21:15 mareko: is AMD the only not having a weird ISA for once?
21:15 Sachiel: there's nothing weird about the Intel ISA
21:16 Sachiel: except for all the weird bits
21:58 karolherbst: I hate this feature...
22:13 gfxstrand: mareko: I think AMD GCN and NVIDIA Volta+ both make sense in their own way and everything else is some weird eldritch horror in between the tow.
22:13 gfxstrand: *two
22:52 DemiMarie: gfxstrand: which one is closer to what the actual execution units see?
22:53 DemiMarie: in other words, which requires the least control logic?
22:59 zmike: eric_engestrom: I'm not sure anyone has said this recently, but I have to say it
22:59 zmike: thanks.