06:07noodle: is it too late for the new nouveau vulkan driver to make it into 23.2?
07:23airlied: noodle: there would be no point including it even if it could make it
07:24airlied: having a release of a in development driver would waste a lot of time
07:29noodle: airlied: virtio-experimental is however released
07:34airlied: yeah which is a bad idea since some kernel patches are not upstream
08:00zzoon_vacations_till_6th_Aug[m: airlied: IIRC, you already have a branch for av1 decoding for anv. so you're going to work on it for landing the feature? (to anv)
10:12emersion: daniels: what can i do to unstuck the vk wl tearing MR?
12:20zmike: how is drm-shim supposed to work? I'm trying it for r600 and it doesn't seem to be picking up the shim at all
12:25zmike: cc alyssa
12:26daniels: emersion: I've got it on our list to try to push through
12:27emersion: ty!
12:49zmike: aha I got it
13:38alyssa: gfxstrand: does it make sense to have textures that are simultaneously bindless and non-bindless (indicated with a backend_flag)
13:38alyssa: this is making sense to me did i not sleep enough
13:40zmike: how would a bindless and non-bindless texture work
13:41alyssa: for gl, regular non-bindless texture with a texture_index(+texture_offset), and also a texture_handle that points to the descriptor in memory (i.e. texture_handle == binding_table_base_address + texture_index*stride)
13:41alyssa: for vk, same thing but with more descriptor sets
13:42alyssa: so that way the hardware can use the non-bindless part but software can get the address of the descriptor from the bindless part
13:42alyssa: maybe my monolithic texture lowering is Bad and that's the problem
13:42alyssa: a relevant case is reading from an array texture
13:42zmike: sounds hard to comprehend, but my brain is very smooth
13:42alyssa: since we use the hardware array texture read but also we clamp the array index in software
13:43alyssa: and that all happens in the backend
13:43alyssa: so the driver needs to turn all array texture reads into bindless access since the clamping creates a txs which only works on bindless
13:43alyssa: but then the hardware read ends up being bindless too for no good reason
13:44alyssa: but maybe the real answer here is that the clamping should happen before the driver decides whether to force bindless access, so the txs is separate from the tex
13:44alyssa: that seems significantly more sensible actually..
13:44alyssa: thanks faith
14:14DavidHeidelberg[m]: I'm thinking about a small weekend hackfest after XDC; who would be interested in joining? On of the topics of it would be CI, but I would be happy if any people interested working on any Mesa part would join.
14:52alyssa: I kinda wish gallium had a state tracker (~:
14:54HdkR: VKGallium
15:18gfxstrand: alyssa: It's not totally crazy
15:19gfxstrand: alyssa: Like, you could reserve 5 bits of backend_flag to store the set index or something like that if you wanted.
15:19gfxstrand: And make the semantics u[set_idx] + bindless_handle
15:25alyssa: gfxstrand: hmm that's actually reasonable as heck
15:32gfxstrand: alyssa: In Vulkan, the descriptor set index will always be known at compile time for textures/images.
15:32gfxstrand: Exact descriptor offset won't but the index will.
15:32alyssa: sure
15:33alyssa: same for GL with my fake GL descriptor sets
15:33alyssa: is that terrible?
15:33alyssa: that I have descriptor sets in my GL driver? :P
15:33gfxstrand: Sure, you can just fix it to 0 or 1 or wherever you put the fake descriptor set
15:33gfxstrand: Terrible? Not at all.
15:34alyssa: no, I have multiple :-D
15:34gfxstrand: Sure, why not?
15:34alyssa: :-D
15:34alyssa: They're not really descriptor sets
15:34alyssa: It's just that, when merging shader stages, a single hardware shader needs to be able to access the binding tables from multiple shader stages
15:34alyssa: so I model this the same way Zink+AGXV would ... descriptor sets per stage
15:35gfxstrand: Sure
15:36DemiMarie: Why is writing a Vulkan driver and using Zink for OpenGL much harder than writing an OpenGL driver directly, given that even if Vulkan is not a great fit to the hardware, OpenGL seems worse?
15:37DemiMarie: alyssa: I trust that if you say it is, then it is. I’m just curious _why_ it is.
15:38gfxstrand: Vulkan has a lot more boilerplate to get going. It also has a lot less room for hacks.
15:39gfxstrand: Hardware doesn't support primitive restart? On GL, you can can scan from the CPU. Not great for performance, but great for getting a driver off the ground.
15:39i509vcb: There is also the fact that for zink to get started you need some pretty non-default extensions
15:39gfxstrand: Queries funky? In GL, you can do whatever you want. In Vulkan, you have to figure out how to get it to fit in a query pool and use compute shaders as needed to make vkCmdCopyQueryPoolResults() happen.
15:39gfxstrand: There's that, too.
15:40gfxstrand: A Vulkan 1.0 driver capable of running some Vulkan apps and a Zink driver are not the same feature level.
15:41DemiMarie: Why does Zink need so many extensions?
15:41i509vcb: I'm trying to understand why I am getting asserts with trying to setup sync types in agxv before I see if it's a kernel bug. https://gitlab.freedesktop.org/-/snippets/7672
15:41alyssa: 15:40 <gfxstrand> A Vulkan 1.0 driver capable of running some Vulkan apps and a Zink driver are not the same feature level.
15:41i509vcb: The assert I hit has this "We can only have one timeline mode" comment
15:41alyssa: this is the bigger thing for me
15:41alyssa: I already have a GL 3.1 driver, right
15:41alyssa: I would rather spend my time on ARB_geometry_shader so we get a GL 3.3 driver
15:42alyssa: instead of agxv plumbing so Zink gives us a GL 2.1 driver and DXVK/VKD3D don't work
15:42alyssa: Eventually, we'll probably have a gl4 native driver and zink+agxv will be gl4 and we'll look at performance comparisons to decide the fate of the native driver
15:42DemiMarie: Thanks alyssa!
15:42alyssa: in the mean time ... zink does not solve any of my problems, and it creates a bunch of new ones
15:43i509vcb: DemiMarie: some features in minimum vulkan 1.0 that are required are implemented with extensions. VK_EXT_custom_border_color is one of those requirements apparently
15:43i509vcb: Plus if you don't want to languish in slow wl_shm presentation you need to implement things like VK_EXT_external_memory_dma_buf
15:43gfxstrand: That's a bit of a salty take but not too far off.
15:43alyssa: gfxstrand: I mean, Zink does solve problems
15:43alyssa: Just not any of the ones I have
15:43alyssa: at least not right now
15:44DemiMarie: alyssa: to be clear, this is not a feature request (and I hope it did not come across as such!)
15:44gfxstrand: It also depends a lot on your priorities. If you're bringing up hardware that the Linux desktop has never run on, what are you going to do first? A GL driver that's able to run GNOME? Or a Vulkan driver that's able to run Zink at a level where it can run GNOME?
15:44DemiMarie: makes sense
15:44alyssa: Yup
15:44i509vcb: Imagination seems to have taken the zink route from what I recall?
15:44alyssa: Gallium makes it really easy to hack enough together for a GL 2 driver that can run GNOME
15:44alyssa: i509vcb: Imagination has a different set of problems than we do, though
15:45gfxstrand: I mean, sure, if you're trying to reduce the over-all work required to get to Vulkan + GL nervana, Vulkan-first might make sense. But if your goal is to run GNOME as quickly as possible, writing a gallium driver is the way to do that.
15:45alyssa: ++
15:45gfxstrand: Part of the reason why I can forget about GL for NVK is because the nouveau GL driver can already run GNOME.
15:45alyssa: +++
15:45gfxstrand: Sure, it sucks, but it can run GNOME.
15:46gfxstrand: And its ability to run GNOME is good enough to composite whatever game you're running on NVK+DXVK.
15:46alyssa: ++++
15:46DemiMarie: How much of this is because Mesa has more helpers for OpenGL than for Vulkan?
15:46gfxstrand: So for going from where we are today to the maximum gaming experience as fast as possible, forgetting about GL entirely (including Zink, sorry) and focusing on solid Vulkan with DXVK features is the path.
15:47gfxstrand: DemiMarie: Some? Vulkan makes it harder to have helpers but also there's definite gaps and we're working on filling those gaps.
15:47HdkR: Perfect for running Neverball under
15:47gfxstrand: But, also, how wicked that curve is depends on hardware.
15:48gfxstrand: On NVIDIA, we don't need many of those helpers because the hardware has basically everything built-in.
15:48alyssa: AGX is basically a really fast software rasterizer ;~P
15:48gfxstrand: Literally the only things not built in are MSAA resolves and blits and blits are kinda there but horrible.
15:48DemiMarie: gfxstrand: I see. Is part of it because Vulkan requires features that would typically be brought up later?
15:48alyssa: We need ALL the helpers :D
15:48gfxstrand: Not really
15:48gfxstrand: But certain things like copying images to/from tiled memory are pretty basic things you don't think about.
15:48ids1024[m]: You're also probably more likely to be running Vulkan-based games on an Nvidia GPU than on Apple Silicon anyway. Though that would be interesting with a good Vulkan driver and fast x86 emulation.
15:49DemiMarie:forgot that games are the main reason for these fancy new APIs
15:49gfxstrand: In theory FEX-EMU should make that tractable.
15:49buduar: Why not to go at GL ES+EGL and program a really performant driver under whatever gpu accelerator, it's not very hard?
15:49HdkR: gfxstrand: <3
15:49alyssa: ids1024[m]: Right, that's the other part of the calculus ... NVIDIA GPUs have full hardware support for {geometry shaders, tessellation shaders, transform feedback, ..} and it's reasonably straightforward to implement in the drivers
15:50gfxstrand: buduar: I can't tell if that's sarcastic or nog.
15:50alyssa: So going from 0 to DXVK on NVIDIA hardware is a lot more straightforward than AGX where none of the above has effective hardware support
15:50gfxstrand: :D
15:50alyssa: but a Vulkan driver that doesn't support those features (dumb as they are) won't be able to run any games other than, like, VkQuake
15:50DemiMarie: alyssa: is MoltenVK helpful at all, at least in terms of “how do I translate X to something AGX actually implements”?
15:50i509vcb: I guess you could describe agx as being very shader heavy?
15:51alyssa: i509vcb: yeah
15:51alyssa: DemiMarie: absolutely not
15:51alyssa: moltenvk is a massive pile of hacks
15:51buduar: gfxstrand, it's not sarcastic, ES does everything correctly, the precision can be lifted, cause they save die area , es is best.
15:51alyssa: and moltenvk is broken in all the places you would expect given where agx doesn't have support for things
15:52ids1024[m]: DemiMarie: > * <@demi:invisiblethingslab.com> forgot that games are the main reason for these fancy new APIs
15:52ids1024[m]: For better or worse, for most normal graphics stuff that aren't games on Linux you just need GLES 2.0 or so. Maybe some fancy professional video software also does fancy things with Vulkan.
15:52gfxstrand: buduar: First off, "really performant driver" is "very hard" no matter what API or hardware.
15:52alyssa: TBH, seeing moltenvk claim support for stuff makes me immensely sad because we're trying to do things Right but they get to advertise the punch sooner by layering hacks on hacks and shipping the broken thing fast
15:52buduar: gfxstrand, it's only tiny extension , have a look at this https://github.com/jermp/s_indexes
15:52DemiMarie: alyssa: wow, I was not expecting that!
15:52DemiMarie: Does AGX have any fixed function stuff at all?
15:53i509vcb: Well GLES 3.2 is nice to have. From what I recall there is some HDR related stuff that a wayland compositor can actually use there
15:53i509vcb: (or was it 3.0?)
15:53alyssa: DemiMarie: sure. It's got a rasterizer, texture fetch hardware, and .. yeah those are the biggies
15:54buduar: the groundwork has been there for so many years, the natural continued extension is only that, and it's magical
15:54alyssa: Depth/stencil unit, primitive assembly, clipping/culling
15:54alyssa: It does have a tessellator but it's not sufficient for any of GL/ES/VK/D3D
15:55alyssa: MoltenVK is broken in precisely those places, Apple's GL driver falls over to tessellating on the CPU and your performance goes off a cliff
15:55i509vcb: Metal does apparently has mesh related stuff advertised but I imagine how the hardware implements it can be very weird
15:55alyssa: i509vcb: there's no mesh hardware, it's done entirely software
15:55i509vcb: oof
15:55i509vcb: that sounds brutal
15:55DemiMarie: alyssa: that’s interesting, not least because it tells me what stuff genuinely cannot be emulated in shaders efficiently
15:56alyssa: our current understanding is that the mesh shaders run as compute kernels that generate geometry by something like device_generated_commands, creating draws with regular vertex shaders
15:56DemiMarie: GPU-side JIT?
15:57alyssa: as far as we know, the only trick it has (that an application doesn't have) is a mechanism to allocate memory dynamically from a shader
15:57alyssa: but even that is implemented in firmware with a kernel dance, not hardware
15:57buduar: gfxstrand, also look at this what they managed to hack on dma https://people.ece.cornell.edu/land/courses/ece4760/RP2040/C_SDK_DMA_machine/DMA_machine_rp2040.html
15:57i509vcb: I'd guess if agx is so shader heavy Apple would try to put as much die space into compute/shader execution
15:57alyssa: i509vcb: well yeah, that's the tradeoff. drop all the fixed function hardware and you can get more shader cores
15:58alyssa: for implementing Metal, agx is the right design
15:58alyssa: for D3D or VK or GL... less great.
15:58alyssa: but critically, entirely possible.
15:58alyssa: I want to defeat the narrative that AGX somehow "can't" support conformant GL and Vulkan
15:58gfxstrand: buduar: I don't see how any of those links have anything to do with what's being discussed.
15:58DemiMarie: how much will not having that hardware hurt Vulkan performance?
15:59alyssa: It can. Apple chooses not to.
15:59alyssa: That's a political choice and one that Apple should not be making
15:59i509vcb: I've found agx to be quite performant from my use with the gl 3.1 driver
15:59alyssa: and everywhere MoltenVK fails conformance, that's on Apple
16:00DemiMarie: alyssa: why is that?
16:00DemiMarie: why is it on Apple and not a MoltenVK bug?
16:00i509vcb: Someone is certainly going to get the wild idea of trying to run mesa's asahi driver on macOS to get proper Vulkan eventually
16:01alyssa: i'd rather people switch to linux :~)
16:01DemiMarie: I had the same thought
16:02buduar: gfxstrand, but i do see, cause vulkan there is no need to handle any cpu threads, the compilation is so tiny it's bus traffic only, dma can do that
16:02buduar: when you order bunch of loops in the compiler, dma can handle it
16:02buduar: but this correct compiling is quite tiny
16:03DemiMarie: Also does this channel have logs?
16:03gfxstrand: Yeah, no... That's not how any of this works.
16:03i509vcb: DemiMarie: yes
16:03DemiMarie: i509vcb: where?
16:03i509vcb: https://oftc.irclog.whitequark.org/dri-devel/
16:03i509vcb: same place as #wayland
16:04DemiMarie: i509vcb: thanks
16:04i509vcb: Back to what I was initially here to ask...
16:04i509vcb: gfxstrand: on the snippet I linked above, what would be typically causing that weird assert for vk_sync?
16:08gfxstrand: i509vcb: Good question. That's definitely odd.
16:08gfxstrand: Oh, that assert has a comment on it! You have more than one timeline type
16:08i509vcb: This happens in agxv if you were wondering
16:09gfxstrand: i509vcb: Does your kernel driver support timeline sync objects?
16:09gfxstrand: If so, then you don't need all that `sync_timeline_type` stuff.
16:10i509vcb: From what I recall yes, but it's untested
16:10gfxstrand: Sounds like a good time to test it!
16:10i509vcb: So I guess I'll need to talk to lina about finding the bugs then
16:11gfxstrand: The other option is that you can do `device->drm_syncobj_type.features &= ~VK_SYNC_FEATURE_TIMELINE` to disable timeline sync objs.
16:11gfxstrand: And then the emulation will work fine.
16:14i509vcb: yup I guess it's a problem with the kernel driver since the timeline deqps just hang forever
16:15i509vcb: Hmm although the emulation doesn't like a different assert apparently
16:15gfxstrand: We really should make the vk_drm_syncobj_get_type() take a `bool supports_timelines`
16:22alyssa: i509vcb: timeline sync needs to work for the kernel merge, so please branch off the driver with real timeline sync and a deqp case hitting the kernel bug and send it to lina for debug
16:22alyssa: thank s:)
16:23i509vcb: ok so I guess it's time to build kernels
16:23gfxstrand: Yes, it should work for the kernel merge
16:26buduar: Sure it does work so, if you offload to dma, there is no need for threads on CPU, and there's no need to do any locking with performance reasons. And there is no need to fixate sport results by killing underaged Estonian kids. Thread would issue bus instructions and after that alu, for performance reasons it's not needed, human is not able to trace such perf. If you compile correctly there's no CPU threads needed to fill the pipeline with more
16:26buduar: data, leave them to os smp.
17:40alyssa: zmike: Does Zink support stores&atomics from geometry shaders?
17:40alyssa: (provided the underlying vulkan driver supports vertexPipelineStoresAndAtomics, I mean)
17:40alyssa: If so -- I am wondering if it is subtly broken
17:41alyssa: The Vulkan spec ("9.8.1 Geometry Shaer Execution") implies a geometry shader might be invoked multiple times
17:41alyssa: but the GL spec ("7.13.1 Shader Memory Access Ordering") implies a geometry shader is invoked exactly once per primitive
17:49anholt: alyssa: our conclusion has been that the GL spec didn't really mean that, and tests have been fixed over time to allow multiple execution.
17:49alyssa: anholt: Alright :+1:
17:50alyssa: So I can implement the Vulkan behaviour even for GL and hopefully everyone is happy?
17:50anholt: (or, maybe, the GL spec meant that at the time, but they realized whoops, and also nobody needed that detail, so we all just pretended that's what it meant all along)
17:50alyssa:doesn't understand how side effects in vertex shaders can possibly be *useful*, but..
17:53alyssa: whole bunch of KHR-GLES31 tests look bogus
17:54alyssa: KHR-GLES31.core.shader_atomic_counters.basic-usage-vs, KHR-GLES31.core.shader_atomic_counters.advanced-usage-multi-stage, etc
17:55alyssa:can make the tests pass with enough hacks but that's not really the point
19:54bylaws: alyssa: adreno geom shaders execute once per vertex in both GL and VK so should definitely be fine
19:55alyssa: :+1:
19:55alyssa: wait the geom shader is once per vertex?!
19:57bylaws: Output vertices I mean
19:57bylaws: It's invoked max_vertices times per prim
20:02alyssa: that's.. also bizarre, wow, lol
20:22robclark: alyssa: it does let the GS run for each output vertex in parallel
20:24alyssa: robclark: fair enough. I guess that helps reduce divergence and stuff?
20:26robclark: I guess it really depends on the structure of the GS..
21:06doras: karolherbst: is anything needed for https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/318?
21:09karolherbst: not really. I should probably just merge it...
21:11karolherbst: someobdy would have to make a release, I do not know to make one, but I guess it's fine to wait anyway
21:48doras: karolherbst: thanks. I agree that it should be fine to wait.
23:07karolherbst: I'm really running into the weirdest issues... fs_visitor::split_virtual_grfs where num_vars is 4750009 :')
23:07gfxstrand: Uh...
23:07karolherbst: ehh.. might be a memory corruption actually
23:08karolherbst: maybe not
23:08karolherbst: the nir_shader be like `constant_data_size: 4000000,`
23:09gfxstrand: hehe
23:09karolherbst: yeah.. nir_print_shader in gdb throws a `Cannot access memory at address 0x7ffffeddcb6f`
23:09karolherbst: something smells
23:09karolherbst: prog vars with chip-spv seem to work great, except...
23:10karolherbst: probably some weirdo overflow somewhere...
23:10gfxstrand: valgrind is your friend
23:11karolherbst: valgrind is not my bf anymore, libasan is my new bff
23:11karolherbst: but it's kinda weird
23:12karolherbst: I don't even know why gdb complains about it, because I don't even know what that pointer is supposed to be
23:12karolherbst: the nir is somewhere else
23:12karolherbst: nir_print_shader is somewhere else...
23:13karolherbst: ehhh... probably my stack just got trashed
23:19karolherbst: "con 64 %6750014 = load_const (0x00000000003d08cc = 3999948)" mhhhhh
23:20karolherbst: we probably shouldn't inline massive loops or something
23:20karolherbst: acutally.. what's the spirv
23:22karolherbst: the spirv literally has 65 SSA values
23:26karolherbst: yeah.... something really weird happens and a very small nir explodes massively in size
23:27karolherbst: "decl_var constant INTERP_MODE_NONE float[1000000] __chip_var__initializer = null" ah yes....
23:28karolherbst: gfxstrand: https://gist.githubusercontent.com/karolherbst/c1f06f1320bcd541792081b38e6203ee/raw/50df0a0f376475e47c2c58fe8abcc3c879a4f4c5/gistfile1.txt
23:28karolherbst: we _might_ want to lower that to a loop
23:30gfxstrand: karolherbst: lol
23:30karolherbst: yeah.. it's the memcpy lowering
23:31gfxstrand: Yeah, we probably want a threshold on the size. :joy:
23:31karolherbst: or we alwyas lower it to a loop and let the loop unroller do its magic
23:32gfxstrand: And maybe something smarter if we know alignments and that the size is well-aligned.
23:32karolherbst: myyy
23:32karolherbst: mhhh
23:32gfxstrand: Because actual loop case copies one byte at a time
23:32karolherbst: maybe
23:32karolherbst: yeah...
23:32karolherbst: I think it's one element rather
23:33karolherbst: or is it really byte?
23:33gfxstrand: byte
23:33karolherbst: it's all based on derefs still tho
23:33gfxstrand: It's memcpy, mate
23:33karolherbst: ehhh
23:33karolherbst: right..
23:33karolherbst: though
23:33gfxstrand: (Imagine I said that in a semi-passible Austrailian accent)
23:33karolherbst: the lowered nir uses uvec4
23:34karolherbst: ahh yeah
23:34karolherbst: memcpy lowering isn't that dumb
23:34gfxstrand: We could probably make the loop do vec4...
23:34karolherbst: lower_memcpy has this "copy_type_for_byte_size" function which decides how big an element is
23:34karolherbst: and the biggeest thing is vec4
23:34gfxstrand: Like, emit 3 loops: copy in vec4s, copy what's left in dwords, copy what's left in bytes.
23:35karolherbst: it's already done it seems :D
23:35gfxstrand: Not in the loop case, it isn't
23:35karolherbst: you even wrote the code
23:35karolherbst: ahh
23:35karolherbst: ohh, that's what you meant
23:35gfxstrand: The unrolled case can do anything
23:35gfxstrand: The loop case needs work
23:36karolherbst: I see...
23:36karolherbst: uhh
23:36karolherbst: right
23:36karolherbst: if the size is a constant then we are smart
23:36gfxstrand: Yup
23:36karolherbst: otherwise we'd need peak memcpy...
23:36karolherbst: uhm...
23:36karolherbst: mhhh
23:37karolherbst: let's just make the const size thing also emit a loop and let the loop unroller optimize it for now... maybe
23:37karolherbst: we can always optimize the variable thing later
23:37karolherbst: mhhh
23:38karolherbst: might as well just merge those branches and be smarter about element size selection...
23:38karolherbst: maybe it wouldn't be too bad after all
23:38karolherbst: I can try to write the code... shouldn't be tooooo bad
23:39gfxstrand: I'm already typing
23:39karolherbst: okay
23:41karolherbst: but I like how chip-spv handrolls their Initializer kernel and doesn't use the SPIR-V initialzer stuff... saves me the trouble of implementing that as well
23:44karolherbst: well.. it's faster this way anyway as then you are not limited to one thread...