10:18 akien: Heya! Let me know if this isn't the right channel to discuss this.
10:19 akien: On F43 (mesa 25.2.7), when initializing a Vulkan app (Godot), I get the following warning:
10:19 akien: `terminator_CreateInstance: Received return code -3 from call to vkCreateInstance in ICD /usr/lib64/libvulkan_dzn.so. Skipping this driver.`
10:19 akien: I assume the Vulkan loader tries all ICDs to see which ones are suitable, but dzn is the only one to raise a warning, though there are a bunch more packaged that aren't relevant for my hardware. I also assume Dozen is only relevant on Windows for WSL.
10:20 akien: I'm trying to figure out whether I should patch Godot to silence that warning (I would still want to know if e.g. `libvulkan_radeon.so` were to fail, so I can't outright ignore init warnings), or whether it's something to look at in Fedora packaging or Mesa upstream.
10:20 K900: I'm pretty sure this is fixed in 25.3 so that dzn returns the correct error code
10:21 jannau: should be fixed by https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38611
10:22 akien: Ah that's great, thanks! I thought I had searched for that error on GitLab but somehow I missed Dave's issue and MR.
10:23 jannau: fix is even in mesa 25.2.8
10:23 akien: Awesome, it should find its way in Fedora 43 soon enough then.
10:23 akien: It's not a big bother either way but I wanted to check whether that's something worth fixing, which it seems it was :)
11:20 Tom^: does creating a GL_TEXTURE_2D on a compressed modifier dmabuf occur some penalty, im not sure if im seeing things or just measuring this wrong but using GL_TEXTURE_EXTERNAL_OES is faster in those cases
14:46 zmike: mareko: I'm looking at some perf items related to hash table lookups and I see a couple years back you replaced some hash_table usage with sparse array; is that globally applicable?
14:56 glehmann: does anyone have hardware that's supported by NIR and that flushes output but not input denorms?
14:56 glehmann: SPIR-V doesn't require input flushing but I think we should
14:57 glehmann: makes optimizations easier
15:14 soreau: zmike: I think the git log breadcrumbs include reuse_gl_names env var, so probably yes?
15:15 zmike: well the question is if it's globally applicable then why don't the main hash_table and set implementations just wrap that sort of mechanism
15:15 soreau: WIP :)
15:16 zmike: I guess maybe it's not quite the same since GL names always pass around the id/key whereas hash_table/set have to calculate it and there may be collisions
15:29 mareko: zmike: I don't know what you mean; the GL ID reuse is not an optimization, it's for compatibility with apps
15:30 zmike: mareko: your commit describes the _mesa_HashTable change as a 19% performance improvement
15:35 anonymix007[m]: Answering my own question: spirv_to_nir doesn't set driver_location but nir_to_spirv in zink uses it. What's supposed to set it then?
15:35 zmike: the driver
15:45 glehmann: in theory nothing, it's deprecated :)
15:46 anonymix007[m]: I assume the driver is zink in this case, and it clearly doesn't. Any ideas why?
15:46 anonymix007[m]: glehmann: So I wonder how it is supposed to work then and what I can do about this.
15:48 zmike: grep -r driver_location
15:50 anonymix007[m]: zmike: does zink require doing something special for NIR shaders generated by spirv_to_nir from Vulkan SPIR-V? I assume this path was never tested, so maybe it needs something to work
15:50 zmike: I can't answer that off the top of my head
16:07 anonymix007[m]: Should I file an issue about this? I can manually set the correct values and these do not get overwritten, so clearly none of the lines with 'driver_location = ' are reached for the respective variables
16:07 zmike: you can file issues, but they're not likely to be resolved given things work as expected for the current usages
16:08 zmike: if you're trying to use existing code in new ways, it's your responsibility to make that happen
16:11 ccr: to boldly go where no one has gone before?
17:13 alyssa: anonymix007[m]: what are you trying to do anyway?
17:23 anonymix007[m]: alyssa: I'm trying to make d3d10umd work with zink. Right now it uses TGSI which is completely broken (the whole shader is "optimized" out somewhere), so the triangle is not rendered at all. I replaced the DXBC to TGSI with dxbc-spirv and spirv_to_nir and now I can at least see the triangle, but the colors are wrong.
17:27 anholt: I think you're going to have a much easier time just figuring out what went wrong with your tgsi-to-nir that you lost your shader contents.
17:28 anholt: like, sure, dx10umd should probably be ported to emit nir directly at some point, but get things running first.
17:40 mareko: zmike: any optimizations to GL ID management are on top of the new GL ID allocator that always returns the lowest unused ID
17:54 anonymix007[m]: anholt: I'm not so sure about that one. TGSI looks reasonable, tgsi_to_nir also looks reasonable, but ZINK_DEBUG=tgsi,nir,spirv prints empty shaders: https://paste.debian.net/hidden/b1c1ea84
17:56 anholt: sounds like you need to move your debugging earlier and see what's happning inside d3d10umd, then.
17:56 anonymix007[m]: At this point I'm starting to consider adding PIPE_SHADER_IR_SPIRV to just bypass whatever zink does for TGSI/NIR and use that SPIR-V directly instead.
17:58 mareko: fixing tgsi_to_nir is probably the easiest approach
17:58 anholt: I'm going to bet that you're going to face *so* much more pain trying to rig that up. You're assuming that d3d10umd on the frontend and zink in the middle don't modify shaders in the process of translation (relayout bindings of things, for example), which does not sounds like a remotely valid assumption to me.
18:01 mareko: the system of shader resource bindings is crazy different between DX10 and VK
18:02 mareko: there is also DXVK BTW
18:02 anonymix007[m]: mareko: I made a simple tool to just test what tgsi_to_nir does to the shader and it looks reasonable (that is, even nir_to_spirv produces the same results for that). I bet some of the passes in zink are breaking it later
18:03 mareko: why make dx10umd work though?
18:05 anonymix007[m]: Have you heard of that niche operating system for video games called Windows? It requires having some sort of direct-x umd. I'm porting Venus to it
18:07 mareko: dx10 on zink on venus? oh wow
18:07 anonymix007[m]: DXVK is not UMD, so making it work as such is much harder. d3d10umd is almost there
18:09 mareko: my position on venus is that it's not a commercially viable solution, so hopefully nobody is expecting any ROI from it
18:12 anonymix007[m]: The only alternative is gfxstream and it doesn't even work in Linux currently. There's not much choice here. And no, native context is not an option because Intel and Nvidia GPUs exist
18:13 anonymix007[m]: @anholt: for the reference, d3d10umd works with virgl, so I don't think there is a problem there.
18:14 glehmann: isn't gfxstream just NIH venus, or does it have some technical advantage?
18:14 anholt: you're talking about a frontend on top of two different gallium drivers. behavior of the frontend will differ.
18:15 mareko: native context is possible to implement, far easier to do than it seems, and I think it's the only solution that has a non-zero chance of bringing some business/funding for the project in the long term
18:15 anholt: you're saying, and you pasted, that the tgsi coming into zink is empty when it shouldn't be. So, the next step is to figure out why you've got empty tgsi coming in.
18:16 anholt: (though, mareko is completely right here)
18:16 mattst88: glehmann: the most hilarious part of saying that gfxstream is "NIH venus" is that both came from google :P
18:17 dcbaker: mattst88: I mean, what could be more Google than Google not realizing that it's writing both?
18:17 anonymix007[m]: No-no-no, TGSI is not empty. NIR and SPIR-V are
18:18 zmike: dcbaker: google realizing it's writing both but they're managed by different teams and thus it's impossible to stop either one?
18:18 anholt: what's going on in line 1-6 of your paste?
18:19 anonymix007[m]: These are empty shaders, that's fine and expected. The actual shaders for the triangle are later
18:19 mattst88: zmike: heh, spot on
18:20 zmike:bigcos
18:20 mareko: wasn't native context also created by google?
18:21 anholt: mareko: by google engineers not quite fully defying management to prove that venus was a bad idea, yes.
18:22 mareko: oh don't we all love big companies
18:28 mareko: if venus-on-radv can't outperform radv-on-native-context and run Steam games as good the latter, any investment into projects layered on top of venus is a total waste
18:28 anonymix007[m]: Assuming you ignore that Intel and Nvidia GPUs exist, sure
18:29 anholt: Just sort out native context for intel and nv. that's way less effort than making venus work, let alone not suck.
18:29 mattst88: there's an existing MR for Intel at least
18:29 anonymix007[m]: Not for new ones with xe driver
18:31 anonymix007[m]: If anything, I plan to make native context work on Windows later, but starting with Venus makes sense because of it being more cross-platform right now and because it seems to fit into WDDM a bit better than native contexts
18:33 mareko: as long as everybody accepts that nobody will make any money from venus and nobody will want to game on it, then whatever
18:34 robclark:not sure the state of venus on windows host.. but also not sure how the host side of nctx would work with windows hosts either
18:35 robclark: windows kinda being someone elses problem and all
18:36 mareko: Steam on Linux potentially running games same or better than Windows is also a thing
18:36 anonymix007[m]: robclark: It probably doesn't work in windows host, but the goal here is windows guest
18:38 mareko: BTW, if you want to run Windows games in Qemu, AMD has the best solution for that
18:40 anonymix007[m]: I hope it's not "use an insanely priced server GPU with SRIOV", is it? Sounds interesting if not, do you have more details?
18:41 mareko: we've had the solution for several years now
18:42 mareko: we've been talking about it here today
18:43 mareko: Windows games running on Proton running on Linux using RADV using native context forwarding driver ioctls directly the host kernel (skipping host userspace that's not Qemu)
18:45 anholt: glehmann, alyssa: curious about your thoughts: one of the problems faced by the algebraic pattern test is deciding if the input values we're testing invoke UB of the opcode. What do you think of having nir_opcodes.py define expressions like the constant evaluation for deciding if it would be UB?
18:48 anonymix007[m]: mareko: Oh, so the solution is "don't use Windows at all". That's probably good enough for games, but not so much for other software. Wine is good, but sometimes it's just not an option and one has to use windows.
18:48 anholt: (bitfield insert/extract beyond bit size, f2i/u* overflowing, s/udot_*_sat with the dot product part overflowing, etc.)
18:48 mareko: in the future we'll also have direct-to-HW command buffer submission from guest userspace in Qemu, bypassing guest kernel, Qemu, host libraries, and host kernel
18:49 mareko: that already has experimental support in Mesa
18:50 anonymix007[m]: Anyway, Venus is not a problem here, I'm quite sure I would see similar issues with native contexts. To sum up, TGSI shaders from d3d10umd are broken with zink (because it optimizes them out completely for whatever reason), as well as NIR (converted from SPIR-V) shaders (because driver_location never gets set)
18:53 anonymix007[m]: If I run tgsi_to_nir manually, the resulting shaders look sane though: https://paste.debian.net/hidden/beb0eb09
18:56 mareko: there are no problems, native context is extremely close to native AMD performance, it's just RADV+radeonsi+ROCm, and it's been the most commercially successful solution that I know of, already used by customers and ready to be used by new customers today
18:57 alyssa: do we not have native context on Intel?
18:57 alyssa: we should fix that
18:58 alyssa: digetx: need review?
18:58 alyssa: Mary: could i interest you in native context nvidia (:
18:59 Mary: alyssa: for nouveau? I wonder how hard it would be to type/test but maybe I can do something quickly
18:59 alyssa: o:)
19:00 alyssa: anholt: I think - unfortunately - that's the wrong direction for us to move in :/
19:00 alyssa: as I was saying the other day, we will medium-term want NIR to adopt poison/freeze with LLVM semantics
19:01 anholt: that feels orthogonal?
19:01 alyssa: under that model, anything that doesn't literally crash the GPU to execute (page faults or whatever) would be defined to return Poison on out-of-bounds inputs
19:01 alyssa: so those expressions amount to strengthening the existing constant folding code to return Poison (some special sentinel value I guess) for those inputs
19:02 anholt: yeah, "would this opcode return poison if executed with these values?"
19:02 alyssa: and then poison has well-defined semantics anyway so you wouldn't be skipping it in the unit tests, you would just be checking that both sides do compatible things
19:03 Mary: alyssa: still figuring out nvidia kernel driver dma-buf mess for my testing backend but planning to switch to mesh shader after that so might take a while before I look at native context
19:03 alyssa: fundamentally once we have proper poison semantics in nir, there is never* UB for pure-ALU
19:04 alyssa: so there'll be no test skipping needed
19:04 anholt: cool, but we still need some definition in the opcode of when it's poison, yeah?
19:05 anonymix007[m]: mareko: so native context radeonsi won't optimize out those TGSI shaders into nops, will it?
19:05 alyssa: Yes but that's most naturally part of the constant folding definition
19:05 alyssa: unless you really don't want the sentinel I guess
19:05 alyssa: (which might be fair, it seems annoying to do 64-bit alu on 64-bit CPU..)
19:05 anholt: ok, so not a separate expression, but in the main expression, we can also set a poison flag?
19:05 alyssa: ok, I guess you've convinced me
19:06 alyssa: poison isn't a flag, it's a value in itself
19:06 alyssa: a "u32 -> u32" unop will actually be defined with domain and range [0, 2^32 - 1] U {Poison}
19:06 anholt: I'm meaning in the sense of our constant expression evaluation signature for now -- nir_const_value doesn't have a poison bit, and I don't think we're defining one today, so I need this information to go sideband somewhere.
19:07 alyssa: Yeah, I don't have a good solution to that right now
19:07 alyssa: so if a second expression is the sane thing to do, I guess that's okay
19:07 anholt: but it seems to me like doing this work sets us up for the future poison work.
19:07 mareko: anonymix007[m]: they are the same drivers as on the host, just Vulkan, GL, VAAPI and ROCm (HIP and OpenCL), just install Steam in Qemu and play games
19:09 mareko: just need to merge native context into Qemu
19:09 alyssa: * this diverges from LLVM, where integer division by zero is immediate-UB because that traps on common CPUs... I am not aware of GPUs with those semantics and things are a lot simpler if we don't have them in NIR.l
19:10 HdkR: Divide by zero can also return zero on common cpus ;)
19:10 alyssa: eh?
19:11 HdkR: ARM is pretty common you know
19:12 pac85: so does fex has to emulate trapping on div by 0?
19:13 HdkR: :blobsweat:
19:13 HdkR: We haven't implemented ALU faults for integer or floats.
19:14 pac85: so once fex will be a common implementation if will be common for x86 to return 0 on div by 0 :p
19:14 alyssa: i'm gonna write a game that triggers garbage collection with the loop `uint16_t timer = ~0; for(;;) { 1 / (timer--); usleep() }` and then put my x86 game on Steam and file a bug report when it doesn't work on Frame
19:14 alyssa: o:)
19:14 HdkR: nooo
19:15 HdkR: Cortex doesn't even support float exceptions, so we can't sanely implement those until Oryon or Apple Silicon D:
19:20 glehmann: I'm always suprised that amd gpu designer spend die space on float exceptions, it's such a barely used feature
19:22 HdkR: For those 0% of users, it would be quite a boon to performance!
19:49 mareko: we also have shader trap handlers for those exceptions installed by the kernel driver
19:50 airlied: I'm assuming HPC needs it
19:51 mareko: no clue
19:54 mareko: we use the trap handler for other things, like page faults and on-demand paging, or continuing on a page fault instead of hanging (waves hang by default on page faults unless the kernel driver installs a trap handler and does some things with it)
19:54 anonymix007[m]: Looks like tgsi_to_nir_noscreen produces reasonable NIR, but tgsi_to_nir's output is basically empty
19:55 HdkR: Emulating the exceptions without hardware supporting it is a terrible perf hit. Could definitely see some compute workloads being accelerated somehow :D
20:02 mareko: I'm of the opinion that GPU page faults should result in a segfault of the process
20:05 mareko: and SIGFPE for integer division by 0 on the GPU
20:06 glehmann: I would be happy if a gpu page fault only killed the process without a gpu reset
20:07 mareko: a month ago or so, amd-staging-drm-next got a change that waves no longer hang on a VM fault, but instead they just continue now
20:08 mareko: at least that's my understanding of what was implemented
20:15 mareko: the trap handler is quite nice for implementing something like gdb for AMD GPUs
20:47 anonymix007[m]: I *think* there is an issue with nir_lower_returns. tgsi_to_nir+nir_to_spirv shows the following (still correct) shader: https://paste.debian.net/hidden/7b80e957 if I short circuit the ttn_finalize_nir
20:48 anonymix007[m]: And the following already incorrect shader if I add a return immediately after nir_lower_returms: https://paste.debian.net/hidden/027939db
21:00 anonymix007[m]: Oh, wait, it is already incorrect, there is a "if true return". Surely that's the problem