08:00 dolphin: airlied, sima: Added you folks to the discussion thread about peek/poke access adding for EU debugger
08:00 dolphin: thread "[PATCH v6 2/8] drm/ttm: Add ttm_bo_access"
08:06 dolphin: feel free to pull in any relevant folks who might have opinions on the parasitic thread debugging vs. direct debugging
09:29 sima: dolphin, mbrost already pinged me about this
09:29 dolphin: ah, ok, good. just returning from some time off so behind on mails
09:29 sima: I'll reply with some thoughts, but I think the main suggestion I'll drop is that maybe we should have a "how is gpu debugging supposed to work" doc section
09:29 sima: and collect acks from driver and probably more importantly, gdb folks
09:30 sima: otherwise I fear massive derail going to happen here
09:30 sima: also I think from the earlier parts of the thread, there's a confusion around "midlayer or not and what does it even mean"
09:30 sima: which I think would be good to document in ttm DOC: sections too
09:30 sima: at least I suggested mbrost to type up such a patch (but he's not around here rn)
09:32 dolphin: yeah, there definitely seem to be different layers to the discussion, I tried to answer the high level "why do this?"
10:10 sima: dolphin, airlied replied
10:11 dolphin: I see your reply only?
10:11 dolphin: ah, probably add ":" there and it makes sense
10:15 sima: ah yes
10:15 sima: at least I didn't murder someone with missing punctuation :-P
10:16 sima: but was close, almost typed done instead of replied
10:20 dolphin: but yeah, overall on the details I expect there of course will be plenty of discussion, but will be good to first understand if folks are aligned at the high level
11:17 Lynne: its sad that the state of compute-only vulkan profiling is still exactly non-existent these days
11:18 Lynne: modifying mesa to dump anyway crashes the GPU, nvidia's tools are graphics-only
11:20 Lynne: using clockrealtimeext is like threading a needle with an excavator
14:01 sima: Lynne, https://www.youtube.com/watch?v=nvR4daRVKWQ
14:05 dolphin: Christian still doesn't use IRC?
14:10 sima: not that I know of
14:15 dolphin: ok, will then probably have to take the time to answer in length to the email tomorrow
16:18 alyssa: llvm 19 don't hurt me :(
16:21 HdkR: alyssa: Too late, it has already happened
16:21 alyssa: yeah...
16:25 alyssa: llvm19 getting me:
16:26 alyssa: function_temp array float[3], deref_cast to float *, store_deref
16:26 alyssa: and then deref_array+load_array
16:26 alyssa: is.. NIR expected to be able to handle that?
16:29 alyssa: karolherbst: ^
16:34 jenatali: That looks fine?
16:35 jenatali: IIRC there was a change Karol had in flight to optimize the cast with no pointer math into an array deref with index 0
16:35 alyssa: jenatali: I'm think the issue is casting array to pointer and then nir_opt_copy_prop_vars can't see it
16:36 alyssa: can probably add a pass to drop the cast and then should work maybe
16:36 alyssa: but yeah if there's an MR I can cherrypick that's preferable :)
16:37 alyssa: let's try https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27640
16:38 alyssa: hm no not that
16:45 alyssa:doesn't understand derefs enough for this
16:59 karolherbst: alyssa: yeah.. it's an annoying issue
17:00 karolherbst: the tldr is that various nir passes fight each other here
17:00 karolherbst: we need to make passes smarter I think
17:01 karolherbst: like the issue in LLVM is, that everything is just a plain pointer, if you think of C based inheritance, then it's almost impossible to add the right cast
17:01 karolherbst: sometimes a field at offset 0 is used, sometimes the outer thing
17:02 karolherbst: in that MR I _kinda_ tried to take into account how the pointer is used, but there are too many corner cases to do it sanely
17:04 alyssa: what do we do
17:12 karolherbst: I think we have to stop relying on the cast here. Maybe we should just assume it's alright, at least for CL and just trust the application isn't silly (tm), because data layouts are all well defined
17:12 karolherbst: the actual type of the source data doesn't actually matter with CL
17:12 karolherbst: but that's gonna be funky with the way we do derefs
17:15 karolherbst: sadly that idea doesn't work well with scratch memory..
17:33 boratfromkz: OK i post the last algorithm that made me smile again. after selecting a double value 72+72 you do +141+141 which comes from 1024-883 that line yields four times 72 and twice that of 69, now -883-26+512 of one triplet of two 72s and one 69, yields -184, where as another triplet when done with -840+512 yields -115 , so that should already say everything. I got it right this time. 26 comes
17:33 boratfromkz: from 1024-883-115. It is either some modified quadratics or synthetic division theorem way probably, but works immensely well on vectors that i was after. The science is so strong at my side that i do not bother with conflicts anymore, i figured all out, and i win all battles anyways without touching abusers in any way anymore. I fly that high. Scientists got the numbers correctly
17:33 boratfromkz: authored imo, i advise you to practice a bit and see yourself.
17:40 zf: Is there a nice way I can dump the commands sent by mesa to the AMD GPU?
17:40 zf: (I'm getting visual corruption trying to decode H.264 with VA, but not with Vulkan, and staring at my own code several times has not helped me find where I'm filling parameters wrong, so I figure I might try attacking the problem from the other end)
17:56 karolherbst: alyssa: the main issue is, that C has weird rules. Like you can have a temporary array of uint_8[128], but then always cast it to the same struct. Atm even if we make all those passes more intelligent, we'd fail to promote this to SSA values. And the issue you are seeing there is simply a specialized variant of that (where source and dest types are
17:56 karolherbst: somewhat related). I was considering a pass which works past IO lowering, but then if you have a single indirect, then that pass won't work anymore. But one thing we could do is, if we have a copy, we could try to introspect the incoming types and see if we could construct a path (with offset/idx 0) to the other (sub)type using deref_array and
17:56 karolherbst: deref_struct. Like if a is struct { float[4] } and the b is float[4], we simply replace a with deref_struct(a, 0). If a is struct { struct T t; float } and b is struct { struct T t; double }, we'd use deref_struct(..., 0) on both
17:57 karolherbst: just need to traverse the entire type chain probably
17:57 karolherbst: and throw away more casts
17:58 karolherbst: so maybe just need to make the cast optimization smarter? mhhh
18:05 alyssa: ok..
18:06 alyssa: karolherbst: is this going to go away when we switch to llvm's spirv backend?
18:10 karolherbst: alyssa: maybe? At least we can enable optimization levels which might get rid of some of those things for us...
18:11 karolherbst: but that doesn't help anybody giving us spir-v from old tooling or something
18:11 karolherbst: maybe I'll take another look once I'm done with SVM or so
18:19 nowrep: zf: https://gitlab.freedesktop.org/nowrep/mesa/-/commits/ac-vcn_dec_msg it's very incomplete but maybe it can help
18:37 zf: ah, thanks!
18:37 zf: thanks for the fix for bug 12057, also :-)
18:43 alyssa: karolherbst: i'm not worried about spir-v in the wild
18:44 karolherbst: right, but I am
18:44 karolherbst: I really should try to figure out a good solution there, it's just not very critical in terms of functionality 🙃
18:45 alyssa: this is blocking far41 release, so.. it's critical for us at least..
18:45 alyssa: current plan is to backport the whole llvm18 stack..
18:46 karolherbst: ohh, is it crashing? Or simply "it requires scratch memory, but we don't handle that case"?
18:46 alyssa: the latter
18:47 karolherbst: right... Intel had the same issue, they just set scratch to 0 and moved one
18:47 karolherbst: *on
18:47 alyssa: no i mean
18:47 alyssa: load_scratch/store_scratch with llvm19 wasn't there with 18
18:47 alyssa: no asahi driver shaders need scratch or spilling, this is asserted
18:47 karolherbst: yeah, but intel has a backend pass to clean it up
18:47 karolherbst: so they opt to ssa a bit later
18:48 jenatali: Oh this is asahi's internal shaders, fun
18:48 karolherbst: but regardless of existing workarounds, I'd like to get it fixed at some point
18:48 alyssa: jenatali: yeah, although that's ultimately just a smoke test for cl being horribly broken or not
18:49 jenatali: I mean, CL needs real scratch
18:49 karolherbst: maybe let me think about this issue this week and see if I come up with anything
18:49 alyssa: Yes, but if code didn't use scratch on fedora 40, it suddenly spilling is a severe perf regression
18:49 jenatali: Oh sure
18:50 alyssa: ultimately this is all a symptom of the LLVM SPIR-V circus
18:50 karolherbst: yeah... I think intel checks the offset and turns matching thing to an ssa value
18:50 jenatali: Yep
18:50 karolherbst: we could have a more generic pass for that, but that still keeps the core issue there
18:50 karolherbst: alyssa: I don't think the spir-v backend in llvm is gonna to change that, because the LLVM IR doesn't have that knowledge. The solution everybody goes for atm is just add more spir-v extensions 🙃
18:51 alyssa: i see.
18:51 karolherbst: and... that extension won't map nicely to nir anyway
18:51 karolherbst: I think
18:51 jenatali: karolherbst: I disagree. DXIL isn't going to support scratch, and the HLSL -> SPIR-V pipeline should also be capable of producing DXIL as part of the HLSL upstreaming
18:52 karolherbst: https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_untyped_pointers.asciidoc
18:52 jenatali: So I think the requisite info is going to be added to LLVM IR to stop it doing dumb things
18:52 karolherbst: jenatali they removed it, and I'm sure they won't bring it back
18:52 jenatali: Yeah we'll see about that
18:52 karolherbst: well :)
18:53 karolherbst: but as it stands, there are LLVM versions where there is simply a pointer type without specifics
18:53 jenatali: Sure, but I think it will get better in the future
18:53 karolherbst: mhh, how so?
18:54 zf: oh that's interesting. this fixes it: <https://paste.debian.net/1335223/>
18:54 karolherbst: but in any case, I kinda wanted to solve this issue, because it's kinda fixable, just requires more code (tm)
18:55 zf: nowrep: does this look potentially correct? no idea if this is a driver bug or an application bug; VA is so underspecified...
18:55 jenatali: Folks are interested in getting Clang to produce graphics/Vulkan shaders, which have a lot more restrictions than CL
18:55 jenatali: This stuff will need to be solved before the SPIR-V layer for that to work
18:55 karolherbst: make "SPV_KHR_untyped_pointers" required in the next vulkan version, done
18:57 karolherbst: or well.. it simply relies on SPV_KHR_untyped_pointers if you use LLVM to compile to vulkan spir-v
18:58 karolherbst: maybe I'm just less optimistic than you on that one :D
18:58 nowrep: zf: you should create the context with the coded size in the bitstream, so for h264 that's always 16 aligned
18:59 zf: hmm okay, and recreate it if it changes?
19:00 zf: the Vulkan side seems to be fine with this, but maybe that's a Vulkan (spec?) bug
19:01 nowrep: yeah recreate if the size changes
19:03 zf: should I also be recreating it for Vulkan? it seems to work fine as-is...
19:04 nowrep: yes
19:06 zf: got it, thanks
19:09 alyssa: karolherbst: if perf is trash on all drivers, that's not going to fly :p
19:10 karolherbst: that's true, but I mean, we can fix this on a nir level, just have to make the optimizations smarter 🙃
19:11 karolherbst: generally the idea would be to be more relaxed comparing those two sides of copies (or load/stores) and figure out how to make it work.
19:13 karolherbst: alyssa: weird idea: try adding a padding member at offset 0 🙃 and see if that is a suitable workaround. I think that changes the generated code enough. But might end up with something similar
19:17 karolherbst: mhh though in your case it's an array...
19:24 zf: xrandr
19:24 zf: oops, wrong window :P
19:32 lina: I worked out how to ship llvm18 spirv stuff on F41 so at least this isn't on fire any more ^^
20:10 dj-death: alyssa: welcome to our world, I'm sorry.
22:28 karolherbst: yeah.. I really should bump up the prio on that one 🙃
22:32 figaroseen: alyssa: they silence real people, if you want to preserve your work, you gonna have to cope with their rules and you might be as young that maybe it's beneficial to you, another plan would be since all real people are at my side including gangsters and real forces to make a meat out of this syndicate with my personal opinion there yes i would do it for personal reasons since they tried to
22:32 figaroseen: kill me, but for you i would not advise that path, just keep doing the nonsense work and have fun , maybe one day you get gifted some new ryzen or just buy one that works ok, the last eight generations of intel are also pretty ok. Those channels have turned into a joke anyhow to some degree the least, but most important the systems that can be redesigned have enough of base goodies to do
22:32 figaroseen: that. As for fact , windows, osx , linux and so many other oses, are suitable to make such hacks i have queued up.