00:26gfxstrand: I really hate that this stupid query crash cost me like 4 hours of CTS runtime today. *grumble*
00:28karolherbst: *pat *pat*
00:30gfxstrand: *prrrr*
00:30epony: popcorn teatiem
02:13i509vcb: In the Vulkan spec there is a mention of VK_ERROR_UNKNOWN possibly being returned by any command. But outside of that does the spec saying vkWhatever returns some specified error codes mean other error codes are technically valid to return?
02:14i509vcb: Question is related to me noticing that vkCreateWaylandSurfaceKHR states VK_ERROR_OUT_OF_HOST_MEMORY and VK_ERROR_OUT_OF_DEVICE_MEMORY are failure return codes, but Mesa can return VK_ERROR_SURFACE_LOST_KHR if you happen to hit some code paths
02:15i509vcb: s/valid/invalid
02:20zmike: sounds like bug
02:21i509vcb: It kind of makes sense that you could instantly lose a surface because the wl_display you gave to vulkan had a protocol error, but the spec seems ignore the existance of SURFACE_LOST in that case
03:03DemiMarie: Someone ought to fuzz Mesa and make sure it returns VK_ERROR_OUT_OF_HOST_MEMORY where needed. /hj
03:51Lynne: it's not like linux will ever return null for malloc, sadly, but on windows it's possible
04:03ishitatsuyuki: there are some CTS test that uses a custom allocator to simulate this. it's the C programmer's greatest foe ;P
04:15HdkR: Linux not returning null for malloc? That's easy, just run out of virtual address space
04:17airlied: just be a 32-bit game :-P
04:19HdkR: Yea, a 32-bit game is easy mode :D
04:22kode54: that reminds me
04:22kode54: I can't get Yuzu to run on Vulkan on ANV right now
04:22kode54: it's dying and throwing a terminating exception because some Vulkan call returns VK_ERROR_UNKNOWN
04:23kode54: naturally, the console output doesn't say where this is being thrown from
04:33airlied: alyssa: do you do function calls? :-) 24687 has some spirv/nir bits
05:06Lynne: ishitatsuyuki: it's my greatest regret, writing all this nice and neat resilient code to cascade all errors, but never actually using it or seeing it run
05:07Lynne: linux should give oom'd programs at least half a chance of closing carefully by letting malloc return null, but far too much code has been written under the assumption it won't
05:23HdkR: Lynne: I would love some inotify system to register low memory situations. Kind of like cgroup notifications but app level
05:24kode54: dj-death: do I need to poke that issue where I posted that trace?
05:26dj-death: kode54: that would be helpful
05:26kode54: will do
05:26dj-death: kode54: I thought you said it was a crash
05:26kode54: I meant the one where i915 was running slowly for one game
05:26kode54: I added the traces and generated a new log
05:27kode54: but then I didn't realize you went on holiday
05:27dj-death: normally all VK_ERROR_* should go through vk_errorf
05:27kode54: this is a different thing
05:27dj-death: there should be a trace somewhere
05:27kode54: VK_ERROR_UNKNOWN was Yuzu
05:27kode54: the log I added traces for was Borderlands: GOTY Enhanced
05:27kode54: I'm apparently tracking multiple issues
05:27kode54: in different hings
05:27kode54: *things
05:28kode54: not sure what to do about yuzu
05:28kode54: I'll have to look at the source code to see why it's just throwing an exception
05:30kode54: yuzu doesn't have a single reference to vk_errorf
05:31dj-death: kode54: I meant the vulkan driver
05:31kode54: oh
05:33kode54: how do I get those messages if the app isn't able to show them?
05:33kode54: which environment variable do I need to set to make them all just dump to the console?
05:34dj-death: kode54: that's the one? : https://store.steampowered.com/app/729040/Borderlands_Game_of_the_Year_Enhanced/
05:34kode54: yeah, that's the one
05:34kode54: I somehow got it for free for owning the original GOTY edition
05:36kode54: that game runs like crap on i915.ko, and causes a GPU crash on xe.ko
05:36dj-death: kode54: unfortunately you need to recompile mesa with this bit of code enabled I think : https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/vulkan/runtime/vk_log.c#L130
05:36dj-death: the message should end up on the console
05:36kode54: oh no, I have to build a debug build
05:36kode54: I suppose I should be running debug builds by default when testing out xe.ko
05:37dj-death: it's more than debug build I think
05:37dj-death: you also need to turn that line into : #if 1
05:37kode54: oh
05:37kode54: or -DDEBUG ?
05:37dj-death: or that yes
05:38kode54: yeah, my default build setup uses the mesa-tkg-git PKGBUILD and scripts
05:38kode54: and that does NDEBUG by default
05:38dj-death: maybe we should have a MESA_VK_LOG_FILENAME variable and write all the traces if enabled
05:38kode54: that may be a good idea
05:38dj-death: if it's just for yuzu you can also build your own repo
05:39dj-death: and set VK_ICD_FILENAMES= to the json file of the anv driver
05:39dj-death: like configure the repo with meson
05:39dj-death: then build : ninja -C build src/intel/vulkan/libvulkan_intel.so src/intel/vulkan/intel_devenv_icd.x86_64.json
05:39dj-death: and export VK_ICD_FILENAMES=$PWD/build/src/intel/vulkan/intel_devenv_icd.x86_64.json
05:40dj-death: the vulkan loader will pick up that driver when the app creates the VkInstance
05:41kode54: gotcha
05:42dj-death: kode54: just before I go and buy Borderland GOTY, you don't reproduce the problem with Borderland 3?
05:42kode54: I can try
05:42kode54: let me install that
05:43dj-death: thanks
05:43kode54: should probably take me about 20 minutes to install that
05:43dj-death: because I already have that one
05:43kode54: I mean, I was playing Borderlands 3 at one point, but there was an annoying issue that I didn't like with it
05:43kode54: where the masked water textures and such would randomly flicker through the rest of the world
05:44kode54: this happened on both i915 and xe
05:44dj-death: :(
05:44kode54: it went away if I set the environment variable for full sync, but that destroyed my frame rate
05:45kode54: I need to test it again
05:45kode54: let me install it first
05:46kode54: this is getting tight too
05:46kode54: this will leave me with about 45GB of free space
05:46kode54: I need to rebalance my installed junk
05:47kode54: maybe I just have a junk video card
06:15kode54: okay
06:15kode54: I enabled that block of debug code
06:15kode54: what do I need to pass in env to get all messages now?
06:19dj-death: kode54: you don't, we that activated it should print out on the console
06:19kode54: it didn't print anything that wasn't printed before
06:20dj-death: hmm okay strange
06:20dj-death: and now you have a debug build?
06:21kode54: I'll try that next
06:21dj-death: maybe also comment that early return : https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/vulkan/runtime/vk_log.c#L114
06:26kode54: https://www.irccloud.com/pastebin/GcycHPmD/
06:27kode54: looks like yuzu did a booboo
06:29dj-death: ah yeah
06:30dj-death: validation layers might have caught that
06:50tzimmermann: javierm, hi. may i ask you for some reviews?
06:52epony: yes
06:52epony: I review you now from this.
06:52epony: ah, it's for smb else
06:52epony: ok
06:59kode54: I started BL3 about 20 minutes ago
06:59kode54: it's still preparing vulkan shaders
07:00epony: ok
07:00epony: how is vulcan now?
07:01epony: amb assadore serveall
07:06kode54: 62% done processing
07:14kode54: now it's 64% done
07:15kode54: what the hell kind of shaders does this game use
07:15epony: shady
07:17HdkR: kode54: If you still have the DEBUG build, all the validation really slows down shader compilation
07:17kode54: it's a release build with that debug message output enabled
07:17kode54: it normally takes this long every time I first run BL3
07:18kode54: the game, after all, does download 2GB of shaders and transcoded videos
07:18HdkR: UE4 title, probably has a million shader variants as well :D
07:18epony: can it not YT?
07:18epony: why trans coded
07:24kode54: probably doesn't help that I've got a R7 2700
07:24kode54: I've already been told that bottlenecks my GPU
07:26HdkR: Fossilize shader compilation also will thread out to the number of CPU cores you have. So pretty good scaling
07:26kode54: commandline says --num-threads 15
07:27HdkR: Indeed
07:28HdkR: Maximum threads subtract one is easy math :D
07:28kode54: it just bumped up to 65%, then rolled back to 64%
07:32javierm: tzimmermann: sure
07:33tzimmermann: javierm, thnaks, i have more fbdev cleanups in https://patchwork.freedesktop.org/series/122976/ and https://patchwork.freedesktop.org/series/123017/
07:34javierm: tzimmermann: you are welcome, I'll try to do it later today
07:45kode54: okay
07:45kode54: I hit the skip button
07:45kode54: now it's gone green "running"
07:45kode54: and no window has appeared yet
07:48kode54: fine, I'll reboot to stable kernel and switch to stable mesa and see how long this takes to boot up
07:50epony: transcode with GPU
07:50epony: it has a lot of stream processors
07:51epony: in CPU is a drama
07:51kode54: did somebody say something
07:52epony: are you transcoding in CPU?
07:52epony: how much memory you have?
07:53kode54: starting from scratch on 23.1.6
07:53epony: how much of it moves in 1 second through CPU and how much does it bulge over the data set (memory expand vs data input)
07:54epony: is it really running the threads on all cores.. with good saturation?
07:55epony: can you offload it to the GPU
07:55epony: check your cache eviction rates
08:13kode54: fossilize just finished
08:14kode54: now it's doing the claptrap walk animation that used to have a progress display, but no longer does
08:14kode54: done
08:20kode54: yeah, none of the pipeline recreation lag that BL:GOTY Enhanced has
08:20kode54: but it still has the flickering water
08:20kode54: I'll record a video
08:31kode54: https://f.losno.co/v/bl3-benchmark.mp4
08:31kode54: I manage better frame rates under Windows, by quite a bit, on the same settings (Ultra)
08:34kode54: oh, from the other game
08:34kode54: https://gitlab.freedesktop.org/mesa/mesa/uploads/41d19a60a85ab244c18843a2864066da/trace.perfetto-trace.4.zst
08:34kode54: was that useful?
08:37kj: gfxstrand: to double check, it's not acceptable to compile down to SPIR-V or NIR (and serialize) at build time for internal shaders?
08:38kj: So the options would be to check in glsl and glsl_to_nir(), or build the shader in nir at runtime
08:41kj: Asking because for pvr we still need to unhardcode some internal shaders so would be nice to write them up with something more higher level than rogue ir (which we've done atm)
08:43kj: I recall a conversation here about setting up a common way of doing things for internal shaders but not sure what happened with that
08:50dj-death: kode54: thanks yeah
08:50dj-death: kode54: really looks like a window system issue
08:50kode54: which window system should I use?
08:51dj-death: kode54: looks like the game is like waiting to get a new buffer for an image for like 160ms
08:51kode54: is there one especially suited to this task?
08:51dj-death: kode54: I think most people use gnome-shell
08:51kode54: Xorg or Wayland?
08:51kode54: I have nasty window scaling glitches on both gnome and plasma
08:52kode54: resizing the scaling of one output to 200% causes the window shadows to glitch out
08:52dj-death: both should work
08:52kode54: I'll try gnome again
08:54dj-death: kode54: I can see on the graph that the GPU is completely idle at times : https://i.imgur.com/TxaAtBy.png
08:55dj-death: kode54: and everytime that seems to be because it ran out of swapchain buffer
08:55dj-death: and it's wait for one to come back from the compositor
08:55kode54: weird
08:55dj-death: but in the middle of that you have 10+ frame that went fine
08:55dj-death: each around 16ms
08:57dj-death: not ruling out some driver issue but it's really strange...
08:57dj-death: that doesn't look like a GPU programming issue
08:57dj-death: more like a WSI problem
09:15kode54: okay
09:15kode54: it happens under Gnome too
09:16kode54: could this also be that annoying as hell TPM bug
09:40dj-death: TPM?
09:50kode54: nope, it wasn't fTPM
09:50kode54: how did you find that it was waiting on frames from the compositor?
10:13dj-death: kode54: if you look at the timeline
10:13kode54: I don't know what I'm looking for
10:13kode54: and I have no idea how to zoom in or find more detail from what was logged
10:13dj-death: kode54: in the row that has the name of the app
10:14kode54: ok
10:14dj-death: you like on the row, it'll expand
10:14dj-death: then you see sub-rows that are the threads in the app
10:14kode54: what app are you using to view this trace?
10:14kode54: I'm using a web site
10:14dj-death: you can zoom in/out with Ctrl+scroll-up/down
10:14dj-death: yeah me too
10:14dj-death: https://ui.perfetto.dev/
10:15dj-death: and load the trace file
10:15kode54: yes, and I see rows named after the app
10:15dj-death: you should list of bunch of row in the app that are thread of the WSI
10:15kode54: with frames, and a bunch of different items that are gapped where the delays were
10:15dj-death: "WSI swapchain q ..."
10:16dj-death: the "pull present queue" are when the thread is waiting for a new swapchain buffer to be available
10:16kode54: oh
10:16kode54: I was looking at the wrong thing
10:16dj-death: that matches one image of the app
10:16kode54: I was looking at Borderlands-#-something rows
10:16kode54: no idea what those are logging
10:17dj-death: mind is called "Z:\home\chris\..."
10:17dj-death: so those "pull present queue" items is a thread being blocked on waiting for a free buffer
10:17dj-death: but you can see that they wait for 95ms, 80ms, etc...
10:18dj-death: one is really bad at 160ms
10:18dj-death: that way too much
10:18kode54: I see that
10:18kode54: and it's happening under Gnome too
10:18dj-death: it should be almost immediate
10:18kode54: could it be my kernel? I'm using a non-distro kernel
10:20dj-death: I don't know to be honest
10:20dj-death: never seen something like that
10:21dj-death: what's really strange is the recreation of swapchains constantly
10:22dj-death: that's not driven by the driver but by the app/dxgi
10:23kode54: dxvk bug tracker told me that they recreate the swap chain if it times out
10:28dj-death: kode54: yeah but what's odd is that for each swapchain, they appear to only do a single AcquireFrame
10:29dj-death: I see the app is polling a query as well
10:29dj-death: maybe there is some issue there
10:30dj-death: well yeah there might be a kernel issue after all
10:30dj-death: vkQueueSubmit is blocked for 90+ms
10:31dj-death: that's blocking the WSI in as well I bet
10:48kode54: the thing is
10:48kode54: I can't even test if this is an xe/i915 thing
10:48kode54: it won't even run on xe.ko
11:07dj-death: kode54: trace.perfetto-trace.4.zst was recorded on Xe ?
11:07kode54: no, i915
11:07kode54: I can't even get it to run without crashing the GuC on Xe
11:08kode54: it just starts up to a black screen, then gets a crash notice about lost DX11 device
11:09kode54: and the kernel dumps a useless GPU core text file to a /sys file
11:09kode54: since there's no usable GuC info in it
11:10kode54: it was even suggested that the GuC info that is there is from after it's already been restarted, so doubly useless
11:11dj-death: alright
11:12kode54: ah, it wasn't a DX11 device lost
11:12kode54: it was General Protection Fault
11:13kode54: [ 1908.101030] xe 0000:28:00.0: [drm] Engine reset: guc_id=133
11:13kode54: [ 1908.108626] xe 0000:28:00.0: [drm] Timedout job: seqno=4294967188, guc_id=133, flags=0x8
11:13kode54: yup, it crashed
11:13kode54: would love to have usable dumping so we can get to the bottom of these GuC crashes
11:13dj-death: if only we could have the type of dma-fence the kernel is waiting on
11:14kode54: oh, the other one? that would be great
11:56kode54: maybe DSB?
12:01swizzlefowl: Hello
12:11zamundaaa[m]: MrCooper: I'm doing a bunch of performance work for KWin and found something pretty unexpected
12:12zamundaaa[m]: When I'm rendering to a gbm buffer (imported as an EGLImage, which is used as the color attachment for an fbo) and call glFinish(), the fds of the buffer aren't always immediately readable afterwards
12:13zamundaaa[m]: This seems to happen quite seldomly on AMD, and more often on Intel. Are my expectations for how this works just wrong, or could there be some driver bugs involved?
13:13DemiMarie: kode54: Fuzz the GuC and report the bugs to Intel?
13:22DemiMarie: Sorry
13:23DemiMarie: Asahi Lina has some experience debugging firmware crashes.
13:33alyssa: kj: compiling glsl/cl to spir-v at build-time is fine.
13:33alyssa: compiling to nir & serializing is somewhat more sketchy. but intel goes all the way to hw binaries at build time so YMMV i guess
13:34alyssa: i'll probably send out common code for doing CL kernels at build time in a reasonably generic way. I have a vague plan to let CL C be usable for certain vertex/fragment shaders too, if that's something that's needed.
13:35alyssa: Not sure what kinds of shaders you're talking about. For small stuff usually nir_builder is the right call, it's just the chunky monkey kernels that really benefit.
13:36DavidHeidelberg[m]: eric_engestrom: the build job limit sounds like good idea, MR Ack :)
13:49kj: alyssa: thanks, we have compute shaders for queries which are fairly simple, but there's also a whole bunch of shaders used for transfer stuff which might be a bit more involved (haven't looked in depth there)
13:51penguin42: If I've got a 'Compute Shader LLVM IR' dumped from rusticl debug, is there anyway I can push it through llvm to see what it does?
13:51alyssa: kj: yeah.. I've done some pretty intense nir_builder but yeah working with C is a lot nicer :p
13:52alyssa: see agx_nir_lower_texture.c if you want to be scared :p
13:52alyssa: (not really a candidate for CL at this point)
13:53karolherbst: penguin42: llvm has tooling for their stuf,f like llvm-dis and whatnot
13:54penguin42: karolherbst: Yeh so I've seen those, but I don't know how to go from the debug output of mesa to feeding that into llvm so I can play around with llvm to see why it's doing what it's doing
14:07karolherbst: yeah... sadly I don't really know much about how to dig into those things deeper from an AMD perspective, might wantt o ask around in #radeon as there are some LLVM developers who might be able to help out with it
14:07penguin42: karolherbst: Ack, do you know how to throw that debug into llvm ?
14:08karolherbst: no idea how to trigger the normal pipeline, however there is AMD_DEBUG=asm or something to print the actual hardware level IR
14:08karolherbst: or something
14:09penguin42: karolherbst: Yeh so I have the IR and I have the asm, I wanted to play around with what was inbetween; mostly this is trying to understand where that weird load ordering thing came from
14:09karolherbst: ahh
14:09karolherbst: I guess for that you'll have to compile LLVM and see what passes it runs
14:09karolherbst: maybe there is an LLVM option to print what it runs
14:10karolherbst: dunno
14:10penguin42: yeh I guess there is once I can figure out how to run it :-)
14:10karolherbst: just use your local LLVM build instead of your system one
14:11penguin42: karolherbst: But what options do I pass to llvm to take that IR and spit out that asm?
14:11penguin42: karolherbst: I don't see any of that at the moment because I only have the rusticl debug
14:12karolherbst: there is no simple solution here
14:12penguin42: ok
14:12karolherbst: your best bet is to just use the current mesa code
14:12karolherbst: and whatever radeonsi is doing
14:12karolherbst: I don't know if you can do that stuff on the cli even
14:13penguin42: ok - that's what I was really after, because I was assuming doing it from the CLI would be the best way to add llvm debug/tracing/etc
14:14karolherbst: there might be a way, I just don't know it
14:14karolherbst: but you can also just write a small tool doing the same thing
14:14karolherbst: e.g. copying the pipeline radeonsi is using
14:15penguin42: karolherbst: I can see optimising shader code can drive people nuts; I was finding one change to my shader made rusticl faster and ROCm slower or the other way around
14:16karolherbst: yeah...
14:16karolherbst: optimizing compilers be like that
14:16karolherbst: at some point it's hard to find those changes which are always a benefit
14:16karolherbst: so, some stuff gets slower, some stuff gets faster
14:17penguin42: karolherbst: I'm suspecting some of this might be 'bank clashes' - but wth knows; AMDs pretty profiling tools look like they need their kernel drivers
14:18karolherbst: yeah.. could be
14:18karolherbst: though, did you try umr?
14:18penguin42: what's umr?
14:18karolherbst: https://gitlab.freedesktop.org/tomstdenis/umr
14:18karolherbst: though not sure it would help here
14:19penguin42: karolherbst: Ooh, one to try later
14:42eric_engestrom: DavidHeidelberg[m]: thanks! mind saying that on the MR? :P
14:46eric_engestrom: pendingchaos: (sorry for the delay, been doing too many things lately and I forgot to read my mentions here) I'll continue making new 23.1.x releases until 23.2.0 is out, no matter how long it takes
16:23Lynne: cargo is nice and all until a fresh sync takes 500 megabytes and you're on a bad connection, and a crate decides it absolutely must use nightly, as all crates do
16:23Venemo: is gitlab down again?
16:24Lynne: just a throwaway comment
16:52alyssa: something something cargo cult
16:52gfxstrand: :P
16:53karolherbst: 🦀
17:01anarsoul: Lynne: just don't do sync when you're on a bad connection
17:04Lynne: having a bad connection is hardly a choice
17:05tnt: I've got an application causing : "[drm] GPU HANG: ecode 12:1:85dcfdfb, in ngscopeclient" (intel 12th gen, vulkan app).
17:06tnt: How would one go about tracing what's going on ?
17:19cmarcelo: does anyone foresee glsl_function_type() being useful again (only user was spirv, but stopped in favor of own implementation, right now is dead code)? deciding here if I can just remove or make changes to improve with others as part of an ongoing MR.
17:33dcbaker: mattst88: did you meant to request a review from marge on the intel_clc series, or did you mean to assign it to marge?
17:33mattst88: dcbaker: lol, derp
17:34mattst88: thanks for noticing that
17:35dcbaker: that looks good to me, btw. I'd still like to get it to where we don't need to do that, but there are so many assumptions in meson's implementation that things are for the host and only the host it's turning into a slug fest with some seriously annoying problems
17:36alyssa: dcbaker: I'm experimenting with a generic mesa_clc
17:36dcbaker: so, same problem then?
17:36alyssa: current tentative plan is that it goes CLC->SPIR-V but does not touch any NIR
17:37alyssa: which should be a lot fewer deps but yes, same problem until clang can do that itself
17:38dcbaker: the good news is Mesa isn't the only project with this problem, it turns out that other big complex projects run into the same issue when cross compiling
17:38dcbaker: at least, good in that everyone agrees it needs to happen, lol
17:56DemiMarie: Sorry for all the unanswerable questions I asked earlier!
18:11alyssa: nod
18:12alyssa: i expect by end-of-year asahi will have a hard build-dep on clc
18:13alyssa: We have a significant need from it and we have buy in from Fedora and Intel's already doing it for raytracing so sure yeah why not
18:13alyssa: and asahi only needs to build on arm and x86, so no LLVM problem
18:16dcbaker: sigh. LLVM.
18:26dcbaker: karolherbst: I'm going to start reviewing the next round of the crates in meson work next week, I know gfxstrand has been trying it out. Are there any crates you need/want?
18:31alyssa: dcbaker: Debian has some build rules to disable LLVM on exotic architectures (':
18:31alyssa: So no CLC deps in common code without angering somebody.
18:31alyssa: Although... if they're cross building it shouldn't matter?
18:32karolherbst: dcbaker: uhm.. mostly just syn and serde
18:32alyssa: Like you should be able to run intel_clc/mesa_clc on the host with the host LLVM and then do an LLVM-free target mesa build using the precompiled kernels
18:32alyssa: you don't get Rusticl support but the BVH kernels etc should work fine in that set up
18:32alyssa: we dont support that on the mesa side but we.. probably could?
18:33dcbaker: karolherbst: cool, I'm pretty sure syn already works
18:33dcbaker: I'll make sure we test out serde
18:33karolherbst: cool
18:33dcbaker: alyssa: yeah.... I'm just not sure how you'd go about supporting OpenCL without llvm/clang at this point
18:34karolherbst: there are probably random others I might want to use in the future, but those would be a good start to drop some code
18:35alyssa: dcbaker: ~~gcc-spirv when~~ delet
18:36dcbaker: alyssa: I mean, if someone else wants to write the code and it works...
18:36alyssa: dcbaker: :p
18:37alyssa: I don't love runtime LLVM deps, honestly I build -Dllvm=disabled myself up until now
18:37alyssa: but I feel entitled to a buildtime LLVM dep o:)
18:37alyssa: (I already build mesa with clang, this is just more of that :D)
18:38dcbaker: I don't care that much about buildtime deps that much, but I usually build with gcc and getting the right version of LLVM can be a real pain sometimes
18:39alyssa: I do know an llvm spirv target was talked about, I wonder if mesa_clc will be obsoleted in due time..
18:40alyssa: except for libclc, src/compiler/clc/ isn't doing much that clang couldn't do itself..
18:40alyssa: :q
18:41karolherbst: yeah.. maybe.. if it's not causing any regressions in the CTS that is
18:41alyssa: yeah
19:00airlied: zmike: you should have made that blog a monetized twitter post, controversy gets clicks!
19:06zmike: airlied: there's no controversy there, just people who agree with me and people who are wrong
19:10airlied: that's the attitude to get more elon bucks :-P
19:19alyssa: I wonder what it'll take to get this loop unrolled: https://rosenzweig.io/hmm
19:19alyssa: It's from a... familiar piece of source code.
19:21alyssa: hmm I wonder how clang unrolls this
19:21alyssa: ...does clang not unroll this? T.T
19:21pendingchaos: that looks like an overly complicated IF statement
19:21alyssa: pendingchaos: correct
19:23alyssa: it's from doing something i'm really not supposed to =D
19:26alyssa: clang seems to eliminate the backwards branch when compiled for my cpu
19:43alyssa: an extra opt_algebraic rule removes a big chunk of loop header, but still not enough for unrolling..
19:44alyssa: oh... opencl is nopping out __builtin_expect I guess..
19:44alyssa: is it?
19:45karolherbst: yeah.. we ignore it for now
19:45alyssa: karolherbst: at what point is it being ignored?
19:45karolherbst: inside vtn
19:45alyssa: alright
19:45alyssa: I wonder if I should plumb that through. Or perhaps more likely, see if I can get clc to do a bit more LLVM opts
19:46alyssa: oh right we -O0 right.
19:46alyssa: erg
19:46karolherbst: yeah.. builtin_expect isn't _that_ terrible to pipe though, it's just that it's one of those 80/20 things
19:46alyssa: right.. ugh..
19:46karolherbst: alyssa: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23852
19:46alyssa: For this particular kernel, if I build with -O3, it gets unrolled in llvm just fine
19:47karolherbst: ahh...
19:47karolherbst: I think there is strictly nothing against using more opts, just some opts break the tooling
19:47karolherbst: and the translator gives up
19:47alyssa: yeah..
19:47alyssa: it's just annoying because with -O3 the final NIR is excellent
19:47karolherbst: mhh
19:48karolherbst: anyway, in this commit specifically I played around with enabling some llvm opts: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23852/diffs?commit_id=4b18b0770154aec4ad905bba9856db1cd47b5d60
19:48karolherbst: and those are safe
19:48alyssa: ack
19:48karolherbst: however we can always allow callers to enable more or something
19:49karolherbst: but I wouldn't enable that for everything
19:49karolherbst: maybe that stuff gets better once the spirv backend lands
19:49alyssa: Alternatively, NIR's optimizer should be good enough to handle all this? O:)
19:49karolherbst: yeah... well.. hopefully :)
19:50alyssa: needs expect plumbed thru for this guy
19:50karolherbst: my motivation with that MR was to cut down the size of cached blobs
19:51alyssa: hmm wait is expect even doing what i expect
19:51karolherbst: what do you expect expect is doing?
19:51karolherbst: but yes, it's very cursed
19:51alyssa: I mixed up expect and assume
19:51karolherbst: and not trivial
19:52karolherbst: right...
19:52karolherbst: expect is the more trivial thing, assume is cursed to implement
19:54alyssa: OOOOOKAY
19:54alyssa: so it turns out I've been doing something absolutely stupid in mesa for years
19:55alyssa: kewl.
19:55karolherbst: heh
19:55alyssa: well, loop's gone now, with -O0
19:55karolherbst: :D
19:55karolherbst: do we want to know?
19:56alyssa: No
19:56alyssa: Even so, with -O3 there's a big pile of load/store_scratch that goes away with -O0
19:56karolherbst: yeah....
19:56alyssa: i'm guessing I'm missing some NIR copyprop pass somewhere
19:56alyssa: seems like we should be able to see thru that
19:57karolherbst: so you have scratch stuff with O3 but not with O0?
19:57karolherbst: anyway.. you can always copy whatever rusticl is doing to get rid of those things
19:57karolherbst: the entire pipeline is cursed
19:57karolherbst: kinda
19:57karolherbst: but it kinda also works
20:01airlied: the translator docs I think state O0 only is supported, anything else is a crap shoot
20:01karolherbst: yeah.. I'm also a bit hesistant of even landing some of the llvm opt work because of this
20:01airlied: but yeah it might be worth running some llvm passes, but also I think we can do a lot on the NIR side to step up
20:01karolherbst: it's literally not tested anywhere
20:01airlied: and close the gaps
20:01airlied: if we want NIR to be a real life compute compiler
20:01karolherbst: right.. I didn't want to get more optimize binaries, just smaller ones
20:02karolherbst: and I was able to reduce binaries sizes by lik 60%
20:02karolherbst: it's just not very stable
20:02airlied: I suppose in theory spirv-opt could be used as well, if it ever did anything useful :-P
20:02karolherbst: it speeds up caching/ reduces peak memory usage and other benefits, sadly we probably can't rely on it
20:02karolherbst: yeah... that gives me another 25% reduction
20:03karolherbst: it's all in the MR
20:03karolherbst: sadly.. I can't use the impressive `MergeFunctions` LLVM pass
20:03karolherbst: so the benefits are all kinda smallish
20:03karolherbst: `MergeFunctions` generates function pointers in a few places
20:08karolherbst: alyssa: anyway, I think the asahi CL stuff is ready to land, I've listed all the remaining problems, but nothing stands out really and I mitigated the linear image issue as much as possible. Now you simply can't map 3D images, but whatever. Maybe I should just assign to marge and... figure out timestamps after that
20:18alyssa: karolherbst: I had scratch with O0 but disappeared with O3. I think I screwed up my pass order or whatever, will look at it harder, NIR should be able to breeze thru this
20:19karolherbst: yeah.. it should
20:19karolherbst: just run all the passes 5 times or something
20:19alyssa: Lol
20:19alyssa: 20:01 airlied | if we want NIR to be a real life compute compiler
20:20alyssa: Yeah... A big chunk of stuff that CL wants, VK also wants and we don't have any LLVM to cheat off there. So trying to get NIR into shape seems like the better long-term approach, idk
20:20karolherbst: yeah
20:20alyssa: I'm good with asahicl being merged
20:55alyssa: k, this is interesting:
20:55alyssa: 64 %5 = deref_var &__const.hello.cfg (constant struct.AGX_USC_TEXTURE)
20:55alyssa: 64 %7 = deref_cast (uvec4 *)%5 (constant uvec4) (ptr_stride=16, align_mul=0, align_offset=0)
20:55alyssa: 32x4 %8 = @load_deref (%7) (access=none)
20:56alyssa: nir_opt_deref doesn't remove the cast and nir_opt_constant_folding doesn't see through the cast, so that turns into a load_global (!) instead of a load_const
20:57karolherbst: constant isn't in shader constant memory though
20:58karolherbst: it's just a ubo (more or less)
20:58karolherbst: just with global addressing
20:58alyssa: still supposed to be constant folded.
20:58karolherbst: does the constant variable have a constant initializer?
20:59alyssa: yes
20:59alyssa: it's just the cast in the way
21:00karolherbst: so without the cast it would constant fold? Mhh.. normally we kinda drop pointless casts but there are a few restrictions in palce
21:01karolherbst: but yeah, if the source is constant known at compile time we should constant fold it
21:07alyssa: also, deref_ptr_as_array
21:08alyssa: 64 %5 = deref_var &__const.hello.cfg (constant struct.AGX_USC_TEXTURE)
21:08alyssa: 64 %9 = deref_cast (uvec2 *)%5 (constant uvec2) (ptr_stride=8, align_mul=0, align_offset=0)
21:08alyssa: 64 %11 = deref_ptr_as_array &(*%9)[2] (constant uvec2) // &(*(uvec2 *)%5)[2]
21:08alyssa: 32x2 %12 = @load_deref (%11) (access=none)
21:09alyssa: admittedly dont fully understand whats happening here but it should also constant fold..
21:09karolherbst: it might be that most of the folding only reliably works once IO is lowered
21:10alyssa: But it's too late by then, since lowering I/O turns this into load_global(load_constant_base_ptr)
21:10karolherbst: mhhh... yeah, it shouldn't...
21:10alyssa: this needs to be optimized away in derefs
21:11karolherbst: I think the problem here is, that casting a constant memory pointer to a different address space is kinda UB
21:12alyssa: there's no constant memory pointer in the C code?
21:12alyssa: *CL kernel
21:12alyssa: just literals that clang decided to turn into constant memory
21:12karolherbst: right... it's kinda weird tbh
21:13karolherbst: what's the CLC source?
21:13alyssa: agx_pack(..) { ..}
21:20karolherbst: not really sure what that gets generated into, but in theory it should just be a stack variable getting fields assigned, so I'm kinda confused why it's doing this kinda nonsense in nir
21:21karolherbst: or rather, I don't see where this cast would come from
21:22karolherbst: does it look better in the spir-v? though I suspect not
21:22karolherbst: or maybe?
21:23karolherbst: what does the nir straight out of spirv_to_nir look like?
21:25karolherbst: but anyway... casting from constant to generic is just not legal
21:26karolherbst: the thing is... because it's all coming from C you also can't just drop random cast, because $reasons
21:28karolherbst: like e.g. if you'd do (global* int)some_local_memory_ptr, you also can't just load from the local address, because it's technically a bug in the source code and UB
21:28karolherbst: but only in the sense of your pointer is probably pointing to invalid memory
21:29karolherbst: but what if you do (local* int)(global* int)...
21:30alyssa: there's no cast to global?
21:30alyssa: what's happening is that there's a constant struct
21:30alyssa: that would be fine, if we split the struct with split_struct_vars
21:30alyssa: but that bails on complex uses, because of the deref_ptr_as_array
21:31alyssa: which in turn nir_deref.c claims should be eliminated by nir_opt_deref but that's not happening
21:31karolherbst: ehhh wait.. I missinterpreted the " 64 %9 = deref_cast (uvec2 *)%5 (constant uvec2) (ptr_stride=8, align_mul=0, align_offset=0)" thing...
21:31alyssa: presumably my pass order is busted.
21:32karolherbst: if it's a pointless cast, opt_deref should kinda be able to get rid of it
21:32karolherbst: alyssa: btw, did you call explicit_type?
21:32karolherbst: some of the passes rely on explicit type information
21:33alyssa: yes
21:33alyssa: https://rosenzweig.io/hello.cl
21:33alyssa: Here's a simple reproducer
21:33alyssa: right after vtn, this looks like https://rosenzweig.io/spirv-to-nir.txt
21:34karolherbst: huh...
21:34karolherbst: that's a lot of stuff...
21:34alyssa: after all the lowering/opt passes, this ends up as a mess with scratch access https://rosenzweig.io/final.txt
21:35alyssa: not sure if rusticl fairs any better
21:37karolherbst: https://gist.githubusercontent.com/karolherbst/6a2981d5cf7fecc82c7011840220c664/raw/675bd58ae198c7f5ec0f34f5af6d3d55afa92f51/gistfile1.txt
21:37karolherbst: let me write to a struct instead
21:38karolherbst: indeed...
21:39alyssa: karolherbst: that code doesn't write to a struct?
21:39karolherbst: yo, that's kinda rude of nir :')
21:39karolherbst: yeah, I have it local now and it also uses scratch
21:39alyssa: Joy
21:40karolherbst: funky...
21:41alyssa: very basic issue: why is nir_split_struct_vars failing on https://rosenzweig.io/why.txt?
21:42alyssa: presumably the cast from struct to uvec
21:43karolherbst: probably
21:44alyssa: which is being inserted by nir_lower_memcpy, I think
21:44karolherbst: it's kinda funky, that with the constant struct initiialize nir at some point does load_const, but for whatever reason it thinks it should pipe that through scratch memory...
21:45karolherbst: ohhhhh
21:45karolherbst: uhhhhh
21:45karolherbst: it's this bug
21:45karolherbst: I hate it
21:46karolherbst: I remember now...
21:46karolherbst: alyssa: https://gist.githubusercontent.com/karolherbst/127d028e6c6cca2d74d5c5bba3d321e0/raw/09975229af07124a613468646701f6e54d7e4128/gistfile1.txt
21:46karolherbst: this is kinda the reason
21:46karolherbst: the deref chains for store and load are different
21:46karolherbst: so we fail to see they are equal
21:47karolherbst: or rather point to equal things
21:47karolherbst: there is a dump LLVM reasons for it and the translator also not being super nice to us
21:48karolherbst: so when storing it, you have explicit struct member accesses
21:48karolherbst: but on load you don't have the struct information and it just does raw vec/scalar loads
21:48karolherbst: it's really annoying
21:48karolherbst: however, we should still be able to optimize it away :D
21:49karolherbst: it's just that our opt_deref isn't smart enough for that yet
21:49alyssa: alright..
21:49karolherbst: I think there is an MR for that...
21:50karolherbst: maybe not
21:51karolherbst: gfxstrand might remember
21:51alyssa: in this case at least, the obvious problem is that we're lowering memcpy to a raw copy of bytes, which fundamentally impedes other opts
21:52karolherbst: but yeah.. I think ultimately this is something we can only clean up after io lowering
21:52karolherbst: mhhh.. yeah, just..
21:52karolherbst: In my example there is no memcpy
21:52alyssa: two solutions are to either lower memcpys of structs to memcpys of each element separately, if we know it's tightly packed & so on
21:52alyssa: or to teach struct lowering to delete memcpies
21:52alyssa: uh
21:52alyssa: or to teach struct splitting to split memcpys
21:53karolherbst: ehh wait.. I faied to use my search function.. there is a memcpy
21:53karolherbst: let's see...
21:54karolherbst: yeah soo.. we can't do much useful with that memcpy
21:54karolherbst: it's just taking the raw pointer and copies the function_temp stuff into it
21:55karolherbst: so out ouf LLVM/SPIR-V it's already a plain byte copy
21:55karolherbst: and not much we can really do about it
21:55karolherbst: and I don't think that before IO lowering is the place where we could actually resolve that, because we'd have to know the actual offsets the load/stores go to
21:56karolherbst: in order to propagate it
21:58karolherbst: the downside of doing this after io lowering is, that we already allocated scratch space
22:00karolherbst: I honestly don't know what would be the best path forward here
22:01karolherbst: maybe we can convince LLVM to not do this nonsense? but then we can also get spir-v doing it anyway
22:04alyssa: ok.. I think we can split the memcpy, at least in the simple case I'm looking at
22:06alyssa: but the original case didn't have a memcpy there, just stores with deref_ptr_as_array..
22:07alyssa: oh, but there's legitimately a cast happening in that one
22:08alyssa: even though it's a cast between.. morally equivalent things
22:08karolherbst: I wonder if the better strategy is to simply convert everything to byte arrays as a intermediate step before io lowering.. :D
22:09karolherbst: but that's going to be messy
22:12alyssa: i mean.. trying to unlower scratch back to SSA sounds like you're in for a bad time
22:12karolherbst: yup
22:13karolherbst: I think all solutions here are messy in one or the other way
22:14alyssa: i think the memcpy is the root brokenness
22:14karolherbst: sure
22:17alyssa: This is the nonsense that we get with everything up until lowering memcpys, triggered in my kernel by passing a struct around:
22:17alyssa: https://rosenzweig.io/zeroing.txt
22:17karolherbst: yeah...
22:17karolherbst: there isn't really anything you can do on a deref level
22:17alyssa: Why not?
22:18alyssa: It feels like we "should" be able to split that memcpy_deref into a memcpy_deref for each struct element
22:18karolherbst: sure, but we don't copy the struct, we copy pointers to bytes
22:18alyssa: so..?
22:18alyssa: we have derefs, we can see thru the casts
22:19karolherbst: fair enough
22:19alyssa: really annoying that LLVM makes us jump through these hoops for such a trivial example though
22:19karolherbst: yep
22:20karolherbst: I do wonder though: if we get rid of the casts, does that get optimized away?
22:20karolherbst: after memcpy lowering I mean
22:20alyssa: it's not valid to get rid of them straight up
22:20alyssa: that's a memcpy between a struct and a u8
22:20karolherbst: mhh.. fair enough
22:21karolherbst: is it like that straight away or is that constant array after some opts?
22:21karolherbst: like in the initial nir it should still be all structs or not?
22:22alyssa: it's like that in the initial nir because llvm suuuucks
22:22karolherbst: pain
22:23alyssa: look on the bright side, I can pass pointers to structs as function arguments, that's cool
22:23karolherbst: :D
22:23karolherbst: yeah...
22:24alyssa: wait..
22:25alyssa: oh come on!
22:25alyssa: i switched to passing the struct by value instead, you're still not happy nir?
22:27alyssa: oh.. yeah, it's choking on the struct initializer which llvm is helpfully turning into a memcpy from raw bytes
22:27karolherbst: yes, llvm best compiler, very helpful
22:29alyssa: the good news is that I can deal with the struct initializer nonsense..
22:30alyssa: oh. strictly no i can't on this struct because there's padding :~)
22:31alyssa: Actually I have no idea if that's legal C or not
22:31alyssa: casting between a struct ptr and a u8* and putting stuff into the padding bytes and expecting that to work
22:32karolherbst: well.. why not?
22:33karolherbst: you might read/write random garbage, but besides that?
22:33alyssa: padding rules are implementation defined so..
22:33karolherbst: they shouldn't be
22:33karolherbst: the C struct layout is kinda very strictly defined
22:35karolherbst: but maybe it's technically implementation defined, but I think everybody kinda follows the same rules on each platform? dunno
22:36karolherbst: the fun part is, that the CL CTS tests this stuff
22:37alyssa: it's ludicrous to me that llvm is going so far as casting to u8*.
22:37karolherbst: well...
22:37karolherbst: I'm sure they have their good reasons
22:37karolherbst: but yeah...
23:01alyssa: intel raytracing on arm64 when