01:40alyssa: glehmann: "honestly broken beyond belief" is the state of the art for opengl yes
09:04zzoon_: Lynne: I couldn't replicate the corruption that you said during seek... I'm not sure if it's other hw issue..
09:17Lynne: zzoon_: try "mpv intel_corruption.ts --no-config --speed=3 --loop --hwdec=vulkan"
10:06zzoon_: Lynne: The same...
10:06zzoon_: Not sure if you share the media correctly but.. I got only this shot (no moving): https://drive.google.com/file/d/1QFK0MwCRdzg2mjaqdqj-nvgO5SymAsF1/view?usp=drive_link
10:13Lynne: zzoon_: this is what I get - https://0x0.st/Pbb5.mp4
10:13Lynne: it's bad
10:13Lynne: do you use sway?
10:14zzoon_: no
10:14zzoon_: hmm worth trying on wayland? I'm still on X11
10:17Lynne: yes, its likely a sync issue, and wayland properly uses semaphores/dmabuf fences to present
10:21zzoon_: okay will try a bit later. thanks.
10:34zzoon_: still the same on mutter..
10:34zzoon_: Lynne: maybe I should try on sway
10:34Lynne: use the vulkan backend
10:35Lynne: export WLR_RENDERER=vulkan
10:39zzoon_: hmm. still the same.
10:49Lynne: git master mesa?
10:50zzoon_: sure..
10:50zzoon_: actually it's main.. anyway
10:50Lynne: yeah, muscle memory
10:56Lynne: can you make sure vulkan decoding is even used?
10:56Lynne: also, what hardware are you running on?
11:02zzoon_: sure.. make it sure by adding printf for each command..
11:03zzoon_: Intel Corporation Alder Lake-P GT2 this is mine
11:06Lynne: try increasing --speed to 10, 20, 30
11:27zzoon[m]: the same even on sway
11:34Lynne: which ffmpeg version?
11:36zzoon[m]: ffmpeg version N-122611-g7e9fe341df... (full message at <https://matrix.org/oftc/media/v1/media/download/AewhkNWj2LGV9haC6VpnkbPIOyHKynz_6h3L8BeqhTJMhr0lXIUJv-gNhftr0F4_qRdadh0xOUa0ca0MMCHJvrNCecZjzmAAAG1hdHJpeC5vcmcvWnZidVJrTUxUV2hzcWpJcExSdkl1RnFP>)
11:58pta2002: hello, I'm trying to get a pixel 6a which runs a g78 gpu to work with panfrost. so far I've copied over basically everything from the g57 gpu (on both panfrost and mesa) (which according to everything I've seen so far is basically identical) and it mostly works, but there's some graphical glitches while scrolling or during animations (they go away after a few frames). is there any way i can get more concrete debugging information to try to
11:58pta2002: figure out what's causing this? i'm assuming i'm missing something in some settings for mesa or panfrost, but i'm basically just guessing at this point
12:27Lynne: zzoon[m]: out of curiosity, could you try with the Xe kernel drivers?
12:56zzoon[m]: Lynne: will try it later.
13:45alyssa: pta2002: run conformance tests (dEQP-GLES3, dEQP-GLES31, etc) and find out what tests are failing
13:46alyssa: https://github.com/KhronosGroup/VK-GL-CTS
13:46alyssa: https://lib.rs/crates/deqp-runner is helpful
13:46alyssa: MESA_SHADER_CACHE_DIR=/dev/shm EGL_PLATFORM=surfaceless deqp-runner run --deqp ~/GLCTS/glcts --output output --timeout 60 --caselist /dev/shm/caselist.txt --skips /dev/shm/skips.txt --tests-per-group=1000 -- --deqp-surface-type=pbuffer --deqp-gl-config-name=rgba8888d24s8ms0 --deqp-surface-width=400 --deqp-surface-height=300 --deqp-log-images=disable
13:46alyssa: --deqp-log-shader-sources=disable
13:46alyssa: ^ something like that
13:47alyssa: caselists are in ~/VK-GL-CTS/external/openglcts/data/gl_cts/data/mustpass/
13:47alyssa: skips list are in eg ~/mesa/.gitlab-ci/all-skips.txt ~/mesa/src/asahi/ci/asahi-skips.txt
14:48glehmann: Is there a script to update mesa CI expectations or do I have to go through all fail files and manually add a broken test?
14:53valentine: it might be best to add them to all-skips.txt if the tests are broken
14:53valentine: there isn't an easy way to update all expectations
14:58pta2002: Thanks alyssa, will do
15:18alyssa: Lynne: why shouldn't put_symbol be inlined? seems like its called only once?
15:19alyssa: unless one of these loops is secretly getting unrolled?
15:21Lynne: yeah, fair, that's not the best example of put_symbol, but here is: https://code.ffmpeg.org/FFmpeg/FFmpeg/src/branch/master/libavcodec/vulkan/ffv1_enc_setup.comp#L69
15:23alyssa: Lynne: I wonder if you'd get better perf by replacing that code with a loop
15:23Lynne: in a WIP I've merged both setup and encode shaders and both put_symbol and put_isymbol (they're identical, except put_isymbol reads an extra bit for the sign)
15:23Lynne: which one?
15:23alyssa: write_slice_header
15:23alyssa: in effect, re-rolling the put_usymbol loop
15:23alyssa: I do generally agree that mesa should support real function calls of course
15:23alyssa: just curious what all the use cases end up being
15:24alyssa: but I agree with the radv devs - the hard part is an inlining heuristic that doesn't regress existing stuff
15:29daniels: glehmann: https://gitlab.freedesktop.org/gfx-ci/ci-collate
15:31Lynne: alyssa: dunno, but I kinda doubt that games would be impacted much by the worst case scenario, compared to what ffmpeg would gain in perf
15:55glehmann: I still doubt that ffmpeg would gain much
15:56glehmann: RT benefits this much because unreal has like hundreds if not thousands of shaders in one RT pipeline object
16:04alyssa: glehmann: fwiw Apple has stated they saw (presumably non-rt things) speed up with not-inlining everything specifically due to i-cache footprint
16:05alyssa: glehmann: page 25 https://llvm.org/devmtg/2017-10/slides/Chandrasekaran-Maggioni-Apple%20LLVM%20GPU%20Compiler.pdf
16:05alyssa: pages 30 and 31 are also interesting
16:06glehmann: but the function here isn't even big?
16:07alyssa: not speaking to ffmpeg specifically, just Mesa general
16:09glehmann: in general sure, there could be cases
16:09glehmann: but an issue is that most content comes through dxil and dxc already inlines everything
16:15alyssa: ah. lolz.
16:15alyssa: well that would ensure that our inlining heuristic can't regress games =P
18:10pixelcluster: I mean on the function call stuff the biggest prior work we have are GPGPU compiler stacks (the use cases that would benefit from not inlining everything are most likely very concentrated there)
18:13pixelcluster: I've heard stories of some real person trying to feed their real code into RADV and inlining everything ballooned the code so hard it ran OOM at 80GiB memory usage
18:13pixelcluster: inlining everything at the SPIR-V level resulted in a ~12GiB large SPIR-V binary
18:15pixelcluster: not sure if stuff like this is really that uncommon in GPGPU land
18:16alyssa: pixelcluster: entertaining.
18:16alyssa: we have a chicken & egg problem where backends dont support this because nir doesn't but nir doesn't because backends don't
18:16alyssa: but I guess with ACO supporting now we can maybe hatch the egg
18:17mareko: or boil it
18:17alyssa: or poach it
18:18pixelcluster: we could yeah, although the ACO side also has some funny things to figure out
18:18pixelcluster: mostly around how the details of a generic compute calling convention would look I suppose
18:18alyssa: sure
18:18alyssa: and you'll definitely want interprocedural RA too
18:18pixelcluster: we might be able to copy LLVM's homework on the calling convention side
18:19pixelcluster: the way I wrote the ABI structs in ACO right now was more or less a setup for doing that :D
18:19pixelcluster: also yeah interprocedural opts
18:19pixelcluster: those are probably going to be fun
18:19alyssa: IPRA feels like it should be straightforward but I haven't implemented it.
18:20alyssa: (Take a topological sort of the call graph, process leafs first, propagate registers up instead of using a fixed ABI?)
18:20alyssa: (break cycles by using the "real" calling convention instead)
18:21pixelcluster: pretty much, I suppose it shouldn't be too bad
18:21alyssa: LLVM does ... not do it that way, afaik
18:21alyssa: but,
18:21alyssa: .
18:28Lynne: pixelcluster: we print the spir-v size and the binary size (via vkGetShaderBinaryDataEXT()), never noticed bloat
18:28Lynne: though as much as I'd like to say we don't have horrible code in ffmpeg most of the time drivers just return the SPIR-V size from vkGetShaderBinaryDataEXT()
18:30pixelcluster: tbh in those shaders I've seen, I'm not sure inlining anything would be worth much, not sure if I'm missing something?
18:30Company: I have a question about VK_EXT_descriptor_heap: What drivers/gpus are going to implement that?
18:31Company: because I'm wondering if it's a viable plan to transition GTK's Vulkan renderer to that extension and revert everything else back to OpenGL
18:32dcbaker: Company: RADV and ANV at least have opened MRs for it
18:32Company: yeah, I expect the desktop drivers to all handle it
18:33Company: I'm mostly interested in how it's looking on mobile
18:33alyssa: Company: heaps are expected to be the future of vulkan
18:33glehmann: hasvk will never implement it, not sure about v3dv
18:33alyssa: so I expect eventually every maintained driver will implement it for new hardware
18:33alyssa: no promises about the back catalog, though
18:33Company: yeah, I mostly care about that back catalog
18:33alyssa: but tbh OpenGL is probably fine for back catalog
18:34alyssa: and there's a chunk of old hardware where GL might be faster than heap tbh
18:34Company: it kinda depends how far back
18:35Company: the main reason I want it is because then D3D support is basically s/vk/d3d12/
18:35glehmann: heap means drivers can't do residency tracking, so there is no information on what bos needs to be in vram
18:36glehmann: but only radv does that anyway I think, and we want to drop it
18:36pixelcluster: i mean didn't desc buffer also mean that already
18:36pixelcluster: and buffer device address
18:36mareko: why would GTK want to go past VK 1.0
18:37glehmann: pixelcluster: is gtk using those?
18:37pixelcluster: oh gtk
18:37Company: it's not
18:37Company: mareko: because Windows support - descriptor heap means I can share lots of code with D3D12
18:38glehmann: one maybe releveant vk driver is the powervr stuff, I doubt they will support heaps anytime soon
18:38Company: (also because I hate descriptor layout set layout pools)
18:38glehmann: and there is no real gl driver
18:39mareko: zink?
18:39mareko: or angle
18:39zmike: they use zink
18:39zmike: and have even submitted conformance for it
18:40glehmann: Company: I understand hating layouts, but what is the issue with pools? just create one with reasonable descriptor counts, allocate from it until it fails, move on to the next one, reuse the old one when the gpu is done with it
18:42Company: glehmann: GTK problem - I don't have a good infrastructure for managing pools - though I guess once I need to manage heaps, I'll encounter almost the same problem
18:44Company: so, I guess the big question for me is Adreno and Mali
18:44glehmann: mesa only, or also android drivers?
18:44glehmann: because good luck getting heaps on those
18:44Company: Mesa primarily
18:45Company: GTK Android only does GL anyway currently
18:45Company: mac is more interesting
18:46Company: I should actually ask that: what about kosmickrisp?
18:46Company: I had just assumed that it's a modern driver, so it's gonna get it eventually, but I have no idea how Metal works
18:57airlied: pixelcluster: llvmpipe has some backend function support, and I've hooked up the spirv hints
19:38alyssa: Company: heaps are annoying on top of Metal, but I made sure the EXT released with a specific exception to ensure it is possible to implement
19:38alyssa: so kosmickrisp is expected to eventually support heaps, though it's pretty low down the prio list (they still have to do 1.4 i think)
19:41Company: right
19:41Company: so it's all possible, but not urgent - which means I don't need to hurry with my plan
19:43HdkR: I wonder if descriptor heaps is why I got a bug report about updating FEX's Vulkan thunks for "really necessary" extensions.
19:44glehmann: alyssa: what exception?
19:44glehmann: afaiu, the issue with heaps is that metal has sampler objects
19:44glehmann: and the count is quite limited on older hw
19:45alyssa: glehmann: making capture/replay optional
19:45alyssa: to allow *krisp drivers to cheat at samplers and do objects internally like vkd3d-proton works on desc sets.
19:45alyssa: if you do capture/replay you need to implement sampler heap properly.
19:47alyssa: (at one point I had an unhinged plan to do capture/replay heap on m1 in hk, but lol no thanks.)
19:47alyssa: (It involved a compute kernel that wrote the hw sampler heap on the device-timeline.)
19:49glehmann: so basically, you have to cache and pray that you don't run out of sampler objects?
19:58alyssa: well that's already the case.
19:58alyssa: with descriptor sets on either the native or layered impls.
19:59alyssa: M3 is the first that supports bindless properly.
21:06anholt: daniels: I'm trying ci-collate for the first time, and having a bad time.
21:13valentine: I'm always somewhat reluctant to recommend it.
21:16anholt: 15 minutes to generate a truncated patch that doesn't apply.
21:18valentine: your --local-clone needs to be checked out to the target branch
21:27daniels: yeah, those really need to bubble up the list; you need to pass non-obvious options to get the behaviour you want unless you're a special-purpose script
21:29daniels: `ci-collate patch --local-clone $(pwd) --branch $(git branch --show-current) --patch-name ${name-of-output-you-want}.diff ${pipeline_id}` is what you're after
21:48sergi: If you add --artifacts */failurescsv it is much more conservative on preparing a patch
21:49karolherbst: okay.. how can I force softpipe these days..
21:53karolherbst: but I also have tests failing in CI with software for an MR that doesn't even touch anything softpipe would run into.. maybe something else is going on...
21:59anholt: sergi: do you mean failures.csv.zst?
21:59anholt: oh, not zst, results is the one that's zstd
22:08sergi: results are usually compress with zst, but failures is usually small and don't need compression. I miss a dot to the extension in the previous message
22:11sergi: you can specify the artifact from where it takes the information about expectations. The */ at the beginning is to use a feature of ci-collate to search in the artifacts for the file you are interested. That's because this file is not always in the same path.
22:14sergi: also using failures.csv the memory usage will be smaller. The need to use of the results.csv.zst produces an important impact on the memory used (where I need to work to improve it). Work in parallel between the jobs in the pipeline, doesn't help much.
22:16anholt: really need to fix whatever it is that's driving you to use results.csv (flakes, I'm guessing?)
22:18sergi: Yes, this comes from a request to process the flakes. And that opened the door for multiple interpretations about what different people do when manually process those files
22:20anholt: so, you need a patch to deqp-runner to also output a flakes.txt so you don't have to download the massive file. maybe just have flakes listed in failures.txt -- I think it'll all work out for the "use failures.txt as the baseline+includes for a rerun" usecase.
22:24sergi: yes, that can help. And I also need time ;) I think I didn't work on that project for the last 3 months. Perhaps last merge is even older.
22:30pta2002: So I've run through the cts for the g78 and there are 15 or so total test failures, seemingly related to image compression? (dEQP-GLES3.functional.texture.compressed.astc.block_size_remainder.*) however unfortunately I've got no idea if this is the same thing causing the rendering glitches. The generated images are just black squares so that doesn't help me that much I guess. Only pointers I've got are that this seems to happen consistently
22:30pta2002: with transparency, or with rapidly moving images (in which case it then seems to pass after a bit). I've looked at the mali kbase driver and the one different thing between the g57 and g78 is that the g78 has some different settings in the mmu, which does not seem to be exposed at all on panfrost from a quick glance. Could this be the reason
22:30pta2002: * So I've run through the cts for the g78 and there are 15 or so total test failures, seemingly related to image compression? (dEQP-GLES3.functional.texture.compressed.astc.block\_size\_remainder.\*) however unfortunately I've got no idea if this is the same thing causing the rendering glitches. The generated images are just black squares so that doesn't help me that much I guess. Only pointers I've got are that this seems to happen
22:30pta2002: consistently with transparency, or with rapidly moving images (in which case it then seems to pass after a bit). I've looked at the mali kbase driver and the one different thing between the g57 and g78 is that the g78 has some different settings in the mmu, which does not seem to be exposed at all on panfrost from a quick glance. Could this be the reason?
22:31alyssa: pta2002: -> #panfrost
22:31pta2002: Didn't know about that channel, thanks, will ask there
23:59daniels: anholt: why separate value-per-line files rather than one file?