00:18 dwlsalmeida[d]: gfxstrand[d]: Meanwhile here in Brazil even my watercooler is giving out because of the heat
00:18 dwlsalmeida[d]: Gotta use fans, since the building I am at will not let me install my AC
00:19 dwlsalmeida[d]: Said fans just push the winds of hell directly on your face
00:20 dwlsalmeida[d]: What I wouldn’t give for weather where you actually need a heater instead
00:21 orowith2os[d]: Give me mild all year round weather 😭
00:21 gfxstrand[d]: dwlsalmeida[d]: Oh, I only need a heater because I'm on the other side of the equator. You can't survive summer here without AC.
00:22 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1333592926621794386/image.png?ex=679974b2&is=67982332&hm=70e41b80957d684b9d19cc6cf76b8ebbf78d28f0a3023e0398355931db096d5f&
00:22 gfxstrand[d]: I have no idea what's blowing up. 😭
00:23 gfxstrand[d]: I'm not seeing anything in stderr
00:25 mhenning[d]: yeah, a few weeks ago I tried to reproduce your instruction encoding error with the "dragon age the veilguard character creator", and I got that instead
00:25 mhenning[d]: didn't actually look at it too carefully though
00:26 gfxstrand[d]: I'm going to let it finish compiling all the other shaders and see if I can isolate it a bit.
00:26 dwlsalmeida[d]: gfxstrand[d]: I know someone that actually planned to live half the year in Canada and the other half here, to avoid their winters. Visas and money aside, what a fantastic idea lol
00:27 dwlsalmeida[d]: Except they’d endure shitty 40c weather in both places lol
00:27 gfxstrand[d]: Yeah, that's the problem.
00:28 gfxstrand[d]: I generally prefer the heat to the cold but I'm not a huge fan of either.
00:28 gfxstrand[d]: gfxstrand[d]: I wonder of fossilize is blowing up. I see a `Fossilize WARN`.
00:28 airlied[d]: california in winter and brisbane in winter
00:42 gfxstrand[d]: gfxstrand[d]: I think this is a wine bug. The only error the driver is throwing is COMPILE_REQUIRED. Maybe my winevulkan isn't new enough?
00:48 gfxstrand[d]: But that seems crazy, right? Like Wine itself shouldn't be doing that sort of error checking, right?
00:50 gfxstrand[d]: https://tenor.com/view/padme-gif-24474700
00:54 mhenning[d]: gfxstrand[d]: looks like that status might be something different than the actual function status https://github.com/wine-mirror/wine/blob/master/dlls/winevulkan/loader_thunks.c#L2958
00:54 gfxstrand[d]: gfxstrand[d]: Annoyingly, that's a generated file so I can't go look at it in GitHub.
00:55 gfxstrand[d]: interesting
01:00 gfxstrand[d]: Yeah, it's an NTSTATUS that comes from somewhere
01:01 gfxstrand[d]: So... something failed to thunk?
01:03 orowith2os[d]: Its telling you to go thunk yourself
01:03 orowith2os[d]: I'll leave now
01:04 orowith2os[d]: Are you using upstream wine or something else?
01:04 gfxstrand[d]: Whatever proton is the default
01:04 orowith2os[d]: Ah, cause I was boutta ask, if it's upstream, to try with dxvk....
01:05 orowith2os[d]: Try Experimental, just to see
01:05 gfxstrand[d]: I'm going to try with Proton-Experimental
01:05 mhenning[d]: Maybe we segfaulted or something and that's what the unix signal becomes
01:06 orowith2os[d]: Renderdoc maybe
01:06 mhenning[d]: or maybe we're trying to unwind or something and wine can't let that work across the different abis
01:08 gfxstrand[d]: If it would tell me what `status` is, that might be helpful. :frog_upside_down:
01:09 gfxstrand[d]: Okay, now I'm getting a bajillion threads worth of `fossilize_replay`. That's new.
01:11 gfxstrand[d]: On the upside, this means it's using all 32 threads of Ryzen to do so.
01:15 gfxstrand[d]: Yeah, I still get them with Experimental
01:17 HdkR: Yea, that error would likely mean that the trampoline failed rather than the function returning an error
01:18 gfxstrand[d]: Any idea why that would happen?
01:19 HdkR: Usually missing symbols
01:19 gfxstrand[d]: I'm pretty sure `vkCreateComputePipelines` isn't missing
01:20 HdkR: like vkGetDeviceProcAddr didn't return it for whatever reason
01:21 HdkR: Might be able to PROTON_LOG=1 WINEDEBUG=+vulkan to get more information dumped to $HOME/steam-<appid>.log
01:23 gfxstrand[d]: Okay, I can give that a try
01:24 HdkR: Looks like vkCreateComputePipelines, as long as it exists can only return STATUS_SUCCESS
01:26 gfxstrand[d]: I've got lots of `0200:trace:vulkan:thunk64_vkCreateComputePipelines 0x1635a80, 0x0, 1, 0x7eafec40, (nil), 0xd9b22f10`
01:26 gfxstrand[d]: Only 3 of them seem to fail
01:26 gfxstrand[d]: I have no idea how to find the fails in this mess
01:28 HdkR: I would suspect that it would be one of the last ones :D
01:29 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1333609770740879541/steam-1845910.log.xz?ex=67998462&is=679832e2&hm=f2981077a7b5422a442a750f912d72d6628fe313f5498688b0c6282e7ac6530e&
01:29 gfxstrand[d]: If anyone wants to look.
01:29 gfxstrand[d]: HdkR: Nah, it's kinda early on but not at the start
01:31 HdkR: 0214:trace:vulkan:thunk64_vkCreateComputePipelines 0x1635a80, 0x0, 1, 0x7effec40, (nil), 0xda1877a0
01:31 HdkR: 0210:err:seh:handle_syscall_fault Syscall stack overrun. (L"!status && \"vkCreateComputePipelines\"",L"../src-wine/dlls/winevulkan/loader_thunks.c",2775)
01:31 HdkR: :thonk:
01:32 HdkR: So you used too much stack?
01:34 HdkR: Looks like wine's default "kernel" stack size is 1MB
01:35 HdkR: Might be able to use WINE_KERNEL_STACK_SIZE to override it to something larger.
01:35 zmike[d]: gfxstrand[d]: no, I had to disable HIC in zink to get anything to run
01:36 HdkR: WINE_KERNEL_STACK_SIZE=8192 I think should give 8MB stack
01:44 gfxstrand[d]: Okay, I'll give that a try tomorrow.
01:46 gfxstrand[d]: Some recursive thing is probably getting out of control. The question, then, is why?
02:22 Abdulaziz_: https://www.donte.net/
04:24 gfxstrand[d]: HdkR: That did thee trick
04:24 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1333653818629427242/image.png?ex=6799ad68&is=67985be8&hm=815c84234e7dd806bb293fec2185fd84fa52fc5b52cfbbfd61d9064ae7efad3f&
04:26 gfxstrand[d]: I suspect this is just a case of Rust has really deep stacks when built in debug mode
04:29 gfxstrand[d]: Well, I can get to the character creator. The framerate suuuuucks, though.
04:29 HdkR: gfxstrand[d]: Woo \o/
04:31 tiredchiku[d]: zmike[d]: interesting, I haven't needed anything like this to run plasma wayland or gnome wayland
04:42 mhenning[d]: gfxstrand[d]: is https://gitlab.freedesktop.org/mesa/mesa/-/issues/12183 fixed then? or maybe you're on a different machine now?
04:47 gfxstrand[d]: Both tests were on 4060s. I mean, something clearly went wrong but I can't reproduce it now. 🫤
04:48 gfxstrand[d]: I do, however, need to get a renderdoc trace and figure out why it's slow.
04:48 gfxstrand[d]: Which I just realized is probably because I still have wine filling my disk with logs. 😅
04:49 mhenning[d]: I'm sure that doesn't help
04:49 gfxstrand[d]: Logging every Vulkan call has a way of slowing things down.
04:49 orowith2os[d]: gfxstrand[d]: Uh oh, what's the solution :ferrisballSweat: are ya gonna just abort on panic, instead of unwinding?
04:50 gfxstrand[d]: orowith2os[d]: First I need to test with a release build and see if it has the problem.
04:50 orowith2os[d]: Fair
04:50 gfxstrand[d]: Release probably shrinks the stack 10x
04:50 gfxstrand[d]: Rust is REALLY stack heavy in debug builds.
04:51 gfxstrand[d]: I would like to see if I can find the shader and repro, though.
04:51 orowith2os[d]: Maybe you could build debug mode with some opts to help...? Though not sure how applicable it is here
04:51 gfxstrand[d]: 🤷🏻‍♀️
04:51 orowith2os[d]: I've had to do a bare minimum opt level for debug mode before to get some stuff working reliably (div/mod opts)
04:51 orowith2os[d]: So not out of the question
05:29 gfxstrand[d]: gfxstrand[d]: Even without the disk logs, it's still dog slow
05:30 gfxstrand[d]: Like, 0.5 FPS slow
05:31 redsheep[d]: gfxstrand[d]: That's exactly what I saw with doom eternal when that was broken, was difficult to find that rust was debug when I didn't expect it to be
05:33 redsheep[d]: Seems like there may be something in nvk that would be better off not using the stack to avoid that pitfall
05:36 redsheep[d]: I haven't looked recently but would assume something is doing quite a lot of recursion
05:37 gfxstrand[d]: Compilers are often recursion-heavy
05:37 gfxstrand[d]: If we can find the culpret, we might be able to reduce it some, though.
05:39 mhenning[d]: In the doom eternal case it was get_ssa_or_phi, and I didn't see a trivial way to eg.make it non-recursive https://gitlab.freedesktop.org/mesa/mesa/-/issues/11279
05:40 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1333673074213523487/snapshot.png?ex=6799bf57&is=67986dd7&hm=56efda532338fcd4a8654cde901c538f7812c0a86a0bee33c172860c287d7917&
05:40 gfxstrand[d]: Good news is that it renders. Bad news is the 2s frame times.
05:41 gfxstrand[d]: I kinda wonder if my GPU isn't clocking up for some reason. I feel no heat coming off the GPU fan.
05:41 mhenning[d]: could be stalled
05:42 redsheep[d]: You said it was ada, so can't not be gsp... I haven't yet ran across any situation where I could come up with good evidence of low clocks
05:42 redsheep[d]: Obviously very hard to try to divine without being able to actually read them
05:42 gfxstrand[d]: Could be not putting anything in VRAM. 🤡
05:43 gfxstrand[d]: The CPU is at 150%
05:44 redsheep[d]: Meaning 1.5 threads worth, right? Mangohud from your screenshot is 4% which matches that mathb
05:46 gfxstrand[d]: Yeah
05:50 redsheep[d]: Napkin math tells me that is vague the performance that would be consistent with absolutely everything going over the pcie bus
05:52 gfxstrand[d]: We've had people complain about how VKD3D-Proton sees our memory heaps.
05:52 gfxstrand[d]: I could believe it's type/heap order or something.
05:53 tiredchiku[d]: gfxstrand[d]: yes
05:53 redsheep[d]: I think that theory is probably consistent with keeping a cpu thread hammered too
05:54 tiredchiku[d]: I have info hang on I'll get on my pc
05:54 gfxstrand[d]: Well, I got to the point where I could save the game so at least tomorrow I'll be starting in the game and not menus.
05:55 gfxstrand[d]: mhenning[d]: There are ways but ugh...
05:56 tiredchiku[d]: so
05:56 tiredchiku[d]: when vkd3d-proton exceeded the HVV budget, the fallback logic only scanned for DEVICE_LOCAL | HOST_VISIBLE, and landed on HVV again
05:56 gfxstrand[d]: FYI: I'm going to bed. I'll read stuff in the morning.
05:57 tiredchiku[d]: causing games to crash
05:57 tiredchiku[d]: because we don't have an uncached system memory type
05:57 mhenning[d]: gfxstrand[d]: Yeah, it's an annoying one.
05:57 tiredchiku[d]: tiredchiku[d]: they did patch the logic to also mask out HVV when needed so it didn't happen: https://github.com/HansKristian-Work/vkd3d-proton/pull/2269
08:58 BigSmarty: hello
08:58 BigSmarty: can someone give me a link to the freedesktop discord server? I cant find it anymore
09:20 tiredchiku[d]: https://discord.gg/7zNsjcKH
09:31 bigsmarty[d]: hello
09:31 bigsmarty[d]: I would like to get involved with nvk development but Im unsure as to how to do that on nixos
09:31 bigsmarty[d]: can someone who develops on nixos give me a few hints?
09:31 bigsmarty[d]: tiredchiku[d]: thanks
09:32 tiredchiku[d]: you're welcome
09:32 tiredchiku[d]: development looks the same regardless of the OS you're on, just clone the repo and start poking
09:32 tiredchiku[d]: testing, on the other hand
09:33 tiredchiku[d]: you'll have to build your code and run it to test
09:33 tiredchiku[d]: I believe magic_rb[d] has some experience with testing on nix
09:33 magic_rb[d]: hi! yes i do
09:34 magic_rb[d]: but
09:34 magic_rb[d]: to be completely fair, if you want to develop nvk, install arch :(
09:34 gfxstrand[d]: Just build, install anywhere, and then set `VK_ICD_FILENAMES=$INSTALL/share/vulkan/icd.d/nouveau_vulkan.x86_64.icd` and off you go
09:34 magic_rb[d]: not really on nixos sadly
09:34 magic_rb[d]: i think
09:36 magic_rb[d]: it might work, but there is also very good chance it wont. due to this mess https://paste.tomsmeding.com/je6RhYfd
09:37 tiredchiku[d]: :thonk:
09:37 magic_rb[d]: you might be able to do what `chaotic-nyx` does, they `LD_PRELOAD` something forcibly to override the mesa version used normally, otherwise youd have to recompile all of nixos
09:38 magic_rb[d]: https://github.com/chaotic-cx/nyx/blob/main/modules/nixos/mesa-git.nix#L23 this
09:38 magic_rb[d]: so if you do that locally within a shell pointing to whatever you build imperatively it might work, but youre for sure not installing the thing you build imperatively anywhere usefull
09:40 magic_rb[d]: ah no things changed, its simpler now, setting `GBM_BACKENDS_PATH` and `GBM_BACKEND` should be enough i think, maybe with what faith suggested
09:40 magic_rb[d]: anyway, if you do try it bigsmarty[d] feel free to ping me, ill try to help as much as possible, but if you dont want to fight nixos, install arch, itll be quicker
09:44 bigsmarty[d]: thanks 👍
10:15 gfxstrand[d]: gfxstrand[d]: Found one issue. I need to fix up the NonWriteable thing.
10:15 gfxstrand[d]: But I need to sleep first
10:20 asdqueerfromeu[d]: gfxstrand[d]: And speaking of huge stacks: <https://github.com/chromiumembedded/cef/issues/3616>
11:18 esdrastarsis[d]: bigsmarty[d]: I created a flake to compile mesa in a isolated nix shell, it's working for me
11:19 esdrastarsis[d]: I did my first commit to nvk using my flake
11:25 bigsmarty[d]: could you share it with me?
12:56 zmike[d]: ugh is the suspend bug back in recent kernels?
14:32 zmike[d]: update: applying https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33269 fixes weston enough that it can render its own process in drm but not enough that any GL clients can run
14:33 zmike[d]: hm or does it
14:33 zmike[d]: nope, we good
14:33 zmike[d]: that fixes everything
14:33 tiredchiku[d]: still weird to me that weston blows up like that
14:36 mohamexiety[d]: I wonder if this is specifically a Turing thing only :thonk:
14:37 tiredchiku[d]: could be
14:37 tiredchiku[d]: actually
14:37 tiredchiku[d]: let's verify
14:37 mohamexiety[d]: iirc even NVIDIA only enabled rebar officially on Ampere onwards (theoretically Turing should be capable)
14:38 tiredchiku[d]: that is correct
14:39 mohamexiety[d]: zmike[d]: but yeah if you have a spare machine tiredchiku[d]
14:39 tiredchiku[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1333808667438481408/rn_image_picker_lib_temp_11f4cef8-469a-449b-9937-71c1c5b5bb54.jpg?ex=679a3d9f&is=6798ec1f&hm=b85f51c7e4ccb490f7d38360e8a4fe056168f884c738beb4096c5ae3c4f5aaab&
14:39 tiredchiku[d]: also I'll never understand why gnome needs so many entries
14:40 tiredchiku[d]: 3 out of 4 do the same thing
14:41 orowith2os[d]: I think it's just interpreting the session files differently
14:41 tiredchiku[d]: this is also a thing on GDM
14:41 orowith2os[d]: Huh, weird
14:41 orowith2os[d]: Dunno then
14:43 tiredchiku[d]: mohamexiety[d]: I have just one machine but I'm always down to testing :)
14:43 tiredchiku[d]: checking on my ampere card rn
14:46 tiredchiku[d]: bah
14:47 tiredchiku[d]: couldn't open seat over ssh
14:47 tiredchiku[d]: what am I missing
14:47 zmike[d]: set up seatd
14:48 tiredchiku[d]: pain
14:48 tiredchiku[d]: same command works fine if I do it in a tty
14:48 tiredchiku[d]: yses logind backend
14:51 zmike[d]: I think all I had to do was install it...
14:51 tiredchiku[d]: started the seatd service, still the same
14:51 tiredchiku[d]: zmike[d]: fedora?
14:52 zmike[d]: yep
14:52 zmike[d]: may need to add your user to `seat` group
14:52 zmike[d]: or `tty`
14:52 zmike[d]: (and then re-login)
14:54 tiredchiku[d]: I see, thanks
14:55 tiredchiku[d]: yeah that was it, thanks :ThumbsUp:
14:55 tiredchiku[d]: yeah, steam client opens fine on my machine
14:55 tiredchiku[d]: ampere
14:56 tiredchiku[d]: so this might be a turing specific issue
14:56 tiredchiku[d]: zmike[d]: did this, opened a terminal in weston, verified vulkan and gl using nvk and zink respectively, then launched steam from said terminal
14:57 zmike[d]: I was just trying to launch glxgears
14:57 zmike[d]: it didn't have to be anything complex
14:57 tiredchiku[d]: steam was the first x11 app I could think of :lul:
14:57 tiredchiku[d]: but glxgears is fine too
14:58 tiredchiku[d]: a comfy 5800fps
14:58 zmike[d]: annoying
15:00 tiredchiku[d]: very
15:01 tiredchiku[d]: also I made sure to switch my default target to multi-user target and not graphical
15:02 tiredchiku[d]: and rebooted, to make sure a display server wasn't running
17:30 zmike[d]: how is x86 cross-compile supposed to work with rust again? fedora doesn't ship an i686 package and rustup isn't being helpful
17:32 karolherbst[d]: `1.78-i686-unknown-linux-gnu` or whatever version you need
17:32 karolherbst[d]: ehh as the version to install via `rustup`
17:32 karolherbst[d]: mhh apparently there is also `rustup target add i686-unknown-linux-gnu`
17:33 karolherbst[d]: it's per toolchain anyway
17:33 karolherbst[d]: (the target stuff)
17:34 zmike[d]: I tried that and it said it was already up-to-date
17:34 zmike[d]: but then my i686 mesa build fails
17:34 karolherbst[d]: probably need to pass the correct flags in your meson cross file
17:34 zmike[d]: I thought I did
17:34 zmike[d]: it's the same cross file I use everywhere else
17:34 karolherbst[d]: mhh..
17:35 karolherbst[d]: what's the error anyway?
17:35 karolherbst[d]: maybe it's something unrelated
17:35 zmike[d]: `error[E0463]: can't find crate for `std`
17:35 karolherbst[d]: mhhhh
17:35 karolherbst[d]: that sounds related
17:35 karolherbst[d]: is `rustup target list --installed` listing it?
17:36 zmike[d]: yes
17:37 karolherbst[d]: odd...
17:37 karolherbst[d]: I know that I've tried with the native 32 bit toolchain
17:37 karolherbst[d]: not the extra target one
17:37 zmike[d]: I'm pretty sure I've done it before too
17:37 zmike[d]: but today it isn't working
17:37 karolherbst[d]: soo there are two differences
17:37 karolherbst[d]: if you use the target you'd use your normal rustc
17:37 karolherbst[d]: if you use the native 32 bit toolchain, you'd use rustc from that
17:38 karolherbst[d]: what's the rustc invocation?
17:38 zmike[d]: `rustc -C linker=gcc --color=always -C debug-assertions=no -C overflow-checks=no --crate-type rlib --edition=2021 -C opt-level=3 --target=i686-unknown-linux-gnu --crate-name compiler --emit dep-info=src/compiler/rust/compiler.d --emit link=src/compiler/rust/libcompiler.rlib --out-dir src/compiler/rust/libcompiler.rlib.p -C metadata=56103fb@@compiler@sta
17:38 zmike[d]: -lstatic:-bundle,+verbatim=libcompiler_c_helpers.a -Lsrc/compiler/rust src/compiler/rust/libcompiler.rlib.p/structured/lib.rs`
17:38 zmike[d]: which is probably using my PATH one that's 64bit
17:39 karolherbst[d]: yeah..
17:39 karolherbst[d]: I'm using `/home/kherbst/.rustup/toolchains/1.78-i686-unknown-linux-gnu/bin/rustc` as rustc in the cross file
17:39 zmike[d]: how do I tell it to use the right rustc
17:39 karolherbst[d]: `rust = '/home/kherbst/.rustup/toolchains/1.78-i686-unknown-linux-gnu/bin/rustc'`
17:40 karolherbst[d]: in the `binaries` section
17:40 karolherbst[d]: though I think the custom target _should_ work, but might also be something meson needs to do
17:40 karolherbst[d]: or something else going wrong
17:40 zmike[d]: aha
17:40 zmike[d]: yeah I didn't have the path set for rustc
17:40 zmike[d]: how foolish
17:41 karolherbst[d]: cross compilation is a mess, I'm happy when I get binaries and stop thinking about it and hope it never breaks
17:46 zmike[d]: seems like now the failure is compiling subprojects using the wrong compiler
17:47 karolherbst[d]: *sigh*
17:47 zmike[d]: error[E0514]: found crate `paste` compiled by an incompatible version of rustc
17:47 zmike[d]: --> ../src/nouveau/nil/tic.rs:18:5
17:47 zmike[d]: |
17:47 zmike[d]: 18 | use paste::paste;
17:47 zmike[d]: | ^^^^^
17:47 zmike[d]: |
17:47 zmike[d]: = note: the following crate versions were found:
17:47 zmike[d]: crate `paste` compiled by rustc 1.84.0 (9fc6b4312 2025-01-07) (Fedora 1.84.0-3.fc41): /home/zmike/src/mesa/release32/subprojects/paste-1.0.14/libpaste.so
17:47 zmike[d]: = help: please recompile that crate using this compiler (rustc 1.78.0 (9b00956e5 2024-04-29)) (consider running `cargo clean` first)
17:47 zmike[d]: error: aborting due to 1 previous error
17:47 karolherbst[d]: ohhh
17:47 karolherbst[d]: right
17:47 karolherbst[d]: throw away your build directory
17:47 karolherbst[d]: or
17:48 karolherbst[d]: just the compiled rust stuff
17:48 karolherbst[d]: whatever is easier
17:48 karolherbst[d]: there are some bugs around in meson not being able to properly detect when it needs to rebuild stuff
17:49 zmike[d]: I deleted teh whole build dir though
17:49 zmike[d]: apparently not hard enough
17:49 zmike[d]: seems like it worked this time
17:49 karolherbst[d]: nice
17:52 zmike[d]: no I was wrong again lmao
17:52 zmike[d]: same error
17:53 karolherbst[d]: oh no
17:53 zmike[d]: it's pulling in system paste I guess
17:53 zmike[d]: or
17:53 karolherbst[d]: meson does that now?
17:53 zmike[d]: no, I was right the first time and it's just using the wrong compiler for subprojects
17:54 karolherbst[d]: pain...
17:57 zmike[d]: not sure how to resolve that
17:57 karolherbst[d]: file a meson bug
18:19 gfxstrand[d]: zmike[d]: Can this be reproduce with the X11 or Wayland backends?
18:22 zmike[d]: what is "this"
18:22 zmike[d]: the successful use of weston?
18:24 asdqueerfromeu[d]: zmike[d]: I wonder how that cmdline would look with `gcc`/`clang` compiling C/C++ code
18:31 gfxstrand[d]: zmike[d]: I was hoping for the crash so I can maybe debug it but yes. 🙃
18:34 zmike[d]: I would assume it applies to all since it's the same codepaths for client rendering?
18:35 gfxstrand[d]: Okay. If that's the case, that's certainly easier. Less SSH and crazy environment variable nonsense involved. Running things on KMS sucks.
18:36 zmike[d]: just ssh in from your phone like the rest of us you big baby
18:39 karolherbst[d]: ~~you have SSH terminals on your phones?~~
18:44 tiredchiku[d]: termux
18:44 tiredchiku[d]: juicessh
18:44 tiredchiku[d]: both serve the purpose
18:44 mohamexiety[d]: oh damn I forgot that was a possibility
18:45 tiredchiku[d]: technically I do have a spare machine but it's headless
18:45 tiredchiku[d]: so it gets ssh'd into
18:45 tiredchiku[d]: and not out of
18:45 karolherbst[d]: fair
19:06 gfxstrand[d]: karolherbst[d]: How else do you think I use GitLab from my phone? 🤡
19:07 karolherbst[d]: why do you need ssh for that?
19:09 tiredchiku[d]: I'm losing faith...
19:14 gfxstrand[d]: karolherbst[d]: It's a joke. I use a web browser on my phone.
19:14 gfxstrand[d]: SSH from a phone is for crazy people. There was a time when I was crazy people...
19:15 karolherbst[d]: I've heard about people doing X11 ssh forwarding for business purposes on their phone...
19:19 airlied[d]: I do irc on my phone with ssh, not sure id want to do anything else
19:29 karolherbst[d]: not doing IRC at all on your phone 😛
19:34 gfxstrand[d]: When you realize your fossilize build predates your username change...
19:39 gfxstrand[d]: Ugh... fossilize can't play this pipeline for some reason
19:39 gfxstrand[d]: `Fossilize WARN: Descriptor set layout 000000004acae9ac is not supported. Skipping.`
19:42 gfxstrand[d]: Of course it doesn't tell me WHY it's not supported. :facepalm:
19:54 asdqueerfromeu[d]: gfxstrand[d]: The relevant function is quite complex (so I can see why they didn't try to add logging): <https://github.com/ValveSoftware/Fossilize/blob/4dc8828ac0c78888d3dc9d75b7dbc5ff3496c05e/cli/fossilize_feature_filter.cpp#L1662>
19:55 gfxstrand[d]: It's something with mutable descriptors
19:55 asdqueerfromeu[d]: I also have this vagueness problem with the `patch` program (it doesn't tell why a hunk `FAILED`)
19:58 esdrastarsis[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1333889020484386846/flake.nix?ex=679a8874&is=679936f4&hm=9bebd71ce99242d09ee327f64a5fcb3aecc10a26ba58f9e7429c171c2bfa596c&
19:58 esdrastarsis[d]: bigsmarty[d]:
20:01 gfxstrand[d]: gfxstrand[d]: RenderDoc is clearly putting it in the json file but Fossilize can't pick up on it. 😕
20:44 gfxstrand[d]: https://github.com/baldurk/renderdoc/pull/3527
20:44 gfxstrand[d]: Gotta fix the tools before we can fix the driver. 🤠
20:49 bigsmarty[d]: esdrastarsis[d]: Oh lol i thought it automated driver switching somehow
20:49 bigsmarty[d]: Thanks 🙏
20:55 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1333903258674004031/snapshot.png?ex=679a95b7&is=67994437&hm=c3b22fdea556b757a3be466689a7349a509af38415fa70a07370e5f5bab2bb48&
20:55 gfxstrand[d]: Still only 7 FPS but that's still about a 15-20x improvement over last night.
20:55 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33276
20:57 gfxstrand[d]: I know VKD3D-Proton hits that path hard. I'm not sure how much DXVK is going to be affected, though. Probably some.
21:02 karolherbst[d]: the system caching mode probably also doesn't help much
21:03 karolherbst[d]: kinda want to avoid system at all cost
21:04 karolherbst[d]: as in.. if you aren't doing CUDA and SVM atomics, you probably don't need system
21:04 gfxstrand[d]: Yeah. I need to figure out what we actually want to do there. I suspect we can use CTA for almost everything.
21:04 karolherbst[d]: system means it has to synchronize with everything on the system
21:04 karolherbst[d]: including any access over PCIe
21:04 gfxstrand[d]: Yeah...
21:05 karolherbst[d]: the next one is called .GPU
21:05 karolherbst[d]: and then .VC, which I guess is for virtualized GPUs
21:05 karolherbst[d]: so instead of the full gpu, only the vGPU
21:06 karolherbst[d]: or maybe it's the context...
21:06 karolherbst[d]: whatever VC means
21:06 gfxstrand[d]: As long as caches remain consistent and it's only ordering, CTA is probably enough for Vulkan.
21:06 karolherbst[d]: maybe
21:07 gfxstrand[d]: We emit explicit cache management instructions for memory model ops.
21:07 karolherbst[d]: it's called scope
21:07 gfxstrand[d]: Not we could certainly drop to .VC
21:07 gfxstrand[d]: *but
21:07 karolherbst[d]: should be fine
21:08 karolherbst[d]: there is also .PRIVATE 😄
21:08 karolherbst[d]: not really sure what's that point with that
21:08 karolherbst[d]: maybe local memory accessed through LD
21:09 karolherbst[d]: ehh wait
21:09 karolherbst[d]: it means something entirely different...
21:09 karolherbst[d]: huh......
21:09 karolherbst[d]: uhhh
21:09 karolherbst[d]: .PRIVATE is _very_ important for perf 🙃
21:09 karolherbst[d]: basically means anything outside the specified scope won't be able to observe the operation at aall
21:10 karolherbst[d]: which I think means it won't leave the cache
21:10 karolherbst[d]: though not sure how often you could use it
21:17 gfxstrand[d]: Again, a lot of this depends on how consistent things stay vs. ordering and propagation guarantees. We can get away with a lot in Vulkan as long as conflicting writes are handled correctly.
21:21 mhenning[d]: ^ That's one of the things I started working on - using weaker memory orderings where appropriate
21:22 mhenning[d]: This matches what nvidia does afaict: https://gitlab.freedesktop.org/mhenning/mesa/-/commit/6342ffabfe834d69f1e590023043d22d43c18a54
21:23 mhenning[d]: Although I've been meaning to do more reverse engineering to check that before I actually submit the MR
21:24 gfxstrand[d]: That doesn't look too crazy.
21:25 tiredchiku[d]: nvk go brr soon <a:HypedAsFuckBoi:1295407078327844934>
21:25 phomes_[d]: how long should I expect a full cts run to take? Am I doing something wrong if it is > 24 hours?
21:26 gfxstrand[d]: Yeah. I can run in an hour.
21:26 phomes_[d]: with deqp-runner and external/vulkancts/mustpass/main/vk-default.txt
21:26 gfxstrand[d]: Are you using deqp-runner?
21:27 phomes_[d]: yes. With external/vulkancts/mustpass/main/vk-default.txt
21:27 gfxstrand[d]: https://github.com/gfxstrand/linux-setup-scripts/blob/main/scripts/devel/run-deqp.sh
21:28 gfxstrand[d]: There's some extra nonsense in there you don't need for sharding across two GPUs so that probably won't work verbatim for you.
21:28 mhenning[d]: My runs take about 2 hours with 8 threads
21:28 mhenning[d]: on a single gpu
21:28 gfxstrand[d]: But the important things are:
21:28 gfxstrand[d]: 1. Use a release build of the CTS
21:28 gfxstrand[d]: 2. Increase the number of tests per run
21:30 phomes_[d]: ok. I will make sure it is cts and adjust your script to my single gpu
21:30 karolherbst[d]: when I run the CL CTS on nvidia the runtime goes up 5x, because.... nvidia 🙃
21:30 gfxstrand[d]: Oh, and don't log images. But that just saves disk space more than time.
21:32 gfxstrand[d]: Oh, and I disable WSI tests and the object management tests. The former take forever and don't treat anything interesting on drivers and the later have a nasty habit of running you out of resources if run in parallel.
21:34 mhenning[d]: karolherbst[d]: I haven't managed to find any situation where the nvidia vulkan driver or cuda compiler mark a read or write as `.PRIVATE`, which makes me guess it isn't super important.
21:35 gfxstrand[d]: Or just too hard.
21:35 mhenning[d]: Right, it's possible it comes up in niche situations
21:35 gfxstrand[d]: Also, not all of them exist on all hardware. 🙃
21:36 gfxstrand[d]: Or at least that's my memory
21:36 asdqueerfromeu[d]: gfxstrand[d]: Just like how `gcc` is RAM-intensive when compiling C++ code of popular software (I haven't really tested `clang` yet)
21:36 mhenning[d]: gfxstrand[d]: I think it's pretty consistent post volta?
21:37 mhenning[d]: ampere does remove some encodings for things that don't make sense, in order to free up some bits
21:37 gfxstrand[d]: I may be misremembering. I know there's some special casing we do in NAK.
21:37 gfxstrand[d]: But maybe it's just the Ampere thing
21:39 asdqueerfromeu[d]: gfxstrand[d]: Have you ever NAKed a NAK patch? 🥁
21:40 karolherbst[d]: mhenning[d]: huh.. interesting...
21:40 karolherbst[d]: I could see it matter for temporary writes...
21:41 karolherbst[d]: but it's definitely a flag to make it skip actually writing out to memory, higher level caches
21:41 mhenning[d]: If that's the case I could see it being pretty hard to use
21:42 mhenning[d]: Like, you need to not care whether your write actually applies or not
21:42 karolherbst[d]: well.. outside the scope that is
21:42 karolherbst[d]: so it might matter for the CTA, but nothing on higher levels
21:43 karolherbst[d]: though I can see why outside of compute it might be really hard to find good uses for it
21:44 karolherbst[d]: maybe it's used more in mesh + raytracing?
21:44 karolherbst[d]: more the latter
21:44 karolherbst[d]: tho dunno
21:45 mhenning[d]: If it skips writing out to memory, then the write vanishes as soon as the cache is evicted, right? And that can happen any time
21:59 mhenning[d]: karolherbst[d]: Oh, wait, do you mean that once the CTA exits, a .CTA.PRIVATE will be removed from cache, but it can still be evicted before cta exit? That would be a lot more useful
22:02 karolherbst[d]: maybe? It's a bit unclear, it just states that .PRIVATE is useful when anything outside the scope can't observe the write
22:02 karolherbst[d]: well.. or the operation
22:03 karolherbst[d]: I can see it mattering for lock situations, where maybe a subgroup operations on some shared memory, but end up writing a "final" value right before releasing a lock
22:04 karolherbst[d]: maybe it's useful when you run out of shared memory and want to use global memory as a temporary storage
22:05 karolherbst[d]: I wouldn't be surprised if LDL/STL are implicitly .PRIVATE because they don't have that flag
22:06 karolherbst[d]: though LDS/STS doesn't either
22:06 karolherbst[d]: which is fair, because it's literally L1 cache I think... or was it L2?
22:07 karolherbst[d]: ehh L1
22:39 gfxstrand[d]: mhenning[d]: Mind making that a draft MR and posting it, rebased on !33276? Also, we should give image load/store the same treatment.
22:40 gfxstrand[d]: I'm a little less confident in that but the best way to gain confidence is to post a draft and have people try it out.
22:40 mhenning[d]: Sure, I can do that tonight
22:41 mhenning[d]: Although I do think that running more weird spir-v programs through the proprietary stack is another good way to gain confidence
22:47 gfxstrand[d]: yeah
23:08 gfxstrand[d]: I wonder how much these if statements around loads are hurting us.
23:09 gfxstrand[d]: I have a half a plan for adding predication to NAK but ugh...
23:12 mhenning[d]: Yeah, I've been thinking about predication too
23:13 mhenning[d]: Although it would also really help if cacl_instr_deps could see across control flow, so the branch didn't always need for the load to complete
23:19 esdrastarsis[d]: bigsmarty[d]: No problem, to use nvk just set the VK_DRIVER_FILES=path-to-the-json-file as in any linux distro. My flake installs the vulkan-cts package, so you don't need to build cts as well, just run it.
23:19 gfxstrand[d]: That's really tricky to get right with divergence sticking its ugly head in there.
23:21 gfxstrand[d]: mhenning[d]: My tentative plan for predication is to add a `SrcRef` to `Dst` which is the value to use when the predicate is false. In RA or legalize, we'll have to watch out for the case where that `SrcRef` points to something that lives beyond the instruction and insert a copy. Then RA can guarantee that the two live in the same registers.
23:23 gfxstrand[d]: That by itself isn't too bad. The bigger issue is if we want to make liveness predicate-aware so we can alias variables with opposite predicates.
23:25 mhenning[d]: That sounds a little annoying in that it creates special cases anywhere that we deal with dests
23:25 gfxstrand[d]: Yes it does
23:26 mhenning[d]: nv50 deals with it by having a special operation that's a bit like a phi, except it can appear anywhere in a block and both defs are in the same block. That operation then tells RA to coalesce
23:26 mhenning[d]: which I think is a reasonable design
23:27 gfxstrand[d]: Yeah, I've seen some papers that talk about that design.
23:28 gfxstrand[d]: But then you have to deal with what happens when copy-prop propagates the false value into your phi(lite) and you have interference.
23:28 mhenning[d]: Another option I was thinking of is to just do the predication after RA, so nothing before that needs to worry about it - RA can just treat it as a normal if and you get the coalescing "for free"
23:29 gfxstrand[d]: If the phi(lite) comes later in the program, RA can no longer just run top-to-bottom.
23:31 mhenning[d]: gfxstrand[d]: If you interfere, it can always just become a bcsel.
23:31 gfxstrand[d]: mhenning[d]: That works for basic flatening, assuming a clean if/else. I'm not sure if it works in general or if we want to be able to specifically emit predicated code.
23:32 gfxstrand[d]: mhenning[d]: Yeah, at that point the phi(ish) is a "try to coalesce if you can" hint and you emit `sel` or nothing later. That works.
23:32 gfxstrand[d]: I've thought about that, too.
23:32 mhenning[d]: gfxstrand[d]: What more general cases do you want to handle? Getting leaf if/else statements gets you most of the way there imo
23:33 gfxstrand[d]: But then you need to know the predicate in the phi(ish) and need guarantees that that predicate matches the one in the "then" instruction. It also induces weird ordering restrictions that are hard to reason about in the IR unless both instructions are clearly predicated with `p` and `!p`.
23:34 gfxstrand[d]: mhenning[d]: I'm not sure, TBH. I've been thinking about predicated loads and trying to make sure they're predicated rather than just hoping.
23:38 gfxstrand[d]: Adding a post-RA flattening pass is certainly the easiest first step.
23:38 mhenning[d]: I kind of like after-RA predication because it feels like a simple place to start, and I think it will get us most of the way there
23:38 gfxstrand[d]: (Annoyingly, that extends the live range of the if condition but I think that's likely okay in a bunch of cases.)
23:38 mhenning[d]: Yeah, I think we should start there and make it more complicated if we have a justification
23:39 mhenning[d]: gfxstrand[d]: You're always free to skip predication if the value doesn't live long enough
23:40 gfxstrand[d]: Yeah, it's easy enough to scan both sides of the if and see if the condition predicate is ever written.