03:13 gfxstrand[d]: Thanks for catching that?
05:44 cubanismo[d]: x512[m]: No insight there. The setup looks correct based on the code block, assuming the handles are sane. I assume the flip request isn't returning an error.
05:44 cubanismo[d]: Note the NVKMS ioctl can return success even if the command fails in some cases. I don't know if flip requests are one of them.
05:45 cubanismo[d]: You have to check the reply structure status in those cases.
05:56 sonicadvance1[d]: Oh, nvk supports host_image_copy doesn't it.
05:56 sonicadvance1[d]: bweh, I'm going to need to look at that implementation one day aren't I.
05:59 sonicadvance1[d]: Fun fact, don't use atomics to write in to uncached memory :headempty:
06:12 x512[m]: cubanismo[d]: Do semaphore operation depend on notifiers? Is it allowed to specify semaphores, but not notifiers?
07:54 kar1m0[d]: mohamexiety[d]: so for some reason wuthering waves is just a black screen with nouveau, it works on prop driver though
07:54 kar1m0[d]: not sure if it is my issue, since I did pass DXVK_FILTER_DEVICE_NAME=4080
07:54 kar1m0[d]: and it worked on the dgpu
07:54 kar1m0[d]: I wonder if it is an error or I had just to wai
07:54 kar1m0[d]: anyway can I retrieve some logs perhaps?
07:54 kar1m0[d]: that might help
07:55 chikuwad[d]: I'm guessing you've tried the -dx11 argument?
09:40 mohamexiety[d]: sonicadvance1[d]: It’s super straightforward dw!
09:40 mohamexiety[d]: ~~and also not all that fast compared to prop but I had no clue how to improve back then~~
10:33 violet_purple_red[d]: chikuwad[d]: No it's dx12
10:33 violet_purple_red[d]: Wuthering waves doesn't even open with -dx11
10:34 marysaka[d]: let me download it on my test bench and do a quick test
11:18 marysaka[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1435226764011110420/snapshot.png?ex=690b3295&is=6909e115&hm=4eeae8511e457c39466e0c962a1b2caad1e44816df51d4e87d5ff95c48c629c7&
11:18 marysaka[d]: kar1m0[d]: seems to run here
11:19 marysaka[d]: testing on 25.2 on my 4060
11:20 violet_purple_red[d]: marysaka[d]: Hm interesting
11:20 violet_purple_red[d]: Did you do anything specific?
11:21 violet_purple_red[d]: Because for me it is just a black screen
11:21 marysaka[d]: violet_purple_red[d]: cmdline is `STEAMDECK=1 WINE_GSTREAMER=1 PROTON_USE_NTSYNC=1 %command%`
11:21 violet_purple_red[d]: marysaka[d]: I should also do PROTON_LOG=1 when I test it
11:21 violet_purple_red[d]: And send the log here
11:24 marysaka[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1435228111112503397/snapshot.png?ex=690b33d6&is=6909e256&hm=3f68ce8431440f2b7034bbf1fb0d5229cbea732fb0592c8350424aca35cf04d1&
11:24 marysaka[d]: yeah no it just run for me here
11:24 kar1m0[d]: is it playable?
11:25 kar1m0[d]: marysaka[d]: what fps are you getting?
11:27 marysaka[d]: unsure I'm testing via my kvm atm
11:32 x512[m]: marysaka[d]: It triggers anticheat error for me.
11:32 marysaka[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1435230250329051146/snapshot.png?ex=690b35d4&is=6909e454&hm=8f8287ff1c19289c87b3d8e9f5464508b536145a5ea8fd93fd06b72c9fe50c8f&
11:33 marysaka[d]: seems like it
11:33 esdrastarsis[d]: 60 fps locked?
11:33 marysaka[d]: would have to move my test bench to play on it anyway
11:33 marysaka[d]: might be a good excuse to rest and play with NVK today 😄
11:34 marysaka[d]: Also mind you I'm testing with 6.18-rc3 and mohamexiety[d] compression patch on the kernel side (not testing the MR itself)
11:34 marysaka[d]: so might be selecting bigger pages size but unsure if that would have any effect anyway
11:34 marysaka[d]: x512[m]: do you have STEAMDECK=1 set?
11:34 mohamexiety[d]: there's a 120 fps option too
11:35 marysaka[d]: I have a 60Hz "display" (aka my KVM)
11:35 mohamexiety[d]: ahh
11:35 marysaka[d]: I should move my test bench to my 4k display anyway
11:38 x512[m]: marysaka[d]: Seems work with this flag. But significantly slower than on Windows.
11:40 marysaka[d]: I mean I'm not sure what you are expecting...? Especially considering we are running with DX12 here, we nave no context priority and you don't even have compression
11:40 marysaka[d]: so of course there are going to be differences still
11:42 x512[m]: But it works...
11:48 kar1m0[d]: x512[m]: I mean first of all wuthering waves isn't native on linux like it is on windows, second of all you use nvk with directx12 game, it running is already a miracle
12:03 marysaka[d]: I do wonder how much we can with compression MR might make quite the change :aki_thonk:
12:04 marysaka[d]: would also be nice to have numbers on how much you get on Windows with and without DLSS with highest settings (and on what GPU)
12:04 marysaka[d]: and then also what you get on linux with blobs for both scenario too
12:09 kar1m0[d]: I mean I can only compare assassin's creed valhalla and marvel rivals
12:09 kar1m0[d]: since both of those games I played on windows and linux
12:09 kar1m0[d]: and with prop and nvk
12:33 pac85[d]: What are all the differences between NV mesh and ext mesh?
12:37 pac85[d]: Skimming through the spec.
12:37 pac85[d]: * Workgroup size can't be changed from task when spawning mesh
12:37 pac85[d]: * Output size can't be set from mesh
12:37 pac85[d]: Is this right?
16:18 tdaven[d]: There is a big table here that lists the differences:
16:18 tdaven[d]: https://www.khronos.org/blog/mesh-shading-for-vulkan
16:18 pac85[d]: thx!
16:45 sonicadvance1[d]: mohamexiety[d]: At least nvk has a larger incentive to optimize it for x86 compared to the likes of Panfrost/Turnip/v3d/<insert other ARM GPU drivers here> 😄
16:46 mohamexiety[d]: what do you use it for?
16:47 mohamexiety[d]: (just curiosity since iirc the only user I heard of for HIC was ffmpeg)
16:48 sonicadvance1[d]: I personally don't have a use for it. I just noticed SilkSong's intro movies were playing at sub-1FPS on Snapdragon and it was due to HIC causing atomics on uncached memory.
16:49 mohamexiety[d]: oh _damn_
16:50 mohamexiety[d]: we don't do that at least
16:50 sonicadvance1[d]: welll, do any use GPR loads or stores rather than sse/avx? 🙂
16:51 sonicadvance1[d]: If so, welcome to atomics when emulating.
16:53 mohamexiety[d]: hm. we just use copy_nonoverlapping() from rust which _I suspect_ the compiler optimizes to either AVX or SSE. reason for this is I got identical throughput when I changed it to use SSE. didn't actually try AVX directly because I noticed that this wasn't going anywhere
16:54 mhenning[d]: I'd expect it to be a memcpy which typically does have simd fast paths
16:55 mohamexiety[d]: yeah copy_nonoverlapping() is basically rust memcpy. just the arguments are backwards :KEKW:
16:55 mhenning[d]: ah, well I can never remember the argument order for memcpy anyway
16:56 mohamexiety[d]: kind of interesting though, why do GPR loads/stores lead to atomics when emulating while SSE/AVX register loads/stores don't?
16:58 sonicadvance1[d]: mohamexiety[d]: FEX has varying levels of emulating x86 TSO semantics, default configuration is that GPR loadstores use atomics, while string operations, and vectors don't because the cost is too high on ARM platforms.
16:58 sonicadvance1[d]: Is NVIDIA tiled format big enough that regular memcpy calls can be used? That's kind of wacky.
17:01 mohamexiety[d]: yeah. the way it goes is you have high level tiles ("blocks") that are composed of lower level tiles ("GOBs") and then each GOB is 512B and comprised of sectors (32B each, arranged in either 2x 16B or 4x 8B layout). we memcpy sectors
17:01 mohamexiety[d]: see https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/nil/tiling.rs?ref_type=heads#L176 and https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/nil/tiling.rs?ref_type=heads#L25
17:01 sonicadvance1[d]: I've forgotten all my knowledge of the T210 tiled format back when I was investigating it 😄
17:02 mohamexiety[d]: heh
17:06 sonicadvance1[d]: I'll see if I can build today and look at the final assembly. Probably the easiest way to check.
17:19 cubanismo[d]: x512[m]: Yeah, they should be independent to the best of my knowledge. You could go trace through the code as well as I could there, but there's nothing tying them together at the HW level.
17:21 cubanismo[d]: The SSE streaming load/store stuff has looser ordering on x86, right?
17:22 cubanismo[d]: As in you can translate it without strict ordering on ARM and still be theoretically functionally equivalent?
17:22 mohamexiety[d]: sonicadvance1[d]: thanks for the explanation btw! didn't think of TSO at all
17:32 sonicadvance1[d]: cubanismo[d]: Yea, we translate those to non-tso even when the vector tso config option is enabled.
17:33 cubanismo[d]: I assume that's the path you'd expect to hit on memcpy most of the time?
17:33 sonicadvance1[d]: On regular memcpy yea, or the string operations when ERB(?) is set in cpuid.
17:34 sonicadvance1[d]: ERMSB
17:35 sonicadvance1[d]: The ARM GPUs tend to have quite a bit more compact swizzling so memcpy doesn’t work well.
17:43 sonicadvance1[d]: Turnip I think maximum deals with 1024-bit sectors, which all the elements inside get swizzled. Which maps well to ARM zip instructions, doesn't map super well to most x86 😄
19:10 steel01[d]: gfxstrand[d]: So what's the current status of the tegra support? Is it waiting on something external? Or just prioities compared to other stuff?
19:59 gfxstrand[d]: priorities compared to other stuff