00:20fdobridge_: <gfxstrand> Okay, I've got fp64 fdiv working
00:20fdobridge_: <gfxstrand> It was a bug in fp64 comparisons with zero.
00:26fdobridge_: <gfxstrand> @dadschoorse I've updated my fp64 MR if you feel like reviewing the NIR bits again
04:23fdobridge_: <prop_energy_ball> mmm latest linux-git black screens on nouveau for me
04:24fdobridge_: <prop_energy_ball> guess I should try the rc
04:52fdobridge_: <prop_energy_ball> Also doesn't work =(
06:11fdobridge_: <!DodoNVK (she) 🇱🇹> With or without GSP?
06:19fdobridge_: <prop_energy_ball> I did not enable it by default, so I am guessing without?
06:20fdobridge_: <prop_energy_ball> It's on a preproduction 2060 Super though (although it should be functionally identical to a regular 2060 Super aside from the shroud saying 2070 on it)
06:20fdobridge_: <prop_energy_ball> Maybe that forces on GSP?
06:20fdobridge_: <prop_energy_ball> Not sure what gens do that
06:21fdobridge_: <prop_energy_ball> It works on 6.6 just fine anyway
06:21fdobridge_: <!DodoNVK (she) 🇱🇹> It's only forced on Ada
06:21fdobridge_: <!DodoNVK (she) 🇱🇹> Does the regular Linux package work?
06:21fdobridge_: <prop_energy_ball> yes
06:22fdobridge_: <prop_energy_ball> latest git and 6.7-rc do not though
06:23fdobridge_: <!DodoNVK (she) 🇱🇹> It looks like you have a regression then :cursedgears:
06:23fdobridge_: <!DodoNVK (she) 🇱🇹> Can you check dmesg?
06:46fdobridge_: <prop_energy_ball> I'll have to see if I can get some ssh to the machine
06:46fdobridge_: <prop_energy_ball> or serial
06:46fdobridge_: <prop_energy_ball> it's a VM so should be ez
06:46fdobridge_: <!DodoNVK (she) 🇱🇹> You are passing through the NVIDIA GPU to a Linux VM, right? 🐸
06:48fdobridge_: <prop_energy_ball> yes
06:50fdobridge_: <!DodoNVK (she) 🇱🇹> That feels like a niche use case to me (but if display output worked before then that's weird)
06:51fdobridge_: <prop_energy_ball> that's not niche at all, nor is it related to the problem
06:52fdobridge_: <!DodoNVK (she) 🇱🇹> Does the graphical output in virt-manager show a black screen too?
06:53fdobridge_: <!DodoNVK (she) 🇱🇹> If not then that's definitely a nouveau regression (there was a lot of display rework stuff on v6.7)
07:03fdobridge_: <prop_energy_ball> I don't think I have that set up, but can look
07:10fdobridge_: <prop_energy_ball> Given how many NV GPUs are used for AI and mining in VMs these days... I wonder if they outweigh consumer units :(
07:10fdobridge_: <prop_energy_ball> that's not niche at all, and probably not related to the problem (edited)
08:47fdobridge_: <prop_energy_ball> Seems like it works on my other display
08:47fdobridge_: <prop_energy_ball> My Asus PG32UQX doesn't work tho
08:57fdobridge_: <prop_energy_ball> mmm no gsp
08:58fdobridge_: <prop_energy_ball> Lots and lots of bad stuff in dmesg
08:58fdobridge_: <prop_energy_ball> https://cdn.discordapp.com/attachments/1034184951790305330/1186593366947922000/message.txt?ex=6593d030&is=65815b30&hm=c86d47c0e3f91e93b222d799e92a68a0b76cb27bb2adc726ae737d875d834a12&
08:59fdobridge_: <prop_energy_ball> ```
08:59fdobridge_: <prop_energy_ball> [ 2.753298] jt: obj type 1
08:59fdobridge_: <prop_energy_ball> [ 2.753303] jt: obj len 0
08:59fdobridge_: <prop_energy_ball> ```
09:01fdobridge_: <!DodoNVK (she) 🇱🇹> I think those warnings might be triggered by a buggy ACPI implementation
09:01fdobridge_: <prop_energy_ball> Wonder if that is related to VM at least
09:01fdobridge_: <prop_energy_ball> No issues there with AMD passthru or NV with prop driver tho
09:01fdobridge_: <prop_energy_ball> or on windows passthru
09:01fdobridge_: <!DodoNVK (she) 🇱🇹> The fifo error is definitely not normal though
09:02fdobridge_: <prop_energy_ball> -22 is EINVAL right
09:03fdobridge_: <!DodoNVK (she) 🇱🇹> I think so
09:06fdobridge_: <prop_energy_ball> guess a `gsp->intr[i].nonstall == ~0` then, whatever that implies 😅
09:07fdobridge_: <prop_energy_ball> Plasma Wayland runs awful but Plasma X11 seems fine
09:08fdobridge_: <!DodoNVK (she) 🇱🇹> Dave will definitely love debugging this /s
09:09fdobridge_: <prop_energy_ball> https://cdn.discordapp.com/attachments/1034184951790305330/1186596147901513738/cover2.png?ex=6593d2c7&is=65815dc7&hm=48c38bc69f91b13b168684a81d598ad7625eec661e1e0dbd43faae708afb7d96&
09:17fdobridge_: <prop_energy_ball> ty for the package btw, it was useful reference
09:17fdobridge_: <!DodoNVK (she) 🇱🇹> Anyway it seems the GSP firmware is being loaded here (can you disable it?)
09:20fdobridge_: <prop_energy_ball> Yeah can do, I manually enabled it.
09:21fdobridge_: <prop_energy_ball> Seems like enabling gsp hoses everything so I go back to llvmpipe.
09:54fdobridge_: <esdrastarsis> Strange, Nouveau is calling an function for Ampere GPUs (ga100_fifo_nonstall_ctor) on a Turing GPU
09:54fdobridge_: <prop_energy_ball> This is a preproduction GPU, so there is a definite chance of weirdness. I re-flashed it with a stock VBIOS tho
09:55fdobridge_: <prop_energy_ball> The original bios used to hang and bring the system down all the time 🐸
09:58fdobridge_: <esdrastarsis> Strange, Nouveau is calling a function for Ampere GPUs (ga100_fifo_nonstall_ctor) on a Turing GPU (edited)
09:59fdobridge_: <!DodoNVK (she) 🇱🇹> That's actually fine I think
10:12fdobridge_: <marysaka> might be worth checking what firmware it tried to select
10:12fdobridge_: <marysaka> maybe it's trying to load an Ampere one
10:19fdobridge_: <prop_energy_ball> How can I check that?
10:22fdobridge_: <prop_energy_ball> also my first MR :D 🎉
10:22fdobridge_: <prop_energy_ball> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26752/
10:23fdobridge_: <prop_energy_ball> hooking up VK_KHR_present_wait + VK_KHR_present_id
10:24fdobridge_: <esdrastarsis> There is a function (nvkm_debug) in r535_gsp_load_fw that prints the firmware that is loaded, but I don't remember which kernel parameter to print in the kernel log
10:29fdobridge_: <!DodoNVK (she) 🇱🇹> So does DXVK and ~~D12VK~~ vkd3d-proton use it too? 🤔
10:31fdobridge_: <esdrastarsis> @prop_energy_ball I think your issue is known: https://gitlab.freedesktop.org/drm/nouveau/-/issues/283
10:32fdobridge_: <prop_energy_ball> yes
10:38fdobridge_: <!DodoNVK (she) 🇱🇹> Enjoy the 3 issue mentions then :happy_gears:
10:43fdobridge_: <esdrastarsis> It would be nice to also add the sparse residency MR to the DXVK tracker issue
10:44fdobridge_: <esdrastarsis> oh, and in vkd3d-proton issue too
10:45fdobridge_: <!DodoNVK (she) 🇱🇹> I can't seem to find it
10:48fdobridge_: <esdrastarsis> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26719
10:48fdobridge_: <!DodoNVK (she) 🇱🇹> It has no NVK label 🍩
10:50fdobridge_: <dadschoorse> ofc, the label marker ignores draft MRs
10:55fdobridge_: <marysaka> hmm I think it report the arch a bit before trying to init? I can't remember :vReiAgony:
10:55fdobridge_: <prop_energy_ball> fixd
11:26fdobridge_: <!DodoNVK (she) 🇱🇹> :nouveau:🇫🇷
11:26fdobridge_: <!DodoNVK (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1186630748405510154/Screenshot_20231219_132639.png?ex=6593f300&is=65817e00&hm=75d904893eb4d23de410921e412cf985a669a3ebd139efe12240050581919258&
11:45fdobridge_: <!DodoNVK (she) 🇱🇹> Anyway that patch is now in vulkan-nouveau-git :triangle_nvk:
13:17fdobridge_: <leopard1907> inb4 NVK runs more D3D12 games than ANV 🐸
13:42fdobridge_: <huntercz122> i could boot cyberpunk 2077 already
13:42fdobridge_: <huntercz122> but nothing showed up
13:42fdobridge_: <huntercz122> just black screen
13:42fdobridge_: <huntercz122> but nvk did something
13:47fdobridge_: <!DodoNVK (she) 🇱🇹> Can you check the Wine/Proton logs and dmesg?
13:49fdobridge_: <!DodoNVK (she) 🇱🇹> If dmesg shows nothing interesting then send the Wine/Proton logs
13:55fdobridge_: <huntercz122> after work
13:58fdobridge_: <dadschoorse> wouldn't be surprising, easier hw and more modern backend compiler
14:00fdobridge_: <leopard1907> Yes, Anholt complained about whole compiler being in a very bad shape in one of the issue reports. 🐸:frog_gears:
14:58fdobridge_: <gfxstrand> Yeah, the Intel backends are the very definition of production code. They're not good but they've been hammered until they work well enough.
15:03fdobridge_: <gfxstrand> Today's goals:
15:03fdobridge_: <gfxstrand> 1. Make @dadschoorse happy with my fp64 MR and land out:
15:03fdobridge_: <gfxstrand> 2. Get the "NVK holiday update" blog post out the door.
15:04fdobridge_: <samantas5855> 3.fermi support
15:05fdobridge_: <!DodoNVK (she) 🇱🇹> 4. Working gamescope
15:05fdobridge_: <Sid> 5. world domination
15:16fdobridge_: <!DodoNVK (she) 🇱🇹> 6. NVK mascot :triangle_nvk:
16:47f_: Fermi in NVK?
16:53fdobridge_: <gfxstrand> That'll teach me to post my ToDo list. 😂
16:53fdobridge_: <gfxstrand> First on the ToDo list, apparently, was to fix my laptop keyboard. 🤦🏻♀️ l
16:54karolherbst: f_: probably never gonna happen unless somebody is really really bored and cares about GPUs we get 0 perf on anyway
16:54f_: Pretty much what I thought :P
16:54fdobridge_: <!DodoNVK (she) 🇱🇹> Time to infodump various feature suggestions then 😅
16:54karolherbst: reclocking on fermi is kinda the first step here :')
16:55f_: Isn't that WIP?
16:56f_: + "NVK" -> VK -> Vulkan?
16:56f_: (not sure about that one ^)
16:57fdobridge_: <!DodoNVK (she) 🇱🇹> 7. Get Cyberpunk 2077 working on NVK
16:57fdobridge_: <!DodoNVK (she) 🇱🇹> 8. GL_NV_mesh_shader support
16:58fdobridge_: <gfxstrand> No!
16:58fdobridge_: <gfxstrand> What's actually missing for Cyberpunk?
16:58f_: 9. Profit for a while
16:58fdobridge_: <!DodoNVK (she) 🇱🇹> 10. NVIDIA driver compatibility mode for shaders (so that all of my Minecraft shader packs work OOTB)
16:58fdobridge_: <!DodoNVK (she) 🇱🇹> Not sure (all I know that Hunter got a black screen with it)
17:01karolherbst: f_: yeah, but vulkan is mostly interesting for gaming, and gaming with a super slow GPU ain't fun
17:01f_: Fermi doesn't support vulkan to begin with AFAIK.
17:01fdobridge_: <!DodoNVK (she) 🇱🇹> I'm getting MFC flashbacks (because Wine developers have some phobia/allergy of MFC which leads to none of the mfc* libraries being present in Wine at all; not even as a stub)
17:02karolherbst: f_: yeah... though there aren't really strong reasons for having something good enough (tm)
17:02karolherbst: like it's good enough for using zink on top of it
17:02f_: Now..aren't some of them (e.g. quadro) supposed to be used for VFX? :^)
17:05f_:suddently starting blender
17:05f_: Ah, it crashes?
17:05fdobridge_: <!DodoNVK (she) 🇱🇹> But more realistically I guess if the proprietary Microsoft library works completely fine for the MFC stuff on Wine then no one bothers to actually implement it (I'm still surprised no one has even put a 100% stub of it on Wine though; that definitely happened for much more niche libraries)
17:05f_: with a bunch of `nouveau:` logs..
17:06f_: (might be a regression?)
17:08karolherbst: possibly
17:08karolherbst: but I'm on PTO, so I ain't gonna do anything about it until next year 🙃
17:09fdobridge_: <gfxstrand> Ah, yes. The classic "black screen" bug. I've seen that one a few times. 😂
17:13fdobridge_: <!DodoNVK (she) 🇱🇹> I'm still waiting for the logs so I can better understand what's going on
17:17fdobridge_: <gfxstrand> The Phoronix forums do not disappoint. 😂
17:17fdobridge_: <gfxstrand>
17:17fdobridge_: <gfxstrand> https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-nvidia-linux-nouveau/1429159-for-at-least-one-game-mesa-s-nvk-driver-can-outperform-nvidia-s-proprietary-driver/page2
17:17fdobridge_: <gfxstrand>
17:17fdobridge_: <gfxstrand> It took them less than a page to start debating whether or not *A Hat in Time* was running on OpenGL when running on NVIDIA. WTF did they get that from?!? I have no idea but that's where it went. 😂
17:21fdobridge_: <!DodoNVK (she) 🇱🇹> Hunter is there 🕵️♀️
17:29fdobridge_: <loothelion (Liam Middlebrook)> The cat in comment #8 on that thread was cute.
17:50f_: @DodoNVK you're waiting for *my* logs?
17:50f_: karolherbst: haha
17:54fdobridge_: <!DodoNVK (she) 🇱🇹> f_: I'm talking about another person called Hunter
17:54f_: Oh ok.
17:57fdobridge_: <huntercz122> ```
17:57fdobridge_: <huntercz122> [ 677.716217] nouveau 0000:01:00.0: GameThread[29705]: job timeout, channel 16 killed!
17:57fdobridge_: <huntercz122> [ 677.716249] [drm:nouveau_job_submit [nouveau]] *ERROR* Trying to push to a killed entity
17:57fdobridge_: <huntercz122> [ 740.494570] nouveau 0000:01:00.0: gsp: mmu fault queued
17:57fdobridge_: <huntercz122> [ 740.494577] nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:16 type:31 scope:1 part:233
17:57fdobridge_: <huntercz122> [ 740.494582] nouveau 0000:01:00.0: fifo:001001:0002:0010:[GameThread[29705]] errored - disabling channel
17:57fdobridge_: <huntercz122> [ 740.494586] nouveau 0000:01:00.0: GameThread[29705]: channel 16 killed!
17:57fdobridge_: <huntercz122> [ 746.708880] nouveau 0000:01:00.0: gsp:msg fn:4111 len:0x28/0x8 res:0x0 resp:0x0
17:57fdobridge_: <huntercz122> ```
17:57fdobridge_: <huntercz122> https://cdn.discordapp.com/attachments/1034184951790305330/1186729029613654026/steam-1091500.log?ex=65944e89&is=6581d989&hm=4055cef30a5199b4b20740fe8fb662ded077e798634e18a4d9979f8e29b641ef&
17:57fdobridge_: <karolherbst🐧🦀> yeah.. that's just the GPU crashing
17:57fdobridge_: <karolherbst🐧🦀> or rather..
17:57fdobridge_: <karolherbst🐧🦀> stuck?
17:58fdobridge_: <Sid> happens to me on any app using dxvk
17:58fdobridge_: <karolherbst🐧🦀> with prime that is, right?
17:58fdobridge_: <Sid> yes
17:59fdobridge_: <karolherbst🐧🦀> I wonder if kernel or userspace bug...
17:59fdobridge_: <Sid> hunter and I have the same cpu/gpus
17:59fdobridge_: <karolherbst🐧🦀> I see
17:59fdobridge_: <Sid> just different laptops
18:00fdobridge_: <Sid> while I can't tell if it's a kernel or userspace bug, I can say with full confidence it only happens if GSP is enabled
18:02fdobridge_: <karolherbst🐧🦀> yeah.. could be a GSP limitation.. I have a laptop with ada here and haven't actually tested displaying anything
18:02fdobridge_: <karolherbst🐧🦀> something for next year to look into, but maybe @airlied has time for it?
18:02fdobridge_: <butterflies> f_: NVIDIA ended up shipping D3D12 on Fermi towards the end for some reason
18:02fdobridge_: <karolherbst🐧🦀> wasn't there even this magical driver version with vulkan on fermi?
18:03fdobridge_: <butterflies> don't think anything was public there?
18:03fdobridge_: <karolherbst🐧🦀> I think it was
18:03fdobridge_: <butterflies> might or might not have existed I guess
18:03fdobridge_: <karolherbst🐧🦀> they just removed vulkan support pretty quickly
18:04fdobridge_: <butterflies> ok I see it on vulkan.gpuinfo
18:04fdobridge_: <butterflies> it was there indeed...
18:04fdobridge_: <karolherbst🐧🦀> yeah..
18:04fdobridge_: <karolherbst🐧🦀> it was just not compliant afaik
18:04fdobridge_: <!DodoNVK (she) 🇱🇹> nouveau iceberg when? 🐸
18:05f_: @butterflies oh lol
18:05f_: @karolherbst hm :P
18:05fdobridge_: <butterflies> ```"Had to register to post here, since I was the one that uploaded that entry.
18:06fdobridge_: <butterflies>
18:06fdobridge_: <butterflies> It’s not a compromised database, that entry is uploaded directly - unaltered - from the utility that they had for uploading them.
18:06fdobridge_: <butterflies>
18:06fdobridge_: <butterflies> Notice that my GTX 570 reports not a single image format supported, all the Vulkan applications also fail to initialize the display. Most likely Nvidia didn’t make a whitelist for enabling the Vulkan system.
18:06fdobridge_: <butterflies>
18:06fdobridge_: <butterflies> http://i.imgur.com/OB3X2jF.png 17"
18:06fdobridge_: <butterflies>
18:06fdobridge_: <butterflies> It’s not fake but it is non functional anyway on Fermi.
18:06fdobridge_: <butterflies>
18:06fdobridge_: <butterflies> ```
18:06fdobridge_: <karolherbst🐧🦀> yeah...
18:06f_: Yeah I was pretty sure Fermi didn't have Vulkan.
18:07fdobridge_: <butterflies> https://on-demand.gputechconf.com/gtc/2016/events/vulkanday/Vulkan_Overview.pdf
18:07fdobridge_: <butterflies> slide 55
18:07fdobridge_: <karolherbst🐧🦀> images work differently on fermi, but in theory it should more or less work
18:07fdobridge_: <karolherbst🐧🦀> you just won't ever hit compliance
18:07fdobridge_: <karolherbst🐧🦀> probably
18:07fdobridge_: <butterflies> https://cdn.discordapp.com/attachments/1034184951790305330/1186731617385971773/image.png?ex=659450f1&is=6581dbf1&hm=94680b1cc86f397b959abe2d06eb4464af7f8eb21d94977105f7dc76fcfdb2c9&
18:08f_: Not like quadro Fermi is supposed to be used for gaming anyway
18:10fdobridge_: <!DodoNVK (she) 🇱🇹> Did your system freeze?
18:11fdobridge_: <huntercz122> nope
18:11fdobridge_: <!DodoNVK (she) 🇱🇹> So I guess the PRIME freeze issue isn't relevant here
18:13fdobridge_: <huntercz122> without GSP it spits more stuff with stacktrace to dmesg, also it repeats the same stuff for some time until the game gives up and gives me crash report dialog
18:14fdobridge_: <!DodoNVK (she) 🇱🇹> What stuff?
18:15fdobridge_: <huntercz122> https://cdn.discordapp.com/attachments/1034184951790305330/1186733531813118132/message.txt?ex=659452ba&is=6581ddba&hm=19f65b670e79d912f4f0af53b55455678dd46f32d2b9d228e5277f28a125d0d8&
18:17fdobridge_: <huntercz122> i booted to 6.6 by accident oops
18:24fdobridge_: <huntercz122> on 6.7-rc6 without GSP
18:24fdobridge_: <huntercz122> ```
18:24fdobridge_: <huntercz122> [ 193.237597] nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at 0000003fddc00000 engine 0f [ce0] client 20 [HUB/HSCE0] reason 02 [PTE] on channel 2 [017fe41000 GameThread[1914]]
18:24fdobridge_: <huntercz122> [ 193.237608] nouveau 0000:01:00.0: fifo:000000:0002:[GameThread[1914]] rc scheduled
18:24fdobridge_: <huntercz122> [ 193.237610] nouveau 0000:01:00.0: fifo:000000: rc scheduled
18:24fdobridge_: <huntercz122> [ 193.237615] nouveau 0000:01:00.0: fifo:000000:0002:0002:[GameThread[1914]] errored - disabling channel
18:24fdobridge_: <huntercz122> [ 193.237620] nouveau 0000:01:00.0: GameThread[1914]: channel 2 killed!
18:24fdobridge_: <huntercz122> [ 193.237886] nouveau 0000:01:00.0: fifo: fault 01 [VIRT_WRITE] at 0000003fddc00000 engine 0f [ce0] client 20 [HUB/HSCE0] reason 02 [PTE] on channel 2 [017fe41000 GameThread[1914]]
18:24fdobridge_: <huntercz122> [ 193.237922] nouveau 0000:01:00.0: GameThread[1914]: error fencing pushbuf: -19
18:24fdobridge_: <huntercz122> [ 218.319599] nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
18:24fdobridge_: <huntercz122> ```
18:24fdobridge_: <!DodoNVK (she) 🇱🇹> That looks more interesting
18:28fdobridge_: <!DodoNVK (she) 🇱🇹> Can you set `NVK_DEBUG=zero_memory,push_sync` variable?
18:29fdobridge_: <huntercz122> oh it froze even without GSP
18:29fdobridge_: <huntercz122> oke
18:30fdobridge_: <!DodoNVK (she) 🇱🇹> I assume you know the Steam variable trick
18:34fdobridge_: <huntercz122> `Whoa! Cyberpunk 2077 has flatlined.`
18:34fdobridge_: <huntercz122> same crash
18:34fdobridge_: <huntercz122> even addresses are same
18:34fdobridge_: <!DodoNVK (she) 🇱🇹> Send the Proton log too
18:37fdobridge_: <huntercz122> https://cdn.discordapp.com/attachments/1034184951790305330/1186738976216862820/steam-1091500.log?ex=659457cc&is=6581e2cc&hm=2dcb8f80ab8308f245b05b39bb730fbde3d1e306c62dbd63325725314cced223&
18:39fdobridge_: <huntercz122> this happens on all vulkan games after Cyberpunk crashes, so unrelated ig
18:43fdobridge_: <huntercz122> this happens on all vulkan games after Cyberpunk crash, so unrelated ig (edited)
20:44fdobridge_: <esdrastarsis> It's a d3d12 game, so descriptors are missing I think
20:49fdobridge_: <gfxstrand> That's believable.
20:49fdobridge_: <gfxstrand> Did they ever publish a Vulkan-native version? They released it for Stadia so one exists.
20:50fdobridge_: <!DodoNVK (she) 🇱🇹> `188.285:0368:0390:warn:vkd3d-proton:d3d12_device_validate_shader_meta: Attempting to use 16-bit operations in shader c16d9cdff15a6e3c, but this is not supported.` 🤔
20:52fdobridge_: <esdrastarsis> No
20:52fdobridge_: <!DodoNVK (she) 🇱🇹> Here's where NVK just dies :triangle_nvk:
20:52fdobridge_: <!DodoNVK (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1186773194535882872/message.txt?ex=659477aa&is=658202aa&hm=e40afc692b08231034bd3c5c8227f8b8116f13890a75c14ef3e3f1bc7e5c32a4&
20:54fdobridge_: <gfxstrand> Uh... That'll cause problems...
20:54fdobridge_: <gfxstrand> I mean, int16 is fine but fp16 is going to be a bad time.
20:54fdobridge_: <esdrastarsis> But Red Dead Redemption 2 has it, the game was released on Stadia and the Vulkan version is available on PC
20:55fdobridge_: <gfxstrand> Yeah, I think I knew about that one.
20:57fdobridge_: <!DodoNVK (she) 🇱🇹> This is what is needed :ferris:
20:57fdobridge_: <!DodoNVK (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1186774256231657522/message.txt?ex=659478a7&is=658203a7&hm=89385f2d51ca6969f16d5232de0742625cb6e750a27d0c660074d5d0a51f3b2e&
20:59fdobridge_: <gfxstrand> Yeah, we don't have `shaderFloat16` yet
21:00fdobridge_: <gfxstrand> @marysaka is going to work on it at some point but I think she got lost in NIR trying to get that last mesh test passing. 😂
21:01fdobridge_: <marysaka> https://tenor.com/view/kaguya-love-is-war-nervous-tea-coffee-lover-gif-13510467
21:02fdobridge_: <marysaka> Talking about that how would you handle finding the "deepest" block and going back in reverse, do I need to calculate that manually or is there a fancy way :aki_thonk:
21:02fdobridge_: <!DodoNVK (she) 🇱🇹> Is FP16 harder to support than FP64?
21:02fdobridge_: <marysaka> (but yeah I kind of ordered my thought and logic today for that NIR pass)
21:03fdobridge_: <gfxstrand> Much, unfortunately.
21:03fdobridge_: <marysaka> (but yeah I kind of ordered my thoughts and logic today for that NIR pass) (edited)
21:03fdobridge_: <gfxstrand> I mean, it's not *that* bad but it's definitely more annoying.
21:04fdobridge_: <gfxstrand> Because the hardware doesn't have fp16, it has v16vec2 so everything is a vec2 everywhere.
21:04fdobridge_: <gfxstrand> NIR has the passes to handle it but everything gets kinda fiddly.
21:04fdobridge_: <gfxstrand> Because the hardware doesn't have fp16, it has f16vec2 so everything is a vec2 everywhere. (edited)
21:05fdobridge_: <gfxstrand> There's no fancy way.
21:05fdobridge_: <marysaka> That's what I thought oki
21:09fdobridge_: <karolherbst🐧🦀> at least there is `HMNMX2`
21:10fdobridge_: <karolherbst🐧🦀> well...
21:10fdobridge_: <karolherbst🐧🦀> ampere+ 🥲
21:10fdobridge_: <gfxstrand> And then there's the `DMNMX` which they deleted on Volta. 🙃
21:10fdobridge_: <karolherbst🐧🦀> yeah..
21:10fdobridge_: <karolherbst🐧🦀> they kinda like to remove/add `MNMX`
21:11fdobridge_: <gfxstrand> Well, yeah...
21:40fdobridge_: <dadschoorse> does nv have 16bit tex/image instructions?
21:41fdobridge_: <karolherbst🐧🦀> in which sense?
21:41fdobridge_: <karolherbst🐧🦀> the ISA only knows 32 bit registers
21:41fdobridge_: <dadschoorse> f16vec4 = texture(sampler, coord);
21:41fdobridge_: <karolherbst🐧🦀> ahh yeah
21:41fdobridge_: <karolherbst🐧🦀> it can do that
21:42fdobridge_: <karolherbst🐧🦀> it's packed as a u32vec2 tho
21:42fdobridge_: <dadschoorse> nice, another user for the first NIR pass that I wrote 🐸
21:42fdobridge_: <karolherbst🐧🦀> even supports two rounding modes: `.RN` and `.RZ`
21:43fdobridge_: <karolherbst🐧🦀> the hardware writes 0 to unused components
21:43fdobridge_: <karolherbst🐧🦀> (naturally)
21:44fdobridge_: <dadschoorse> also nice, another backend that needs better NIR vectorization 🐸
21:44fdobridge_: <karolherbst🐧🦀> I think the problem with vectorization is, that you'll never reach the point where you don't need a better one 😄
21:45fdobridge_: <karolherbst🐧🦀> the other thing I'm wondering is, if we have a plan in regards to bf16...
21:45fdobridge_: <dadschoorse> yeah but the current one often never vectorizes anything for fully scalar input like DXIL
21:45fdobridge_: <dadschoorse> Do we need a plan?
21:46fdobridge_: <karolherbst🐧🦀> for CL I kinda do long-term I thing...
21:46fdobridge_: <karolherbst🐧🦀> not sure if there is any extension yet
21:47fdobridge_: <dadschoorse> does anyone have HW for any bf16 except dot2acc?
21:47fdobridge_: <karolherbst🐧🦀> nvidia does
21:47fdobridge_: <karolherbst🐧🦀> all fp16 ops can also do it in bf16
21:48fdobridge_: <karolherbst🐧🦀> mhh..
21:48fdobridge_: <karolherbst🐧🦀> maybe except `HSETP`
21:48fdobridge_: <karolherbst🐧🦀> but `HFMA2` supports it
21:48fdobridge_: <gfxstrand> Intel does
21:48fdobridge_: <dadschoorse> ah, so amd is the exception
21:49fdobridge_: <karolherbst🐧🦀> though bf16 `HSETP` can easily be done with the `FP32` one :ferrisUpsideDown:
21:50fdobridge_: <karolherbst🐧🦀> all the conversion ops also can do `bf16`
21:50fdobridge_: <karolherbst🐧🦀> for weird reasons
21:50fdobridge_: <karolherbst🐧🦀> (as output)
21:50fdobridge_: <dadschoorse> can nvidia freely swizzle within the f16 vec2? and are neg/abs/sat per instruction or per component?
21:51fdobridge_: <karolherbst🐧🦀> it can't
21:51fdobridge_: <karolherbst🐧🦀> the ISA is 32 bit, period
21:51fdobridge_: <karolherbst🐧🦀> some instructions can select vector components on registers
21:51fdobridge_: <karolherbst🐧🦀> but that's the exception
21:51fdobridge_: <gfxstrand> Yeah, I thought some of them could say which half they source from.
21:51fdobridge_: <karolherbst🐧🦀> yeah
21:51fdobridge_: <dadschoorse> yeah that's what I meant
21:52fdobridge_: <karolherbst🐧🦀> well..
21:52fdobridge_: <karolherbst🐧🦀> only the half instructions can
21:52fdobridge_: <karolherbst🐧🦀> so.. 4?
21:52fdobridge_: <gfxstrand> Yeah...
21:52fdobridge_: <gfxstrand> It's pretty limited
21:52fdobridge_: <karolherbst🐧🦀> it's more a feature of the instruction than of the ISA
21:52fdobridge_: <gfxstrand> add/mul/fma
21:52fdobridge_: <karolherbst🐧🦀> which is one instruction 😄
21:52fdobridge_: <gfxstrand> hehe
21:53fdobridge_: <karolherbst🐧🦀> `F2F` can do something like that
21:53fdobridge_: <karolherbst🐧🦀> as well
21:53fdobridge_: <gfxstrand> Yeah, f2f has a half thing
21:53fdobridge_: <karolherbst🐧🦀> anyway.. the point was, it's an overstatement to say "freely"
21:53fdobridge_: <gfxstrand> But it doesn't mask destinations so vec2 f2f16 is still three instructions: 1 F2F and one PRMT to combine them.
21:53fdobridge_: <dadschoorse> so it's actually worse than RDNA3 for fp16 🐸
21:53fdobridge_: <gfxstrand> But it doesn't mask destinations so vec2 f2f16 is still three instructions: 2 F2F and one PRMT to combine them. (edited)
21:54fdobridge_: <karolherbst🐧🦀> there is `F2FP`
21:54fdobridge_: <gfxstrand> Ooh, nice. So that one's not as bad as I though
21:54fdobridge_: <karolherbst🐧🦀> it still writes the full reg 😄
21:54fdobridge_: <karolherbst🐧🦀> but
21:55fdobridge_: <karolherbst🐧🦀> it has an additional "initial result value" input
21:55fdobridge_: <karolherbst🐧🦀> so you do a = f2fp b a
21:55fdobridge_: <dadschoorse> although amd decided to be annoying and gave us per component neg and no abs at all instead of per instruction neg/abs
21:55fdobridge_: <gfxstrand> funky
21:55fdobridge_: <karolherbst🐧🦀> and then say if you merge the result or not
21:55fdobridge_: <gfxstrand> Makes sense, though. And, honestly, that's nicer for SSA anyway.
21:55fdobridge_: <karolherbst🐧🦀> yeah
21:55fdobridge_: <gfxstrand> That's... a choice.
21:56fdobridge_: <karolherbst🐧🦀> pain
21:57fdobridge_: <gfxstrand> I mean, it may be nicer if your goal is to vectorize things.
21:57fdobridge_: <gfxstrand> Not that NIR knows what to do with that.
21:57fdobridge_: <karolherbst🐧🦀> let's see how that's done on nv..
21:57fdobridge_: <dadschoorse> especially since it's purely an microcode issue, the ALU supports abs just fine because scalar fp16 has abs and can be dual issued (if the ~~stars~~ register caches align, compared to vec2 fp16, which is always dual issued)
21:58fdobridge_: <karolherbst🐧🦀> yeah.. the mod applies to the entire vec2
21:58fdobridge_: <gfxstrand> That's what I would expect
21:58fdobridge_: <karolherbst🐧🦀> but you have all swizzle modes
21:58fdobridge_: <gfxstrand> Yup
21:58fdobridge_: <gfxstrand> So it all maps to NIR quite nicely
21:58fdobridge_: <karolherbst🐧🦀> well
21:59fdobridge_: <karolherbst🐧🦀> except one
21:59fdobridge_: <gfxstrand> ?
21:59fdobridge_: <karolherbst🐧🦀> reverse
21:59fdobridge_: <gfxstrand> WDYM?
21:59fdobridge_: <karolherbst🐧🦀> no `.yx`
21:59fdobridge_: <gfxstrand> 🤦🏻♀️ al
21:59fdobridge_: <gfxstrand> 🤦🏻♀️ (edited)
21:59fdobridge_: <dadschoorse> what do they do with the last bit combination?
21:59fdobridge_: <karolherbst🐧🦀> nothing apparently
21:59fdobridge_: <gfxstrand> That's gonna be pain...
22:00fdobridge_: <gfxstrand> Hopefully it doesn't come up often.
22:00fdobridge_: <karolherbst🐧🦀> I don't know the encoding, so it could be entirely cursed
22:00fdobridge_: <karolherbst🐧🦀> like.. a flag for all three sources together
22:00fdobridge_: <karolherbst🐧🦀> or something
22:00fdobridge_: <karolherbst🐧🦀> yeah..
22:03fdobridge_: <karolherbst🐧🦀> ohh.. an imm16v2 format exists, funky
22:03fdobridge_: <karolherbst🐧🦀> but kinda makes sense
22:04fdobridge_: <karolherbst🐧🦀> `F2FP` is cursed..
22:04fdobridge_: <karolherbst🐧🦀> it has some weird rounding magic going on
22:04fdobridge_: <karolherbst🐧🦀> stochastic rounding
22:05fdobridge_: <karolherbst🐧🦀> and you can specify a threshold
22:05fdobridge_: <dadschoorse> amd also has fixed precision fma, I guess that's another advantage over nv?
22:05fdobridge_: <karolherbst🐧🦀> fixed precision fma?
22:05fdobridge_: <dadschoorse> dest/every source can be either 32bit or 16bit
22:06fdobridge_: <karolherbst🐧🦀> ah yeah.. nvidia doesn't do that
22:06fdobridge_: <karolherbst🐧🦀> you have specific instructions for each width
22:06fdobridge_: <karolherbst🐧🦀> and there is no 16/8 bit alu besides fp16/gf16
22:06fdobridge_: <karolherbst🐧🦀> and there is no 16/8 bit alu besides fp16/bf16 (edited)
22:07fdobridge_: <karolherbst🐧🦀> well..
22:07fdobridge_: <karolherbst🐧🦀> except for `IDP4`
22:07fdobridge_: <karolherbst🐧🦀> *`IDP4A`
22:08fdobridge_: <dadschoorse> amd basically everything for int16, and even packed min/max/add/mul/mad.
22:09fdobridge_: <dadschoorse> though some three operand integer instructions are 32bit only
22:09fdobridge_: <karolherbst🐧🦀> I mean.. those things work the same on 16 bit alu 😄 (more or less)
22:09fdobridge_: <dadschoorse> like xor_add or lshl_add
22:09fdobridge_: <karolherbst🐧🦀> so you just use the 32 bit ops and are done with it
22:10fdobridge_: <karolherbst🐧🦀> I think nvidia just decided to keep their 32 bit only thing going, for simplicity reasons
22:10fdobridge_: <dadschoorse> amd's int16 mul is 8x as fast as full range int32
22:11fdobridge_: <dadschoorse> 2x as fast as int24 mul (yeah, that's a thing...)
22:11fdobridge_: <karolherbst🐧🦀> how fast is `IMAD` compared to MOV?
22:12fdobridge_: <dadschoorse> amd has no full precision imad, only int24/int16 mul with 32bit add
22:12fdobridge_: <karolherbst🐧🦀> heh
22:12fdobridge_: <karolherbst🐧🦀> okay
22:12fdobridge_: <karolherbst🐧🦀> how fast is `IMUL` compared to `MOV` then?
22:12fdobridge_: <dadschoorse> 1/4, and 1/8 on paper since rdna3 because dual issue memes
22:13fdobridge_: <karolherbst🐧🦀> okay..
22:13fdobridge_: <karolherbst🐧🦀> so nvidias `IMAD` is as fast as AMDs int16 mul then?
22:13fdobridge_: <karolherbst🐧🦀> 🙂
22:13fdobridge_: <karolherbst🐧🦀> for context: nvidia uses `IMAD` instead of `MOV` very often because `MOV` is apparently annoying
22:14fdobridge_: <karolherbst🐧🦀> and they have more alu bits to speed things up
22:14fdobridge_: <karolherbst🐧🦀> so they often interleave `IMAD` and `MOV`
22:14fdobridge_: <dadschoorse> half as fast, unless it has 2 IMAD ALUs
22:14fdobridge_: <karolherbst🐧🦀> `IMAD` is fixed latency
22:14fdobridge_: <karolherbst🐧🦀> so it has enough alus to keep the shader going
22:15fdobridge_: <dadschoorse> doesn't nvidia have 2 issue ports for fp32 since ampere tho? and only one can be used for int?
22:15fdobridge_: <karolherbst🐧🦀> don't think so
22:16fdobridge_: <karolherbst🐧🦀> at least not looking at how they use `IMAD` for a lot of things
22:16fdobridge_: <karolherbst🐧🦀> but that part is also kinda magic
22:16fdobridge_: <dadschoorse> umad?
22:17fdobridge_: <karolherbst🐧🦀> umad?
22:17fdobridge_: <!DodoNVK (she) 🇱🇹> umad?
22:17fdobridge_: <esdrastarsis> umad?
22:17fdobridge_: <mohamexiety> IMAD
22:18fdobridge_: <!DodoNVK (she) 🇱🇹> YCBCRMAD