IRC Logs of #nouveau on irc.freenode.net for 2023-12-24

00:01 fdobridge_: <redsheep> Looks like I will have to wait to try to figure out juggling the drivers later, the mirrors for the extra/nvidia arch package are all returning 404
00:33 fdobridge_: <redsheep> Nevermind, I just had to fix my mirrors to install it, and go back to Linux 6.6 to boot it. Testing zink on it now
01:18 fdobridge_: <redsheep> Stuck again, what is the trick to getting zink to load on top of the nvidia driver? This still appears to be loading the prop driver, and not zink:
01:18 fdobridge_: <redsheep>
01:18 fdobridge_: <redsheep> ```VK_DRIVER_FILES=/usr/share/vulkan/icd.d/nvidia_icd.json MESA_LOADER_DRIVER_OVERRIDE=zink prismlauncher```
01:30 fdobridge_: <redsheep> At least that launches, but if I load with:
01:30 fdobridge_: <redsheep>
01:30 fdobridge_: <redsheep> ```VK_DRIVER_FILES=/usr/share/vulkan/icd.d/nvidia_icd.json __GLX_VENDOR_LIBRARY_NAME=mesa MESA_LOADER_DRIVER_OVERRIDE=zink GALLIUM_DRIVER=zink prismlauncher```
01:30 fdobridge_: <redsheep>
01:30 fdobridge_: <redsheep> Then that crashes with ```ERROR: ICD associated with VkPhysicalDevice does not support GetPhysicalDeviceCalibrateableTimeDomainsKHR```
01:30 fdobridge_: <samantas5855> I think I did
01:30 fdobridge_: <samantas5855> __GLX_VENDOR_LIBRARY_NAME=mesa MESA_LOADER_DRIVER_OVERRIDE=zink GALLIUM_DRIVER=zink
01:32 fdobridge_: <redsheep> If I skip the ICD path to match yours then I get:
01:32 fdobridge_: <redsheep>
01:32 fdobridge_: <redsheep> ```DRM kernel driver 'nvidia-drm' in use. NVK requires nouveau.
01:32 fdobridge_: <redsheep> ERROR: ICD associated with VkPhysicalDevice does not support GetPhysicalDeviceCalibrateableTimeDomainsKHR```
01:32 fdobridge_: <redsheep> It doesn't mention NVK without specifying the ICD path
01:33 fdobridge_: <redsheep> It doesn't mention NVK when* specifying the ICD path (edited)
07:49 fdobridge_: <Sid> hm, so it's a bug in GSP DRM handling overall
07:50 fdobridge_: <Sid> and not a gen/setup specific once
07:50 fdobridge_: <Sid> and not a gen/setup specific one (edited)
07:50 fdobridge_: <Sid> interesting
07:50 fdobridge_: <Sid> (referring to DEVICE_LOST)
07:59 fdobridge_: <Sid> ok, I have a question I'm hoping someone can answer for me:
07:59 fdobridge_: <Sid> when NVK calls drmCommandWriteRead() in its push_submit, what exactly does the drmCommandIndex refer to and how does the kernel interpret and handle it?
09:59 fdobridge_: <redsheep> It's possible there are two different device lost bugs which is why I was trying to test if zink could blow up the prop driver like Karol suggested, but given I played so long error-free prior to the kernel patches I do think they're likely the reason I got that error.
09:59 fdobridge_: <Sid> ..oh?
09:59 fdobridge_: <Sid> in that case I think I know which patch introduces the error
10:01 fdobridge_: <Sid> give me ~30 mins to verify
10:01 fdobridge_: <Sid> in the meanwhile
10:02 fdobridge_: <Sid> could you try building the patchset without `0010-nouveau-push-event-block-allowing-out-of-the-fence-context.patch` as well? since I need that commit to not run into system-wide hangs
10:02 fdobridge_: <Sid> I have a feeling that's where the device lost error is being introduced
10:03 fdobridge_: <Sid> I'll try building current rc with *only* that commit on top, you could try everything except that commit
10:03 fdobridge_: <redsheep> Sounds good
10:03 fdobridge_: <Sid> if I run into device lost and you don't, we have our culprit :D
10:08 fdobridge_: <redsheep> Before I can effectively test that I need to figure out disabiling nvidia's driver so I can properly boot the right kernel without uninstalling and reinstalling it all the time
10:10 fdobridge_: <Sid> do you use systemd-boot?
10:10 fdobridge_: <redsheep> Yes
10:10 fdobridge_: <!DodoNVK (she) 🇱🇹> I have this in my kernel optionsl: `module_blacklist=nvidia,nvidia-modeset,nvidia-drm,nvidia-uvm`
10:10 fdobridge_: <!DodoNVK (she) 🇱🇹> I have this in my kernel options: `module_blacklist=nvidia,nvidia-modeset,nvidia-drm,nvidia-uvm` (edited)
10:10 fdobridge_: <Sid> ```
10:10 fdobridge_: <Sid> [sidpr@strogg entries]$ pwd
10:10 fdobridge_: <Sid> /boot/loader/entries
10:10 fdobridge_: <Sid> [sidpr@strogg entries]$ cat caconv.conf
10:11 fdobridge_: <Sid> title Arch Linux
10:11 fdobridge_: <Sid> linux /vmlinuz-linux-caco
10:11 fdobridge_: <Sid> initrd /intel-ucode.img
10:11 fdobridge_: <Sid> initrd /initramfs-linux-caco.img
10:11 fdobridge_: <Sid> options rw rootfstype=bcachefs root=/dev/nvme0n1:/dev/nvme1n1p3:/dev/sda i915.enable_fbc=1 i915.lvds_downclock=1 i915.enable_guc=2 i915.enable_psr=1 pcie_aspm=force pcie_aspm.policy=performance drm.vblankoffdelay=1 ahci.mobile_lpm_policy=1 module_blacklist=nouveau
10:11 fdobridge_: <Sid> [sidpr@strogg entries]$ cat cacogsp.conf
10:11 fdobridge_: <Sid> title Arch Linux (With GSP)
10:11 fdobridge_: <Sid> linux /vmlinuz-linux-caco
10:11 fdobridge_: <Sid> initrd /intel-ucode.img
10:11 fdobridge_: <Sid> initrd /initramfs-linux-caco.img
10:11 fdobridge_: <Sid> options rw rootfstype=bcachefs root=/dev/nvme0n1:/dev/nvme1n1p3:/dev/sda i915.enable_fbc=1 i915.lvds_downclock=1 i915.enable_guc=2 i915.enable_psr=1 pcie_aspm=force pcie_aspm.policy=performance drm.vblankoffdelay=1 ahci.mobile_lpm_policy=1 module_blacklist=nvidia nouveau.config=NvGspRm=1 intel_idle.max_cstate=1
10:11 fdobridge_: <Sid> ```
10:11 fdobridge_: <Sid> is what's working for me on a prime setup
10:11 fdobridge_: <!DodoNVK (she) 🇱🇹> That's a lot of i915 stuff 🟦
10:11 fdobridge_: <Sid> you only need to blacklist nvidia, since loading nvidia loads everything else
10:11 fdobridge_: <Sid> power saving memes :frog_gears:
10:12 fdobridge_: <redsheep> If I don't have those directories I assume I need to create them to get to /boot/loader/entries/?
10:12 fdobridge_: <Sid> /boot is your ESP partition
10:13 fdobridge_: <!DodoNVK (she) 🇱🇹> I also had to create a fake nvidia-utils.conf file in /etc/modprobe.d
10:13 fdobridge_: <Sid> yeah, you'll have to check the system to see if nouveau is being blacklisted anywhere and disable that
10:14 fdobridge_: <Sid> ..or actually
10:15 fdobridge_: <Sid> if you use X11 you should be able to just use optimus-manager
10:15 fdobridge_: <Sid> no need to reboot for switching then, you'll just have to log out and back into your session again
10:16 fdobridge_: <redsheep> Does that work even if the igpu is disabled and/or amd?
10:16 fdobridge_: <Sid> https://github.com/Askannz/optimus-manager/wiki/A-guide--to-power-management-options#configuration-2--pci-power-control
10:16 fdobridge_: <Sid> it should? I'm not sure
10:16 fdobridge_: <Sid> it doesn't unload the nvidia module consistently for me is all I can tell you
10:16 fdobridge_: <redsheep> Optimus is the branding for nvidia+intel so 🤷
10:17 fdobridge_: <redsheep> I'll try it
10:17 fdobridge_: <Sid> readme says it works with nvidia+amd as well
10:19 fdobridge_: <redsheep> ... I'm reviewing the readme and I am questioning my life choices, I am just going to uninstall the nvidia driver lol
10:19 fdobridge_: <Sid> heh
10:19 fdobridge_: <Sid> I feel like multiple boot options is the best way to do it
10:20 fdobridge_: <Sid> I'm not entirely sure how the boot options will look like if you use UKI though
10:20 fdobridge_: <Sid> or the config for it, rather
10:20 fdobridge_: <redsheep> That's valid but my linux install basically exists just to test nvk
10:20 fdobridge_: <Sid> :o
10:21 fdobridge_: <Sid> this machine is my daily driver 😅
10:21 fdobridge_: <redsheep> Honestly, I am really not very familiar with linux as a whole, I am just here in hopes that some day I will want to use it. I see the end times coming for me using windows
10:22 fdobridge_: <Sid> that's fair, I get that
10:22 fdobridge_: <Sid> I too have a fair few friends unhappy with the direction windows is taking, planning on jumping ship when win10 goes EOL
10:26 fdobridge_: <redsheep> Yeah I did daily drive linux for 8 months and the nvidia driver situation was the final straw that drove me away at the time, it just broke everything.
10:28 fdobridge_: <Sid> I've been using linux for the past 3 years on this laptop and haven't really had too many driver-related troubles, but I suppose that's because of the iGPU doing the heavy lifting
10:28 fdobridge_: <Sid> I don't even have a mux switch 😅
10:30 fdobridge_: <redsheep> My personal system exists for tweaking and overclocking so I always have really recent hardware and wacky configs. When testing wayland especially it was blowing everything up. Anyway, I am building the kernel now with patch 10 removed
10:39 fdobridge_: <redsheep> ...Something blew up my session mid-build. Seems to be an Nvidia driver bug. Glad to be deleting it.
10:40 fdobridge_: <Sid> :(
10:45 fdobridge_: <redsheep> Oh it's building against rc7 now
10:47 fdobridge_: <Sid> that shouldn't be a problem, there haven't been any nouveau related commits in the rc
10:53 fdobridge_: <Sid> device_lost achieved 💪
10:54 fdobridge_: <redsheep> I've haven't had enough time yet to be sure it won't happen, but I have been rapidly saving and exiting over and over to try to break it the same way it did for me earlier, and I've had no issue
10:54 fdobridge_: <Sid> you could try running my api trace
10:55 fdobridge_: <Sid> or, well, replay the
10:55 fdobridge_: <Sid> or, well, replay them (edited)
10:55 fdobridge_: <redsheep> Yeah trying to find your old message with that now
10:56 fdobridge_: <Sid> http://cloud.sidonthe.net/d/492adfc31dde49ffbd86/
10:56 fdobridge_: <redsheep> Here it was
10:56 fdobridge_: <Sid> go for no-push-sync one
10:57 fdobridge_: <Sid> install the apitrace package, then run `apitrace replay /path/to/downloaded/trace`
10:57 fdobridge_: <Sid> might have to do some wine shenanigans to get it to run iirc
10:58 fdobridge_: <redsheep> Are you able to download back your own trace? I am hitting a security error
10:58 fdobridge_: <Sid> @asdqueerfromeu knows how to replay traces better than I do 🐸
10:58 fdobridge_: <Sid> ...ah
10:58 fdobridge_: <Sid> hang on, gimme 5 mins
10:58 fdobridge_: <!DodoNVK (she) 🇱🇹> https://cdn.discordapp.com/attachments/1034184951790305330/1188435634642235513/image.png?ex=659a83ef&is=65880eef&hm=d760b9e3b9f26b446aceb4ad79302ec34dfdcbde9984094a2654a35e079ffdab&
10:59 fdobridge_: <redsheep> Yeah firefox won't even let me bypass it
11:01 fdobridge_: <Sid> am on it
11:01 fdobridge_: <redsheep> Should I download bottles and run the windows version of apitrace, or what? I assume this is a dx11 trace?
11:01 fdobridge_: <Sid> domain name change shenanigans
11:04 fdobridge_: <Sid> there we go, dl fixed
11:04 fdobridge_: <Sid> uh, hm... @asdqueerfromeu?
11:05 fdobridge_: <!DodoNVK (she) 🇱🇹> How did you make the trace?
11:05 fdobridge_: <redsheep> Their support matrix says linux is supported but doesn't clearly define that dx11 in particular is supported on linux
11:05 fdobridge_: <Sid> followed dxvk wiki guide
11:05 fdobridge_: <Sid> but didn't disable dxvk
11:06 fdobridge_: <!DodoNVK (she) 🇱🇹> That means it's a Direct3D trace
11:06 fdobridge_: <Sid> mhm
11:06 fdobridge_: <Sid> I'm guessing you need a wine prefix with d3dretrace.exe and dxvk installed
11:07 fdobridge_: <!DodoNVK (she) 🇱🇹> So the person definitely needs the Windows version of apitrace
11:07 fdobridge_: <!DodoNVK (she) 🇱🇹> With DXVK
11:07 fdobridge_: <Sid> ah
11:07 fdobridge_: <Sid> https://github.com/apitrace/apitrace
11:08 fdobridge_: <Sid> 64 bit application was traced
11:09 fdobridge_: <!DodoNVK (she) 🇱🇹> 64-bit apitrace version should always work for replaying
11:09 fdobridge_: <Sid> from the readme: `On 64bits Linux and Windows platforms you'll need apitrace binaries that match the architecture (32bits or 64bits) of the application being traced.`
11:10 fdobridge_: <Sid> oh, wait
11:10 fdobridge_: <Sid> me dumb
11:10 fdobridge_: <Sid> right, makes sense
11:11 fdobridge_: <Sid> got win-apitrace going with lutris
11:12 fdobridge_: <Sid> https://cdn.discordapp.com/attachments/1034184951790305330/1188439000176926751/image.png?ex=659a8711&is=65881211&hm=dde3d17056b7bd0ff77171b4f51e7fc96101b73b58446cd6af00dbf0a47607da&
11:12 fdobridge_: <redsheep> I am just building a bottle right now with dxvk to launch the 64 bit apitrace
11:12 fdobridge_: <Sid> fair :)
11:37 fdobridge_: <redsheep> I like bottles a whole lot less than I did an hour ago. Am I reading the logs right where it seems to suggest that you need the executable for the application you're replaying?
11:37 fdobridge_: <Sid> nope
11:37 fdobridge_: <Sid> only need the trace
11:38 fdobridge_: <Sid> iirc
11:39 fdobridge_: <redsheep> That's what I thought, I was thrown by it listing the directory where the executable, but I see now the listed directory is from your computer, not mine
11:39 fdobridge_: <Sid> ah
11:42 fdobridge_: <Sid> let me try running the trace on the proprietary driver to see what happens in the meanwhile
11:52 fdobridge_: <redsheep> I decided bottles is the worst and replicated what you did with lutris instead, so far though that retrace shows a black screen
12:04 fdobridge_: <redsheep> Is that what you expected @tiredchiku? I get the feeling we have not yet gotten to the bottom of this, while lutris doesn't show device lost anywhere in terminal output it does seem something bad is happening in dmesg each time I replay that trace...
12:04 fdobridge_: <redsheep>
12:04 fdobridge_: <redsheep> ```[ 3774.323994] nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:120 type:13 scope:1 part:233
12:04 fdobridge_: <redsheep> [ 3774.324000] nouveau 0000:01:00.0: fifo:c00000:000f:0078:[d3dretrace.exe[19294]] errored - disabling channel
12:04 fdobridge_: <redsheep> [ 3774.324005] nouveau 0000:01:00.0: d3dretrace.exe[19294]: channel 120 killed!
12:04 fdobridge_: <redsheep> [ 3859.181029] nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:120 type:13 scope:1 part:233
12:04 fdobridge_: <redsheep> [ 3859.181036] nouveau 0000:01:00.0: fifo:c00000:000f:0078:[d3dretrace.exe[20191]] errored - disabling channel
12:04 fdobridge_: <redsheep> [ 3859.181042] nouveau 0000:01:00.0: d3dretrace.exe[20191]: channel 120 killed!
12:04 fdobridge_: <redsheep> [ 4154.290036] nouveau 0000:01:00.0: gsp: rc engn:00000001 chid:88 type:13 scope:1 part:233
12:04 fdobridge_: <redsheep> [ 4154.290042] nouveau 0000:01:00.0: fifo:c00000:000b:0058:[d3dretrace.exe[21059]] errored - disabling channel
12:04 fdobridge_: <redsheep> [ 4154.290049] nouveau 0000:01:00.0: d3dretrace.exe[21059]: channel 88 killed!```
12:06 fdobridge_: <Sid> makes sense, since when I captured the trace the game never went past a black screen
12:07 fdobridge_: <Sid> I can capture a trace again on the proprietary driver with the game loading until the main menu and we could try running that
12:07 fdobridge_: <redsheep> That would probably be better, I am building an unpatched rc7 kernel now to see if that stops the dmesg errors
12:08 fdobridge_: <Sid> on it :)
12:12 fdobridge_: <Sid> uploading
12:22 fdobridge_: <redsheep> Ok so that trace yields channel killed even without any of the 11 patches, so maybe that is just the expected result for a broken trace?
12:33 fdobridge_: <redsheep> I've tried downloading the new trace 4 times now, it's like the download times out. Maybe upload it in parts? I will have to test all this more later
12:45 fdobridge_: <Sid> can't split it into parts, but I'll try compressing it
12:49 fdobridge_: <Sid> hm, that didn't do much
12:50 fdobridge_: <Sid> @redsheep here: https://drive.google.com/file/d/1b4QMaQo2nAXIfPcZur98NfuRndlS_zt5/view?usp=drive_link
12:51 fdobridge_: <Sid> @redsheep here: https://drive.google.com/file/d/1b4QMaQo2nAXIfPcZur98NfuRndlS_zt5/view (edited)
12:51 fdobridge_: <Sid> @redsheep here: https://drive.google.com/file/d/1b4QMaQo2nAXIfPcZur98NfuRndlS_zt5/view (edited)
13:52 fdobridge_: <Sid> framerate is scuffed on the replay but it does work on proprietary drivers
13:59 fdobridge_: <Sid> and on nvk it hits a device lost
14:00 fdobridge_: <Sid> I'll try the other scenario in an nvidia-only environment in a bit
14:53 fdobridge_: <Sid> ok, so it's not a patchset regression because I had a device lost in nv-only env even on unpatched 6.7-rc6
15:25 fdobridge_: <!DodoNVK (she) 🇱🇹> So the patchset didn't introduce DEVICE_LOST?
15:25 fdobridge_: <!DodoNVK (she) 🇱🇹> I wonder if there's some NVK regression
16:26 fdobridge_: <Sid> nope, wasn't the patchset
18:38 fdobridge_: <Sid> if I have tomorrow I'll try an older version of mesa and bisect it
18:38 fdobridge_: <Sid> time tomorrow*
18:38 fdobridge_: <Sid> how far back should I check from though
20:56 fdobridge_: <redsheep> I wonder if were looking at multiple different bugs, I still put a ton of time into zink+NVK before the patch set and only crashed after applying them, but what you're facing sounds more like device lost was just being prevented by something else breaking first