01:00 awilcox: using https://foxkit.us/linux/endian-gfx-v1.patch I was able to get as far as https://bpa.st/6EDCI, which makes me think that it's possible the patching system may have some issues. but what I really think would be cleaner would be to make the nvfw_* struct cast functions somehow return structs that are byte-swapped instead of doing the byte-swapping at every access (it would also
01:00 awilcox: be far less maintenance burden)
06:05 awilcox: I've just had a different idea that will require even less maintenance burden - zero
06:38 awilcox: okay, with https://bpa.st/WYUMM applied to nouveau.ko, and running https://bpa.st/TDVMY on all the firmware binaries except *_inst.bin (which have odd sizes and so I'm not sure the format) and putting the results in /lib/firwmare/nvidia_be, I ended up with https://bpa.st/KVLUU
06:45 awilcox: is there any documentation on the format of the fecs/gpccs firmware binaries? I'm assuming that since the card is in BE MMU mode, it's doing unconditional swapping, so it probably needs to be swapped as well (which is why it is timing out), but the files are both of odd sizes so I don't know the stride to use to swap them.
06:48 awilcox: hmm. maybe if I change the MMU to LE when it is uploading the _inst fw and change it right back..
06:53 awilcox: that doesn't seem immediately possible without a lot of bad special-casing in nvkm_acr_lsfw_load_bl_sig_net so what about another idea - this has padding up to 256 bytes, so I could just add 00s to the end of the files that aren't mod4 and word-swap them like the others
07:14 airlied: awilcox: not sure the card has a BE MMU mode
07:15 awilcox: it does, or at least it responds like it does; PMC_BOOT_1 is set to 0x01000001, and the card info is read correctly. in contrast, an RTX 3060 does not allow PMC_BOOT_1 to be set, and ignoring that gives very corrupted card info
07:16 awilcox: this is a GT 1030 "Pascal" GP108/NV138
09:06 airlied[d]: Maybe they kept the swappers in until volta then, no idea what they apply to
09:28 awilcox: I'm quite motivated to find out, and hope to be able to upstream any work that I can polish up and make actually work in a way that is easy to maintain and does not impact usage or performance on LE systems - but any pointers offered would be deeply appreciated, as right now I'm just shooting in the dark based on my past experiences with other hardware (non-GPU)
09:46 airlied[d]: So the fw is loaded via pio ops no idea if it needs to be swapped on write there
09:46 airlied[d]: Don't think Pascal does DMA fw loads
11:48 fabcal: may I ask within this very channel for some technical support in relation to the Xwayland/nouveau driver?
11:49 DodoGTA: fabcal: I think so
11:52 fabcal: DodoGTA: I am running Debian 12 (bookworm) being the VirtualBox-Host; While running my configured VM, each VM crashes everytime for the following reason: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[5313]: Xwayland: ../nouveau/pushbuf.c:730: nouveau_pushbuf_data: Assertion `kref' failed.
11:52 fabcal: what am I doing wrong?
11:53 DodoGTA: fabcal: That assertion means you should check dmesg
11:54 fabcal: DodoGTA: dmesg reports: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[5313]: (EE) 0: /usr/bin/Xwayland (0x55b3a64e1000+0x178fe4) [0x55b3a6659fe4]
11:54 fabcal: I did get some of those error messages
11:57 fabcal: if I start my XWindow session using Gnome Xorg (X11) [instead of Xwayland) the VirtualBox's VM runs completely fine: no crashes at all
11:57 DodoGTA: fabcal: I mean you should run `sudo dmesg`
11:59 fabcal: DodoGTA: I did indeed: the error messages above relate to the command "sudo dmesg"
11:59 DodoGTA: fabcal: I only see one message though
12:00 fabcal: here you go:
12:00 fabcal: Oct 29 17:29:11 lenovo-p3-debian firefox-esr[5636]: gdk_wayland_window_set_dbus_properties_libgtk_only: assertion 'GDK_IS_WAYLAND_WINDOW (window)' failed
12:00 fabcal: Oct 29 17:29:39 lenovo-p3-debian firefox-esr.desktop[5636]: [Parent 5636, Main Thread] WARNING: gdk_wayland_window_set_dbus_properties_libgtk_only: assertion 'GDK_IS_WAYLAND_WINDOW (window)' failed: 'glib warning', file ./toolkit/xre/nsSigHandlers.cpp:187
12:00 fabcal: Oct 29 17:29:39 lenovo-p3-debian firefox-esr[5636]: gdk_wayland_window_set_dbus_properties_libgtk_only: assertion 'GDK_IS_WAYLAND_WINDOW (window)' failed
12:00 fabcal: Oct 29 17:33:29 lenovo-p3-debian kernel: nouveau 0000:01:00.0: gr: TRAP ch 2 [01ff8f4000 Xwayland[5313]]
12:00 fabcal: Oct 29 17:33:29 lenovo-p3-debian kernel: nouveau 0000:01:00.0: fifo: fault 00 [VIRT_READ] at 0000000000561000 engine 40 [gr] client 02 [GPC1/T1_2] reason 02 [PTE] on channel 2 [01ff8f4000 Xwayland[5313]]
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian kernel: nouveau 0000:01:00.0: Xwayland[5313]: channel 2 killed!
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[5313]: Xwayland: ../nouveau/pushbuf.c:730: nouveau_pushbuf_data: Assertion `kref' failed.
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[5313]: (EE) 0: /usr/bin/Xwayland (0x55b3a64e1000+0x178fe4) [0x55b3a6659fe4]
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[5313]: (EE) 1: /usr/bin/Xwayland (0x55b3a64e1000+0x17c9d9) [0x55b3a665d9d9]
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[5313]: (EE) 18: /usr/bin/Xwayland (0x55b3a64e1000+0x5ce02) [0x55b3a653de02]
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[5313]: (EE) 19: /usr/bin/Xwayland (0x55b3a64e1000+0x5d05b) [0x55b3a653e05b]
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[5313]: (EE) 20: /usr/bin/Xwayland (0x55b3a64e1000+0x4f6ff) [0x55b3a65306ff]
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[5313]: (EE) 21: /usr/bin/Xwayland (0x55b3a64e1000+0xfbc6d) [0x55b3a65dcc6d]
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[5313]: (EE) 22: /usr/bin/Xwayland (0x55b3a64e1000+0x115773) [0x55b3a65f6773]
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[5313]: (EE) 23: /usr/bin/Xwayland (0x55b3a64e1000+0xa9af4) [0x55b3a658aaf4]
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[5313]: (EE) 24: /usr/bin/Xwayland (0x55b3a64e1000+0xada8c) [0x55b3a658ea8c]
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[5313]: (EE) 27: /usr/bin/Xwayland (0x55b3a64e1000+0x33fe1) [0x55b3a6514fe1]
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[2903]: Connection to xwayland lost
12:01 fabcal: Oct 29 17:33:29 lenovo-p3-debian gnome-shell[2903]: X Wayland crashed; attempting to recover
12:01 fabcal: Oct 29 17:39:20 lenovo-p3-debian firefox-esr.desktop[5636]: [Parent 5636, Main Thread] WARNING: gdk_wayland_window_set_dbus_properties_libgtk_only: assertion 'GDK_IS_WAYLAND_WINDOW (window)' failed: 'glib warning', file ./toolkit/xre/nsSigHandlers.cpp:187
12:01 fabcal: Oct 29 17:39:20 lenovo-p3-debian firefox-esr[5636]: gdk_wayland_window_set_dbus_properties_libgtk_only: assertion 'GDK_IS_WAYLAND_WINDOW (window)' failed
12:01 fabcal: Oct 29 17:46:22 lenovo-p3-debian firefox-esr.desktop[5636]: [Parent 5636, Main Thread] WARNING: gdk_wayland_window_set_dbus_properties_libgtk_only: assertion 'GDK_IS_WAYLAND_WINDOW (window)' failed: 'glib warning', file ./toolkit/xre/nsSigHandlers.cpp:187
12:05 fabcal: Core was generated by `/usr/bin/Xwayland :0 -rootless -noreset -accessx -core -auth /run/user/1000/.mu'.
12:05 fabcal: Program terminated with signal SIGABRT, Aborted.
12:05 fabcal: #0 0x00007fac2f4a9e3c in ?? ()
12:05 fabcal: [Current thread is 1 (LWP 5313)]
12:05 fabcal: (gdb) backtrace
12:05 fabcal: #0 0x00007fac2f4a9e3c in ?? ()
12:05 fabcal: #1 0x00007ffd2a8781f0 in ?? ()
12:05 fabcal: #2 0x58812c3bba82d900 in ?? ()
12:05 fabcal: #3 0x0000000000000006 in ?? ()
12:05 fabcal: #4 0x00007fac2ee6b8c0 in ?? ()
12:05 fabcal: #5 0x0000000000000006 in ?? ()
12:05 fabcal: #6 0x00000000000002da in ?? ()
12:05 fabcal: #7 0x00007fac2ec2a0cc in ?? ()
12:05 fabcal: #8 0x00007fac2f45afb2 in ?? ()
12:05 fabcal: #9 0x00007fac2f5f2e70 in ?? ()
12:05 fabcal: #10 0x00007fac2f445472 in ?? ()
12:05 fabcal: #11 0x0000000000000020 in ?? ()
12:05 fabcal: #12 0x0000000000000000 in ?? ()
12:05 fabcal: (gdb)
12:09 DodoGTA: fabcal: I finally see the actual nouveau fault (maybe Karol knows more about this?)
12:09 fabcal: DodoGTA: what am I doing wrong? Any idea?
12:59 dwlsalmeida[d]: dwlsalmeida[d]: gfxstrand[d] sorry for pinging, but would you have any idea what is going on here? I've compared the pushbufs between us and the blob and the issue is not there, so maybe something in NIL ?
13:15 gfxstrand[d]: dwlsalmeida[d]: That's scrambled enough that it might be a tiled vs. linear difference. The other thing that springs to mind is that maybe chroma is one of the interleaved UV formats and we're screwing that up.
13:16 dwlsalmeida[d]: gfxstrand[d]: But the luma plane is perfect
13:16 dwlsalmeida[d]: This is NV12 btw, so chroma is interleaved
13:17 gfxstrand[d]: Good. Luma being perfect it good.
13:17 gfxstrand[d]: That doesn't mean the chroma is right, though.
13:19 gfxstrand[d]: We're probably screwing something up with interleaved formats. I think the sampler code for them works. We pass all the YCbCr tests for them.
13:20 gfxstrand[d]: But there is likely weirdness somewhere. And, annoyingly, image descriptors don't nicely show up in the command stream.
13:24 dwlsalmeida[d]: I can try to debug if you give me pointers on what and where
13:24 gfxstrand[d]: Like, I wouldn't be surprised if you have to multiply the width by 2 somewhere.
13:24 dwlsalmeida[d]: FYI: if we compare the chroma plane as 16x16 blocks, half of them match
13:26 gfxstrand[d]: That smells like tiling
13:26 gfxstrand[d]: Or maybe a stride gone wrong.
13:26 gfxstrand[d]: What size is the image?
13:29 dwlsalmeida[d]: 64x64
13:29 gfxstrand[d]: Ugh... That's pretty hard to screw up
13:30 gfxstrand[d]: Still, I wouldn't be surprised if the tiling factors are off somehow
13:37 gfxstrand[d]: Are they two separate planes? What are the formats and dimensions of the two planes?
13:40 dwlsalmeida[d]: I've been "debugging" this for two days
13:40 dwlsalmeida[d]: two
13:40 dwlsalmeida[d]: days
13:40 dwlsalmeida[d]: Nicolas just reminded me I had forgotten to click the button in YUView that changes from I420 to NV12
13:41 dwlsalmeida[d]: I've been "debugging" a perfectly OK result
13:41 dwlsalmeida[d]: 🤦‍♂️
13:42 gfxstrand[d]: 😭
13:42 dwlsalmeida[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1301904252675035217/Screenshot_2024-11-01_at_10.42.14.png?ex=67262c52&is=6724dad2&hm=ed243ddc91fb21fd1df3704040f9bdad952f0f8ee395de6f89e6908f3c64868c&
13:42 dwlsalmeida[d]: airlied[d]: finally, a 100% valid result using NVK
13:43 gfxstrand[d]: That happens. It sucks but I've totally thrown away a day or two debugging that wrong thing before.
13:47 dwlsalmeida[d]: not only have I been "debugging" this with the wrong YUView settings, but I have been comparing this to what ffmpeg decodes to
13:47 dwlsalmeida[d]: except...
13:47 dwlsalmeida[d]: I also manage to fuck this up 😄
13:47 dwlsalmeida[d]: and I instructed ffmpeg to generate a I420 file
13:47 dwlsalmeida[d]: so **of course** only the luma plane matches
13:47 dwlsalmeida[d]: 😭
13:51 gfxstrand[d]: Live and learn. You got a correct frame out. 💜 That's the real take-away.
14:05 dwlsalmeida[d]: gfxstrand[d]: I've been confused with this GOB thing btw:
14:05 dwlsalmeida[d]: NvU32 tileFormat : 2 ; // 0: TBL; 1: KBL;
14:05 dwlsalmeida[d]: NvU32 gob_height : 3 ; // Set GOB height, 0: GOB_2, 1: GOB_4, 2: GOB_8, 3: GOB_16, 4: GOB_32 (NVDEC3 onwards)
14:06 dwlsalmeida[d]: what is working is tileFormat==0 (i.e. TBL, whatever that is), and gob_height==0, i.e.: GOB_2
14:06 dwlsalmeida[d]: I noticed that in NIL, we have gob height == 8
14:07 dwlsalmeida[d]: changing this parameter doesn't seem to change anything though, at least not for this particular input
14:19 gfxstrand[d]: The comments in the NV headers here are confusing. It's not GOB height (8 pixels) but tile height in GOBs. That's given to you by `1 << tiling.y_log2`>
14:20 gfxstrand[d]: And apparently 1 GOB isn't an option for the video hardware.
14:20 gfxstrand[d]: So we'll need an assert
14:27 gfxstrand[d]: And probably assert that `z_log2 and `x_log2` are both zero. We can probably work around `x_log2` in software but we'll get to that when we get to that. For now, just assert.
14:30 asdqueerfromeu[d]: dwlsalmeida[d]: Now it's as good as Proton without shader caches 📼
14:35 avhe[d]: dwlsalmeida[d]: fwiw tileFormat is unconditionally set to TBL on tegra for all codecs, but i've only seen KBL on discrete cards (though i reversed whether there is a conditional on it)
14:36 avhe[d]: which leads me to believe TBL means tegra block linear (no idea what the K in KBL stands for though)
14:37 dwlsalmeida[d]: gfxstrand[d]: where can I read more about GOBs? There’s some scant documentation in drm_fourcc.h
14:37 dwlsalmeida[d]: And they seem to build this 3d thing which made me stop understanding it completely
14:38 dwlsalmeida[d]: It says they can be “stacked vertically” uhm, what?
14:38 gfxstrand[d]: avhe[d]: Kepler? Fermi was kinda switchable but Kepler hard codes 8-high GOBs
14:39 dwlsalmeida[d]: Also in NIL, what is this array_len parameter? Width height and depth are pretty obvious
14:39 avhe[d]: gfxstrand[d]: that would make sense. do later dgpu gens use the same tiling scheme as kepler?
14:40 gfxstrand[d]: No. The internal GOB layout changed at Turing. But the height is still 8.
14:41 avhe[d]: uhm, not sure whether my TBL hypothesis is correct then
14:42 asdqueerfromeu[d]: asdqueerfromeu[d]: (This is a reference to Proton's default video placeholder when a video can't be decoded or can't be found in Valve's servers)
14:54 gfxstrand[d]: avhe[d]: I'm not sure. That Tegra docs and modifiers make it look like there may also be a Tegra vs. Discrete difference as well. That makes sense, tough, since Tegra uses DDR rather than GDDR and that has a very different performance profile.
15:05 avhe[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1301925178615267338/mpv-shot0052.jpg?ex=67263fcf&is=6724ee4f&hm=29ab0db62c1a7d1f6a09a6288ce58f32a10d31f580416dd011bbd3b0f5c56d8f&
15:05 avhe[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1301925178854608976/mpv-shot0053.jpg?ex=67263fcf&is=6724ee4f&hm=82e09e006d696e20a8bb572bc719f1620214453ae18c2b5615d110719a9d4ccb&
15:05 avhe[d]: here's what using KBL for daniel's sample on my nintendo switch looks like (correct output on the right for reference)
15:07 avhe[d]: but it looks like jpg screenshots weren't the best idea 😅
17:31 mohamexiety[d]: dwlsalmeida[d]: `nil/copy.rs` has some stuff in a big comment but not sure it'd be useful for video stuff
18:37 airlied[d]: dwlsalmeida[d]: nice! glad you figured it out, I think I've only ever used nv12 things, if the buffer sizings work, CTS might be interesting