00:02 unlord: ahh ok
00:04 unlord: maybe I need some kernel config
00:05 RSpliet: well, other than not load any kms driver (nouveau, nvidia_drm...)
00:06 RSpliet: and X.org needs access to the device. No idea how nv does that... whether it means running X.org as root or having a /dev thing to open
00:06 RSpliet: imirkin: you know this old stuff a lot better than I do :-P
00:07 unlord: https://paste.debian.net/hidden/013dfafa/
00:07 unlord: it looks like is trying to load NOUVEAU twice
00:08 RSpliet: and modesetting. Got an xorg.conf?
00:08 unlord: I don't
00:09 RSpliet: You probably need one to use nv
00:10 unlord: let me see if I can find the minimal settings
00:11 RSpliet: Sorry, I personally would have gone down the "get a nouveau kernel module" route rather than the "use nv" route myself. I'm not sure I'm super helpful on the matter... it's been 15 years since I played with nv
00:12 RSpliet: And back in those days, X.org always ran as root and had access to /dev/mem to do stuff. KMS took security forward a step, I have no idea how xf86-driver-nv (a "UMS" - user-space mode setting - driver) was adapted to the new reality
00:13 unlord: I think I figured it out
00:13 unlord: https://paste.debian.net/1188184/
00:13 unlord: I can also rebuild my kernel I guess
00:14 imirkin: unlord: yep. either way works.
00:14 imirkin: one gets you modesetting done by the kernel
00:14 imirkin: one gets you modesetting by poking /dev/mem
00:14 imirkin: RSpliet: it wasn't - does the same thing it always has
00:15 imirkin: mwk: well, it'd be a super-manual operation, i figure, in a hwtest-like framework
00:16 imirkin: unlord: the xf86-video-nouveau driver doesn't offer a ton of accel on nv4/nv5 either. it's just not a great GPU :)
00:16 unlord: imirkin: I loved my TNT2 back in 1999
00:16 imirkin: i get it
00:16 imirkin: i have on plugged in.
00:16 imirkin: but
00:16 imirkin: it's still not that great, esp for even slightly modern things
00:16 unlord: but yeah, this is just so I can test AV1 decoding on this 20 year old computer
00:17 RSpliet: and measure it's performance in SPF? :-D
00:17 unlord: SPF?
00:17 unlord: it should be fine
00:17 RSpliet: Seconds per frame. It's a silly joke ;-)
00:17 unlord: RSpliet: I work on encoders, I know this
00:19 unlord: I hope nv has accelerated blitting at least
00:19 unlord: I don't know how well mpv will work with dri
00:20 imirkin: unlord: hopefully it uses Xv
00:21 imirkin: which i think might actually use the overlay functionality
00:21 imirkin: i.e. YUV without conversion
00:21 unlord: imirkin: hardware YUV conversion?
00:22 imirkin: unlord: well, scanout of YUV directly
00:22 imirkin: you could call it conversion, sure
00:22 RSpliet: I recall my XP 2800+ (which is only 18yo) barely keeping up with 720p youtube videos, and giving up on 1080p. I forgot that 20 years old is still this millenium
00:22 unlord: even better
00:22 imirkin: note that you won't get that with nouveau, only with nv
00:22 imirkin: (with nouveau those overlay planes are supported, but not hooke dup)
00:22 unlord: imirkin: nice, I am using nv :)
00:23 imirkin: but i dunno if mpv dumped the "old" output things
00:23 imirkin: if it did, you're sunk
00:23 unlord: I know some people, I'll ask
00:23 imirkin: probably prints it
00:23 imirkin: and/or there's a -vo xv
00:23 imirkin: or not. if not, then fail
00:23 imirkin: definitely was a thing in mplayer
00:24 unlord: RSpliet: there is no SIMD acceleration for SSE in dav1d, only SSE2
00:38 unlord: there is a USE=xv on mpv
00:57 unlord: huh, my windows have drop shadows
01:06 unlord: and I got a weak crossfade changing the background
01:06 unlord: so I need to disable this compositor
01:21 RSpliet: unlord: ah, this is you https://github.com/videolan/dav1d/commit/bfbee8607b0fbc57ce8d52a0b7e14f253f3f8df4
01:21 RSpliet: That explains the use-case. Good luck!
01:22 unlord: ya found me
01:22 unlord: RSpliet: https://people.videolan.org/~unlord/parvus.png
01:22 imirkin: but what about 3dnow!?!
01:22 unlord: just waiting for mpv to cfinish compiling
01:23 unlord: imirkin: I have some MMX code for dav1d
01:23 imirkin: hehe
01:23 unlord: not sure I'll write it all, but I wrote one function :)
01:23 imirkin: fprintf(stderr, "get a better cpu\n"); exit(1);
01:23 imirkin: =]
01:24 unlord: imirkin: this is one of the fastest "retro" machines I own
01:24 imirkin: p3 was pretty awesome stuff.
01:24 unlord: My favorite right now is this AMD 386DX 40MHz
01:24 imirkin: ooh, DX! double the rate!
01:24 unlord: I have a VLB motherboard for it
01:25 imirkin: and a nice cirrus logic video card?
01:25 imirkin: or, even better, trident
01:26 unlord: I do own one of those but the fastest video card I have is the W32i ET4000 from Tseng Labs
01:28 unlord: I think I might try nouveau on this setup, the nv driver seems really slow
01:29 imirkin: i see more recompiling in your future...
01:29 unlord: imirkin: I already built the driver
01:30 RSpliet: Hopefully you won't have to do that on the actual Pentium 3?
01:30 unlord: this install is all self hosted
01:31 unlord: if you want to cringe
01:31 unlord: Sat Mar 6 18:43:46 2021 >>> sys-devel/clang-11.0.0
01:31 unlord: merge time: 2 days, 5 hours, 45 minutes and 29 seconds.
01:32 imirkin: that's long it takes me to build chromium
01:32 imirkin: (ok, not quite, but it does take like 24h)
01:33 unlord: Mon Jul 20 14:00:09 2020 >>> www-client/chromium-84.0.4147.89
01:33 unlord: merge time: 1 hour, 24 minutes and 32 seconds.
01:33 unlord: different computer, but still
01:33 imirkin: one that's 24x faster than mine, apparently :)
01:33 imirkin: (i7-920 here)
01:33 imirkin: there's massive swapping involved too
01:35 unlord: imirkin: https://people.videolan.org/~unlord/vulpes.png
01:35 imirkin: makes sense. i have 5 cores. and 6GB of ram.
01:35 imirkin: 4 cores*
01:35 unlord: I recently doubled the memory on this desktop
01:36 imirkin: (and each core is quite a bit slower too, i suspect)
01:36 imirkin: normally i update every 10y or so ... this comp is from 2010, so i'm definitely due
01:36 imirkin: but it also works totally fine, so...
01:36 unlord: video coding is a real bear
04:14 unlord: https://people.videolan.org/~unlord/parvus-mpv.png
04:32 imirkin: unlord: is this with nouveau?
04:33 imirkin: pastebin xorg log
04:33 imirkin: unlord: https://cgit.freedesktop.org/nouveau/xf86-video-nouveau/tree/src/nv04_xv_blit.c
04:34 unlord: imirkin: https://paste.debian.net/hidden/57293d52/
04:34 unlord: this is nv
04:38 unlord: any thoughts?
04:39 damo22: unlord: did you compile the graphics stack with debugging symbols?
04:39 unlord: damo22: I have splitdebug on gentoo
04:39 damo22: you could run it through gdb
04:39 damo22: and find a backtrace where it segfaults
04:40 unlord: sure, I can do that
04:40 unlord: what about the VAAPI null error
04:45 damo22: DRI2 failed to authenticate, sounds like a permissions error
04:45 unlord: https://people.videolan.org/~unlord/parvus-mpv-gdb.png
04:46 damo22: you probably need to type "bt full" into gdb and pastebin the output because it will likely not fit on a screen
04:47 unlord: eys
04:49 damo22: looks like it has MMX/SSE instructions baked in, but your processor does not support that
04:49 unlord: my processor does support MMX and SSE
04:50 unlord: it does not support SSE2
04:50 damo22: hmm ok
04:50 damo22: u sure PIII has SSE?
04:50 HdkR: movq is SSE2
04:51 unlord: $ cpuid2cpuflags
04:51 unlord: CPU_FLAGS_X86: mmx mmxext sse
04:51 HdkR: That's an SSE2 instruction and your CPU doesn't support it :)
04:51 damo22: ok so i wasnt too far off
04:52 unlord: yeah, I have a patch I need to upstream
04:52 unlord: this is using platform libdav1d
04:53 damo22: why did gentoo generate SSE2 instructions if your machine does not support it?
04:53 unlord: because upstream is busted
04:54 unlord: ttps://code.videolan.org/videolan/dav1d/-/blob/master/meson.build#L255
04:55 damo22: yup
04:55 damo22: oh youre on x86, you must have sse2
04:56 damo22: :P
04:56 HdkR: It's not a bad minspec for new software
04:57 damo22: the optimal flags should be detected based on machine i guess
04:57 damo22: or gentoo has some flag magic
04:57 imirkin: unlord: weird, looks like the xv ext is there
04:58 imirkin: unlord: try running 'xvinfo'?
04:58 imirkin: to see what adapters are present
04:58 imirkin: and make sure that mpv isn't using opengl
04:58 imirkin: since that won't work
04:59 imirkin: don't worry about the DRI2 stuff - it's just plain not supported in your setup.
05:00 damo22: the other question is why you are running a computer that could belong in a museum
05:05 damo22: im all for saving old pcs, but don't you reach a point where the heat it produces is not worth the computation it does?
05:08 imirkin: he said he has a AMD-clone 386DX, so ... the p3 is not the one you should be worried about :)
05:08 damo22: i think Hurd will run on that one day
05:08 imirkin: linux dropped 386
05:12 damo22: hurd might be >= 486
05:16 imirkin: pmoreau: i split up that "store" commit in the nv50 branch. would be nice to know which whether the "separate store" one is actually required, or if it was just the predication one
05:38 unlord: damo22: see how I found a bug in dav1d, that is actually something I will fix
05:38 unlord: when we developed opus, we actually got it working in 16-bit DOS, this found a bunch of bugs
05:56 unlord: imirkin: here is xvinfo -> https://paste.debian.net/hidden/c80846d6/
05:58 imirkin: unlord: that's unfortunate
05:58 imirkin: maybe -nv doesn't support xv afterall?
05:58 imirkin: -nouveau definitely should though
05:58 unlord: well, I can upgrade to that
06:01 imirkin: unlord: hm, very odd. reading the -nv code, looks like you should *at least* have had the blit video adapter
06:01 imirkin: looks like it didn't support the pre-nv10 overlay
06:01 unlord: am I enabling nv incorrectly?
06:01 imirkin: it seemed fine, and your xorg log seemed fine
06:01 imirkin: which is why i'm surprised
06:02 imirkin: oh nevermind
06:02 imirkin: hmmmm
06:03 imirkin: hard to tell. would have to double-check what bitsPerPixel is
06:04 unlord: can I do that from the command line?
06:05 imirkin: i meant in the driver code
06:05 imirkin: i just looked it up though - i really think you should see a xv adapter
06:05 imirkin: but xvinfo says no
06:05 imirkin: which is weird.
06:05 unlord: well, I can see how xf86-video-nv is built
06:05 imirkin: it's not about that
06:05 imirkin: [3562182.526] (==) NV(0): Depth 24, (--) framebuffer bpp 32
06:05 imirkin: \
06:05 imirkin: so bpp is not 8
06:06 unlord: nope
06:06 unlord: it looks just like that screenshot
06:06 imirkin: so it's fine. based on the code, you should have a xv adapter
06:07 imirkin: unlord: https://cgit.freedesktop.org/xorg/driver/xf86-video-nv/tree/src/nv_video.c#n286
06:07 imirkin: the first condition is false (it's NV_ARCH_04)
06:08 imirkin: oooh, i wonder if NoAccel is true for you
06:08 imirkin: hmmm
06:08 unlord: I don't know
06:08 imirkin: will rtfs some more
06:09 imirkin: unlord: aha
06:09 imirkin: [3562182.671] (II) LoadModule: "xaa"
06:09 imirkin: [3562182.671] (WW) Warning, couldn't open module xaa
06:09 imirkin: [3562182.671] (EE) NV: Failed to load module "xaa" (module does not exist, 0)
06:09 imirkin: [3562182.671] (II) NV(0): Falling back to shadwwfb
06:09 imirkin: this in turn causes it to disable accel
06:09 imirkin: and thus no xv adapter
06:09 unlord: well, how do we fix that
06:09 unlord: I want some accel :)
06:10 imirkin: try nouveau :p
06:10 imirkin: you're not going to get the overlay adapter anyways with -nv
06:10 imirkin: (i forgot it only did it for nv10+)
06:10 unlord: OK
06:14 unlord: I have xf86-video-nouveau installed
06:14 unlord: do I need to modify the kernel?
06:14 imirkin: you need the nouveau kernel module yea
06:17 unlord: do you recommend building it into the kernel, or as a module?
06:17 imirkin: more common to do module
06:17 imirkin: no problem with it being built-in though
06:20 unlord: I usually build everything in, if I do a module do I still need tgo reboot?
06:20 imirkin: no
06:20 imirkin: (assuming the new module is compatible with running kernel, of course)
06:20 unlord: yep, should be
06:20 imirkin: although dunno
06:20 imirkin: running -nv could put the board in a weird state
06:20 imirkin: so it might not init properly with nouveau
06:21 imirkin: but it might
06:21 imirkin: anyways, i'm off - good luck!
06:21 unlord: I'll cleanly exit X
06:21 unlord: thanks
07:38 unlord: so rebuilding dav1d without SSE2 fixed playback via mpv
07:45 HdkR: Makes sense
07:46 unlord: I'm on hour 2 of building nouveau module
07:47 ccr: you need more steam for your steam engine
07:55 damo22: pedal harder
07:56 damo22: unlord: why dont you emulate older cpus
07:56 damo22: then you can build it as fast as your emulator / hots
07:56 damo22: host*
07:58 unlord: I think there is a way to distcc a kernel
08:07 unlord: http://paste.debian.net/hidden/57e8af2e/
08:07 unlord: this is with nouveau loaded
08:08 damo22: $ qemu-system-i386 --cpu help
08:08 unlord: this is on real hardware
08:08 damo22: i know but imagine what you can do on vm
08:08 damo22: pass through the pci gpu to the vm and subset the cpu flags
08:09 unlord: http://paste.debian.net/hidden/d95ba5f9/
08:09 unlord: damo22: right, so the point of testing a software decoder on actual hardware is that you can say something about the performance of that software decoder
08:09 unlord: I'm aware that computers got faster since 2001
08:10 unlord: I'm getting a stack trace when loading nouveau
08:10 damo22: ok
08:12 damo22: unlord: lspci -nn | grep VGA
08:12 unlord: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation NV5 [Riva TNT2 / TNT2 Pro] [10de:0028] (rev 15)
08:14 damo22: -19 is ENODEV ?
08:14 unlord: modprobe -r and reloading makes this repeatable at least
08:15 unlord: this computer has onboard video as well
08:16 damo22: [3587702.982109] [TTM] Zone kernel: Available graphics memory: 186384 KiB how much ram does the machine have?
08:16 unlord: MemTotal: 372768 kB
08:20 damo22: -12 ENOMEM
08:20 damo22: it failed to probe because it was out of memory?
08:22 unlord: hmm
08:22 unlord: there is plenty of memory on this computer
08:23 unlord: MiB Mem : 364.0 total, 92.8 free, 116.5 used, 154.7 buff/cache
08:23 unlord: MiB Swap: 1312.0 total, 1291.7 free, 20.3 used. 217.0 avail Mem
08:24 damo22: [3587702.993750] modprobe: page allocation failure: order:7, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=/,mems_allowed=0
08:24 unlord: yeah, I saw that
08:25 damo22: im not sure what the mems_allowed is for
08:26 unlord: I'm not sure where that is coming from
08:28 damo22: cat /proc/filesystems | grep cpuset
08:28 damo22: nodev cpuset ?
08:28 unlord: nodev cpuset
08:29 damo22: cpuset.mems List of memory nodes on which processes in this cpuset are allowed to allocate memory
08:30 unlord: I have nodev cpuset on my other computers
08:30 unlord: I guess they don't use nouveau
08:32 damo22: you dont have enough memory in the kernel
08:32 unlord: how is that possible>?
08:33 damo22: i believe the kernel has its own split of memory
08:35 unlord: is that something that can be configured?
08:36 damo22: not sure, im looking
08:40 damo22: https://tldp.org/HOWTO/KernelAnalysis-HOWTO-7.html
08:47 unlord: not sure what that means
13:54 unlord: I wonder if this is an issue with newer kernels: https://forums.developer.nvidia.com/t/455-23-04-page-allocation-failure-in-kernel-module-at-random-points/155250
15:07 unlord: Should I try the nvidia-drivers ?
15:07 unlord: 71.86.15 apparently has support for TNT2
15:09 ccr: you'll need a really ancient kernel for that probably
15:10 unlord: ccr: yes
15:10 unlord: I think I'll debug nouveau more
15:10 unlord: > Note that Gentoo does not provide the 71.86.xx versions. If the system has a card that needs these drivers then it is recommended to use the nouveau driver.
16:07 unlord: will nouveau let me play 3D accelerated games on the TNT2?
16:07 unlord: like Quake 3
16:27 RSpliet: in theory... but I think imirkin mentioned that the amount of GL implemented in nouveau for the TNT2 is pitiful
16:28 RSpliet: I don't think it has the capability of supporting anything beyond GL1.2, unsure how much of GL1.2 was "emulated" in the driver back then.
16:29 unlord: well, for video decoding all I really need is to avoid the software YUV conversion
16:57 imirkin: urgh
16:57 imirkin: something needs to be a vmalloc
16:57 imirkin: or kvmalloc or something
16:57 imirkin: let me check what's going on
16:58 imirkin: order 7 is pretty high, so not unexpected that loading would fail on a long-running system
16:58 imirkin: i guess on average this stuff loads at boot when memory is less fragmented
16:59 imirkin: lol
16:59 imirkin: struct nv04_display {
16:59 imirkin: struct nv04_mode_state mode_reg;
16:59 imirkin: struct nv04_mode_state saved_reg;
16:59 imirkin: uint32_t saved_vga_font[4][16384];
16:59 imirkin: yea
16:59 imirkin: probably don't want that kmalloc'd
17:00 imirkin: the nv04_mode_state is also pretty giant
17:00 unlord: this is NV05
17:00 imirkin: all pre-nv50 uses nv04_display struct
17:01 imirkin: unlord: can you apply a small patch?
17:01 unlord: of course
17:01 imirkin: ok, gimme a few
17:04 imirkin: unlord: https://termbin.com/w65j -- apply this in the drivers/gpu dir
17:04 imirkin: er wait
17:04 imirkin: crap
17:04 imirkin: try this one: https://termbin.com/826n
17:05 unlord: OK
17:05 imirkin: not compile-tested, but pretty obvious, i hope
17:06 unlord: do a lot of people use nouveau?
17:06 imirkin: no clue
17:06 imirkin: at least 2
17:06 imirkin: but it's definitely rare to load nouveau not-at-boot
17:06 imirkin: which is why you're hitting this problem
17:06 unlord: isn't the point of a non-compiled kernel module that you can reload it?
17:07 karolherbst: nope
17:07 karolherbst: thta's a benefit, but not the main point
17:08 karolherbst: main point is more like you don't need to load code for 50k devices you don't have plugged in, so your system can boot faster and the kernel consumes less memory
17:10 karolherbst: and even though you can unload modules, sometimes userspace will prevent you from doing so by having fds open and stuff
17:10 unlord: karolherbst: I regularly have systems online for over 500 days
17:10 unlord: so I'm familiar with tracking these things down
17:10 karolherbst: sounds like a security nightmare, how do you deal with kernel updates?
17:11 unlord: I run extremely narrow APIs
17:11 karolherbst: doesn't protect you from bugs in the TCP stack
17:11 unlord: I don't lose a lot of sleep over those
17:12 karolherbst: yeah well... any of them can own your system, so don't see why those should be unimportant ;)
17:12 unlord: imirkin: this file also exists: ./drivers/gpu/drm/nouveau/dispnv50/disp.c
17:12 karolherbst: anyway, my point was saying that you have your system onlines for 500 days sounds more like a bad thing than anything you should be proud of
17:12 karolherbst: unless you livepatch
17:13 karolherbst: then it would be impressive
17:13 unlord: oh, that is 50 not 05
17:13 karolherbst: ahh.. yeah, 50 sounds more sane than 500 :)
17:14 karolherbst: but I was thinking about doing kernel live patching, but that's super annoying to actually get started and everything :/
17:15 bencoh: I wonder if livepatch techs are mature and practical enough to allow you to reach 500 days of uptime with a fully up-to-date stable system
17:16 karolherbst: bencoh: well.. always depends on the patch I think? I am not sure how good livepatching deals with changing structs etc...
17:16 unlord: bencoh: I assume that at 500 days most high availability places are thinking of replacing something anyway
17:16 bencoh: I feel like some of those kernel bugs might just be impossible to live patch
17:16 bencoh: karolherbst: exactly
17:16 karolherbst: ohh, sure
17:16 karolherbst: but for me it's like being too lazy to reboot every week
17:16 karolherbst: so if I can stretch that a bit that would be good already
17:16 bencoh: I mean, right, fixing this single tcp hash function missing some bracket between two ifs might be easy
17:16 bencoh: but ...
17:17 karolherbst: would be cool if distributions could integrate it in a way where the package manager can say: this kernel update won't require a reboot
17:17 karolherbst: and then it's just best effort
17:18 karolherbst: I see people being too lazy to reboot and that it might make peoples computer more safe in avarage
17:18 imirkin: unlord: i also have high-uptime systems. i'm not suggesting this isn't a thing we should fix. i'm explaining why it wasn't caught.
17:18 unlord: karolherbst: when heartbleed came out, my openssl was too old to be vulnerable
17:18 imirkin: unlord: i'll check nv50_display, but i think that's a much more modest struct
17:18 unlord: imirkin: cool, I am glad I helped find a bug (if this fixes it)
17:18 karolherbst: yeah, but heartbleed was also a stupid bug :/
17:19 imirkin: the nv04 one appears to have the universe in it
17:19 imirkin: in duplicate no less :)
17:23 imirkin: unlord: i just glanced at the nv50 stuff, and it seems perfectly reasonable in size. if you do run into such issues, please let me know
17:23 imirkin: the nv04 struct is giant, which is not reasonable to feed to kmalloc
17:24 imirkin: kmalloc is primarily designed for sub-page allocation packing
17:24 imirkin: this allocation is probably 64K+
17:24 unlord: imirkin: I'll let you know in a minute if it works
17:24 unlord: I just rebuilt the module
17:26 imirkin: order 7 = 512kb. so yeah, i was way off. finding a contiguous 1/2mb in a long-running system with limited RAM is unrealistic.
17:28 unlord: imirkin: seems to have fixed it, https://paste.debian.net/1188274/
17:28 imirkin: woohoo, i'll make a proper patch
17:29 unlord: lets see if X comes up
17:29 imirkin: make sure you have xf86-video-nouveau 1.0.17
17:29 imirkin: i fixed some stuff for nv04/nv05 specifically
17:29 imirkin: although they might not affect you
17:30 imirkin: esp this guy can matter: https://cgit.freedesktop.org/nouveau/xf86-video-nouveau/commit/?id=ef89b3c5ca9b2569ca61a9452d13a93edc832810
17:30 imirkin: if you have modern software trying to use present
17:32 unlord: [ebuild R ] x11-drivers/xf86-video-nouveau-1.0.16::gentoo 0 KiB
17:34 imirkin: i think it's still masked
17:34 imirkin: i only pushed the release semi-recently =/
17:35 unlord: well lightdm came up
17:35 unlord: but MATE did not
17:36 unlord: now it did :)
17:36 imirkin: there's another bugfix which enables a 1920-wide monitor
17:36 imirkin: on nv4/nv5
17:36 unlord: oh cool
17:37 imirkin: we never enable tiling, but were trying to compute tile-aligned sizes
17:37 imirkin: (and naturally the thing doesn't support 2048)
17:39 unlord: lets see if I can play video any better
17:39 imirkin: xvinfo shoudl report an adapter now i think
17:40 unlord: yep, many
17:41 imirkin: unlord: if you want me to stick a reported-by, pm me your name + email
17:48 unlord: so I do get trails when I drag the window around
17:48 unlord: but to be fair, I also get this on my x3970 with a 1080
17:50 imirkin: like left-over trails, or ephemeral ones?
17:50 unlord: left over
17:50 unlord: they go away when the system catches up
17:51 unlord: I guess I mean ephemeral
17:51 imirkin: interesting
17:51 unlord: I think this is a product of MATE more than anything
17:51 imirkin: yeah, dunno, perhaps nouveau is doing something wrong, or perhaps MATE is doing something wrong
17:51 imirkin: i dunno how MATE works ... if they have a full-screen compositor, then it's the compositor's fault
17:51 unlord: someone is probably doing something wrong, it might be me
17:52 imirkin: which if you don't have accel will be ... slow
17:52 imirkin: and nouveau is just faithfully rendering the images it is given
17:54 unlord: I ask for software rendering
17:54 unlord: which disables the drop shadows and other compositor things that are slow
17:55 imirkin: well, still implies it's doing software rendering :)
17:55 imirkin: as opposed to using X acceleration
17:55 unlord: this is something I was going to look into eventually
17:55 unlord: the video playback should pass through though I think
17:56 imirkin: if it's using xvideo, i THINK so
17:56 imirkin: not 100% sure
17:56 imirkin: i'd recommend using something like xfce
17:56 imirkin: or other "simple" window managers
17:56 imirkin: i personally use WindowMaker
17:57 ccr: WindowMaker \o/
17:59 ccr: switched to wmaker in 1998 or so from fvwm2 and have never wanted to even try anything else.
18:00 unlord: I like GNOME2 alright, MATE does most of what I want
18:00 imirkin: i switched around that time from afterstep :)
18:00 imirkin: afterstep released some new version which worked totally differently than previous ones
18:00 ccr: heh
18:10 unlord: hmm
18:10 unlord: enabling the software compositor crashed MATE and now I cannot get in
18:11 unlord: so I need to find the way to change that property
18:11 imirkin: or use windowmaker ;)
18:11 imirkin: or xfce or whatever
18:16 unlord: hmm, now I cannot remove the nouveau driver
18:16 imirkin: that's expected
18:16 imirkin: unbinding it is slightly tricky
18:17 imirkin: hold on, let me remember how
18:17 imirkin: unlord: you have to do
18:17 imirkin: echo 0 > /sys/class/vtconsole/vtcon1/bind
18:17 imirkin: first
18:17 imirkin: BUT
18:17 imirkin: that will kill your console
18:17 imirkin: so ... yeah. beware.
18:17 imirkin: after that you can rmmod nouveau
18:17 karolherbst: who needs a console anyway
18:17 imirkin: actually on pre-nv50 hw it might not kill the console
18:17 unlord: that worked
18:17 imirkin: karolherbst: someone who's rmmod'ing nouveau.
18:18 unlord: I
18:18 unlord: I have to go back to the nv driver to uncheck this box for software compositing
18:18 imirkin: lol
18:18 imirkin: unlord: what's the crash btw?
18:18 imirkin: is it in the X server, or in MATE?
18:20 unlord: the X server would not come up
18:20 unlord: do you want me to reproduce this?
18:24 imirkin: meh
18:24 imirkin: up to you
18:24 imirkin: depending on the thing, 1.0.17 might fix it
18:24 imirkin: but if it's an issue in MATE, then it won't
18:25 unlord: Well, it isn't in MATE since using nv doesn't have this problem
18:26 unlord: I am seeing random pixels stay on
18:27 unlord: https://people.videolan.org/~unlord/parvus-mpv-nouveau.png
18:28 imirkin: not sure what i'm looking for in that screenshot
18:28 unlord: the pixels
18:28 HdkR: Random green pixel in the terminal
18:28 unlord: and a yellow one
18:28 HdkR: Sticky pixies
18:28 imirkin: oh
18:28 imirkin: weird.
18:29 imirkin: don't think i've seen anything like that before
18:29 imirkin: did it only start happening with xv?
18:29 imirkin: could be that the accel messes something up which ends up writing over random other data? dunno
18:30 RSpliet: I had to zoom in and out to verify it was the screenshot rather than my monitor :')
18:30 unlord: https://paste.debian.net/1188288/
18:31 imirkin: right, that's expected
18:31 imirkin: i just mean did only start happening when you were playing the video?
18:31 unlord: imirkin: no
18:31 unlord: imirkin: does this hardware support 10-bit video?
18:31 imirkin: it's in the screenshot, so it's not just some display thing
18:31 imirkin: lol, no
18:31 imirkin: you can see it's being converted from yuv420p10 to yuv420 in the mpv log
18:32 unlord: yeah, I think this is where the performance loss is
18:32 imirkin: hard to believe, but i don't think 10bpc color was a big concern in 1999 ;)
18:32 ccr: AV1 on a Pentium 3?
18:32 unlord: imirkin: what does xvinfo say?
18:32 unlord: ccr: yep
18:32 imirkin: that it supports the formats it lists
18:33 RSpliet: Interestingly... they're clusters of two pixels
18:33 imirkin: basically YV12, UYVY, YUYV, and i think another YV12-ish format. plus rgb.
18:33 RSpliet: just zoom in around (325,860)
18:34 imirkin: i forget the diff between I420 and YV12
18:34 imirkin: but it's something dumb
18:35 imirkin: maybe ordering of U and V planes?
18:36 unlord: ccr: this 224p file decodes fine on the command line
18:39 unlord: Decoded 2000/2000 frames (100.0%) - 35.99/30.00 fps (1.20x)
18:40 imirkin: actually nouveau doesn't support 10-bpc yuv formats anywhere
18:40 imirkin: mostly because like the hw decoders only support it on later hw
18:40 imirkin: and i did the enablement for some of the older hw
18:40 imirkin: and it was death, so ... i'm dead inside now
18:41 unlord: OK, so I can do a 12-bit encode or an 8-bit encode
18:41 imirkin: not that ever even got it to work properly =/
18:41 imirkin: stupid h264
18:42 unlord: but first, looking at Xorg.0.log I'm in a 16bpp video mode, https://paste.debian.net/1188291/
18:42 imirkin: you can adjust that if you want
18:42 imirkin: but you don't have infinity vram
18:42 unlord: nv defaulted to 24bpp
18:42 imirkin: i think nouveau has vram-based defaults
18:42 unlord: I have 16MB, which should be plent
18:43 imirkin: and 32mb and under gets 16bpp
18:43 imirkin: but you can force it
18:43 karolherbst: ehhh 16 bit color depth *runs away with nightmares*
18:43 unlord: imirkin: how do I force it?
18:43 imirkin: mmmmm
18:43 imirkin: simplest and quickest is to add -depth 24 to the X command
18:43 imirkin: or you can put something in the xorg.cnnf
18:44 unlord: the second is easier with my setup
18:44 imirkin: but i don't remember how :)
18:44 karolherbst: DefaultDepth on the screen?
18:45 imirkin: sounds right
18:45 karolherbst: yeah "specifies which color depth the server should use by default. The -depth command line option can be used to override this. If neither is specified, the default depth is driver-specific, but in most cases is 8."
18:45 imirkin: hehe
18:45 imirkin: that might be slightly out of date :)
18:46 imirkin: sounds accurate for 1980
18:46 karolherbst: I am sure the default in the xorg server is still 8
18:46 karolherbst: but modesetting will probably default to 24
18:46 imirkin: it says ddx-specific
18:46 imirkin: i don't think most ddx's default to 8
18:46 karolherbst: ohhh
18:46 karolherbst: right
18:46 karolherbst: yeah... probably
18:46 karolherbst: I think I hit old docs anyway
18:47 karolherbst: X11R6.8.0
18:47 imirkin: that's pretty new
18:47 karolherbst: really?
18:47 imirkin: only like 10y old
18:47 imirkin: or 15
18:47 karolherbst: :D
18:47 karolherbst: the newest docs say the same though
18:47 karolherbst: not that it matters
18:47 karolherbst: one can update it, but nobody will see it ever, sppp..
18:48 karolherbst: *soo
18:51 unlord: yeah, so a default screen worked
18:54 unlord: dragging the terminal around is still painfully slow
18:54 imirkin: can you try xfce or another 'simple' environment?
18:55 unlord: am I missing acceleration? https://paste.debian.net/1188296/
18:55 karolherbst: but uhm... the display runs at 60hz and everything, right?
18:55 unlord: imirkin: sure, I can try anything
18:55 imirkin: unlord: no, that all seems normal
18:56 imirkin: unfortunately the nv4/nv5 backend doesn't support a ton of accel
18:56 imirkin: but it does get you some
18:57 unlord: there is some mate-tweak package that lets you fine tune the compositor
18:57 imirkin: what you want is to disable it entirely, which might not be a thing
18:58 unlord: it looks like it is
18:58 unlord: https://askubuntu.com/questions/756005/does-mate-desktop-environment-use-graphics-acceleration
18:59 karolherbst: xfce can do compositing without any fancy graphics, so that's really worth a try
18:59 unlord: Sure, let me start building it
18:59 imirkin: it has a xrender backend iirc
18:59 karolherbst: yeah
18:59 imirkin: though like i said, nv4 backend is kinda shit
18:59 karolherbst: xfce is really nice for low end systems
18:59 imirkin: only supports a couple operations
19:00 karolherbst: yeah... dunno what was the slowest I tried xfce4 on
19:00 karolherbst: but it was slow
19:00 imirkin: but copy (aka blit) is one of them
19:00 unlord: ooh, mate-tweak let me remove the compositor
19:00 unlord: and now my desktop is fast
19:00 imirkin: woo!
19:00 unlord: gentoo of course removed this package
19:00 unlord: but that's okay, I have the power of git
19:00 karolherbst: or just a 5 line ebuild
19:00 karolherbst: *write
19:03 unlord: karolherbst: apparently this is something I did
19:03 unlord: the package was under /usr/local/portage/x11-misc/mate-tweak
19:03 karolherbst: ahh
19:03 unlord: probably got it from this overlay https://gpo.zugaina.org/x11-misc/mate-tweak
19:03 karolherbst: probably
19:04 unlord: I can still install xfce, but first I want to remove the compositor
19:21 unlord: I'll try xfce now
19:22 unlord: even with 8-bit video and disabled compositor, it is still dropping frames
19:24 imirkin: /* The blitter does not handle YV12 natively */
19:24 imirkin: so it makes a YUYV / UYVY surface out of it first =/
19:25 imirkin: NVCopyData420
19:25 imirkin: which i apparently didn't give the SSE treatment to
19:25 imirkin: not that it would help if i had
19:26 imirkin: https://cgit.freedesktop.org/nouveau/xf86-video-nouveau/tree/src/nouveau_xv.c#n564
19:26 unlord: why wouldn't SSE help?
19:26 imirkin: coz you don't got it
19:26 unlord: Pentium 3 absolutely has SSE
19:26 imirkin: i meant SSE2 :)
19:26 unlord: # cpuid2cpuflags
19:26 unlord: CPU_FLAGS_X86: mmx mmxext sse
19:27 imirkin: i dunno - maybe sse has useful things for this too
19:27 imirkin: i haven't checked
19:27 imirkin: patches welcome
19:27 imirkin: you can look at how i did it for NVCopyNV12ColorPlanes
19:27 imirkin: which is a few lines below
19:27 unlord: this is inside nouveau?
19:27 imirkin: it's inside the DDX
19:27 ccr: iirc SSE is floating point stuff, SSE2 is like "MMX" but with SSE XMM registers
19:27 imirkin: (xf86-video-nouveau)
19:28 ccr: e.g. integer operations. not sure if SSE can help
19:28 imirkin: intel used to have this awesome site which let you look for their stupid intrinsics
19:28 imirkin: https://software.intel.com/sites/landingpage/IntrinsicsGuide/
19:28 unlord: I think for display purposes we're probably fine in floats
19:28 imirkin: unlord: this is repacking planar YUV into YUYV
19:29 imirkin: SSE adds pshufw
19:29 imirkin: but ... not pshuf8
19:29 imirkin: er, pshufb
19:30 unlord: imirkin: hmm
19:30 imirkin: actually that wouldn't even help here
19:30 imirkin: what would help ... hmm
19:31 unlord: let me go see what pixman does
19:32 imirkin: the NV12 thing is for the texture method, which is not available on the earlier hardware
19:32 imirkin: the blitter needs YUYV (or UYVY, same idea)
19:33 imirkin: even the hw overlay on nv4/nv5 could only do YUYV / UYVY
19:33 unlord: imirkin: if I run this decoder in perf will this show up, or is it all in the kernel
19:33 imirkin: well, even if it were in kernel, perf would pick it up
19:33 imirkin: but this is in the ddx, i.e. a bit of software run by X
19:33 imirkin: (device-dependent X)
19:34 imirkin: based on the link it should be apparent that it's in xf86-video-nouveau...
19:34 unlord: yeah, I saw that
19:34 unlord: let me do some profiling
19:34 unlord: imirkin: even with the compositor off, dragging a window around MATE is delayed
19:34 unlord: I'll try xfce, but don't think that is the issue
19:37 imirkin: looks like _m_punpcklbw + _m_punpckhbw from MMX could help
19:54 unlord: imirkin: swscale has code for this
19:54 imirkin: should be possible to interleave U and V first, and then interleave that with the Y data
19:56 imirkin: does x86_64 provide MMX?
19:57 unlord: x86_64 has SSE3 I believe
19:57 imirkin: x86_64 is guaranteed to have SSE2
19:57 unlord: these instruction sets are backwards compatible
19:57 imirkin: which also has those ops
19:58 imirkin: but mmx is 64-bit etc
19:58 imirkin: for SSE2, i'd use _mm_unpackhi_epi8 and so on
19:58 unlord: imirkin: I'd recommend at least reading what swscale does
19:58 unlord: since this is the code that is in ffmpeg
19:58 imirkin: i'm sure it's very fancy
19:58 imirkin: and i don't need fancy
19:59 unlord: I have no idea about that, but I know it is correct
19:59 imirkin: the other code i pointed to is as fast as memcpy, at least on my cpu
19:59 unlord: and it can't be too fancey, this is probably a handful of instructions
20:01 imirkin: unlord: i only see YUYV_TO_Y/U functions
20:01 imirkin: not the other way
20:02 unlord: imirkin: which function are you looking for?
20:02 imirkin: planar yuv -> yuyv
20:03 imirkin: interesting, they only do mmx if arch X86_32
20:03 unlord: interesting
20:03 unlord: https://drmdb.emersion.fr/snapshots/5d7537bef8a8
20:04 imirkin: i know. i added it.
20:04 imirkin: both the overlay function into the kernel, and that particular snapshot into emersion's db ;)
20:04 unlord: oh cool
20:04 imirkin: however the ddx does not take advantage of hw overlays
20:04 imirkin: i started on it some time ago
20:05 imirkin: but ... well, didn't finish.
20:06 unlord: can I add my card to the db?
20:06 unlord: is it helpful at all?
20:06 imirkin: not sure - up to emersion i suppose
20:06 imirkin: he hangs out here, can opine
20:06 imirkin: you have the same board as me, right?
20:06 unlord: I assume so, this one is half height :)
20:06 imirkin: 09:00.0 VGA compatible controller [0300]: NVIDIA Corporation NV5 [Riva TNT2 Model 64 / Model 64 Pro] [10de:002d] (rev 15)
20:07 imirkin: except yours is AGP
20:07 emersion: if it's a different board or newer kernel, it would be helpful
20:07 emersion: :)
20:07 unlord: 21:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [GeForce GTX 1080 Ti] [10de:1b06] (rev a1)
20:07 imirkin: oh. that's a different board :p
20:07 unlord: wait, wrong comp[uter
20:07 linkmauve: :D
20:07 ccr: :P
20:07 imirkin: slight functionality increase over the TNT2
20:08 ccr:invokes the "the corporate needs you to find the differences between these two pictures" -meme
20:08 imirkin: that's from The Office, right?
20:09 imirkin: where creed becomes manager?
20:09 ccr: I think so, I've only ran into it due to internet-cultural osmosis
20:09 unlord: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation NV5 [Riva TNT2 / TNT2 Pro] [10de:0028] (rev 15)
20:09 unlord: I guess these are different
20:11 imirkin: ooh, the Pro version!
20:12 imirkin: nothing but the best for unlord
20:12 unlord: imirkin: I just meant 10de:0028 v 10de:002d
20:12 imirkin: i know
20:12 imirkin: but the pci db says that's the TNT2 Pro
20:13 unlord: you're has Pro in the name too
20:13 imirkin: oh lol
20:13 imirkin: didn't notice that
20:13 imirkin: i guess we're both just rolling in it then
20:13 imirkin: with our fancy graphics cards
20:14 imirkin: https://www.youtube.com/watch?v=qpMvS1Q1sos
20:14 unlord: how do I send this json to you?
20:14 unlord: emersion: ^^
20:15 emersion: unlord: drm_info -j | curl -X POST -d @- https://drmdb.emersion.fr/submit
20:16 unlord: emersion: done!
20:16 emersion: eh, the command on the website is borked
20:16 emersion: unlord: thanks!
20:18 unlord: okay, so this has the same YUYV planes
20:20 imirkin: nv4/nv5 have same capabilities.
20:20 imirkin: i added support for it in 3.x kernels at some point
20:20 imirkin: so it's been around for a while now
20:20 imirkin: nv10-nv40 support NV12/NV21 as well
20:22 imirkin: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/dispnv04/overlay.c
20:23 imirkin: actually even some nv1x's and nv20 don't support NV12/NV21 ... NV17/NV18 had it though, which makes sense since they also included a hw mpeg2 decoder
20:26 imirkin: unlord: anyways, i'll try to accelerate that other function in xf86-video-nouveau
20:26 unlord: OK
20:26 imirkin: not sure when i'll get to it
20:27 unlord: I am testing the swscale conversions right now
20:27 imirkin: probably no today though
20:27 imirkin: if you can make mpv produce yuyv for the Xv overlay that will also fix it
20:27 imirkin: (i.e. YUY2 or UYVY formats)
20:27 unlord: I'm just testing this with ffmpeg
20:30 unlord: ffmpeg -v verbose -i lgr-floppy-224p30-svtav1-25k-0-opus-9k-60ms.mkv -pix_fmt uyvy422 -f null - <-- 1.33x realtime
20:31 unlord: ffmpeg -v verbose -i lgr-floppy-224p30-svtav1-25k-0-opus-9k-60ms.mkv -f null - <-- 1.82x realtime
20:31 unlord: so swscale is converting fast enough for this
20:33 imirkin: ok
20:33 imirkin: i just don't see the functions
20:33 imirkin: in swscale
20:34 unlord: let me find them
20:37 imirkin: pmoreau: friendly reminder to test the newly-split patches, whether the double-store is required or not
20:38 unlord: ffmpeg -v verbose -i lgr-floppy-224p30-svtav1-25k-0-opus-9k-60ms.mkv -pix_fmt bgr0 -f null - <-- 1.55x realtime
20:40 imirkin: so like ... ffmpeg does this for a living
20:40 imirkin: we're not really in a position to compete with that
20:40 unlord: imirkin: ffmpeg is open source, you just take the same code :)
20:41 imirkin: it's a big unnecessary dep
20:41 imirkin: i'm going to try to make it faster
20:41 imirkin: which i think will end up being comparable
20:41 imirkin: right now there's zero vectorization of this logic
20:41 imirkin: it's basically being done in the dumbest way possible
20:41 imirkin: but i dunno if that's necessarily the limiting factor
20:42 unlord: no no, you don't add the dependency, you just take the SIMD kernel
20:42 imirkin: ok. send links ;)
20:42 imirkin: also all nouveau stuff is MIT licensed
20:42 imirkin: (or BSD?)
20:42 imirkin: (is there a difference?)
20:43 unlord: https://github.com/FFmpeg/FFmpeg/tree/master/libswscale
20:43 unlord: there is a difference between BSD and MIT
20:43 imirkin: yeah, i looked in there but couldn't find it
20:43 unlord: ffmpeg is LGPL, so you'll want to write your own anyway
21:02 unlord: so how do I get my console back?
21:03 unlord: I cannot do ctrl-alt-f1 to get a console
21:04 imirkin: erm
21:04 imirkin: that should work
21:05 imirkin: does X work?
21:05 unlord: you had me do this earlier to unload the driver, echo 0 > /sys/class/vtconsole/vtcon1/bind
21:05 imirkin: but you re-loaded after that right?
21:05 unlord: yes, I reloaded the driver
21:05 imirkin: that should get the console back
21:07 unlord: ahh, restarting X fixed it
21:09 karolherbst: ahh nice.. reclocking is totally busted on the jetson nano :/ uff
21:17 imirkin: unlord: sometimes you can get it back by just switching between vt's
21:18 unlord: I tried that, but restarting X seemed to fix it
21:18 imirkin: weird
21:19 unlord: the conversion optimizations you are making would be part of the kernel, so I'd need to recompile that module again?
21:21 imirkin: no
21:21 imirkin: all userspace
21:21 imirkin: xf86-video-nouveau
21:21 unlord: oh right?
21:22 imirkin: as you saw, the code is in xf86-video-nouveau
21:22 imirkin: this is loaded by X
21:22 imirkin: i'm looking at MMX... looks like it has no way to interact with memory directly??
21:22 unlord: you just load registers
21:23 imirkin: yeah
21:23 unlord: this is the same with all SIMD
21:23 imirkin: sse2 has _mm_loadu_si128
21:23 unlord: oh heh
21:23 unlord: I don't know all the intrinsics, I know the actual op codes
21:24 imirkin: movdqu
21:24 unlord: right
21:24 imirkin: takes a memory argument
21:24 unlord: [edx]
21:24 imirkin: right
21:24 unlord: or some such, I guess you'd pass a pointer
21:24 imirkin: no such thing in mmx
21:25 imirkin: you can load edx into a mmX reg
21:25 imirkin: there's even a movq, dunno how that works
21:25 imirkin: it says "r64" arg, but how does that work on 32-bit?
21:27 karolherbst: probably not available
21:27 imirkin: mmx came out on top of pentium(1), i doubt they added a ton of stuff to it when x86_64 became a thing
21:28 unlord: https://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_RGB_YUV.pdf
21:28 unlord: top of page 5
21:28 karolherbst: "These are available when using REX.R and 64-bit mode." :)
21:28 imirkin: ah.
21:28 unlord: 1 movq mm1, [eax] ;load G2R2B1G1R1B0G0
21:29 unlord: so there must be intrinsics for that
21:29 imirkin: oh huh
21:29 imirkin: i don't see them on the website
21:29 imirkin: let's check the header file
21:30 karolherbst: the entire x86 ISA is annoying and with 64 bit they just said: hey.. all the hardware has more regs _anyway_ so just add a bunch of regs to the ISA
21:31 imirkin: unlord: hm, well i don't see it
21:32 ccr: x86 ISA is like a garbage dump. old stuff rotting under newer stuff, that's been piled over it.
21:32 imirkin: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34256
21:32 imirkin: looks like it might get generated
21:33 imirkin: probably has to be aligned memory though?
21:33 unlord: imirkin: I don't think so
21:33 imirkin: it does on sse2 unless you use the *u intrinsics
21:34 unlord: right there is movdqa
21:39 HdkR: Why looking at mmx?
21:41 HdkR: SSE2 is effectively MMX but with 128bit registers
21:41 HdkR: Plus the preq added functionality that misses some holes
21:44 imirkin: HdkR: unlord is on a p3
21:44 imirkin: and the code in question is only going to be useful for pre-shader GPUs
21:45 HdkR: ah, writing some SIMD code using MMX
21:46 imirkin: oh crap
21:46 imirkin: stupid inlining logic
21:46 imirkin: all my code got optimized out
21:46 imirkin: coz i'm using fixed inputs in the testing setup
21:47 imirkin: thar ya go
21:52 imirkin: unlord: https://paste.debian.net/1188334/
21:52 imirkin: not the worst-looking code i guess
21:52 imirkin: doesn't quite work, but that's fixable with a bit of thought though
21:53 unlord: imirkin: can you do -M intel
21:54 imirkin: unlord: https://paste.debian.net/1188335/
21:55 unlord: I don't know the structure you are pointing into
21:55 imirkin: i have separate inputs
21:56 imirkin: y, u, and v
21:56 imirkin: coming in as separate planes
21:56 imirkin: unfortunately they're also on the stack
21:56 imirkin: hence the confusion
21:56 imirkin: let me change it so they're malloc'd
21:57 unlord: movq mm1,QWORD PTR [ebp-0x7c]
21:57 unlord: I didn't think that did what you thought it did
21:57 imirkin: i
21:57 imirkin: i'm just letting gcc do its thing :)
21:58 imirkin: https://paste.debian.net/1188338/
21:58 imirkin: with y, u, v, and dest off the stack
21:59 unlord: maybe put the C code in there too?
22:00 imirkin: https://paste.debian.net/1188340/
22:00 imirkin: (it's wrong though, i'm working out how to get it to do the thing i want)
22:00 unlord: yeah, this is better
22:01 imirkin: the naive thing i did is working out byte-swapped from the result i want
22:01 imirkin: so i'm banging randomly on the keyboard
22:02 imirkin: oh hm, no, i'm just fucked. need to add another op of some kind
22:02 imirkin: let's see what ya got, intel
22:04 imirkin: nothing useful =/
22:07 imirkin: i want to swap pairs of bytes in the m64 reg.
22:07 imirkin: or alternatively bswap32() each 32-bit value
22:09 imirkin: weird.
22:09 imirkin: hm
22:10 ccr: isn't there left/right shift? something like val -> shift left -> mask -> tmp ; shift right -> mask etc
22:10 ccr: could be too expensive instruction wise tho, dunno
22:11 karolherbst:doesn't get why this stuff is written in assembly anyway
22:12 unlord: karolherbst: because we need it to be very fast
22:12 karolherbst: ehh.. that's why you don't write it in assembly
22:12 unlord: what do you write it in?
22:12 karolherbst: C with proper intrinsics
22:12 imirkin: fpga!
22:12 ccr: javascript?!
22:12 karolherbst: you can do all the SSE stuff directly in C as well
22:13 unlord: karolherbst: it is not as fast
22:14 karolherbst: wait.. do we still dont' have proper compilers? damn
22:14 karolherbst: anyway, I am sure the compiler won't be worse than this assembly code there as long as you use the intrinsics
22:14 unlord: karolherbst: its not even that we don't have good compilers, it is that you have more context when you write the assembly
22:14 unlord: karolherbst: again, you are wrong
22:15 unlord: there is some overhead, but it is not crazy, you might get 2x faster handwriting if you pipeline the instructions properly
22:15 karolherbst: the compiler can do this as well....
22:16 karolherbst: sure, sometimes compilers do dumb shit and everything
22:16 karolherbst: but the stuff there is farily trivial
22:16 karolherbst: and even the kernel gets rid of assembly code if there is no fundamental reason to keep it that way
22:16 karolherbst: because.. the C code is generally faster
22:16 karolherbst: just saying...
22:17 unlord: C code is generally faster than what?
22:17 karolherbst: the hand written assembly code the kernel had
22:17 karolherbst: they removed a bunch of that
22:17 karolherbst: the C code was faster
22:17 karolherbst: and easier to maintain of course
22:17 unlord: right
22:18 karolherbst: the main reason to use assembly for media stuff is, that you don't want to rely on auto vectorization because auto vectorization sucks big times
22:18 karolherbst: but you can use "C extensions" adding a bunch of those
22:19 unlord: intrinsics should be pretty good for this problem
22:20 karolherbst: yep
22:21 unlord: mpv -vo=drm <file> from the console gives me this: https://people.videolan.org/~unlord/lgr-mpv-drm.jpg
22:21 imirkin: unlord: can you find out what it does?
22:22 imirkin: e.g. get a log
22:22 unlord: imirkin: can I find out what which does?
22:22 imirkin: mpv
22:22 imirkin: mpv -vo drm
22:22 unlord: you see the output right?
22:22 imirkin: yes.
22:22 imirkin: i assume that was not the source material?
22:22 unlord: no
22:22 imirkin: can you get a log from mpv?
22:22 unlord: I'll look
22:23 imirkin: looks like wrong color format
22:23 imirkin: but it's not completely wrong, since you e.g. see the C:\>LAZY thing, presumably an element of the source
22:24 unlord: it is an elemnt
22:24 unlord: source here: https://themayhaks.com/lgr-floppy-av1/lgr-floppy-224p30-svtav1-25k-0-opus-9k-60ms.mkv
22:24 imirkin: maybe it's assuming that something works, and we don't check sufficiently hard that it's not there
22:25 imirkin: Cannot find codec matching selected -vo and video format 0x31305641.
22:26 unlord: imirkin: https://paste.debian.net/hidden/57f5b025/
22:27 imirkin: hmmmm
22:28 imirkin: not completely clear what it's trying to do
22:28 imirkin: looks like it's converting to bgra
22:28 unlord: yeah, and upsampling
22:28 unlord: I'm going to see if I can disable that
22:28 imirkin: well
22:28 imirkin: the primary plane can't take yuv
22:28 imirkin: and the overlay plane can't take bgra
22:29 imirkin: let me glacne at their drm backend
22:30 imirkin: unlord: it might be trying to convert to bgr10x2 instead of bgr8?
22:30 imirkin: i guess i dunno. https://github.com/mpv-player/mpv/blob/master/video/out/vo_drm.c#L584
22:30 unlord: there is a lot of black
22:31 imirkin: looks like it doesn't try to do any funny business with overlays/etc
22:32 imirkin: do you see any errors in dmesg?
22:32 imirkin: although it's a moving image, right?
22:32 imirkin: oh wait....
22:33 imirkin: hm, no
22:33 unlord: it is a video, yes
22:33 imirkin: it does do drmAddFB2()
22:33 imirkin: i mean, when you play it, the image moves
22:33 imirkin: it's not stuck on a single frame?
22:33 imirkin: i do think we support xrgb8 though
22:34 unlord: yes
22:34 unlord: when I play it, I get frames, it is pretty jerky though
22:34 imirkin: the only thing i can think of is that something else is getting messed up
22:34 imirkin: i.e. the images are bogus
22:34 imirkin: due to some of the conversion that's happening
22:35 imirkin: i know that's a cop-out explanation
22:35 unlord: and yes, we do have xrgb8888
22:35 imirkin: but you can test it all out with the 'modetest' tool
22:35 unlord: imirkin: this is the same mpv I was using to play the file under MATE
22:36 imirkin: but it wasn't doing all the same conversions
22:36 imirkin: like are you sure that yuv -> bgra thing works as expected?
22:36 unlord: https://people.videolan.org/~unlord/parvus-mpv-nouveau.png
22:36 unlord: I am not sure, no
22:38 imirkin: it's an interesting point that there's a ton of blue in that pic
22:38 imirkin: and almost no other colors
22:38 unlord: there are some primaries
22:38 unlord: red, green blue
22:38 imirkin: yeah
22:38 imirkin: i dunno what would cause this
22:38 imirkin: if mpv thing were smart, it could make use of the overlay function
22:39 imirkin: which would actually give you free scaling too
22:39 unlord: the mpv guys are pretty smart
22:39 imirkin: :p
22:39 imirkin: you know what i mean.
22:39 unlord: imirkin: to rule it out, I'm going to build mpv from scm
22:40 imirkin: unlord: try it on your desktop
22:40 imirkin: should be the same path
22:40 unlord: with drm?
22:40 imirkin: ya
22:40 unlord: I'll try it with my laptop :)
22:51 unlord: imirkin: just tried on my laptop with mpv -vo=drm <file> and it worked
22:51 unlord: same version of mpv
22:52 imirkin: grr
22:53 unlord: using mpv from git on the P3 is still broken the same way
22:53 imirkin: 32-bit vs 64-bit? i dunno man
22:54 imirkin: like ... showing buffers really should work
22:54 imirkin: which is all vo_drm is doing
22:56 unlord: can you try this on your TNT2 ?
22:58 imirkin: with difficulty
22:59 imirkin: it's not hooked up to anything
23:03 imirkin: and the monitor i have which can do vga is in a box presently
23:04 unlord: it would be good to rule out my hardware being broken
23:04 unlord: I just did everything under xfce4, and I'm getting the same window tearing
23:07 imirkin: lol, i'm just an idiot
23:07 imirkin: it was getting byteswapped because it was taking the BE path in the code
23:08 unlord: https://people.videolan.org/~unlord/parvus-xfce4-term.jpg
23:09 imirkin: whoa
23:09 imirkin: i like it
23:09 imirkin: i've never seen anything like it
23:11 imirkin: unlord: https://paste.debian.net/1188351/
23:11 imirkin: seems pretty reasonable, right?
23:11 imirkin: should i do the stores immediately after doing the unpacks?
23:13 imirkin: (obviously needs to be adapted to looping over the whole image, but that's later)
23:14 unlord: what is the end format?
23:15 imirkin: yuyv
23:15 unlord: Y0V0Y1U0 Y2V1Y3U1 Y4V2Y5U2 Y6V3Y7U3 ?
23:16 imirkin: yes, that sounds right
23:16 unlord: __m64 mm_uv1 = _mm_unpacklo_pi8(*mm_v, *mm_u); <-- are these swapped then?
23:16 imirkin: i thought they were. but it matches wht the current function does when i do this
23:16 imirkin: i guess it's yvyu
23:16 unlord: which should it be?
23:16 imirkin: whatever the current code does
23:17 imirkin: i dumped the current code, and am matching what it produces
23:18 unlord: your card has two output formats: https://drmdb.emersion.fr/snapshots/5d7537bef8a8
23:18 unlord: YUYV and UYVY
23:18 imirkin: that's not completely relevant
23:18 imirkin: since this is not being used
23:18 imirkin: but more importantly, perhaps i've just named "u" and "v" backwards
23:19 unlord: sheesh
23:19 unlord: only you would know, the function header is missing :)
23:19 imirkin: what matters is that it produces the same data as the existing function
23:19 imirkin: * @param src1 source buffer of luma
23:19 imirkin: * @param src2 source buffer of chroma1
23:19 imirkin: * @param src3 source buffer of chroma2
23:19 imirkin: i'm sure that clears things up.
23:19 unlord: it is what I assumed
23:19 imirkin: it has to do with wtf YV12 is
23:19 imirkin: which is the source for this
23:20 imirkin: that's probably how YV12 differs from I420
23:20 imirkin: I420 = Y, U, V, and YV12 = Y, V, U
23:20 imirkin: or something idiotic like that
23:20 imirkin: YV12 is exactly like I420, but the order of the U and V planes is reversed.
23:20 imirkin: from https://wiki.videolan.org/YUV
23:24 unlord: > This format requires 4x8+8+8=48 bits per 4 pixels, so its depth is 12 bits per pixel.
23:24 unlord: nobody I know would call that 12-bit, but whatever :)
23:27 imirkin: i did run into someone who was trying to say that NV12 was a 12bpc format
23:27 imirkin: and i was like ... no.
23:27 imirkin: er, 12bpp
23:28 imirkin: (and who thought that the NV12 name came from that)
23:37 imirkin: oh fk
23:37 imirkin: that function actually does math as well
23:37 imirkin: you can't just do straight 4:2:0 -> 4:2:2 conversion, duh
23:37 imirkin: the math it's doing seems wrong though...
23:38 imirkin: it's trying to smooth things out
23:38 imirkin: i guess that makes some sense
23:38 imirkin: i.e. for odd rows, V = (V_above + V_below) / 2
23:42 imirkin: and there's no easy way to do this with mmx ... i can add, but then it'll overflow
23:43 linkmauve: It probably depends on where the chroma sample is, some formats put it in the middle of the four pixels, some at the top left.
23:45 imirkin: i guess i can unpack together with a 0, do the math, and then repack