08:54Yuri6037: Hi, I don't know if it is the right place but I'm trying to get video transcoding using VAAPI and an old GeForce 8400 GS (chipset NV98). According to nouveau website it should be supported with VP3 and should encode H264 properly. It turns that no matter what I try with ffmpeg, I'm always prompted with "Failed to initialize VAAPI connection: -1
08:54Yuri6037: (unknown libva error)."
08:56RSpliet: Yuro6037: from the top of my head I don't think NV98 will encode any H264, just decode
08:58Yuri6037: The current command I'm using for testing is just "ffmpeg -vaapi_device /dev/dri/renderD128" and this already fails with error -1
08:58Yuri6037: I'm not even trying to encode or decode just trying to intialize a VAAPI connection. But this already fails with error -1
08:59Yuri6037: Also the kernel driver shows weird errors about a missing BIOS and also shows "failed to create encoder 0/1/0: -19"
09:00RSpliet: Did you install the video decoder firmwares?
09:01Yuri6037: in /lib/firmware/nouveau a lot of nvXXX_fucYYY
09:01Yuri6037: even nvXX_fucYYY and even some vuc
09:01RSpliet: nv98_vp, nv98_ppp and a few similar
09:02RSpliet: the vuc-ones may be relevant too, they do mention vp3.
09:02Yuri6037: Just one thing I could not follow exactly the tutorial as the nvidia driver used is incorrect for my GPU and my system I have an amd64 kernel so I downloaded nvidia for amd64
09:02Yuri6037: Yes they mention vp3 and even vp4
09:03RSpliet: this is nouveau, not NVIDIA's official driver. they're two separate things. First choose which of the two you're trying to get to work
09:03Yuri6037: I mean for firmware extraction
09:03RSpliet: Nouveau it is
09:03RSpliet: Please paste your full dmesg onto some paste website and share the link here
09:03RSpliet: Don't grep
09:04RSpliet: Second: please take a look at the output of vainfo
09:04Yuri6037: vainfo does not exist on my system I tried all I could and there are no packages for vainfo
09:05Yuri6037: I'm running CentOS 8 and finding packages there is very complicated. Most sites requires registration but this server is local only
09:08Yuri6037: Here is the pastebin for the entire dmesg log: https://pastebin.com/mqMZqhpZ
09:09Yuri6037: And also yes the motherboard is an old one with 1 weird USB port that randomly cuts current and a power block that is unable to power USB3 PCI extension in addition to 4 HDDs + 1 SSD
09:10Yuri6037: At least when no usb is connected it runs fine and has good ethernet so I use it to make a home server
09:10RSpliet: Logs look fine for nouveau. Good!
09:10Yuri6037: So why would VAAPI intialization fail then?
09:13Yuri6037: I can try to install vainfo for CentOS 7 sometimes COS7 packages works
09:14RSpliet: See if you can get libva-utils from CentOS EPEL
09:14RSpliet: Odd it's not in CentOS 7 EPEL...
09:14Yuri6037: The thing is it has been removed in COS8
09:15RSpliet: happen to know why?
09:15Yuri6037: I don't know why but a lot of drivers and useful tools related to hardware accelerate are all removed from COS8 forcing system amins to stay on 7. Except that I had issues with COS7 so I needed COS8
09:16RSpliet: Right. Are you tied to VAAPI, or can you use VDPAU too?
09:16Yuri6037: Jellyfin server only runs VAAPI
09:16Yuri6037: it uses ffmpeg which actually wants VAAPI not VDPAU
09:16RSpliet: and vaapi-vdpau-driver is installed?
09:18Yuri6037: I have one match libva-dvpau-driver and it's not installed
09:19RSpliet: don't know whether that's useful tbh, sounds like it could translate vaapi to vdpau in case "native" support doesn't work
09:19RSpliet: Anyway, back to your original problem. having vainfo would be useful, worst-comes-to-worst just compile it yourself
09:19RSpliet: libva-utils probably doesn't have a gazillion dependencies
09:20Yuri6037: well it does need X and all kind of tools are kind of hard to install under headless server
09:20Yuri6037: It's SSH and Cockpit only
09:21RSpliet: You can try doing it on COPR instead. Bit more work, but shouldn't be too troublesome if you grab an rpmspec from Fedora or CentOS7 and just tick the "build for CentOS 8" box
09:21Yuri6037: I will try downloading the one from COS7
09:22RSpliet: If you're lucky, karolherbst can help you with COPR stuff a bit. I've used it before, it's not hard but finnicky
09:25Yuri6037: I have failed dependencies: libva.so.1, libva-drm.so.1, libva-x11.so.1 and also libva.so.1(VA_API_0.33.0)
09:27Yuri6037: Is this the correct link for compiling: https://github.com/intel/libva-utils
09:31Yuri6037: As expected same problem libva-drm is missing
09:32RSpliet: That's upstream, sounds about right.
09:32RSpliet: I think questions like this might be better directed at the CentOS people. We don't really do distro-support here (although you're lucky that three nouveau devs work for Red Hat ;-))
09:32RSpliet: And I just happen to be running Fedora for 10+ years
09:33tagr: karolherbst: so this Mesa issue is thoroughly confusing
09:33tagr: karolherbst: I do have a patch which implements the detiling blit workaround and that seems to work, at least partially, for something like kmscube with modifiers support disabled
09:34Yuri6037: I like CentOS a lot for servers as it's lightweight and runs better than Ubuntu Server however I hate the removal of like thousands of important packages
09:34tagr: karolherbst: unfortunately it doesn't work with X (< 1.20.99) because apparently glamor doesn't use the same code paths
09:35tagr: karolherbst: so basically for the detiling blit I rely on gallium's ->flush_resource() getting called at some point on the framebuffer, which works pretty much everywhere (kmscube, Weston, ...) except glamor
09:35tagr: but glamor is really the only case where this is problematic
09:36tagr: I'll have to keep looking, but I find it strange that there's apparently no way for glamor to flush resources and I'm wondering if something is just missing
09:36Yuri6037: I cannot install it there are no packages for libva-drm
09:37Yuri6037: I need a vainfo that does require libva-drm
09:43karolherbst: tagr: could be a glamor bug as well
09:47tagr: karolherbst: yeah, I'm wondering if perhaps glamor always assumes that buffers will be local to the GPU and therefore they don't need to be flushed
09:47Yuri6037: ok vainfo does not build on my system
09:47tagr: karolherbst: I'll look into that, but I'm starting to really dislike the de-tiling blit workaround because early experiments indicate that performance suffers severely, slashing fps in something like kmscube into about 1/3
09:48Yuri6037: VAConfigAttribPredictionDirection undeclared
09:48RSpliet: Yuri6037: libva-drm is probably required to get VAAPI working on nouveau
09:48karolherbst: tagr: I think glamor tries very hard to not flush as this can hurt perf a lot
09:48karolherbst: X doesn't have a concept of "this frame is done now"
09:48RSpliet: Sounds like the kind of package you'd need.
09:48karolherbst: so you don't know when to flush for the entire frame
09:48karolherbst: so either you flush after every 2D operation (good by perf) or you try to not flush at all
09:48Yuri6037: I installed libva-devel and it found liva-drm which is weird but ./configure stoped to error
09:48karolherbst: at least that's my understanding of glamor
09:49RSpliet: (drm -> Direct Rendering Manager, Linux' OSS graphics subsystem. Not to be confused with Digital Rights Management)
09:49Yuri6037: Oh ok
09:49tagr: karolherbst: couldn't we at least flush before presentation? at least that should be a known point in time
09:49karolherbst: not sure
09:49karolherbst: I never really looked into glamors code, so maybe that should rather be discussed with devs involved
09:49Yuri6037: but this does not fix the undeclared identifies problem
09:49karolherbst: I don't even know if other SoCs have the same issue or not
09:50Yuri6037: I'm sure vainfo does not support libva 2.5
09:50RSpliet: Yuri6037: that sounds like a version mismatch indeed
09:50Yuri6037: and COS 8 only has libva2.5 no ways to get 1.X
09:51RSpliet: Grab the v2.5 branch
09:54Yuri6037: Fnally got it built but things are not better I get a nice big init error there are no nouveau driver apparently
09:54Yuri6037: libva info: VA-API version 1.5.0libva info: va_getDriverName() returns 0libva info: Trying to open /usr/lib64/dri/nouveau_drv_video.solibva info: va_openDriver() returns -1vaInitialize failed with error code -1 (unknown libva error),exit
09:55Yuri6037: Here as pastebin: https://pastebin.com/h0Chn22U
09:55tagr: karolherbst: one thing that I don't quite understand yet is why X seems to work fine when the force-linear patch is reverted
09:56Yuri6037: So it's like if nouveau was not installed
09:56tagr: karolherbst: I think that would require for Nouveau to end up allocating a pitch-linear buffer in that case because otherwise Tegra DRM wouldn't be able to properly display it
09:57karolherbst: yeah.. might be
09:57karolherbst: what's interesting is, that nouveau doesn't always allocate a linear buffer when the scanout flag is set
09:57tagr: karolherbst: but if that's the case, then I wonder if perhaps we're just doing something wrong in the force-linear case because that also ends up allocating a pitch-linear buffer that we can then scan out
09:58Yuri6037: Interesting when going to the advertised path: nouveau_drv_video.so does indeed not exist but theres instead a nvidia_drv_video.so why?
09:58tagr: so I think perhaps the reason why glxgears doesn't work in the force-linear case is because we end up allocating some weird type of render/depth buffer combination that cause the errors whereas without that patch it ends up allocating a variant that works fine and which is still linear
09:59karolherbst: tagr: so what I figured out the difference was that tegra sets the linear modifier when bind & scanout, nvc0 only when bind & PIPE_BIND_LINEAR
09:59karolherbst: I don't know how much that matters though
10:01karolherbst: it might also just that nouveau "just works by chance" and it just works out in the end
10:01karolherbst: imirkin also mentioned that it looks a bit strange that we don't always do it for scanout as well
10:03Yuri6037: RSpliet: Do you know why theres no nouveau_drv_video.so?
10:10tagr: karolherbst: well, I think it makes a lot of sense to not do pitch-linear for scanout if you don't have to
10:10tagr: karolherbst: my understanding is that rendering to a pitch-linear buffer is pretty bad for performance
10:11tagr: karolherbst: and if your scanout engine can do any of the block-linear formats that you can render to, then there's no reason to do pitch-linear
10:11tagr: karolherbst: the DRI3 loader has code to set __DRI_IMAGE_USE_LINEAR when you're rendering on a different GPU than you're scanning out
10:13tagr: if you don't have framebuffer modifiers, then that's exactly the case where you need it because you have to assume that at least everyone will understand pitch-linear
10:15tagr: so I suppose one more thing I could try would be to instead of allocating with only DRM_FORMAT_MOD_LINEAR, do a regular resource_create() but with bind |= PIPE_BIND_LINEAR;
10:17karolherbst: mhhh, that might work actually
10:19tagr: yeah... it should avoid the need to do a de-tiling blit as well
10:20tagr: not fully optimal because it'd still be rendering to a pitch-linear buffer, but I'd expect "render to pitch-linear" to still be more efficient than "render to block-linear & blit to pitch-linear"
10:20tagr: but it should be the best we can do under the circumstances
10:20tagr: the best-case is still going to be full framebuffer modifiers support
10:22AndrewR: Yuri6037, probably because mesa was not compiled with va state tracker ?
10:22karolherbst: pmoreau: "../meson.build:1461:4: ERROR: Problem encountered: SPIRV-LLVM-Translator requirements on a minimum LLVM version (>= 0.0) are not compatible with existing requirements (>= 3.9.0)." :p
10:23karolherbst: pmoreau: do you have pending local changes? otherwise I'll fix and push
10:24Yuri6037: ok so in fact I think I'm screwed then mesa comes with COS8... Which would mean the COS8 team made all they could so that you cannot use VAAPI nice...
10:24Yuri6037: making graphics cards useless
10:25AndrewR: Yuri6037, they probably do this due to stability/API issues ....
10:26Yuri6037: then explain why COS7 had it?
10:26AndrewR: Yuri6037, you probably can try to edit spec and rebuild mesa ....
10:27karolherbst: tagr: I wouldn't say that render to pitch linear would be more efficient if you render a lot but only blit once, but oh well... the future is modifier aware anyway and if you care about perf you can probably just enable modifiers
10:27AndrewR: Yuri6037, they tried it, and get drowned in bug reports ?
10:27Yuri6037: yeah and then it's going to need other libraries and even maybe recompile the kernel, I'm not sure if I want to spend monthes at trying to rebuilt all the libraries that comes with COS8
10:28AndrewR: Yuri6037, I have Slackware intsall with similar card now, some VA-api applications works (old mplayer va-api), and CinelerraGG (with ffmpeg git) doesn't work (it works with vdpau, with hack in mesa and artefacts ..)
10:29AndrewR: Yuri6037, yeah, graphics stuff can be quite bleeding-edge ...
10:42tagr: karolherbst: huh... interesting
10:42tagr: looks like glxgears ends up allocating its render buffer using PIPE_BIND_SCANOUT | PIPE_BIND_SHARED
10:42tagr: even though it's not actually a buffer that's scanned out directly but rather composited onto the screen
10:45karolherbst: pmoreau: too late, now I will fix all the review :p
10:56karolherbst: pmoreau: ehh.. that one patch is just overly complicated...
11:12karolherbst: pmoreau: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/c77aadc1d86a766e477a02248656ff859b6b20ca
11:12karolherbst: what do you think?
11:12karolherbst: that block kind of seemed pointless
11:13karolherbst: tagr: I guess that's for legacy reasons :p
11:20tagr: karolherbst: this seems to fix the glxgears crash for me: http://ix.io/2uwq
11:20karolherbst: pmoreau: pushed it :) this is now super clean and quite simple
11:21karolherbst: what you were doing was already good, but the one block checking llvm req vs llvm req wasn't needed at all
11:21karolherbst: it would have just caused an error when eg radv moved to llvm 10
11:21AndrewR: karolherbst, I think I saw something in git history about multiple LLVM versions installed on same system ....
11:21karolherbst: without actually breaking anything
11:22karolherbst: AndrewR: sure, but the block was broken as it was a _req_ check, not a dependency check
11:22karolherbst: so if radv would have requested 10, meson would error
11:22karolherbst: no matter what was installed
11:22AndrewR: karolherbst, ok ... (meson still 'new' for me)
11:24karolherbst: and in case you have multuple versions of llvm installed, you still have just one llvm-config which gets picked up
11:24karolherbst: meson can't do any magic here
11:48karolherbst: tagr: heh... but I doubt we could do that..
11:48karolherbst: or maybe we can? dunno :p
11:49tagr: karolherbst: can you give the above patch a try? that fixes all the use-cases for me
11:49tagr: karolherbst: it's entirely possible that we can't do that for some reason
11:49tagr: but if we can't, I don't think we can reasonably fix this problem
11:50tagr: that's because glxgears ends up allocating its buffers via the dri3 loader and that in turn unconditionally passes USE_SHARE | USE_SCANOUT
11:50tagr: and I don't think that's actually correct, because as far as I know, none of the glx buffers actually are directly scanned out
11:50tagr: isn't it the job of the X server to composite them onto the screen?
11:51karolherbst: you don't have to composite
11:52tagr: how would you do it otherwise? X is always going to have to display the root window and applications on top, no?
11:53karolherbst: I am not 100% on how all of this works but I'd assume there are good reasons those flags got added, but maybe not? I am just very reluctant messing around with the loader
11:53karolherbst: maybe daniels knows?
11:54tagr: okay, I'll reply with my findings to the thread on dri-devel mailing list
11:54tagr: karolherbst: thanks!
11:55karolherbst: I will check why those flags got added
11:55daniels: I don't do DRI3 :P
11:56daniels: I can't tell you why it was added, but I can guess that fullscreen performance was the most important thing, so it sets USE_SCANOUT in the hope it'll be that
11:56karolherbst: so a fullscreen application can directly scanout in X?
11:57karolherbst: applications can do disable compositing and stuff
11:57karolherbst: tagr: mind checking if glxgears -fullscreen works?
11:57karolherbst: or is behaving differently?
12:07tagr: karolherbst: hm... I can't seem to get glxgears to run completely standalone
12:08tagr: karolherbst: if I do xinit glxgears -fullscreen it always starts an xterm along with it
12:08tagr: and it doesn't actually run in fullscreen mode
12:08karolherbst: maybe just run "Xorg" and do DISPLAY=:0 glxgears -fullscreen in another session?
12:10tagr: karolherbst: yes, that works
12:10tagr: doesn't run exactly fast, but works
12:11karolherbst: does it run faster with those flags added back?
12:11karolherbst: or I guess it will just crash then
12:11karolherbst: let me play around with it a little
12:11tagr: yeah, it'll just crash in that case
12:13tagr: I can't see anything in the git log as to why this was added, it's always been there in the dri3 loader helpers, as far as I can tell
12:16karolherbst: ehh... mhh
12:16karolherbst: right.. reclocking is broken on the nano :/
12:17karolherbst: or volting or whatever.. :D
12:17tagr: yeah... I think there's support for reclocking on the Jetson TK1 but not on anything newer
12:17karolherbst: let's see how high I can go without breaking
12:17karolherbst: it kind of works
12:17karolherbst: but.. not that great
12:18karolherbst: probably doesn't help that the volting stuff is just broken on my SKU :)
12:18tagr: karolherbst: recent kernels do support EMC frequency scaling, which should remove the memory as bottleneck
12:19karolherbst: I got aroun 33 fps at FHD at least
12:19karolherbst: tagr: but I also have this error "Tegra210: unknown SKU 0x8f" and I think it messes up the speedo selection
12:19karolherbst: also "tegra-i2c 546c0000.i2c: missing controller clock" mhhh
12:20karolherbst: oh well
12:21karolherbst: btw.. HDMI is also a bit broken :(
12:21karolherbst: sometimes it just doens't work
12:21tagr: glxgears at 2160p runs at ~12 fps for me
12:22karolherbst: I doubt glxgears would be able to bottleneck on the gpu speed anyway
12:22tagr: at 1080p I can get it to 60 fps (probably more) with 1.6 GHz memory clock
12:23karolherbst: how can I increase the memory clock?
12:23tagr: how does HDMI "not work"? I haven't seen any issues with it since a very long time
12:23tagr: karolherbst: you can't =P
12:23karolherbst: just stays black until I modeset again
12:23karolherbst: tagr: :/
12:23tagr: karolherbst: the joys of a closed source firmware and long release cycles
12:24karolherbst: mhh.. yeah.. reclocking is busted :/
12:24karolherbst: the higher clock does increase perf a little though
12:24karolherbst: so the FECS is diying :/
12:24tagr: karolherbst: so basically there's a kernel driver to do the memory frequency scaling, but it requires tables to be passed to the kernel via DTB from the firmware
12:24karolherbst: maybe that's also because of high framerate...
12:24karolherbst: sounds annoying
12:25tagr: and of course our downstream firmware did this in a completely non-standard way, so I've had to add that, but it's proving a bit difficult to get any traction to integrate that into a release
12:25tagr: what with Tegra210 being a very old chip by now...
12:26karolherbst: mhh.. I could set up the engine idle counters and see if it affects anything..
12:26karolherbst: but I think this fullscreen perf thing could be a concern
12:28tagr: agreed, especially since the chips supports up to 2160p
12:29tagr: I don't think memory bandwidth is always the issue, I think we also need to address reclocking
12:29tagr: I vaguely recall that it might have worked at some point, but I might be confusing with Jetson TK1
12:30tagr: I do recall running things like ioquake, dhewm3 and SeriousEngine at fairly okay framerates back when I originally merged this Mesa driver
12:30tagr: but that could've been Jetson TK1, not sure
12:34karolherbst: I think it worked
12:34karolherbst: I used to run at like the third highest clock
12:34karolherbst: but there were always random issues It hink?
12:35tagr: hmm... yeah, highest clock rate doesn't work for me
12:35tagr: there's something called DVFS that we don't do pretty much at all upstream
12:35karolherbst: but that doesn't really cause issues
12:35karolherbst: or not.. directly
12:36tagr: that's "dynamic voltage and frequency scaling" and the idea is to scale the voltage of different rail depending on what the frequency is of the components powered by those rails
12:36karolherbst: ehh, you mean the basic volting thing?
12:36karolherbst: right, that is required :p
12:36karolherbst: and we do it on non tegra gpus
12:36tagr: if I switch to the highest rate, it starts erroring out pretty badly with things like this: [ 6896.090299] nouveau 57000000.gpu: bus: MMIO read of 00000000 FAULT at 100c80 [ TIMEOUT ]
12:37tagr: I think that's basically a GPU hang
12:37tagr: I think on Tegra the DVFS thing works differently from how it's done on the dGPU
12:38RSpliet: tagr: Main difference is not having to worry about the DRAM clock I believe?
12:40tagr: RSpliet: yeah, since there's no dedicated video memory, the DRAM clock is basically the EMC clock and we handle that separately
12:41tagr: more centrally because it needs to take into account things like display, USB, SATA, PCIe, ...
12:41tagr: but the other big difference is that there are no on-GPU knobs to control the voltage for the GPU, that's all done in a separate IC (usually some type of I2C- or PWM-controlled regulator)
12:43RSpliet: Ah, yeah. I always interpreted that as "less work for me", but I guess nouveau would have to interact with the pinctl driver or whatever handles that these days
12:43RSpliet: if it has specific requests
12:43tagr: we do wire up that regulator in DT, and Nouveau also seems to be using it, but I don't know if it's perhaps missing some other information (perhaps some sort of frequency -> voltage mapping) to know what to set
12:44tagr: nvkm/subdev/volt/gk20a.c has that implementation
12:44RSpliet: Yeah, possibly. I think the upstream tegra driver had that info hard-coded
12:45RSpliet: There's no pesky OEM in the middle :-P
12:45tagr: karolherbst: running at pstate 5 gives me roughly 50 fps at 2160p with glxgears -fullscreen, so that's pretty okay
12:45karolherbst: tagr: I am actually more concerned about high end systems where this could be an issue :D
12:45karolherbst: but yeah...
12:46karolherbst: better to discuss that upstream and see what people think
12:47tagr: karolherbst: from what I can tell, all the code paths are actually the same with or without the patch, as far as I can tell
12:48tagr: oh... hang on... I see these: [ 572.705917] [drm:drm_internal_framebuffer_create [drm]] could not create framebuffer
12:49tagr: need to look into that, looks like it's trying to create a framebuffer, for each frame, which means it's likely trying to page-flip rather than composite
12:54tagr: ah... oh boy...
13:18tagr: hm... so that framebuffer creation failure seems to be because we're not properly exporting the resources and then the modesetting driver can't page-flip to it, which I think will cause it to composite on top of the root window
13:19tagr: but this also means that for some cases, we do indeed end up with needing __DRI_IMAGE_USE_SCANOUT
13:19tagr: but we don't really have a way of distinguishing between the two cases
14:43karolherbst: somebody mind taking a look at the nouveau patches in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6367 ?
14:44karolherbst: pmoreau maybe?
14:44karolherbst: with that a bunch of stuff works now ;)
14:45karolherbst: tagr: yeah.. I honestly don't know what to do here. I think requiring modifier aware clients _might_ be a solution, but today it just breaks too much :/
14:45karolherbst: but I also don't really see a good way of fixing it
14:45karolherbst: the shadow buffer thing might be an idea, but probably also very annoying
14:46karolherbst: but I think if we try to render into tiled buffers as much as possible and only blit to linear in the end that's probably prefered over always rendering into linear