09:31jb0IEUQ5aTx: whether booting from a live-media, or from a freshly-installed Ubuntu disk, we wanted to know if the default video/graphics driver loaded on machines with NVIDIA graphics cards is the free-software nouveau driver or the proprietary nvidia driver?
11:40karolherbst: jb0IEUQ5aTx: no clue, that's not up to us to decide
11:42jb0IEUQ5aTx: ok, if we needed to use ffmpeg to hardware-accelrated-encode using the free-software nouveau driver versus proprietary-nvidia/nvenc, what is hardware-accelrated encoder to use (instead of *_nvenc)?
11:47karolherbst: nouveau doesn't support hw accelerated video encoding
11:50jb0IEUQ5aTx: is that because Nvidia does not provides documentation about their GPUs - making it much-harder for the nouveau developers?
11:50RSpliet: Plus bigger fish to fry, plus a tiny team
11:51RSpliet: Oh and lack of redistributable firmware for the hw encoder
11:52karolherbst: and more important things to work on than reverse engineering the hardware encoder sadly
11:56jb0IEUQ5aTx: unbelievable that Nvidia has not changed their posture on this issue for over a decade now...
11:56karolherbst: they have, but just a little
11:56karolherbst: video encoding is just not important enough so there were other things to focus on
11:57karolherbst: they do release firmware for enabling hardware acceleration in general (for OpenGL, CL, etc...) or do provide basic documentation now: https://github.com/NVIDIA/open-gpu-doc
11:58karolherbst: so.. maybe in the future it will be better? who knows
11:59karolherbst: ohh actually, there is some nvdec docs: https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/video/nvdec_drv.h
11:59karolherbst: but without the firmware....
12:06jb0IEUQ5aTx: right, but they even shortchange their proprietary driver users - by allowing only up to 2 (or 3) simultaneous hardware-accelrated encodings (apparently, unless you spend thousands for their higher-end cards): https://github.com/keylase/nvidia-patch
12:27karolherbst: jb0IEUQ5aTx: it's actually a hw limitation
12:28karolherbst: so the only GPUs with 7 instead of 3 or 1 nvenc engines are gp100 and gv100
12:28HdkR: that github repo is fun since it removes some of the limitations. Depending on the hardware's capabilities of course
12:28karolherbst: not sure if the engines can do multiple jobs in parallel though
12:29karolherbst: but as far as we know only gv100 and gp100 have more than 3 nvenc engines
12:30HdkR: That's the issue yea. Parallel encodes are software limited for whatever reason :|
12:32HdkR: Limit on number of encodes doesn't make much sense since it's just a product of how hard you're pushing the encoders. I can only assume it is to reduce consumer confusion
12:32HdkR: But 3 encodes at 8k != 3 encodes at 240p :P
12:32karolherbst: right sure..
12:33karolherbst: but encoding 240p is also super fast anyway
12:33karolherbst: it might be that you will be able to fill in idle time with multiple encodings though
12:33karolherbst: like e.g. streaming
12:34HdkR: Which is likely why Quadro line gives you an unrestricted amount of encodes. Up to the customer to find the upper limit
12:34jb0IEUQ5aTx: right, all encodes are not the same (especially given the resolution, bitrate, etc.)... so, it does not make sense to artificially-limit the number...
12:34jb0IEUQ5aTx: According to https://github.com/Livepool-io/transcoder/issues/11 and https://www.youtube.com/watch?v=0fxu7zbhmrs , applying the said patch supposedly allows for atleast slightly increased number of hardware-accelerated encoding sessions. (The same issue applies to nvenc on both GNU/Linux and Windows)... can't be sure, as we haven't tried it out...
12:34karolherbst: the question is rather, if you do multiple encodings, is it actually faster overall or not.. but again.. for real time streaming it's a different problem in the first place
12:36karolherbst: the issue is mainl, that only those expensive GPUs have more than 3 engines, so hard to tell if it's a limitation on GeForce GPUs or they thought that they limit to the amount of available engines
12:37karolherbst: but it seems like that you are really only able to fill in idle time with this
12:38HdkR: Quadro parts will end up just evenly scaling the encode sessions over the hardware blocks as far as I'm aware
12:38karolherbst: yeah.. I'd assume the same
12:38karolherbst: still better for some use cases though
12:38karolherbst: the most silly limitation we found was power consumption reporting only for quadro cards :D
12:38karolherbst: that has like no hw reason
12:38karolherbst: *had
12:38HdkR: Definitely. If you actually need a wackload of encoders then Quadro still makes sense
12:39jb0IEUQ5aTx: well, we did use 2 simultaneous 8192x4320 5fps encodings using hevc_nvenc, and it had a significant performance improvement taking up a total of only 10% of the CPU - even one 8192x4320 5fps software encoding using libx265 would take up 80% of the CPU (and all of the CPU threads).
12:39jb0IEUQ5aTx: the problem is adding a third simultaneous 1280x720 encoding using a webcamera would fail with that error: "OpenEncodeSessionEx failed: out of memory" even though the "nvidia-smi" command-line utility that GPU memory was not really the issue...
12:39karolherbst: jb0IEUQ5aTx: yeah.. well... 8k encoding needs some memory
12:39karolherbst: jb0IEUQ5aTx: on wha GPU btw?
12:39karolherbst: *what
12:40HdkR: Hopefully Turing or better now. NVENC on the newest chips is great :D
12:40karolherbst: it's very fast.. yes
12:40karolherbst: I noticed it myself
12:40jb0IEUQ5aTx: Quadro P1000 - its one of the lower-end workstation cards...
12:40karolherbst: jb0IEUQ5aTx: yeah soo.. the p1000 is a gp107 which has 3 nvenc engines
12:41HdkR: GP100 also only has 3 nvenc engines, but has unlimited concurrent sessions ;)
12:41karolherbst: so doing two rather than just one encode, should give you a significant improvement yes
12:41karolherbst: HdkR: nope, it has 7
12:42HdkR: Does it? Nvidia's official documentation only claims 3
12:42karolherbst: at least from a driver pov
12:42karolherbst: HdkR: skeggsb_ set the limit to 7
12:42HdkR: huh
12:42karolherbst: maybe it's 3 in like real hw and 7 from a driver pob
12:43karolherbst: *pov
12:43karolherbst: who knows
12:43HdkR: Could be
12:43HdkR: https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new Is supposed to be the documentation on these
12:44karolherbst: HdkR: there is no gp100 on that list?
12:44karolherbst: ohhh
12:44karolherbst: tabs...
12:44karolherbst: HdkR: weird.. gv100 should also have 7...
12:44HdkR: I guess it doesn't take in to account that the newer engines just are more capable
12:45karolherbst: HdkR: odd is the Tesla M10 with 4
12:45karolherbst: as gm107 usually only has 1....
12:45karolherbst: ohhh
12:45karolherbst: 4 chips...
12:45karolherbst: maybe nvidia knows something we don't...
12:46karolherbst: we also claim 3 for gm200 where nvidia claims 2
12:46HdkR: Was M10 the GPU that was abused for GRID? Could explain why it had more independent encoders
12:48HdkR: Old GPUs, hard to remember details about :P
12:48karolherbst: HdkR: no, the M10 is just 4 GPUs
12:48HdkR: oh hah, right.
12:53HdkR: Nvidia's market segmenetation strategy is just a bit grating for consumers that know the hardware is the same. I don't think that will ever change :)
13:21jb0IEUQ5aTx: According to https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new#collapseOne , P1000 is supposed to support 3 concurrent sessions, but can not seem to achieve that... only 2 8K 5fps using "-c:v hevc_nvenc -preset fast -tune ull -zerolatency 1" for now... even if the third encoding is as low as 720P 5fps...
13:22jb0IEUQ5aTx: does anyone have any experience using Radeon/AMDGPU drivers for hardware-accelerated video encoding - if so, how does it compare to using nvenc (both in terms of quality and also concurrent sessions)?
13:25HdkR: Internet claims latest generation of nvenc wipes the floor with AMD encoder
13:25HdkR: in terms of quality
13:25HdkR: no idea about concurrent sessions. I don't do video encoding
13:26jb0IEUQ5aTx: where did u see the comparison?
13:30HdkR: Bunch of random news posts and reviewer videos