IRC Logs of #nouveau on irc.freenode.net for 2024-12-09

10:43 ahuillet[d]: dwlsalmeida[d]: if I am looking at the right thing, this is a bitfield
10:45 ahuillet[d]: one of these is "MB_SYNTAX" (your guess is as good as mine), the other one I need to figure out, but overall I don't think it tells you all that much. what video format are you decoding, is that H264?
10:46 ahuillet[d]: if so - this seems to happen for a VLD error. your guess is as good as mine, again. I hope this helps. I can probably answer more specific questions.
10:48 ahuillet[d]: this isn't inconsistent, glancing at this code I do not know, with the notion that your buffer of input data is too small.
10:51 notthatclippy[d]: ahuillet[d]: Is it not MISSING_SLICE?
10:51 notthatclippy[d]: (assuming it's 0x40)
10:52 ahuillet[d]: I assumed it was 40 not 0x40
10:52 ahuillet[d]: which is consistent with what I saw - bits 3 and 5 *are* set together in the path I looked at and commented on.
10:55 notthatclippy[d]: Fair enough. FWIW, 0x40 is MISSING_SLICE and it happens if you run out of data without hitting the EndOfPicture marker.
11:34 ahuillet[d]: Maybe 0x40 is the right thing, I have no idea! :) Seems probable too.
13:30 dwlsalmeida[d]: it's decimal 40
13:31 dwlsalmeida[d]: ahuillet[d]: that's h.264 yes, I was hoping NVIDIA would have a header file with some constants describing this field
13:32 ahuillet[d]: well, we have a header with names that don't say all that much
13:32 ahuillet[d]: the two bits are "MB_SYNTAX" and... I forgot, "EC_DONE" I think. can you guess anything from these names?
13:47 ahuillet[d]: I don't know this part of the GPU or driver at all, from a quick glance it seems that we set these errors when we get certain interrupts, in particular for VLD errors such as having too small a buffer.
13:56 notthatclippy[d]: This is all set by the nvdec microcode and is handed back in the status structure as part of a job submission.
13:57 notthatclippy[d]: The ucode is embedded in gsp.bin and is loaded onto the nvdec core automatically as part of the boot process.
13:57 notthatclippy[d]: (Well.. "the ucode" - there's probably a dozen different version of nvdec ucodes, but GSP knows which one to load and the others are discarded)
14:06 notthatclippy[d]: Which GPU is this on, btw?
14:35 notthatclippy[d]: Looking more into this, looks like you're out of luck with regards to what you can get here. Decoding got stuck for whatever reason and was killed by a watchdog. No way to get the actual reason, at least not without specialized NV-internal hardware.
14:37 karolherbst[d]: "specialized NV-internal hardware" I'm interested, please tell us more 😄
14:39 notthatclippy[d]: I was being stupid vague for the purpose of onlookers, but really just debug fused boards that have different decryption keys and can load debuggable ucode.
14:39 HdkR: :eyes:
14:39 karolherbst[d]: ohh, right
14:39 karolherbst[d]: yeah, that stuff is known to exist
14:40 notthatclippy[d]: It's standard industry practice AFAIK.
14:40 karolherbst[d]: pre GSP you can clearly see in the header that there are two signatures 🙃
14:40 karolherbst[d]: yeah
14:40 karolherbst[d]: it makes total sense
14:41 notthatclippy[d]: You can still see it in the GSP payload. It might be compressed so a simple mem search won't work, but it's not encrypted at that level.
14:41 karolherbst[d]: I'm hopeful that long-term we might even get our hands on it if nvidia is willing to help out even more, lol
14:41 notthatclippy[d]: Even I don't have one!
14:42 karolherbst[d]: they are that rare?
14:43 karolherbst[d]: oh well...
14:46 notthatclippy[d]: One thing that could possibly give some insight into what the issue is would be reading the nvdec's mbox0/mbox1 registers when the error happens. Unfortunately, that IRQ is routed to GSP, so you'd have to try it with GSP off to get your IRQ delivered to nouveau, and then print it there. It's a lot of effort, and the most it would tell you is whether it was a "buffer empty"/"ran out of data" type
14:46 notthatclippy[d]: of error, or a "bad data" type, but not much finer grain.
14:47 karolherbst[d]: yeah.. and I think GSP itself even has some logging functionality we could use in theory, no?
14:48 notthatclippy[d]: Yes, but no. I had a few other ideas here, but all the code is treating this as an app error (bad API use or bad data) and thus all the logging is "INFO" which means it's compiled out on all release builds. Both on gsp.bin and also on the old style blobby nvidia.ko
14:50 notthatclippy[d]: I can't think of anything we (NV) can do reasonably quickly/cheaply to figure this out, sorry. And I don't think we can go up our mgmt chain for approval on a special gsp.bin drop or anything for this.
14:51 karolherbst[d]: yeah... fair
14:55 notthatclippy[d]: Overall fastest would probably be for someone to fetch Daniel's tree and reproduce internally, then use a custom GSP build to decode. But that's too much work for anyone to do on a whim at expense of actual staffed work we're doing, so it again needs to go through the proper channels.
14:56 notthatclippy[d]: Sorry. I hoped it'd be something we can do quickly and save you some time, we're always glad to help on those.
15:04 dwlsalmeida[d]: notthatclippy[d]: I am using a RTX2060, thanks for all the help
15:05 dwlsalmeida[d]: I mean, as I said, I managed to fix that by telling the GPU that there is more data than there actually is in the buffer
15:05 dwlsalmeida[d]: in other words, this particular issue is fixable, I will probably get some more insight by dumping the buffer's contents
15:06 dwlsalmeida[d]: when I had asked skeggsb9778 initially, my hope was to get a header file with some more `#defines` with the error codes, but as ahuillet[d] said above, this does not exist
15:16 notthatclippy[d]: Publishing these is a formal process, so we can't just give you a header. But as we said, in this case, the errors are just a couple of bits that can be set individually. There's no 40+ entries error enum or something.
15:19 avhe[d]: so if the result field in the status structure is a bitfield, what do those values in the class header correspond to? <https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/video/clc9b0.h#L407-L598>
15:19 avhe[d]: or maybe they're not possible to encounter?
15:19 karolherbst[d]: dwlsalmeida[d]: maybe the size needs to be properly aligned?
15:20 dwlsalmeida[d]: karolherbst[d]: it already is
15:22 dwlsalmeida[d]: interestingly, it's only a few frames which are affected, and the errors are not really human-visible
15:22 dwlsalmeida[d]: we're talking 10 bad macroblocks in a 1080p video
15:22 karolherbst[d]: mhhh
15:23 avhe[d]: there's no alignment requirement for the bitstream data that i know of, however on tegra they append an end-of-stream sequence to the bitstream, but on discrete that EOS is written in the metadata structure, along with a bit to signal this
15:23 dwlsalmeida[d]: hmmm, wait
15:23 dwlsalmeida[d]: I just disregard this entirely
15:24 dwlsalmeida[d]: I didn't know that EOS field was actually being read by nvdec
15:24 avhe[d]: (this is only applicable to codec from the MPEG family, vp8/9 and presumably av1 don't do this)
15:24 avhe[d]: dwlsalmeida[d]: ¯\_(ツ)_/¯
15:24 avhe[d]: your guess as to what the sequence is used for is as good as mine
15:24 karolherbst[d]: well.. better to add it just in case
15:25 dwlsalmeida[d]: lets switch the `explicit EOS flag` thing to 1, and copy the value being passed by the blob in the `eos` field, see what happens
15:25 avhe[d]: you can try, but the you need to set this <https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/video/nvdec_drv.h#L877> to 0, and add 16 to the bitstream size
15:26 avhe[d]: <https://github.com/averne/FFmpeg/blob/master/libavcodec/nvtegra_h264.c#L465>
15:26 dwlsalmeida[d]: this last one is a 404
15:27 avhe[d]: yeah i fixed it
15:27 dwlsalmeida[d]: hey this might actually be the solution,
15:27 dwlsalmeida[d]: I currently set this to 0
15:27 dwlsalmeida[d]: but this last part is missing: ` sizeof(bitstream_end_sequence);`
15:27 avhe[d]: that might actually be a problem indeed
15:28 dwlsalmeida[d]: which tracks, because once you add a value to stream_len, then it works
15:28 dwlsalmeida[d]: (only that I am adding a random value, 0x100, not `sizeof(bitstream_end_sequence)`
15:29 karolherbst[d]: looks like two repeating 32 bit ints 🙃
15:29 karolherbst[d]: uhm...
15:29 karolherbst[d]: 64
15:30 avhe[d]: yeah, they all have that format
15:30 dwlsalmeida[d]: avhe[d]: hey btw I think the value you computed for this was slightly off:
15:30 dwlsalmeida[d]: `history_size = FFALIGN(width_in_mbs * 0x200 + 0x1100, 0x200);`
15:30 dwlsalmeida[d]: if you change that to `align(width_in_mbs * 0x300, 0x200)`, it seems to match with the blob
15:30 dwlsalmeida[d]: where did you take this 0x1100 from ?
15:31 avhe[d]: tegra code i reversed
15:31 avhe[d]: i can double check the decomp this evening
15:32 notthatclippy[d]: avhe[d]: IIUC these are (some of) the possible values that you'd find in mbox0 after the error IRQ.
15:35 notthatclippy[d]: (by 'some of' I don't mean that the published data is incomplete. It's more that there's gaps in there that may be used by the ucodes to pass additional specific errors in an unstable way. In this case you'd get 0x27 which is just arbitrarily picked by SW)
15:38 notthatclippy[d]: BPF should be able read GPU's BAR0, right? Maybe you could be polling on these regs on the side? Or patch up nouveau to do it/expose to userspace?
15:42 avhe[d]: karolherbst[d]: as far as i can tell these are codec-specific end-of-sequence startcodes
15:42 avhe[d]: <https://github.com/averne/FFmpeg/blob/nvtegra/libavcodec/mpeg12.h#L28>
15:42 avhe[d]: <https://github.com/averne/FFmpeg/blob/nvtegra/libavcodec/mpeg4videodec.c#L3524>
15:42 avhe[d]: <https://github.com/averne/FFmpeg/blob/nvtegra/libavcodec/vc1_common.h#L35>
15:43 karolherbst[d]: ohh huh
15:43 avhe[d]: i couldn't find the h264 code (maybe ffmpeg just doesn't have it listed) but it tracks for the others so i'm going by that
15:44 avhe[d]: notthatclippy[d]: interesting, thanks for the info. i can't say i've ever seen one of these error codes but i also never tried poking at very low-level nvdec stuff
18:17 avhe[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1315744091996422214/image.png?ex=675885ac&is=6757342c&hm=540db9373d3aa7e1290e83a4382b91ad91093c2d623a41e25a57d991e25d6f48&
18:17 avhe[d]: dwlsalmeida[d]:
18:17 avhe[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1315744160049008651/image.png?ex=675885bd&is=6757343d&hm=8c90b6162db0541eaac4b5af09802439b1ba5b10c6e1fae56ba8da32d86b5dbb&
18:17 avhe[d]: and my decomp is equivalent
19:01 dwlsalmeida[d]: You decompiled the driver?
19:10 tiredchiku[d]: fwiw you can get into trouble for reverse engineering/decompiling the proprietary driver
19:11 avhe[d]: dwlsalmeida[d]: the userland component, yes
19:45 tiredchiku[d]: mesa adopts a clean room policy afaik
21:50 airlied[d]: There are differences between Tegra and Geforce also, no idea what they are but I was warned about it
21:55 HdkR: host1x support woo?
22:15 marysaka[d]: would be nice to get host1x support in nouveau on the kernel side for syncpoints at least
22:15 marysaka[d]: Maybe next time I pull my TX1 devkit to mess with it I will give that another shot
22:21 marysaka[d]: In any cases if my memory is right there is some upstream nvdec kernel driver for tegra (at least X1, X2 and Xavier), would also be cool to see how to integrate with that 😄
22:27 airlied[d]: avhe[d]: did you say you had code to parse that h265 value, I wonder if the vulkan api provides it somehow, it seems unlikely nvidia would have left an obvious screw up like that
22:30 avhe[d]: yeah but since my driver sits in ffmpeg land i have easy access to the bitstream. in fact i just patched ffmpeg's hevc code to avoid parsing slice headers twice
22:31 avhe[d]: i have no idea how they make it work in their vulkan driver
22:32 avhe[d]: however i've seen codepaths in the tegra driver that use the HEVC_PARSER "codec" <https://github.com/NVIDIA/open-gpu-doc/blob/master/classes/video/clc9b0.h#L51>, presumably for encrypted content, that seemed to calculate that value