00:23dwlsalmeida[d]: -exec p *status
00:23dwlsalmeida[d]: $3 = {
00:23dwlsalmeida[d]: mbs_correctly_decoded = 300,
00:23dwlsalmeida[d]: mbs_in_error = 0,
00:23dwlsalmeida[d]: reserved = 8470,
00:23dwlsalmeida[d]: error_status = 0,
00:23dwlsalmeida[d]: ^ `struct _nvdec_status_s`
00:24dwlsalmeida[d]: this `reserved` field is apparently the number of cicles it took to decode the last request
00:24dwlsalmeida[d]: as I expected, error is 0
00:24dwlsalmeida[d]: :/
01:58asdqueerfromeu[d]: avhe[d]: So could NVDEC encode PS2-compatible video? 🧓
01:59asdqueerfromeu[d]: *NVENC
14:37asdqueerfromeu[d]: I just broke zink + NVK :cursedgears:
14:40asdqueerfromeu[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1303730685768372235/message.txt?ex=672cd151&is=672b7fd1&hm=8d3a77a6152d25ee82cd855f1a17e5ca49665d6ddbfd018a1a1ddb48fb47c014&
14:55dwlsalmeida[d]: asdqueerfromeu[d]: debugging time? 😄
14:55asdqueerfromeu[d]: dwlsalmeida[d]: I'm not sure I can do much because the compositor froze
14:59dwlsalmeida[d]: skeggsb9778: skeggsb9778 hey, if you have the time, I'd like to take you up on your offer. It's getting hard to figure out what's going on because `_nvdec_status_s` is a bit useless. I keep getting the channel killed even though I can get some frames decoded, and maybe that's what I should tackle next
15:01dwlsalmeida[d]: If you need any help, I volunteer, i.e.: to help polish some decoder tool used to parse the dump or something like this
16:52airlied[d]: They can't release the info to parse the dumps to anyone
17:29dwlsalmeida[d]: Oh they work for NVIDIA? I hadn’t noticed
17:29dwlsalmeida[d]: I thought that was some open source tool
21:11skeggsb9778[d]: dwlsalmeida[d]: sure. if you apply Timur's series (https://patchwork.freedesktop.org/series/140736/) and send me the logrm file from debugfs i can take a look and see if there's anything helpful in there
21:33mhenning[d]: karolherbst[d]: Are there any weird restrictions for LEA.HI? I'm working on using it, but I'm getting really weird results in my hardware test
21:34mhenning[d]: Specifically, I can get different results from the same input
21:35mhenning[d]: So now I'm wondering if .HI needs .X to be valid, or if it has a weird latency, or if the hardware test runner is broken, or what
21:35karolherbst[d]: yeah.. it's funky
21:35karolherbst[d]: but it doesn't need .X
21:36karolherbst[d]: LEA.HI has three sources + the constant
21:36mhenning[d]: Yeah, I'm encoding three sources
21:37karolherbst[d]: the third source are the upper bits of a 64 bit value being (src2, src0)
21:37karolherbst[d]: if src0 is negated, so will src2
21:39mhenning[d]: nvdisasm says I'm encoding `LEA.HI R0, R0, R1, R2, 0x0 ;` If I set R0 = 0, R1 = 0, R2 = 0, then I sometimes get 0 as a result and sometimes 1
21:39karolherbst[d]: btw, the ISCADD is a special LEA variant, in case you want to check codegen
21:39mhenning[d]: Oh, interesting. I'll look at that
21:40karolherbst[d]: ISCADD is LEA with src2 == 0, .LO and the input predicate being false
21:40mhenning[d]: Oh, okay. The version without `.HI` is working how I expect it to
21:40karolherbst[d]: mhenning[d]: mhhhh
21:41karolherbst[d]: what if you disable scoreboarding?
21:41karolherbst[d]: but that would be odd...
21:41mhenning[d]: Yeah, I've tried with NAK_DEBUG=serial
21:41karolherbst[d]: let me see...
21:41mhenning[d]: and also tried treating it as variable latency
21:42karolherbst[d]: ohh yeah, let me check that first 😄
21:42karolherbst[d]: it's a plain alu instruction
21:43karolherbst[d]: mhh.. why do you get 1...
21:45karolherbst[d]: `.X` basically just enables the input predicate...
21:46karolherbst[d]: try with .X? though that might happen if you set the input predicate automatically 🙃
21:46mhenning[d]: Yeah, I think .X also changes the source modifiers from integer negate to bitwise not, like the add.x variants
21:47karolherbst[d]: indeed
21:47mhenning[d]: But yeah, I guess I'll try .HI.X
21:47karolherbst[d]: but yeah.. it's fixed latency...
21:48mhenning[d]: Do you know what that latency is?
21:48karolherbst[d]: below what nak uses
21:48karolherbst[d]: but .HI instructions are a bit weird...
21:49karolherbst[d]: but with serial you shouldn't run into any issues, sooo...
21:49mhenning[d]: Yeah
21:50karolherbst[d]: .HI could simply be broken, because I don't even know why you'd use it
21:51karolherbst[d]: apparently for some weird shift + add things, but...
21:52mhenning[d]: Yeah, I mean .HI.X does make sense to me for 64-bit LEA
21:52karolherbst[d]: yeap
21:52karolherbst[d]: &yeah
21:53karolherbst[d]: you can use LEA.HI to implement rotate, but...
21:53karolherbst[d]: I think it's fair to assume it's broken 🙃
21:54karolherbst[d]: I've seen nvidia using LEA.HI.X so...
21:54mhenning[d]: Yeah, I'll start hacking on .X
21:54karolherbst[d]: planning to make use of alyssas MR adding lea?
21:55mhenning[d]: Oh, I hadn't actually seen that yet
21:57karolherbst[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964
21:57karolherbst[d]: their lea is pretty similar, just that it gives you a 64 bit result with a 32bit add and a constant shift
21:59karolherbst[d]: in terms of ssbo address calculation it's useful to know if the base address can't overflow
21:59karolherbst[d]: soo... could even do it in one instruction even on nvidia
22:00karolherbst[d]: but I think that relies on buffer alignments
22:24mhenning[d]: karolherbst[d]: Ugh, okay I fixed it. My test program was broken so I wasn't feeding in the inputs I thought I was
22:25karolherbst[d]: oh wow