00:23 dwlsalmeida[d]: -exec p *status
00:23 dwlsalmeida[d]: $3 = {
00:23 dwlsalmeida[d]: mbs_correctly_decoded = 300,
00:23 dwlsalmeida[d]: mbs_in_error = 0,
00:23 dwlsalmeida[d]: reserved = 8470,
00:23 dwlsalmeida[d]: error_status = 0,
00:23 dwlsalmeida[d]: ^ `struct _nvdec_status_s`
00:24 dwlsalmeida[d]: this `reserved` field is apparently the number of cicles it took to decode the last request
00:24 dwlsalmeida[d]: as I expected, error is 0
00:24 dwlsalmeida[d]: :/
01:58 asdqueerfromeu[d]: avhe[d]: So could NVDEC encode PS2-compatible video? 🧓
01:59 asdqueerfromeu[d]: *NVENC
14:37 asdqueerfromeu[d]: I just broke zink + NVK :cursedgears:
14:40 asdqueerfromeu[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1303730685768372235/message.txt?ex=672cd151&is=672b7fd1&hm=8d3a77a6152d25ee82cd855f1a17e5ca49665d6ddbfd018a1a1ddb48fb47c014&
14:55 dwlsalmeida[d]: asdqueerfromeu[d]: debugging time? 😄
14:55 asdqueerfromeu[d]: dwlsalmeida[d]: I'm not sure I can do much because the compositor froze
14:59 dwlsalmeida[d]: skeggsb9778: skeggsb9778 hey, if you have the time, I'd like to take you up on your offer. It's getting hard to figure out what's going on because `_nvdec_status_s` is a bit useless. I keep getting the channel killed even though I can get some frames decoded, and maybe that's what I should tackle next
15:01 dwlsalmeida[d]: If you need any help, I volunteer, i.e.: to help polish some decoder tool used to parse the dump or something like this
16:52 airlied[d]: They can't release the info to parse the dumps to anyone
17:29 dwlsalmeida[d]: Oh they work for NVIDIA? I hadn’t noticed
17:29 dwlsalmeida[d]: I thought that was some open source tool
21:11 skeggsb9778[d]: dwlsalmeida[d]: sure. if you apply Timur's series (https://patchwork.freedesktop.org/series/140736/) and send me the logrm file from debugfs i can take a look and see if there's anything helpful in there
21:33 mhenning[d]: karolherbst[d]: Are there any weird restrictions for LEA.HI? I'm working on using it, but I'm getting really weird results in my hardware test
21:34 mhenning[d]: Specifically, I can get different results from the same input
21:35 mhenning[d]: So now I'm wondering if .HI needs .X to be valid, or if it has a weird latency, or if the hardware test runner is broken, or what
21:35 karolherbst[d]: yeah.. it's funky
21:35 karolherbst[d]: but it doesn't need .X
21:36 karolherbst[d]: LEA.HI has three sources + the constant
21:36 mhenning[d]: Yeah, I'm encoding three sources
21:37 karolherbst[d]: the third source are the upper bits of a 64 bit value being (src2, src0)
21:37 karolherbst[d]: if src0 is negated, so will src2
21:39 mhenning[d]: nvdisasm says I'm encoding `LEA.HI R0, R0, R1, R2, 0x0 ;` If I set R0 = 0, R1 = 0, R2 = 0, then I sometimes get 0 as a result and sometimes 1
21:39 karolherbst[d]: btw, the ISCADD is a special LEA variant, in case you want to check codegen
21:39 mhenning[d]: Oh, interesting. I'll look at that
21:40 karolherbst[d]: ISCADD is LEA with src2 == 0, .LO and the input predicate being false
21:40 mhenning[d]: Oh, okay. The version without `.HI` is working how I expect it to
21:40 karolherbst[d]: mhenning[d]: mhhhh
21:41 karolherbst[d]: what if you disable scoreboarding?
21:41 karolherbst[d]: but that would be odd...
21:41 mhenning[d]: Yeah, I've tried with NAK_DEBUG=serial
21:41 karolherbst[d]: let me see...
21:41 mhenning[d]: and also tried treating it as variable latency
21:42 karolherbst[d]: ohh yeah, let me check that first 😄
21:42 karolherbst[d]: it's a plain alu instruction
21:43 karolherbst[d]: mhh.. why do you get 1...
21:45 karolherbst[d]: `.X` basically just enables the input predicate...
21:46 karolherbst[d]: try with .X? though that might happen if you set the input predicate automatically 🙃
21:46 mhenning[d]: Yeah, I think .X also changes the source modifiers from integer negate to bitwise not, like the add.x variants
21:47 karolherbst[d]: indeed
21:47 mhenning[d]: But yeah, I guess I'll try .HI.X
21:47 karolherbst[d]: but yeah.. it's fixed latency...
21:48 mhenning[d]: Do you know what that latency is?
21:48 karolherbst[d]: below what nak uses
21:48 karolherbst[d]: but .HI instructions are a bit weird...
21:49 karolherbst[d]: but with serial you shouldn't run into any issues, sooo...
21:49 mhenning[d]: Yeah
21:50 karolherbst[d]: .HI could simply be broken, because I don't even know why you'd use it
21:51 karolherbst[d]: apparently for some weird shift + add things, but...
21:52 mhenning[d]: Yeah, I mean .HI.X does make sense to me for 64-bit LEA
21:52 karolherbst[d]: yeap
21:52 karolherbst[d]: &yeah
21:53 karolherbst[d]: you can use LEA.HI to implement rotate, but...
21:53 karolherbst[d]: I think it's fair to assume it's broken 🙃
21:54 karolherbst[d]: I've seen nvidia using LEA.HI.X so...
21:54 mhenning[d]: Yeah, I'll start hacking on .X
21:54 karolherbst[d]: planning to make use of alyssas MR adding lea?
21:55 mhenning[d]: Oh, I hadn't actually seen that yet
21:57 karolherbst[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964
21:57 karolherbst[d]: their lea is pretty similar, just that it gives you a 64 bit result with a 32bit add and a constant shift
21:59 karolherbst[d]: in terms of ssbo address calculation it's useful to know if the base address can't overflow
21:59 karolherbst[d]: soo... could even do it in one instruction even on nvidia
22:00 karolherbst[d]: but I think that relies on buffer alignments
22:24 mhenning[d]: karolherbst[d]: Ugh, okay I fixed it. My test program was broken so I wasn't feeding in the inputs I thought I was
22:25 karolherbst[d]: oh wow