00:19leftmostcat[d]: Addressed conflicts and the bulk of the review.
00:22leftmostcat[d]: Thank you for taking a look.
00:27gfxstrand[d]: snowycoder[d]: go ahead and make a draft MR for images whenever. (Or do you already have one?) I'll push my SM20 patch to it.
01:28airlied[d]: looks like depth needs new alignment or different GOP requirements
01:37gfxstrand[d]: That's not terribly surprising but annoying
01:38gfxstrand[d]: gfxstrand[d]: snowycoder[d] Or you can just grab the top patch off my nak/sm20 branch
02:00airlied[d]: probably gonna need blackwell class headers, looks like some new methods with a w/h in them, I wonder have they separated Z/S
02:01airlied[d]: but a 1280x720 RT/Z are getting set to 1280x768
02:01airlied[d]: so something needs aligning upwards
02:02gfxstrand[d]: Separate Z/S would make sense
02:03airlied[d]: [0x0000001f] HDR 8000054e subch 0 IMMD
02:03airlied[d]: mthd 1538 NVC797_SET_ZT_SELECT
02:03airlied[d]: .TARGET_COUNT = (0x0)
02:03airlied[d]: [0x00000020] HDR 8001054e subch 0 IMMD
02:03airlied[d]: mthd 1538 NVC797_SET_ZT_SELECT
02:03airlied[d]: .TARGET_COUNT = (0x1)
02:03airlied[d]: [0x00000021] HDR 2003048a subch 0 NINC
02:03airlied[d]: mthd 1228 NVC797_SET_ZT_SIZE_A
02:03airlied[d]: .WIDTH = (0x500)
02:03airlied[d]: mthd 122c NVC797_SET_ZT_SIZE_B
02:03airlied[d]: .HEIGHT = (0x300)
02:03airlied[d]: mthd 1230 NVC797_SET_ZT_SIZE_C
02:03airlied[d]: .THIRD_DIMENSION = (0x1)
02:03airlied[d]: .CONTROL = THIRD_DIMENSION_DEFINES_ARRAY_SIZE
02:03airlied[d]: [0x00000025] HDR 200203c0 subch 0 NINC
02:03airlied[d]: mthd 0f00 unknown method
02:03airlied[d]: .VALUE = 0xde
02:03airlied[d]: mthd 0f04 unknown method
02:03airlied[d]: .VALUE = 0xf9e80000
02:03airlied[d]: [0x00000028] HDR 200203c2 subch 0 NINC
02:03airlied[d]: mthd 0f08 unknown method
02:03airlied[d]: .VALUE = 0x40
02:03airlied[d]: mthd 0f0c unknown method
02:03airlied[d]: .VALUE = 0x3c000
02:03airlied[d]: [0x0000002b] HDR 20020483 subch 0 NINC
02:03airlied[d]: mthd 120c unknown method
02:03airlied[d]: .VALUE = 0x500
02:03airlied[d]: mthd 1210 unknown method
02:03airlied[d]: .VALUE = 0x300
02:03airlied[d]: those new methods seems stencily
02:04airlied[d]: and also sixteen Y GOBS
02:07gfxstrand[d]: Where are you seeing tall GOBs?
02:08airlied[d]: 0x0000000a] HDR 20060202 subch 0 NINC
02:08airlied[d]: mthd 0808 NVC797_SET_COLOR_TARGET_WIDTH(0)
02:08airlied[d]: .V = (0x500)
02:08airlied[d]: mthd 080c NVC797_SET_COLOR_TARGET_HEIGHT(0)
02:08airlied[d]: .V = (0x300)
02:08airlied[d]: mthd 0810 NVC797_SET_COLOR_TARGET_FORMAT(0)
02:08airlied[d]: .V = A8R8G8B8
02:08airlied[d]: mthd 0814 NVC797_SET_COLOR_TARGET_MEMORY(0)
02:08airlied[d]: .BLOCK_WIDTH = ONE_GOB
02:08airlied[d]: .BLOCK_HEIGHT = SIXTEEN_GOBS
02:08airlied[d]: .BLOCK_DEPTH = ONE_GOB
02:08airlied[d]: .LAYOUT = BLOCKLINEAR
02:08airlied[d]: .THIRD_DIMENSION_CONTROL = THIRD_DIMENSION_DEFINES_ARRAY_SIZE
02:08airlied[d]: mthd 0818 NVC797_SET_COLOR_TARGET_THIRD_DIMENSION(0)
02:08airlied[d]: .V = (0x1)
02:08airlied[d]: mthd 081c NVC797_SET_COLOR_TARGET_ARRAY_PITCH(0)
02:08airlied[d]: .V = (0xf0000)
02:08airlied[d]: [0x0000001b] HDR 200303fa subch 0 NINC
02:08airlied[d]: mthd 0fe8 NVC797_SET_ZT_FORMAT
02:08airlied[d]: .V = ZF32_X24S8
02:08airlied[d]: mthd 0fec NVC797_SET_ZT_BLOCK_SIZE
02:08airlied[d]: .WIDTH = ONE_GOB
02:08airlied[d]: .HEIGHT = SIXTEEN_GOBS
02:08airlied[d]: .DEPTH = ONE_GOB
02:08airlied[d]: mthd 0ff0 NVC797_SET_ZT_ARRAY_PITCH
02:08airlied[d]: .V = (0xf0000)
02:09gfxstrand[d]: Oh, tall blocks. That's fine.
02:10gfxstrand[d]: We can do up to 32 already
02:11gfxstrand[d]: airlied[d]: That very much looks like combined stencil to me
02:13airlied[d]: okay so no idea what those new methods would be?
02:13gfxstrand[d]: Maybe they added 3D ZT support and/or LOD support somehow?
02:14airlied[d]: since 0x3c000 pitch looks like a single-byte 1280x768
02:16airlied[d]: and 0xf0000 looks like 4 byte 1280x768 assuming pitch might be shifted another bit
02:16airlied[d]: not shifted another bit same shift
02:16airlied[d]: << 2
02:16gfxstrand[d]: So the 0x500 and 0x300 are clearly dimensions
02:17gfxstrand[d]: The first one is an address
02:17airlied[d]: and 0x3c000 would be an ARRAY_PITCH
02:17airlied[d]: for a 1 byte, just not sure what the 0x40 might be
02:17gfxstrand[d]: 40 is maybe a block size? 1x16x1?
02:18gfxstrand[d]: 1<<4 is 16
02:19airlied[d]: I might hack the demo to try a D24, its just Sascha triangle
02:19gfxstrand[d]: airlied[d]: Probably? I'm not sure
02:23gfxstrand[d]: if 0x500 and 0x300 are width and height, 0x3c000 is too small to be an array pitch
02:23airlied[d]: not for S8?
02:23gfxstrand[d]: 0x500 x 0x300 is 0xf0000
02:23airlied[d]: yup and array pitches are >> 2
02:23gfxstrand[d]: Oh, right
02:24airlied[d]: okay s8z24 looks exactly the same
02:25airlied[d]: and i picked Z32 it doesn't emit the second set
02:27gfxstrand[d]: Well, that's looking a lot like we have separate stencil now.
02:27airlied[d]: okay got sascha triangle rendering with D32
02:29airlied[d]: now the question is if hopper or blackwell 🙂
02:37airlied[d]: I suppose I could abuse stencil_copy_temp 🙂
02:53mangodev[d]: airlied[d]: is this on a branch, git for everyone, or git for sm32?
03:04mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1371684381776805959/image.png?ex=68240821&is=6822b6a1&hm=b623088cecbdf7b4f8119590b39106e626f18d72bf89d0e76bdffd84477d0d66&
03:04mangodev[d]: also
03:04mangodev[d]: is this concerning, or is the compiler just being prudent?
03:09airlied[d]: that was blackwell from gfxstrand[d] branch
03:14gfxstrand[d]: airlied[d]: Or... not?
03:14gfxstrand[d]: We should do the same thing we do on all the other drivers. Put stencil in plane[1]
03:16airlied[d]: yeah not gonna do that, will use plane[1]
03:17mangodev[d]: airlied[d]: phew, i was hoping i wouldn't regret rebuilding my drivers today
03:17gfxstrand[d]: mangodev[d]: Uh... maybe?
03:18mangodev[d]: gfxstrand[d]: happens on 64 bit build too, so it's not just a 32 bit driver thing
03:18mangodev[d]: wait oh no
03:18mangodev[d]: why does it say it's the integer limit
03:18mangodev[d]: how did i not see that part
03:18mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1371688122391330898/image.png?ex=68240b9d&is=6822ba1d&hm=c8b52c6c149a1314f69e09ead6b7dc5930402636d483594174de706be33e4442&
03:21airlied[d]: it's using a uint32 to reference an array without ever validating it, and the compiler doesn't seem to think it can prove it
04:09airlied[d]: https://gitlab.freedesktop.org/airlied/mesa/-/commit/408eeece4e95b50388d41201476df3dfdad84ecd started into it, now to debug it into working 🙂
04:10airlied[d]: seems to trash my mobile gb203 pretty well playing with this
04:56gfxstrand[d]: mangodev[d]: What GCC/clang version are you using? It looks like it's detecting a 2 somewhere. That' might be a real bug.
04:57gfxstrand[d]: Okay, I see what it's doing. It's because
04:59gfxstrand[d]: Okay, I see what it's doing. It's because `load_pred()` is calling `load_ureg()` with `imm_idx = -1`. It's fine, though, because it knows a priori that that one is always a register. Maybe we should assert that?
05:00gfxstrand[d]: Maybe add an `assert(inst->pred <= MME_TU104_REG_R23)` right before the `LOAD_REG`?
05:10mangodev[d]: gfxstrand[d]: whatever's arch latest rn, my system is fully up-to-date
05:11mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1371716345850494986/image.png?ex=682425e6&is=6822d466&hm=a2e4b4eb83b0fbc78fb785b510469f3ecdba39a2db7cd3ade3c21075b60ae9fa&
05:11mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1371716346345164840/image.png?ex=682425e6&is=6822d466&hm=0133ad2cbebbcdce1969f8c77347373fc3cb699b786d8d4c14ac01272f651045&
05:18gfxstrand[d]: Are you building with GCC or clang?
05:25gfxstrand[d]: gfxstrand[d]: In any case, give that assert a try.
05:28snowycoder[d]: gfxstrand[d]: I'm not a fan myself of long macros, but I need a lot of NIR flags and using a static assert manually for each one is not great
05:28gfxstrand[d]: snowycoder[d]: I left a more concrete suggestion on the MR itself
05:32snowycoder[d]: gfxstrand[d]: I don't have one yet, I'm cleaning up the patches a bit.
05:32snowycoder[d]: For now I'm working on here: https://gitlab.freedesktop.org/SnowyCoder/mesa/-/tree/nak_suops?ref_type=heads
05:43mangodev[d]: gfxstrand[d]: iirc gcc with mold
05:43gfxstrand[d]: snowycoder[d]: No worries. Feel free to pluck the top patch off my nak/sm32 branch if you want to land both at the same time.
05:44gfxstrand[d]: mangodev[d]: Weird. I'm on the same GCC version and I don't think I saw that warning.
05:44mangodev[d]: tbf my system has been kinda jank lately
05:44mangodev[d]: idk why
05:44mangodev[d]: kde has weird artifacting issues
05:45mangodev[d]: and discord has been segfaulting a lot more in the past week or so
05:45mangodev[d]: sometimes even twice in a row
05:45mangodev[d]: actually did so a couple hours ago, seems to do it more when other programs are using the gpu
05:46mangodev[d]: the higher the gpu usage, the more discord soft-segfaults
05:46mangodev[d]: firefox may be similar because it crashes more when i'm in a discord call and watching someone's screenshare (more repaints = more gpu activity?)
09:00gfxstrand[d]: Okay, I think I've convinced myself that SLM encoding is very different on Blackwell
09:01gfxstrand[d]: There are certainly similarities but it's not the same fields as we had before
09:15mohamexiety[d]: SLM?
10:11airlied[d]: gfxstrand[d]: btw when you said you were getting NULL was that just an RC msg with 0 as fault addr?
10:11airlied[d]: Those aren't always faults, have to look up what the RC error is
13:56gfxstrand[d]: Yeah
13:57gfxstrand[d]: It's the same message you get if you set the shared memory window wrong
15:35phomes_[d]: Can I get a review/merge of 34168? (I got the executable name wrong when I added the setting because my launch command was calling the X4.exe directly rather than the script that steam otherwise uses)
16:55gfxstrand[d]: mhenning[d]: beat me to it
17:04mhenning[d]: gfxstrand[d]: if you're sad you missed the chance to review that one, there's plenty of other stuff you can review 😛 like https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34865 for example
17:05gfxstrand[d]: mhenning[d]: Yeah, I've been reading that one. It's a nice improvement. I've just not had the chance to read it all the way through. I'll review tonight. Thanks!
17:51snowycoder[d]: gfxstrand[d]: Fixed it and opened another MR (sorry) for two small bug fixes
18:17gfxstrand[d]: Okay. I'll look tonight
18:20snowycoder[d]: gfxstrand[d]: Do you have a custom branch for nv-shader-tools for sm20?
18:20snowycoder[d]: I'd need sm20's imadsp too.
18:20snowycoder[d]: (did you write all the encoder without it??)
18:22gfxstrand[d]: Yeah, I think I do. But I can type up imadsp after a bit. Looks like I'm getting quite the list for this evening. 🤓
18:24snowycoder[d]: don't worry I can do it, nv-shader-tools would only help make sure it's correct