08:10 airlied[d]: okay nearly have 580 working on Ada
09:58 notthatclippy[d]: airlied[d]: Sorry, just for my own understanding, why?
09:59 airlied[d]: Getting gb20b spark going on nouveau, but needs newer fw so have to do that first, might go to 595 though
10:00 airlied[d]: 580 is throwing some weird rpc response to a bunch of msgs, not near it now but I was like 0xff100022 or somethkng
10:00 notthatclippy[d]: Would it make sense to target the same firmware as what Nova will pick up next?
10:01 airlied[d]: Probably but you kinda have to go through the iterations, so everything doesn't just break. I'm not even sure I'll upstream it
10:02 airlied[d]: Just have a piece of hw that can't boot upstream and have to scratch the itch
10:04 notthatclippy[d]: ACK. For 0xff1xxxxxx stuff, check rpc_headers.h, it has a few special status values that can happen that aren't NV_STATUS.
10:05 notthatclippy[d]: Not that particular one though. <https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/nvidia/inc/kernel/vgpu/rpc_headers.h#L133-L148>
10:07 airlied[d]: NV_VGPU_MSG_RESULT_RPC_INVALID_MESSAGE_FORMAT
10:07 airlied[d]: Is what I'm seeing
10:07 airlied[d]: What might cause that to get generated for msgs that were fine on 570
10:09 airlied[d]: NV0073_CTRL_SYSTEM_GET_NUM_HEADS_PARAMS
10:09 airlied[d]: Is hitting it for example, seems to be hit with some short msgs
10:29 notthatclippy[d]: It's probably a length check against sizeof() the top level struct.
10:30 notthatclippy[d]: You can work around it probably by just doing `.length += 64` or something, but let me see what it actually is.
10:34 notthatclippy[d]: Any chance it could be a 32bit overflow on `sizeof(rpc_message_header_v) + sizeof(rpc_gsp_rm_alloc_v03_00) + rpc_params->paramsSize`? (`rpc_gsp_rm_alloc_v03_00 *rpc_params`)
10:35 notthatclippy[d]: That's a new thing that was added between 570 and 580 that returns that error for RmControl RPCs.
10:36 airlied[d]: Unlikely an overflow, these are like 4 byte field
10:36 airlied[d]: Maybe some 4 byte fields end up as 8?
10:38 notthatclippy[d]: 570 to 580 relevant diff:
10:38 notthatclippy[d]: ```diff
10:38 notthatclippy[d]: const NvU32 fixed_param_size = sizeof(rpc_message_header_v) + sizeof(*rpc_params);
10:38 notthatclippy[d]: - const NvU32 full_param_size = fixed_param_size + rpc_params->paramsSize;
10:38 notthatclippy[d]: + NvU32 full_param_size;
10:38 notthatclippy[d]: + if (!portSafeAddU32(fixed_param_size, rpc_params->paramsSize, &full_param_size))
10:38 notthatclippy[d]: + {
10:38 notthatclippy[d]: + rpcStatus = NV_VGPU_MSG_RESULT_RPC_INVALID_MESSAGE_FORMAT;
10:38 notthatclippy[d]: + goto done;
10:38 notthatclippy[d]: + }
10:38 notthatclippy[d]: if (!VALID_MSG_LEN(*rpc_params))
10:38 notthatclippy[d]: {
10:38 notthatclippy[d]: rpcStatus = NV_VGPU_MSG_RESULT_RPC_INVALID_MESSAGE_FORMAT;
10:38 notthatclippy[d]: goto done;
10:38 notthatclippy[d]: }
10:38 notthatclippy[d]: The change is literally an overflow guard, and should be a NOP if there's no overflow
10:50 mohamexiety[d]: airlied[d]: not sure how bad the kernel fw upgrade work will be but it's probably worth upstreaming because while spark itself might not have much nvk interest due to the high price/pro target market, there will be client based laptop chips based on the same chip. it's also the least janky tegra we can get for nvk tegra stuff (though for dev stuff out of tree is fine)
17:44 karolherbst[d]: I'd still need rb for the nak commit here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40541
17:49 siggi_: so, I've captured mmiotraces for nouveau with and without NvForcePost=1
17:50 siggi_: ... I should say for modprobe noveau with and without the parameter set
17:51 siggi_: and to my surprise, with the parameter set, I get ~5800 *fewer* writes?
17:52 siggi_: in any case, is there an easy way to diff mmiotraces?
18:04 siggi_: Hmm, so this isn't entirely consistent, so maybe it's just chance
18:08 siggi_: interesting to see (as a noob) that 'modprobe nouveau && modprobe -r nouveau' does not yield a consistent number of io reads/writes, though I guess it could be timing and interrupts?
18:25 siggi_: k, looks like I was probably holding it (somewhat) wrong - if I give the trace a few seconds to settle after unloading, the count is much closer
18:26 siggi_: and typically more reads/writes on the force post trace
20:17 airlied[d]: ah the ctrl message added some more fields