18:34 fdobridge: <a​irlied> @gfxstrand @karolherbst🐧 so with GSP enabled, the context gets killed when we submit the initial context draw state pushbuf at least on turing
18:34 fdobridge: <k​arolherbst🐧🦀> what does it complain about?
18:34 fdobridge: <a​irlied> GSP says you are dead
18:35 fdobridge: <a​irlied> I know nothing more
18:35 fdobridge: <k​arolherbst🐧🦀> well.. then I can't really help here either
18:35 fdobridge: <a​irlied> the wonders of firmware 😛
18:35 fdobridge: <k​arolherbst🐧🦀> though
18:35 fdobridge: <k​arolherbst🐧🦀> I suspect it might be related to the channel binding
18:35 fdobridge: <k​arolherbst🐧🦀> but...
18:35 fdobridge: <a​irlied> GL works fine
18:35 fdobridge: <k​arolherbst🐧🦀> in any case
18:35 fdobridge: <k​arolherbst🐧🦀> we need errors
18:35 fdobridge: <k​arolherbst🐧🦀> if we don't get errors, we can't use GSP
18:36 fdobridge: <k​arolherbst🐧🦀> @airlied you could dump the buffer and see if there is anything obviously wrong
18:36 fdobridge: <k​arolherbst🐧🦀> but...
18:37 fdobridge: <k​arolherbst🐧🦀> I don't like guessing and I don't think we are doing anything crazy in nvk here
18:37 fdobridge: <a​irlied> I assume at some point we can process better errors, I'd hope so
18:37 fdobridge: <k​arolherbst🐧🦀> we should make that a requirement
18:37 fdobridge: <k​arolherbst🐧🦀> I'm mostly done guessing what nvidia's firmware is up to
18:54 fdobridge: <a​irlied> appears to be sw.cls init
19:04 fdobridge: <a​irlied> ah nvc0 has if (dev->chipset < 0x140) {
19:12 fdobridge: <a​irlied> okay 189 works around it for now for me
19:21 fdobridge: <g​fxstrand> That's believable
19:23 fdobridge: <a​irlied> back to having broken sync now, which is what I wanted to dig into 😛
19:25 fdobridge: <k​arolherbst🐧🦀> ahh yeah.. that would do it
19:25 fdobridge: <k​arolherbst🐧🦀> ohh.. we should ask nvidia if they can give us doc on all those fancy sw methods
19:44 fdobridge: <g​fxstrand> Yeah, that'd be nice if they have SW methods
19:44 fdobridge: <g​fxstrand> Particularly, we need to make sure that we don't break the helper pixels fix
19:44 fdobridge: <a​irlied> @gfxstrand do you see any PBDMA errors in your dmesg at all?
19:45 fdobridge: <a​irlied> I've seen those and I think it's caused by that fix, so we might need to reconsider how it works anyways, I've no idea what GSP does here
19:45 fdobridge: <g​fxstrand> Yup, piles
19:45 fdobridge: <a​irlied> yeah try commenting out the sw.cls and see do they go away
19:46 fdobridge: <k​arolherbst🐧🦀> you remember this magic thing nvidia did where they allow userspace to set certain registers?
19:46 fdobridge: <k​arolherbst🐧🦀> via macros and stuff
19:46 fdobridge: <k​arolherbst🐧🦀> I suspect we need to do the same thing to be compatible to GSP
19:46 fdobridge: <g​fxstrand> Probably
19:46 fdobridge: <g​fxstrand> We just need to know how that magic works
19:46 fdobridge: <k​arolherbst🐧🦀> and nvidia probably has to set up a buffer with bit masks of those regs
19:46 fdobridge: <k​arolherbst🐧🦀> and nouveau probably has to set up a buffer with bit masks of those regs (edited)
19:46 fdobridge: <k​arolherbst🐧🦀> I checked how they did it in nvgpu
19:47 fdobridge: <k​arolherbst🐧🦀> I should check how they do it in their open driver
19:47 fdobridge: <g​fxstrand> Yeah
19:47 fdobridge: <k​arolherbst🐧🦀> anyway, if anybody throws me a branch with all the GSP stuff I can look into it, as I was planning to upstream those bits anyway
19:47 fdobridge: <g​fxstrand> The other option is if we can just have nouveau.ko do some stuff to the context at context creation.
19:47 fdobridge: <a​irlied> I don't think you need GSP though, I think we already generate PBDMA errors
19:47 fdobridge: <a​irlied> we just don't blow away the context as aggressively
19:48 fdobridge: <k​arolherbst🐧🦀> that's not the problem
19:48 fdobridge: <k​arolherbst🐧🦀> the core issue is, there are registers we have to mess with from userspace
19:48 fdobridge: <a​irlied> it's not like GSP is seeing the pushbuf here, it's just reporting the hw error
19:48 fdobridge: <k​arolherbst🐧🦀> right
19:48 fdobridge: <k​arolherbst🐧🦀> but that will cause regressions in nvk if we don't do it
19:49 fdobridge: <k​arolherbst🐧🦀> so we have to figure out how to do the same thing with GSP
19:49 fdobridge: <a​irlied> yes well we should also figure out how to do it without GSP
19:49 fdobridge: <k​arolherbst🐧🦀> there is a magic context switch mmio register with a bit which enables/disables memory load in helper invcations
19:49 fdobridge: <k​arolherbst🐧🦀> it's disabled by default
19:49 fdobridge: <k​arolherbst🐧🦀> we have to enable it
19:49 fdobridge: <k​arolherbst🐧🦀> we already do it without GSP 🙂
19:49 fdobridge: <k​arolherbst🐧🦀> but before we upstream it, we should see how it works with GSP so it looks the same
19:49 fdobridge: <a​irlied> no we don't do it
19:50 fdobridge: <a​irlied> we do it, but it seems to blow up in places
19:50 fdobridge: <k​arolherbst🐧🦀> .....
19:50 fdobridge: <a​irlied> hence all those PBDMA errors
19:50 fdobridge: <k​arolherbst🐧🦀> please understand what I'm writing
19:51 fdobridge: <k​arolherbst🐧🦀> I have a patch to do it via the sw stuff
19:51 fdobridge: <k​arolherbst🐧🦀> that isn't upstream yet
19:51 fdobridge: <k​arolherbst🐧🦀> before upstreaming it, we should see what GSP is doing
19:51 fdobridge: <k​arolherbst🐧🦀> and implement it the same way prior GSP
19:52 fdobridge: <a​irlied> I've no idea though how to work out what the GSP interface is for it
19:52 fdobridge: <a​irlied> did nvidia ever drop any useful hints on how their userspace programs the workaround?
19:53 fdobridge: <k​arolherbst🐧🦀> via macros
19:53 fdobridge: <k​arolherbst🐧🦀> they interrupt the firmware
19:54 fdobridge: <g​fxstrand> I had a dump of all their macros at one point
19:54 fdobridge: <k​arolherbst🐧🦀> or rather.. use a doorbell or something. Anyway, nvgpu is setting up a buffer
19:54 fdobridge: <k​arolherbst🐧🦀> and each bit represents a mmio reg
19:54 fdobridge: <k​arolherbst🐧🦀> and the buffer decides what userspace can mess with
19:54 fdobridge: <k​arolherbst🐧🦀> and when bootstrapping the firmware, they pass that buffer along
19:54 fdobridge: <k​arolherbst🐧🦀> and there are like 20? slots for doing random interactions wiht the firmware afaik
19:54 fdobridge: <k​arolherbst🐧🦀> and there are like 20? slots for doing random interactions with the firmware afaik (edited)
21:35 fdobridge: <a​irlied> @karolherbst🐧 any memories or ptrs where in nvgpu to look?
21:39 fdobridge: <a​irlied> also how did we work out the sw class fix? I don't see that in my email
21:45 fdobridge: <k​arolherbst🐧🦀> ehh.. let me see..
21:45 fdobridge: <k​arolherbst🐧🦀> @airlied search for "Global memory loads in helper invocations"
21:47 fdobridge: <k​arolherbst🐧🦀> the nvgpu stuff is gr_init_get_access_map
21:47 fdobridge: <k​arolherbst🐧🦀> and stuff using that
21:48 fdobridge: <k​arolherbst🐧🦀> or rather `get_access_map`
21:48 fdobridge: <a​irlied> my copy of that thread ends before anyone mentions a sw method
21:48 fdobridge: <k​arolherbst🐧🦀> ohh
21:48 fdobridge: <k​arolherbst🐧🦀> sw method is a software thing
21:48 fdobridge: <k​arolherbst🐧🦀> nouveau implements it
21:48 fdobridge: <k​arolherbst🐧🦀> there is a patch somewhere..
21:48 fdobridge: <k​arolherbst🐧🦀> sw methods are basically interrupts on the kernel, and then the kernel handles the method from the push buffer
21:49 fdobridge: <k​arolherbst🐧🦀> which is nice, because the mmio access is switched to the correct context then
21:50 fdobridge: <a​irlied> oh so we just wire 604 up somewhere on the kernel side and we do the register write there?
21:50 fdobridge: <k​arolherbst🐧🦀> yeah
21:51 fdobridge: <a​irlied> okay I'm failing to figure out the kernel side of that
21:51 fdobridge: <k​arolherbst🐧🦀> https://gitlab.freedesktop.org/drm/nouveau/-/commit/bfe2b42ca7de5793e4b3847ca13ef305465a9492
21:52 fdobridge: <k​arolherbst🐧🦀> or rather
21:52 fdobridge: <k​arolherbst🐧🦀> https://gitlab.freedesktop.org/drm/nouveau/-/commits/topic/vulkan/
21:52 fdobridge: <k​arolherbst🐧🦀> need all of it
21:52 fdobridge: <k​arolherbst🐧🦀> well.. the two top commits
21:53 fdobridge: <k​arolherbst🐧🦀> the method nvidia uses for this kind of stuff obviously doesn't involve kernel roundtrips, so that's why I'd like to figure it out and do the same thing instead
23:08 fdobridge: <a​irlied> okay there seems to be some sort of priv access map you can attach to a context in the kernel, then it lets those register be programmed
23:10 fdobridge: <k​arolherbst🐧🦀> yeah
23:10 fdobridge: <k​arolherbst🐧🦀> I suspect GSP has the same thing
23:10 fdobridge: <k​arolherbst🐧🦀> most likely even configured the exact same way
23:11 fdobridge: <k​arolherbst🐧🦀> the annoying part will be to figure out if the gr firmware we got pre GSP even has it
23:11 fdobridge: <a​irlied> yeah I'm trying to find the interfaces for it
23:11 fdobridge: <k​arolherbst🐧🦀> worst case, we do SW pre GSP
23:12 fdobridge: <k​arolherbst🐧🦀> sounds like it
23:12 fdobridge: <a​irlied> NV2080_CTRL_GPU_PROMOTE_CTX_BUFFER_ID_PRIV_ACCESS_MAP seems to be the other side of it
23:13 fdobridge: <k​arolherbst🐧🦀> actually
23:13 fdobridge: <k​arolherbst🐧🦀> let's do the pre GSP stuff via SW
23:14 fdobridge: <k​arolherbst🐧🦀> we'll need to use it for perf counters anyway
23:14 fdobridge: <k​arolherbst🐧🦀> @airlied is there a property in the new nouveau UAPI to tell if we are on GSP or not?
23:14 fdobridge: <k​arolherbst🐧🦀> I suspect we'll want to have it for stuff like this
23:14 fdobridge: <k​arolherbst🐧🦀> could be part of the device info stuff tho
23:16 fdobridge: <a​irlied> so far the new uapi is only va/sparse stuff, any other new uapi should be separate
23:17 fdobridge: <a​irlied> no point needlessly tying things together
23:17 fdobridge: <a​irlied> if we need a GSP property it should go with the GSP patches
23:18 fdobridge: <a​irlied> though the SW method availability should possibly be it's own flag somewhere
23:18 fdobridge: <a​irlied> I forsee more uAPI changes for GSP, but very separate to the uAPI changes for vma
23:20 fdobridge: <a​irlied> it does seem like skeggsb code for gsp gr setup does a bit of this already so it might be easy to add there
23:55 fdobridge: <🌺​ ¿butterflies? 🌸> What are the odds of getting PMU fw from NV.....