01:06karolherbst: DodoGTA: what do you mean that the usage returned by glxinfo seems broken?
02:06fdobridge: <karolherbst🐧🦀> ah yeah.. some of that is weird on ampere compared to turing :/
02:06fdobridge: <karolherbst🐧🦀> LTS is one of the different things
02:22fdobridge: <airlied> yeah but I've played with the qmd to no avail
02:25fdobridge: <karolherbst🐧🦀> what's the local size set?
02:27fdobridge: <karolherbst🐧🦀> make sure it's like aligned and stuff
02:31fdobridge: <karolherbst🐧🦀> mhhh.. there is `#define NVC6C0_QMDV03_00_SHADER_LOCAL_MEMORY_LOW_SIZE MW(759:736)`
02:31fdobridge: <karolherbst🐧🦀> maybe the QMD version needs to be set?
02:31fdobridge: <karolherbst🐧🦀> in v3 a lot of stuff seems to have been moved around
02:32fdobridge: <karolherbst🐧🦀> @airlied in opengl we do set the QMD version to 2.2 for volta-ampere
02:32fdobridge: <karolherbst🐧🦀> might want to try that
02:32fdobridge: <karolherbst🐧🦀> though I think ampere supports 2.3, 2.4 and 3.0
02:33fdobridge: <karolherbst🐧🦀> `QMD_VERSION` + `QMD_MAJOR_VERSION` inside the QMD
02:36fdobridge: <airlied> yeah did a bit of 2 and 3 no major difference
02:36fdobridge: <karolherbst🐧🦀> did you verify that the local size being set is actually sane?
02:37fdobridge: <airlied> b3e13d2b841648440a1c51c3fd965693ac396bd9 was where I added v3 to try and fix it
02:37fdobridge: <airlied> yeah even made things larger randomly
02:38fdobridge: <karolherbst🐧🦀> it's not about the size
02:38fdobridge: <karolherbst🐧🦀> it's about the value
02:38fdobridge: <karolherbst🐧🦀> also. you don't set the local size
02:39fdobridge: <karolherbst🐧🦀> at least not in that patch
02:40fdobridge: <karolherbst🐧🦀> ehh wait a second..
02:40fdobridge: <airlied> the template macros do it
02:41fdobridge: <karolherbst🐧🦀> ahh, fair
02:41fdobridge: <airlied> slm_size
02:44fdobridge: <karolherbst🐧🦀> well.. dunno then, might want to compare the code to the opengl driver. Maybe some setup stuff is missing
02:45fdobridge: <karolherbst🐧🦀> but the problem isn't really the size itself
02:45fdobridge: <airlied> my other plan was to create a GL test the same, just hadn't gotten to it yet
02:45fdobridge: <karolherbst🐧🦀> because SKED doesn't know if the size is enough or too small
02:46fdobridge: <karolherbst🐧🦀> it's probably something like it's not allowed to be non 0 in this case
02:46fdobridge: <karolherbst🐧🦀> and maybe something with binding the actual buffer is wrong
02:47fdobridge: <karolherbst🐧🦀> do you get that error if you set the local size to 0?
02:47fdobridge: <airlied> probably the only value I didn't try 😛
02:47fdobridge: <karolherbst🐧🦀> well.. did you also try something ridicoulus small like 0x10 or 0x100?
02:48fdobridge: <karolherbst🐧🦀> but anyway... 0 might work and the test fails, which would indicate that something with setting up the local buffer state is broken
02:51fdobridge: <airlied> I should have written down the test, have to go find it again
02:53fdobridge: <karolherbst🐧🦀> anyway.. my bet is that the local memory buffer is set up incorrectly 😄 Might need more alignment or it's just bound differently on ampere? dunno...
02:53fdobridge: <karolherbst🐧🦀> ehh wait...
02:53fdobridge: <karolherbst🐧🦀> does ampere generally work or not at all?
02:54fdobridge: <airlied> passes most of the CTS
02:54fdobridge: <karolherbst🐧🦀> okay..
02:54fdobridge: <karolherbst🐧🦀> so I guess we already bind the copy class then 😄
02:56fdobridge: <karolherbst🐧🦀> @airlied sooo... mhh... you are launching the compute shaders different than we do in OpenGL
02:57fdobridge: <karolherbst🐧🦀> in GL we execute the command twice
02:57fdobridge: <karolherbst🐧🦀> once with `NVC6C0_SEND_SIGNALING_PCAS2_B_PCAS_ACTION_INVALIDATE`, then with `NVC6C0_SEND_SIGNALING_PCAS2_B_PCAS_ACTION_SCHEDULE`
02:57fdobridge: <karolherbst🐧🦀> not sure if that makes a difference
02:58fdobridge: <karolherbst🐧🦀> besides that I don't really see anything changed in the gl driver for ampere
02:59fdobridge: <karolherbst🐧🦀> so maybe that's it, or we just respected some alignment already
02:59fdobridge: <airlied> yeah I didn't see any ampere specifics
02:59fdobridge: <karolherbst🐧🦀> I'd try out doing this two step launch
02:59fdobridge: <karolherbst🐧🦀> I'm sure Ben had a good reason for that
02:59fdobridge: <karolherbst🐧🦀> anyway, gotta sleep
03:01fdobridge: <airlied> going 0 gives an out of range error
03:15fdobridge: <airlied> moving to a 2 step inval/sched didn't change it either
03:21fdobridge: <airlied> dEQP-VK.pipeline.monolithic.spec_constant.compute.expression.array_size is an example test
05:31fdobridge: <![NVK Whacker] Echo (she) 🇱🇹> karolherbst: The memory usage stays basically identical no matter what I do (including loading a heavy game)
06:46fdobridge: <airlied> okay I think I found the last tess blocker bug
06:47fdobridge: <airlied> doh found one bit of it 😛
07:05fdobridge: <airlied> maybe enough for someone else to figure out how to finalise it
07:55fdobridge: <gouz> Thanks @airlied, I will try it out when I get home
09:53fdobridge: <karolherbst🐧🦀> sure, so it's passed SKED
09:54fdobridge: <karolherbst🐧🦀> so something is wrong with setting a local size at all
09:55fdobridge: <karolherbst🐧🦀> but as I said above: the size itself isn't validated here, it's just making sure it's within certain constraints like alignment or 0 if no LTS buffer is bound
09:55fdobridge: <karolherbst🐧🦀> and the alignment is 0x10
09:56fdobridge: <karolherbst🐧🦀> anyway, out of range error is progress over the SKED error as this happens later. SKED does not know the range the local memory buffer is accessed, it's just validating if the QMD is sound before scheduling the job to the SM