09:23RSpliet: karolherbst: https://core.ac.uk/download/pdf/162915021.pdf :-P
09:25RSpliet: This was measured with Nouveau. I think the consensus was that within a generation, context switching latency is broadly stable because the DRAM bandwidth grows proportionately with the # GPCs and SMs
10:34karolherbst: RSpliet: that's on tesla, right?
10:34karolherbst: read GT210 instead of gt710
10:35karolherbst: ahh different table
10:35karolherbst: yeah anyway.. single to two digit us
10:36RSpliet: Yep. There's a chance that NVIDIA's "fully preemptive" context switching takes longer. Never measured, because I never cared, and we can't make the firmware do this stuff anymore
10:37karolherbst: I am mostly wondering how that looks like on new modern GPUs
10:37karolherbst: in the past you had like 2-3 GPU contexts active and it didn't matter, but today where you have like two digit contexts live it might
10:38RSpliet: It is hard to say. DRAM's gotten a lot faster. I think that the FW wasn't very effective at utilising that BW, I hope they improved the copy mechanisms.
10:38karolherbst: could do on the fly compression
10:38karolherbst: or other tricks
10:39RSpliet: Better bursting would be the first target.
10:40RSpliet: Not even sure if DRAM is the bottleneck at all. I think it was rather the latency of reading a list of registers and sticking their values on a FIFO.
10:40RSpliet: More SMs -> More GPCCS -> more parallelism, which is why it scaled
10:40karolherbst: the thing is, you can't just "read out the context" at least if you look at it from a MMIO perspective
10:40karolherbst: not sure if the firmware has magic access to stuff
10:45RSpliet: Either way I don't think it's something that nouveau can influence much beyond clocking the cores. FECS/GPCCS clock speeds mattered from what I recall
10:45RSpliet: As did DRAM clock speeds to some degree
10:45karolherbst: yeah, makes sense
16:27ilgaz: firefox 102 finally shipped for opensuse tw and I have vaapi issue with nvidia 9400