17:47karolherbst: RSpliet: do you have good numbers on context switching speed on nvidia hardware?
18:38ajax: how big is a hardware context anyway
19:00karolherbst: some MB
19:01ajax: well then
19:02ajax: probably roughly the cost of retiring that many megabytes all the way out to (board) ram
19:03karolherbst: at least somewhere in this area
19:03karolherbst: the only number I know of is that on fermi it's <25 us
19:05ajax: let's say 4MB and (picking randomly) 16GB/s GDDR4 would be 4096/sec or 0.244ms
19:05karolherbst: 16 GB/s is a bit slow, don't you think? :P
19:06karolherbst: also context size scales with GPU size
19:06karolherbst: anyway.. I could see that context switching speed is in the two/one digit us area, but...
19:07ajax: GDDR6 is in the 768GB/sec neighborhood? so it'd be 48x faster which is... 5us?
19:07ajax: again guessing that 4MB is all you need to retire
19:07karolherbst: yeah.... not sure
19:07ajax: if it's 5x the size then that gives you your 25us
19:07karolherbst: we already got 18 cbs being each 64k
19:08karolherbst: then the register file per GPC
19:08karolherbst: although it could wait until all thread settle