17:47 karolherbst: RSpliet: do you have good numbers on context switching speed on nvidia hardware?
18:38 ajax: how big is a hardware context anyway
19:00 karolherbst: some MB
19:01 ajax: well then
19:02 ajax: probably roughly the cost of retiring that many megabytes all the way out to (board) ram
19:03 karolherbst: maybe
19:03 karolherbst: at least somewhere in this area
19:03 karolherbst: the only number I know of is that on fermi it's <25 us
19:05 ajax: let's say 4MB and (picking randomly) 16GB/s GDDR4 would be 4096/sec or 0.244ms
19:05 karolherbst: 16 GB/s is a bit slow, don't you think? :P
19:06 karolherbst: also context size scales with GPU size
19:06 karolherbst: anyway.. I could see that context switching speed is in the two/one digit us area, but...
19:07 ajax: GDDR6 is in the 768GB/sec neighborhood? so it'd be 48x faster which is... 5us?
19:07 ajax: again guessing that 4MB is all you need to retire
19:07 karolherbst: yeah.... not sure
19:07 ajax: if it's 5x the size then that gives you your 25us
19:07 karolherbst: we already got 18 cbs being each 64k
19:08 karolherbst: then the register file per GPC
19:08 karolherbst: although it could wait until all thread settle