00:37 gfxstrand[d]: That's fine
00:56 gfxstrand[d]: I think fp64 is slightly broken on Volta. I'm seeing some odd fails. Makes sense, I guess, since I think Volta might be one of the weird ones where it's kinda fixed-latency.
01:10 mohamexiety[d]: yeah Volta is all GV100 so it had the full FP64
01:43 gfxstrand[d]: Yeah, but dependency tokens should still work. 🤷🏻‍♀️
03:05 mhenning[d]: gfxstrand[d]: I thought they were explicitly ignored for fp64 on fixed-latency chips?
03:11 mhenning[d]: snowycoder[d]: I'm not really sure what you're asking, but the gpus require us to schedule instructions for them and track eg. which variable-latency instructions wait on others
11:42 snowycoder[d]: mhenning[d]: Ah, so it's fermi that has a hardware scoreboard.
11:42 snowycoder[d]: Scheduling data for each instruction (sched) has 0x1f bitmask for delay, but what are the higher 3 bits used for?
11:42 snowycoder[d]: (They're set in `SchedDataCalculator::setDelay` but it's not really clear what's the encoding)
15:47 mhenning[d]: snowycoder[d]: Some of that is explained here: https://envytools.readthedocs.io/en/latest/hw/graph/fermi/cuda/isa.html#notes-about-scheduling-data-and-dual-issue-on-gk104
15:52 mhenning[d]: This paper also talks about some details https://www.researchgate.net/publication/313020335_Understanding_the_GPU_Microarchitecture_to_Achieve_Bare-Metal_Performance_Tuning