03:23 fdobridge: <m​henning> I was playing with nvcc for ptx -> sass last night, and I couldn't immediately find a way to just enable the scheduling but disable other optimizations (like dce and copy prop)
03:23 fdobridge: <m​henning> so I'd be a little surprised if there's a way to make it go unscheduled sass -> scheduled sass
03:29 fdobridge: <m​henning> I was able to hack together some stuff for getting the issue latency from the compiler output by trying to carefully craft an input file but it was getting a little frustrating.
03:33 fdobridge: <m​henning> We'd be trying to get latency numbers for use in our compiler, which in particular means we need broad opcode coverage, while most of the papers that I've seen only cover a handful of important opcodes
03:34 fdobridge: <m​henning> The paper I linked described one plausible way of doing it while running on real hardware: "if A has fixed latency, we choose a B that consumes A’s output. We decrease A’s stall cycles in its control word, till A’s result consumed by B is incorrect. The last stall value producing correct results is A’s latency;"
03:36 fdobridge: <m​henning> or we can try feeding inputs to the cuda compiler and try to figure out the information from whatever code it generates
03:44 fdobridge: <m​ohamexiety> ah I see, understood. I first thought you meant that we could do it using our infrastructure rather than NVIDIA's
06:47 fdobridge: <m​henning> If we wanted to do the test by observing hardware, we don't need to use nvidia's compiler/driver
16:00 fdobridge: <g​fxstrand> Pass: 402118, Fail: 1530, Crash: 1679, Warn: 3, Skip: 3195085, Timeout: 2, Flake: 466, Duration: 2:06:21
18:14 fdobridge: <m​henning> @gfxstrand What subset of cts are you running?
20:27 fdobridge: <g​fxstrand> Everything but
20:27 fdobridge: <g​fxstrand> ```
20:27 fdobridge: <g​fxstrand> dEQP-VK.api.object_management.max.*
20:27 fdobridge: <g​fxstrand> dEQP-VK.graphicsfuzz.*
20:27 fdobridge: <g​fxstrand> dEQP-VK.image.swapchain_mutable..*
20:27 fdobridge: <g​fxstrand> dEQP-VK.wsi..*
20:27 fdobridge: <g​fxstrand> ```
23:24 fdobridge: <g​fxstrand> Found my Z/S MSAA resolve bug... We weren't plumbing through writes_depth to the API set-up code.
23:24 fdobridge: <g​fxstrand> That was frustrating...
23:24 fdobridge: <g​fxstrand> Of course I assumed a NAK fail would be the compiler.... I assumed wrong. 🤦🏻‍♀️
23:42 fdobridge: <a​irlied> nice