03:23fdobridge: <mhenning> I was playing with nvcc for ptx -> sass last night, and I couldn't immediately find a way to just enable the scheduling but disable other optimizations (like dce and copy prop)
03:23fdobridge: <mhenning> so I'd be a little surprised if there's a way to make it go unscheduled sass -> scheduled sass
03:29fdobridge: <mhenning> I was able to hack together some stuff for getting the issue latency from the compiler output by trying to carefully craft an input file but it was getting a little frustrating.
03:33fdobridge: <mhenning> We'd be trying to get latency numbers for use in our compiler, which in particular means we need broad opcode coverage, while most of the papers that I've seen only cover a handful of important opcodes
03:34fdobridge: <mhenning> The paper I linked described one plausible way of doing it while running on real hardware: "if A has fixed latency, we choose a B that consumes A’s output. We decrease A’s stall cycles in its control word, till A’s result consumed by B is incorrect. The last stall value producing correct results is A’s latency;"
03:36fdobridge: <mhenning> or we can try feeding inputs to the cuda compiler and try to figure out the information from whatever code it generates
03:44fdobridge: <mohamexiety> ah I see, understood. I first thought you meant that we could do it using our infrastructure rather than NVIDIA's
06:47fdobridge: <mhenning> If we wanted to do the test by observing hardware, we don't need to use nvidia's compiler/driver
16:00fdobridge: <gfxstrand> Pass: 402118, Fail: 1530, Crash: 1679, Warn: 3, Skip: 3195085, Timeout: 2, Flake: 466, Duration: 2:06:21
18:14fdobridge: <mhenning> @gfxstrand What subset of cts are you running?
20:27fdobridge: <gfxstrand> Everything but
20:27fdobridge: <gfxstrand> ```
20:27fdobridge: <gfxstrand> dEQP-VK.api.object_management.max.*
20:27fdobridge: <gfxstrand> dEQP-VK.graphicsfuzz.*
20:27fdobridge: <gfxstrand> dEQP-VK.image.swapchain_mutable..*
20:27fdobridge: <gfxstrand> dEQP-VK.wsi..*
20:27fdobridge: <gfxstrand> ```
23:24fdobridge: <gfxstrand> Found my Z/S MSAA resolve bug... We weren't plumbing through writes_depth to the API set-up code.
23:24fdobridge: <gfxstrand> That was frustrating...
23:24fdobridge: <gfxstrand> Of course I assumed a NAK fail would be the compiler.... I assumed wrong. 🤦🏻♀️
23:42fdobridge: <airlied> nice