16:04 karolherbst: so I kinda want to get into more vulkan perf profiling from a driver perspective, and I'm curious what people here are using to e.g. identify which shader the GPU spends most time at, or which one gets used the most. Atm on NVK/nouveau we don't really have great ways to get metrics on what's going on and I think our timestamps aren't the most
16:04 karolherbst: reliable ones either.
16:08 zmike: renderdoc timing can give you some idea
16:15 zmike: though if you don't have reliable timestamps that won't help
16:16 zmike: maybe ask AI ?
16:20 danylo: karolherbst Having reliable timestamps is a good idea. Then integrating u_trace + perfetto would give you a nice overview of game rendering. You can add which shaders draw call is using to drawcall's tracepoint and you e.g. would be able to focus on high impact ones.
16:21 danylo: Integrating u_trace is definitely a good idea
16:21 karolherbst: zmike: well that's why I'd at least look at shaders that are ran a lot, given that improving those might also actually help with perf in case something very obvious stands out
16:21 danylo: Then a good idea is to expose perf counters via pps producer, to correlate them in perfetto view
16:22 karolherbst: danylo: okay, sounds good! I think we already have somebody working on it for nvk
16:22 karolherbst: well...
16:22 karolherbst: we don't have docs on the perf counters 🙃
16:22 karolherbst: and apparently nvidia changes what those counters mean every generation
16:22 danylo: Shouldn't be too hard to reverse them for at least one or two gens?
16:22 danylo: At least the most useful ones
16:22 danylo: may help a lot
16:22 karolherbst: yeah... we do have that for some gens in the old gl driver
16:23 karolherbst: but it's like ~100 counters
16:24 karolherbst: but yeah.. it's on my todo list and figure out a good way to automate reverse engineering them with nvisght..
16:24 karolherbst: *nsight
16:25 danylo: karolherbst One of interesting things I've done for turnip is that I got gfxr trace that is replayable on both prop driver and turnip, then instrumented both drivers command streams, fetch perf counters for every draw call, then compared them. This allowed to quickly find a few bad rather bad cases, though it was a bit less useful than I expected
16:25 danylo: since it's really hard to compare counters.
16:26 karolherbst: yeah...
16:26 karolherbst: though I think for nvk we still have a couple of low hanging fruits, I just want to know which one matter the most 🙃
16:27 karolherbst: but good news is that I've been doing some compiler opts that really helped a lot, so kinda hoped there is a simple way to figure out what shaders to look at and continue improving things there
16:28 zmike: counters and perfetto definitely good
16:30 karolherbst: for compute I know that aliasing shared memory will help us to improve occupancy (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33914)
16:30 alyssa: that MR scares me
16:30 alyssa: it's obviously the right thing to do but it scares me
16:31 karolherbst: yeah.. but I have shaders that could go from 4 to 5 parallel workgroups :')
16:31 karolherbst: and probably others I'm not aware of
16:32 karolherbst: I'm kinda planning to look into it next week or so 🙃
16:33 alyssa: don't spook me
16:33 alyssa: it's not halloween yet
16:33 karolherbst: it's going to happen
16:34 karolherbst: *throws the ffma split MR as a distraction*
16:37 alyssa: /o\
18:55 anholt: valentine: been seeing some timeouts that feel like the network issue again https://gitlab.freedesktop.org/anholt/mesa/-/jobs/98244959
18:55 anholt:needs to add a download bandwidth graph
23:06 anholt: valentine: I suppose my other question would be whether the local cache is big enough to fit all these traces, or are we hitting fd.o mid-job?