01:27karolherbst: is the relabeling label broken? :'(
17:56osvel: dschuermann Hey, I dropped an MR for what we discussed here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/40364. Would appreciate a review if you have the time :)
19:31dj-death: is there a nir helper to find the most common successor between 2 nir_blocks?
22:06ammen99: (I was told my earlier messages did not make it through, so resending, apologies if they did come through)
22:06ammen99: Hi all, I have been profiling my wayland compositor (Wayfire) to figure out why the CPU usage appears much higher when I use its vulkan backend (compared to the gles2 backend). With valgrind, I figured that about 30% of the execution time is spent clearing memory, which happens when we reset a command buffer
22:06ammen99: So we call vkBeginCommandBuffer, in the intel driver (using intel iGPU + vulkan) it ends up calling reset_cmd_buffer in anv_cmd_buffer.c, and that calls anv_cmd_state_init(), which in turn memsets
22:06ammen99: is this expected, and are there any ways to reduce the time spent memsetting memory? the command buffers are reused as often as possible
22:22HdkR: ammen99: Use command buffer pooling and don't rerecord command buffers when you don't need to.
22:23HdkR: Not helpful, I'm aware
22:26HdkR: By design the recording is fairly heavy, while dispatching/submitting a million times is relatively lightweight on the CPU.
22:28HdkR: But also make sure you're actually measuring what you think the bottleneck is. valgrind instead of `perf top` is a bit of an odd choice for profiling.
22:32ammen99: HdkR: thanks for the ideas, unfortunately I don't think we can avoid rerecording command buffers all the time, except in very specific cases .. I could try using perf top as well, I have simply used valgrind / callgrind in the past and have found it useful, but maybe perf would be better.
22:35Company: ammen99: you could try sysprof
22:36HdkR: Worst case is if you're not using a multi-frame pool, and you're trying to reset a command buffer while it is stil in flight, so you're literally just waiting for it to be consumed by the GPU :P
22:37HdkR: automatic orphaning of resources in Vulkan isn't really a thing.
22:37soreau: so there's glthread, but is there vkthread? or that's up to the vkconsumer?
22:37Company: up to the consumer, Vulkan is just a small shim translating from gpu to a common API
22:38soreau: This is what I thought, thanks Company
22:39Company: you can see that while profiling - the Vulkan parts of the stack traces are very shallow, you're either in your app or in the kernel
22:39Company: in general
22:39soreau: I see
22:42HdkR: This is why you can do half a million API calls per frame and still hit 60FPS :P
22:57Company: I should note that I also think it's unlikely that memset of command buffers is the issue - unless you're collecting millions or billions of commands per frame
22:58Company: and it's way more likely that it's something else going wrong
22:59Company: because command buffers are tiny compared to images and copying/memsetting those is not a problem
23:00zmike: valgrind tends to weight memset more heavily than other profiling tools
23:00HdkR: Yea, I think Valgrind overvalues memory accesses in general
23:01HdkR: memset is one of the fastest things a CPU can do :P
23:22karolherbst: also.. why would one perf profile with valgrind?
23:22zmike: wow, hater
23:23karolherbst: my feelings are somewhere between confusion and curiosity