21:04 superkuh: Hi. I am doing a amdgpu compute thing using vulkan that takes most of my vram. When another process uses lots of vram and kicks the first out, the ram is never reloaded by the first process after the second finishes. When I did the same compute on the same setup with opencl the first process would automatically reload. https://github.com/ggerganov/llama.cpp/issues/5380
21:05 superkuh: Does anyone know of some driver or kernel or vulkan parameter for amdgpu I might pass to restore the opencl like behavior where if it's kicked out of vram it auto-reloads?
21:07 superkuh: I'm on Debian 11 if it matters, more details in the issue/bug report.
22:05 agd5f: superkuh, depends what flags the vulkan driver used for the allocations and the amount of memory in play. If UMD the specifies VRAM only for the allocations, the KMD will try and migrate it back to VRAM on command execution, but it throttles after a certain amount of data to avoid transferring too much data. Otherwise the overhead of transferring all the data back to VRAM may be higher than just doing the work on the buffers in system memory.
22:05 agd5f: Alternatively, if the UMD specifies VRAM | GTT, that tells the KMD that the UMD doesn't care where the buffer ends up, so it won't bother moving it unnecessarily.
22:15 superkuh: I think I understand. What does GTT mean in this context? In my situation there's a constant process using 5.5GB of 8GB of vram (6.2GB/8GB in practice with OS/other things/etc). Then another transient, once or twice per day, ~1 minute process that uses ~5GB of vram.
22:17 superkuh: The first process is usually not "active". It's just waiting and runs a few dozen times per hour. It's an IRC bot LLM. The second is an image analysis neural network.
22:17 superkuh: Er, that's is to say the first process is always running, but not always actively doing operation on what it has in vram.
22:18 superkuh: I suppose this is all too vague for any real help. I will attempt to look at my vulkan userland for possible flags, etc.
22:19 airlied: I think radv has a policy of marking a lot of things as VRAM|GTT
22:40 agd5f: superkuh, GTT is system memory
22:40 superkuh: Thanks.
22:41 agd5f: superkuh, the KMD transfer limit is like 100M or something like that so that's probably the issue
22:42 agd5f: for most workloads the overhead of copying several GB of data back and forth from VRAM every time their is a command buffer is not worth it
22:45 superkuh: Hm. So maybe setting "parm: moverate:Maximum buffer migration rate in MB/s. (32, 64, etc., -1=auto, 0=1=disabled) (int)"
22:46 superkuh: (from $ modinfo amdgpu )
22:49 agd5f: superkuh, yeah, I think so
22:50 superkuh: I wonder why that didn't effect opencl.
22:50 superkuh: Not that I've tried it yet. I'll have to reboot.
22:51 superkuh: With opencl it nearly instantly reloads itself back into vram after being pushed out.
22:51 superkuh: That's the same amdgpu.
22:52 agd5f: superkuh, using ROCm OpenCL?
22:52 superkuh: clBLAST.
22:52 superkuh: ROCm dropped support for the RX 580 a couple years ago. :(