IRC Logs of #radeon on irc.freenode.net for 2024-07-12

05:57 colingpu: Regarding to gfx8+:
05:57 colingpu: 1) When a memory is HOST_VISIBLE & DEVICE_LOCAL, does it mean it is a VRAM memory that is mapped to address space of CPU(CPU access the memory through PCIe bus and no dma involved)?
05:57 colingpu: Or
05:57 colingpu: 2) When a memory is HOST_VISIBLE & DEVICE_LOCAL, does it mean it is a system(host) memory that is mapped to address space of GPU(GPU access to system memory through PCIe bus and no dma involved)?
05:57 colingpu: 3) When a memory is HOST_VISIBLE only, does it mean it is a system(host) memory that is mapped to address space of CPU(CPU access directly to system memory and dma copies data to VRAM)?
05:57 colingpu: 4)In theory, Lets say I dont have VRAM DDR(I have a AMD Embedded series GPU), Can I use system memory for all bo allocations and map fixed 1G size system memory to GPU? Will it work? Can Display Controller access system memory for framebuffer?
06:06 airlied: colingpu: 1 is VRAM mapped via PCIE
06:07 airlied: 3 is system memory, and it isn't DMAed
06:07 airlied: the GPU access system memory directly over PCIE
06:08 airlied: I think APU can scan directly out of system RAM, but it's recommended to use the "VRAM" as it has a faster path
06:26 colingpu: I have a texture that needs to be updated every cycle. Which one has highest performance?
06:26 colingpu: 1.) Allocate staging buffer on system memory(cached and non-coherent) and transfer staging buffer to DEVICE_LOCAL(VRAM) buffer. I think transfer stage will use dma to copy buffer from system memory to vram.
06:26 colingpu: 2.) Allocate buffer HOST_VISIBLE & DEVICE_LOCAL(on VRAM, cached and non-coherent) and use this buffer directly on GPU. I think we first write to CPU cache and use vkFlushMappedMemory to copy data to VRAM via PCIe bus.
06:30 airlied: colingpu: very dependant, if you can use the transfer queue the first might be better, as you could run the transfer in parallel with the GPU doing rendering
06:30 airlied: or use async compute to upload it
06:31 airlied: otherwise 2 is probably more efficient as long as you write it using write combining
06:31 airlied: though you might to take tiling into account in which case 1 might win again
06:31 airlied: but also VK_EXT_host_image_copy is a hting
06:31 airlied: not sure how radv support for it is
06:35 hakzsam: no support
09:20 vedranm: johnny0: OK, same here
13:12 agd5f: All APUs from the past 8-10 years should have the same bandwidth to cached and uncached memory (minus the overhead of the snoop for cached)
14:15 MrCooper: agd5f: when I measured some years ago, VRAM/carveout was faster than system RAM with write-combining, possibly due to page table related overhead. Maybe that gap has been closed or at least narrowed in the meantime though
17:52 Ristovski: hakzsam: hmm, I got VK_EXT_host_image_copy on gfx90c/renoir