06:43colingpu: Hello, i want to create world's simplest 2d triangle drawing code using only libdrm_amgpu and libdrm. May you help?
06:45colingpu: Hello, i want to create world's simplest single file C code drawing a 2d triangle by using only libdrm_amgpu and libdrm for gfx8 based gpu. May you help?
06:52airlied: colingpu: it won't be simple and there is no such thing as a 2d triangle
06:52colingpu: I am currently inspecting amdvlk/pal project. I enabled logging of drm calls. Do you recommend mesa-amd or amdvlk for understanding the code?
06:52airlied: I'd use radv vulkan driver as the closest layer to the kernel
06:53airlied: but a single triangle draw isn't anymore trivial than drawing most of a scene
06:53colingpu: I created a vulkan code drawing a 2d triangle. There are lots of AmdgpuBoCpuMap/AmdgpuBoCpuUnmap calls. I assume these are not needed for simple app.
06:53airlied: do you have a shader compiler?
06:55colingpu: I am setting a spir-v compiled to app. Is there a way to compile shader using amdllpc and set VkShaderModuleCreateInfo.pCode
06:55airlied: don't know, never used amdllpc
06:56airlied: spir-v needs a backend compiler to make it into gpu assembly
06:56airlied: so you'd either have to link your "simple" C code to llpc or load some shader binaries you compiled offline
06:56colingpu: It could be nice if i can directly set pCode to native gpu assembly. Amdvlk is using llvm compiler. I can dump it and upload.
06:57colingpu: I checked that mesa-radv is much simpler than amdvlk.
06:58airlied: no you can't load prebuilt binary easily with vulkan
06:58colingpu: I will try to inspect mesa-radv to learn how it draws.
06:59airlied: the last place I've seen something similiar was when we had r600_demo for reverse engineering the old r600 gpus
06:59airlied: I don't think anyone has written anything like it for newer gpus
06:59airlied: but it's neither a single C file or trivial to write
07:00colingpu: I want to create a custom gpu driver for a baremetal environment to display 2d content. I just need a simple single thread driver.
07:00airlied: just use vulkan
07:00airlied: displaying 2d content isn't special
07:01airlied: or make it any easier
07:01colingpu: OS is not linux, it is zephyr and resource is limited. I need to create a lightweight driver.
07:03airlied: colingpu: but you have a kernel driver?
07:03colingpu: Yes, currently i am trying to use only amdgpu kernel driver/libdrm. After that stage, i will also write a simple kernel driver.
07:04colingpu: I am planning to log all the mmio read/writes and replay on rtos.
07:04airlied: colingpu: do you have a timeline to do this work? I'd reckon it's about 2-3 years of work to do that
07:05airlied: like seriously you are underestimating this effort by probably 2-3 orders of magnitude
07:05airlied: the kernel driver has to load a bunch of firmwares into the card and do a lot of work to make it useable
07:06colingpu: I saw that some reverse engineers are creating a simple 2d drawing driver in 1 month. I thought i can record all the mmio using iotrace commands and replay it.
07:06colingpu: Everthing is static. Memory and addresses are static.
07:06airlied: not sure who or what you've seen, but it's nothing like that simple for at least amdgpus
07:07airlied: replaying won't work
07:07airlied: like the code is all there in the kernel, you could try and figure out which bits you need, but it's quite a lot, esp if you want to turn on the display as well
07:11airlied: you could do the userspace bits as a small experiment, by ripping chunks out of radv, but the kernel driver is a different problem space entirely
07:11colingpu: Is mmio commands are not deterministic? For example, if i log all the mmio commands in a linux environment multiple times, will i see any difference when i compare the logs? I am hypothesizing kernel memory randomization off
07:12airlied: a lot of the programming isn't mmio based
07:12airlied: there's a lot of ring buffers and dma based transfers
07:14colingpu: I saw that creating a buffer is easy using librm_amdvlk. AmdgpuBoAlloc->
07:14colingpu: AmdgpuBoVaOpRaw
07:14colingpu: ->AmdgpuBoCpuMap
07:14airlied: I'm talking about the kernel driver
07:14airlied: none of the userspcae stuff is really mmio based at all
07:15airlied: it creates command buffers in memory that are passed to the kernel driver which submits them to a ring buffer which the gpu consumes them fro
07:15airlied: fro
07:15airlied: from
07:16colingpu: I have not inspected the kernel driver fully. But, i saw initialization of IPs are easy. Every IP has a callback and mmio registers to set a specific values and uploading firmwares is something like i2c.
07:18airlied: the display setup code is a little bit more complex than that
07:19colingpu: Can i use system memory for all the BOs using CPU_GTT? When we use GTT, Do i need dma stuff?
07:19airlied: yes you have to setup page tables on the gpu (kernel driver does that as well)
07:22colingpu: I saw that there are lots of linux kernel files setting up Display controller such as dce110 and dcn. Why is it so hard to just setup a hdmi display controller?
07:22airlied: because it isn't just a hdmi display controller
07:23airlied: it's a display controller for hdmi/displayport and a full HDR pipeline
07:24colingpu: I just need a simple static functionality with HDMI 1920x1080 RGBA with predefined connector id, no-hdr
07:25airlied: and the hw needs a lot of describing to achieve that simplicity
07:25airlied: same as wanting a simple 2D triangle, requires setting up a complete 3D pipeline
07:26colingpu: I think i should also inspect and log dumb buffer example without 3d pipeline to isolate understanding.
07:27airlied: like you could just sw render to a buffer and display it
07:27airlied: without using the 3D engine at all
07:27airlied: that might save a bunch of effort
07:30airlied: you could just use kms apis for that at least
07:30colingpu: So, you state that the hardest IP in amd gpus is display core.
07:32airlied: probably power management, but a lot of that is controlled by fw now, and then display core
07:36colingpu: Debug version of amdvlk.so is 1gb. Loading a application using gdb takes 30second. Mesa-radv is so lightweight
07:37colingpu: Amdvlk is using llvm compiler. Compiling and linking also takes a lot.
07:45airlied: yeah I wouldn't look at amdvlk as an example at all
08:46colingpu: I noticed vkQueueSubmit is calling amdgpu_cs_submit_raw2. But, vkQueuePresentKHR is not calling amdgpu_cs_submit_raw2.
08:47colingpu: In mesa-radv.
08:47colingpu: amdvlk calls in vkQueueSubmit and vkQueuePresentKHR
08:52MrCooper: the RADV behaviour is what I'd expect, vkQueuePresentKHR normally shouldn't require any GPU work of its own in the app
09:01colingpu: Most the drm_amdgpu calls are under the folder of winsys. Does winsys mean os layer or windowing system?
09:08colingpu: vkQueueSubmit(x, 1, &y, 0); has no fence but bool has_user_fence = radv_amdgpu_cs_has_user_fence(request); returning true. Is it expected?
12:00pixelcluster: colingpu: the "winsys" is the os/drm abstraction layer, yes
12:01pixelcluster: and yes, has_user_fence returning true is also expected
12:03pixelcluster: colingpu: also, catching up on the conversation, using spirv for shaders is probably going to pull in a ton more complexity you need to worry about
12:04pixelcluster: the spirv->isa compiler is inevitably going to assume certain things about the driver when generating code to, for example, load descriptors - you need to mirror these things 1:1 in your app if you want to use an existing compiler
12:04colingpu: I have started to create a test function repeating the all the logged drm/drm_amdgpu calls using mesa-radv. I tried using amdvlk. radv is much simpler.
12:05pixelcluster: just repeating syscalls isn't going to work
12:05colingpu: Yes, Only issue is memory locations and exported sync objects/surface.
12:05pixelcluster: no, there are a ton more issues
12:06pixelcluster: this only has a chance to work as long as your app replays exactly the recorded things 1:1
12:07pixelcluster: (and even then, radv relies on writing to mapped memory for a ton of things like uploading compiled shaders, and tracing syscalls won't capture that)
12:08colingpu: I think I should also trace memcpy calls.
12:09pixelcluster: but even if you find a way to replay mapped memory writes 1:1 as well, as soon as you want to change literally anything (like, for example, whether to draw 1 triangle or 2), you have to generate a different command stream, and thus you have to write a ton of code to generate command streams
12:10pixelcluster: radv only memcpy's the final generated command stream to GPU memory, so tracing would fail you there (if it hasn't already failed you way before)
12:12pixelcluster: essentially what I'm trying to say is that the amount of boilerplate you have to write to use graphics hardware at all is HUGE, you essentially have to write half the driver before you can even start executing a vertex shader
12:13colingpu: Shader and commands stream will not change. Application will stay same. Only vertex data will change.
12:15colingpu: I assume that we can draw all the 2d scene using only single draw command.
12:18pixelcluster: colingpu: but will the vertex count change, for example? or the resolution of the image you're rendering to?
12:20colingpu: No, resolution is also same. It is drawing to directly display. No windowing system. Yes, vertex count will change. I am planning to draw a scene using 10000 vertex on pc, record the function calls and replay in baremetal rtos system.
12:21pixelcluster: well, there you have it, changing the vertex count also requires changing the command stream
12:22pixelcluster: why not just port radv or something like that, it'd be way less effort than trying to rewrite it (badly)
12:25colingpu: They said it can take 1 year to port radv. Initially i will try to create a simple and quick demo. After minimal example, i assume more developers can be involved.
12:27colingpu: Today, i moved to mesa-radv to understand the driver. I spent 2 week on amdvlk. Radv is more intuitive.
12:27pixelcluster: who said that?
12:27pixelcluster: that might be true, but writing your own will take even longer
12:29colingpu: I thought that because amdvlk was created by AMD, it is much better documented.
12:30colingpu: Yes, i am planning to port radv as single developer. Also, i have not ported any gpu driver before. I assume it will take 1 year.
12:34colingpu: Is there a way to log all the drm/drm_amdgpu in radv. There was a #define switch in amdvlk which was enabling dumping all calls to file.
12:34pixelcluster: no
12:35colingpu: I think i will insert breakpoint to all drm calls.
12:35pixelcluster: you'd need to do strace or similar, but again, just dumping traces won't get you anywhere
12:35pixelcluster: what hw are you doing this for anyway? zephyr looks like it's intended to be used by microcontrollers, and slapping a gpu onto a microcontroller doesn't make much sense to me
12:37pixelcluster: but anyway, if you have a libdrm interface in zephyr, what is there really to port in radv? why shouldn't it be able to just continue with libdrm
12:37colingpu: zephyr can run on raspberry-pi 4. raspberry-pi 4 has pcie controller.
12:37pixelcluster: but you don't really put a whole gpu in a raspi do you
12:38pixelcluster: i have my doubts about the pi's pci power delivery being sufficient for a gpu
12:40daniels: not even remotely close - you'd need an external PSU
12:40colingpu: There is a guy. He can run almost all gpus using raspi. https://pipci.jeffgeerling.com
12:40colingpu: Yes, i use external PSU.
12:40daniels: tbh you'd be much better off just using virtualisation to have a Linux system to run your GPU for you
12:41daniels: as the others have said, even though you only want 0.1% of the functionality to draw a low-complexity scene, you still need to have 60% of the driver stack
12:41daniels: there's no way to shortcut the complexity of the hardware and 'just' do something small - you still need to address the full complexity of the hardware to do anything at all
12:42colingpu: True.
16:13agd5f: colingpu, you can't use the 3D engine directly via MMIO. It needs to go through the ring buffers
16:15agd5f: colingpu, there are a bunch of simple tests in IGT for radeon gpus