IRC Logs of #dri-devel on irc.freenode.net for 2023-12-16

09:38 soreau: if I wanted to convert a cpu rendering algorithm to gpu (using compute or opencl) with the intent to use the result as a texture, what would be a good approach so that it performs well?
09:39 soreau: is there a way to keep the buffer on the gpu without having to copy it from vram to ram and back (to a texture)?
09:43 tnt: soreau: Yeah, you can use opencl or gl-compute to render/write to a gl texture directly.
09:44 tnt: Or you can also use Vulkan compute to do the same to a Vulkan texture if you want to use Vulkan.
09:44 soreau: this would be an egl gles2 type of thing
09:45 tnt: not sure if OpenGL ES2.x has compute shaders already.
09:45 tnt: I think that's a 3.x thing.
09:46 tnt: What's the target hardware ?
09:46 soreau: there's no real target, it would be a client, basically
09:47 tnt: Is requiring ES3.x viable ? Should be an easy requirement by today's standard.
09:48 soreau: yes, it is
09:48 soreau: do you happen to know if there's any code that does something similar to this that I can look at?
09:49 tnt: Just googling "Compute Shader" should yield plenty.
09:49 soreau: specifically, the 'staying on the gpu' part
09:49 soreau: I know how to setup and write the shader mostly
09:50 tnt: There is nothing special to do. You allocate the texture like you would normally then give it as argument to the compute shader, then you can use it to texture stuff.
09:51 soreau: well, writing the code is a bit special :)
09:51 tnt: The driver would have no reason to copy it out of VRAM so it won't.
09:52 soreau: so alloc the texture, pass it to the program, scribble on it, wait, and render
09:55 soreau: does compute require using a specific render node?
09:55 tnt: No need to wait, AFAIR the driver will handle the dependency.
09:55 tnt: Not sure what you mean ?
09:55 soreau: well, I assume you're talking only about compute
09:56 soreau: I mean like having to open /dev/dri/renderD128/9
09:57 tnt: Huh, you don't do any of that yourself, mesa does that for you, the GL driver handles that.
09:57 soreau: ok so there's no context switching then?
09:57 tnt: you just get a GL context ... and then use GL.
09:59 ishitatsuyuki: yeah I think allocations in GL just go to VRAM most of the time
10:00 ishitatsuyuki: in Vulkan you get to control it but as general principle CPU-side RAM will only be used for staging buffer for CPU->GPU upload only
10:02 soreau: the other question I have, is: are read operations any faster in compute than 'regular' glsl?
10:04 ishitatsuyuki: i suppose you mean graphics / fragment shader instead of 'regular'
10:04 ishitatsuyuki: at the end of the day it depends on your read pattern and therefore cache hit rate
10:05 ishitatsuyuki: images and buffers also have different layouts, the former is usually configured to be tiled (e.g. space filling curve order) while the latter is linear (the order you would have if you wrote buf[w][h] in C)
10:05 soreau: I just know that texture2D() is a killer if used heavily
10:05 ishitatsuyuki: not really
10:05 ishitatsuyuki: also you should account for the fact that dGPUs have like 10x the raw power of CPUs
10:06 ishitatsuyuki: so even if the GPU code is not optimized to the extreme it can still be fairly useful
10:06 soreau: well that's great and all, but IME writing blur algorithms in glsl, texture2D() can get expensive
10:06 ishitatsuyuki: well that's the same on CPU as well
10:06 ishitatsuyuki: you should do separable convolution for blurs to begin with
10:07 soreau: yes we do
10:07 soreau: I'm more interested in compute shaders, not cpu rendering (it already works on cpu slowly enough)
10:07 ishitatsuyuki: anything can become bottleneck if spammed too much
10:08 soreau: and of course the bigger the size, yea
10:08 ishitatsuyuki: in case of blurs, seek efficient approxmiations
10:08 ishitatsuyuki: kawase blur, multi pass box blur, IIR filters, etc.
10:08 soreau: I think we've exhausted those avenues
10:08 ishitatsuyuki: IIR is a bit annoying to do on GPUs
10:09 soreau: but if you'd like to have a look, we'd always like to optimize further :P
10:09 Company: soreau: your assumptions are all wrong
10:09 Company: as a rule of thumb
10:09 soreau: Company: no, you're wrong :D
10:09 Company: nothing you think you know about GPUs matches reality
10:10 soreau: Company: bored again?
10:10 Company: you'll figure that out once you start learning about GPUs
10:11 Company: "I just know that texture2D() is a killer if used heavily" or such are things you picked up reading something somewhere, but you likely fail to remember the context it was said in
10:12 soreau: well, it depends on the size of the texture really
10:12 soreau: and I'm talking about blurring HD resolutions with glsl
10:12 soreau: 1920x1080+
10:13 soreau: Company: also this is nothing I read on reddit, this is IME
10:13 soreau: I don't waste time reading garbage
10:14 soreau: I usually waste more time writing garbage, lol
10:14 soreau: but net producer, so..
10:24 Company: the crappy GTK blur filter does only 150fps on 4k
10:25 Company: and it seems to depend on blur radius, somebody should fix that
10:27 soreau: anyway, we use a lot of tricks for blur such as (ab)using blits to degrade the texture upfront and blur a smaller area than the actual size, then scale back up.. multi pass options, different shaders including kawase, box and guassian (even bokeh effect) probably everything except IIR filters
10:29 soreau: and there's always room for improvement
10:33 soreau: also blurring in a client tk is a bit different than doing it in the compositor, dealing with multiple surfaces, stacking wm, etc.
10:38 soreau: scissoring damage regions too, and the list goes on
11:09 kode54: I just know
11:09 kode54: turning off the blur filter causes the compositor's gpu usage to drop to ~10%
11:09 kode54: while turning it on increases it to about 30%
11:15 Company: blurring is generally not something you should do if you want to go fast
11:16 Company: it's something you do if you want to put screenshots of your neofetch output on /r/unixporn
11:18 soreau: It all depends on how they want to spend their resources. Disabled by default, it's there if they want it.
14:16 ishitatsuyuki: what kind of gpu are we talking about here for perf numbers? gaming dgpus? laptop igpus?
14:53 Company: ishitatsuyuki: that was a Radeon RX 6500 XT
14:53 ishitatsuyuki: ok
16:33 Company: pq, emersion: re the dmabuf format discussion - an example I just ran into is DRM_FORMAT_RGB565 - does that map to VK_FORMAT_R5G6B5_UNORM_PACK16 or VK_FORMAT_B5G6R5_UNORM_PACK16?
16:34 Company: because drm formats and VK formats do their packing slightly differently
16:35 Company: (this one exists in Mesa, so I can look it up)
16:35 emersion: Company: use pixfmtdb to find out
16:37 Company: oh, that's neat