00:38 DemiMarie: Does virglrenderer have a security policy?
09:32 ishitatsuyuki: surely it does, it's probably the part with the largest attack surface inside an otherwise secure VM
18:01 Plagman: karolherbst: trying to run https://github.com/opencv/opencv_zoo/tree/main/models/face_detection_yunet against rusticl radeonsi, i get really low performance/occupancy and an error about a cl program build failure - do you know if it's expected to work?
18:02 Plagman: it's running part of it on the gpu, but then runs on a single cpu thread for a long while and gives me ~10fps
18:02 Plagman: compared to 40 on CPU
21:01 mareko: Company: we provide a driver installer that installs Mesa on older distros of RHEL, SLES, and Ubuntu
21:10 Company: does that installer build its own llvm and Rust?
21:30 karolherbst: Plagman: is it any better with ROCm? Could also be just bad code, which is often the case with those AI/ML libs
21:30 Plagman: rocm instantly hangs my system so not really - i didn't get it working with it so far
21:31 karolherbst: :') sounds like rusticl is already better then
21:31 Plagman: bad code is likely, looking at a trace i see the actual gpu compute time is maybe 11ms total
21:31 Plagman: and all the other time might be spent in a slow readback
21:31 karolherbst: yeah.. the libs I've looked into often busy waited on the CPU or other crazy things
21:31 Plagman: seems like the opencl stuff was written for intel gpus primarily
21:31 karolherbst: ahh..
21:31 karolherbst: yeah, intel has kinda the best CL stack atm
21:31 Plagman: is there a way to force caching for allocations in rusticl somehow?
21:32 karolherbst: you mean like keeping a copy of the data in RAM?
21:32 Plagman: i'm still trying to trace the slow memmove()s
21:32 Plagman: more like the cache coherent flag on the mapping
21:32 karolherbst: ahh
21:33 karolherbst: if there is a special gallium flag I should set, that could help, but usually those things are kinda up to the driver otherwise
21:33 karolherbst: have you tried using zink?
21:33 Plagman: i tried, yeah - it runs one frame and then times out the gpu
21:33 karolherbst: mhh
21:33 karolherbst: are you using main or some release?
21:33 Plagman: it doesn't run directly on my host display because i'm using the amdgpu ddx so i had to point it to a gamescope display
21:34 Plagman: mesa is 24.0.5
21:35 karolherbst: do you have any more info on that program build failure btw? Not sure if "RUSTICL_DEBUG=program" already works on 24.0 or when I've added it, but often it's also helpful to run a build with asserts enabled to see why zink or other drivers are unhappy about things
21:36 Plagman: it seems non-fatal, so i'm guessing it's a test build to see if -cl-no-subgroup-ifp is supported
21:36 karolherbst: ahh
21:37 karolherbst: maybe I should handle this flag then, if clang doesn't like it
21:37 airlied: do we know what is ifp there?
21:37 karolherbst: independent forward progress
21:37 karolherbst: it's related to "CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS"
21:37 Plagman: fwiw trying to run that sample is real easy
21:38 karolherbst: yeah.. I could take a look tomorrow and see if it runs any better on iris or so
21:38 Plagman: from the face_detection_yunet of the repo above it's just `RUSTICL_ENABLE=radeonsi python demo.py`
21:38 Plagman: or building the cpp version works too
21:38 Plagman: the rest seems to just be working from arch packages of opencv/etc
21:39 airlied: ah CL_intel_subgroups related
21:39 karolherbst: yeah..
21:39 karolherbst: it's optional in CL 3.0 and I have no idea if I'm in the mood of wiring it up in gallium if nothing really needs it
21:39 karolherbst: so I'd just do nothing with that flag and prevent a compilation error
21:40 airlied: I wonder why intel added it, they must have some hw that can't do ifp
21:41 karolherbst: might be...
21:41 karolherbst: maybe I should ask Ben
21:41 karolherbst: uhh.. Ben Ashbaugh
21:41 karolherbst: Ben usually knows those things
21:42 Plagman: uhhh
21:42 Plagman: i'm guessing that finding that i'm clCreateKernel() after breaking into the program once it's been running for a while is like.. bad
21:42 Plagman: right?
21:43 Plagman: like that's a pipeline build equivalent or whatever?
21:43 karolherbst: shouldn't matter
21:43 karolherbst: nah... clCreateKernel is ran after all the shaders have been built
21:43 Plagman: ah ok
21:43 karolherbst: CL is a bit weird
21:43 Plagman: i know next to nothing about it so i thought it was a shader build
21:43 karolherbst: clCreateKernel is like.. creating an execution environment for a compiled entry pointer
21:43 karolherbst: *point
21:44 karolherbst: clBuildProgram and clLinkProgram will generate the binaries in rusticl. a "cl_kernel" is more like a thing holding the kernel input parameters (like function parameters) and is the interface to launch code
21:45 Plagman: yeah ok
21:48 Plagman: i forgot to mention i had to edit the sample to use CL, a one-liner edit: [cv.dnn.DNN_BACKEND_OPENCV, cv.dnn.DNN_TARGET_OPENCL],
21:52 karolherbst: mhh.. might be untested
21:52 Plagman: it's cool that it works!
21:53 karolherbst: yeah.. that's the whole idea
21:53 Plagman: clvk looks very similar as well, so app-side being dumb seems likely
21:53 karolherbst: annoying that zink doesn't work then. I should take a look tomorrow then why zink breaks down
21:54 karolherbst: Plagman: what is it using by default? CPU?
21:54 Plagman: yeah, seems like
21:55 Plagman: if the current compute runtime is any indication, maybe it's be ~90fps on my 6900XT
21:55 Plagman: vs. the 40FPS it gets on my monster CPU going wide on all cores
21:55 karolherbst: mhh..
21:56 karolherbst: I'd hope for a bigger difference tbh
21:57 karolherbst: Plagman: how much is the GPU idle when running that stuff?
21:57 Plagman: it is a big cpu to be fair
21:57 Plagman: it's like 10ms of compute on the gpu then idle, usage doesn't go above 10% in umr
21:58 Plagman: that's how i'm theorizing 90fps if it wasn't idling
21:58 karolherbst: oof
21:59 karolherbst: I wonder if the main CPU thread is just busy all the time?
22:00 karolherbst: but anyway... could also be just terrible offloading
22:00 karolherbst: but I do wonder how much CPU time is spend in rusticl vs on whatever the lib is doing
22:00 karolherbst: and if I could optimize it a bit
22:00 Plagman: it doesn't seem like it, seems like either waiting or very slow memcpys
22:00 karolherbst: mmhhh...
22:00 karolherbst: what would be the fastest way to copy VRAM to host memory?
22:01 karolherbst: maybe I should just optimize that part
22:01 karolherbst: maybe there is some magic gallium flag I have to set
22:01 Plagman: i know that in vulkan if you map vram without the cache coherent flags you can spend a looong time in memcpy on discrete
22:02 karolherbst: mhhh, I see
22:02 airlied: should be easily spottable in perf top
22:02 Plagman: we were wondering if maybe that's what's happening here - but maybe it's the app deciding the flags and not the driver, not sure
22:02 karolherbst: CL doesn't really have any flags like that
22:02 karolherbst: but the way CL memory maps work is well... super annoying
22:03 karolherbst: I have some very suboptimal paths in that area
22:03 karolherbst: but it's mostly a limitation of how the CL API works and what gallium provides
22:04 Plagman: perf has a memove in rusticl queue t near the top
22:04 karolherbst: what's the callchain?
22:06 karolherbst: but it kinda feels something very suboptimal is happening
22:08 mareko: Company: it doesn't build any userspace components, but it includes LLVM binaries, I don't think it contains any binaries built from rust
22:08 Plagman: https://gist.github.com/Plagman/934273eae0e34a17af8375e2895052e0
22:08 Plagman: i can't tell if it's the same one as perf shows or not
22:09 Company: makes sense
22:09 karolherbst: Plagman: probably not
22:10 karolherbst: memmove is kinda very generic, would have to check which of the callers have high CPU usage
22:10 karolherbst: that one just points to a constructor basically
22:16 Plagman: perf isn't picking up debuginfod symbols so i'm not sure
22:17 karolherbst: okay... maybe I'll figure something out then, not sure when I'll find some time for it to dig deeper
23:56 DemiMarie: ishitatsuyuki: I hope so too.