00:38DemiMarie: Does virglrenderer have a security policy?
09:32ishitatsuyuki: surely it does, it's probably the part with the largest attack surface inside an otherwise secure VM
18:01Plagman: karolherbst: trying to run https://github.com/opencv/opencv_zoo/tree/main/models/face_detection_yunet against rusticl radeonsi, i get really low performance/occupancy and an error about a cl program build failure - do you know if it's expected to work?
18:02Plagman: it's running part of it on the gpu, but then runs on a single cpu thread for a long while and gives me ~10fps
18:02Plagman: compared to 40 on CPU
21:01mareko: Company: we provide a driver installer that installs Mesa on older distros of RHEL, SLES, and Ubuntu
21:10Company: does that installer build its own llvm and Rust?
21:30karolherbst: Plagman: is it any better with ROCm? Could also be just bad code, which is often the case with those AI/ML libs
21:30Plagman: rocm instantly hangs my system so not really - i didn't get it working with it so far
21:31karolherbst: :') sounds like rusticl is already better then
21:31Plagman: bad code is likely, looking at a trace i see the actual gpu compute time is maybe 11ms total
21:31Plagman: and all the other time might be spent in a slow readback
21:31karolherbst: yeah.. the libs I've looked into often busy waited on the CPU or other crazy things
21:31Plagman: seems like the opencl stuff was written for intel gpus primarily
21:31karolherbst: ahh..
21:31karolherbst: yeah, intel has kinda the best CL stack atm
21:31Plagman: is there a way to force caching for allocations in rusticl somehow?
21:32karolherbst: you mean like keeping a copy of the data in RAM?
21:32Plagman: i'm still trying to trace the slow memmove()s
21:32Plagman: more like the cache coherent flag on the mapping
21:32karolherbst: ahh
21:33karolherbst: if there is a special gallium flag I should set, that could help, but usually those things are kinda up to the driver otherwise
21:33karolherbst: have you tried using zink?
21:33Plagman: i tried, yeah - it runs one frame and then times out the gpu
21:33karolherbst: mhh
21:33karolherbst: are you using main or some release?
21:33Plagman: it doesn't run directly on my host display because i'm using the amdgpu ddx so i had to point it to a gamescope display
21:34Plagman: mesa is 24.0.5
21:35karolherbst: do you have any more info on that program build failure btw? Not sure if "RUSTICL_DEBUG=program" already works on 24.0 or when I've added it, but often it's also helpful to run a build with asserts enabled to see why zink or other drivers are unhappy about things
21:36Plagman: it seems non-fatal, so i'm guessing it's a test build to see if -cl-no-subgroup-ifp is supported
21:36karolherbst: ahh
21:37karolherbst: maybe I should handle this flag then, if clang doesn't like it
21:37airlied: do we know what is ifp there?
21:37karolherbst: independent forward progress
21:37karolherbst: it's related to "CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS"
21:37Plagman: fwiw trying to run that sample is real easy
21:38karolherbst: yeah.. I could take a look tomorrow and see if it runs any better on iris or so
21:38Plagman: from the face_detection_yunet of the repo above it's just `RUSTICL_ENABLE=radeonsi python demo.py`
21:38Plagman: or building the cpp version works too
21:38Plagman: the rest seems to just be working from arch packages of opencv/etc
21:39airlied: ah CL_intel_subgroups related
21:39karolherbst: yeah..
21:39karolherbst: it's optional in CL 3.0 and I have no idea if I'm in the mood of wiring it up in gallium if nothing really needs it
21:39karolherbst: so I'd just do nothing with that flag and prevent a compilation error
21:40airlied: I wonder why intel added it, they must have some hw that can't do ifp
21:41karolherbst: might be...
21:41karolherbst: maybe I should ask Ben
21:41karolherbst: uhh.. Ben Ashbaugh
21:41karolherbst: Ben usually knows those things
21:42Plagman: uhhh
21:42Plagman: i'm guessing that finding that i'm clCreateKernel() after breaking into the program once it's been running for a while is like.. bad
21:42Plagman: right?
21:43Plagman: like that's a pipeline build equivalent or whatever?
21:43karolherbst: shouldn't matter
21:43karolherbst: nah... clCreateKernel is ran after all the shaders have been built
21:43Plagman: ah ok
21:43karolherbst: CL is a bit weird
21:43Plagman: i know next to nothing about it so i thought it was a shader build
21:43karolherbst: clCreateKernel is like.. creating an execution environment for a compiled entry pointer
21:43karolherbst: *point
21:44karolherbst: clBuildProgram and clLinkProgram will generate the binaries in rusticl. a "cl_kernel" is more like a thing holding the kernel input parameters (like function parameters) and is the interface to launch code
21:45Plagman: yeah ok
21:48Plagman: i forgot to mention i had to edit the sample to use CL, a one-liner edit: [cv.dnn.DNN_BACKEND_OPENCV, cv.dnn.DNN_TARGET_OPENCL],
21:52karolherbst: mhh.. might be untested
21:52Plagman: it's cool that it works!
21:53karolherbst: yeah.. that's the whole idea
21:53Plagman: clvk looks very similar as well, so app-side being dumb seems likely
21:53karolherbst: annoying that zink doesn't work then. I should take a look tomorrow then why zink breaks down
21:54karolherbst: Plagman: what is it using by default? CPU?
21:54Plagman: yeah, seems like
21:55Plagman: if the current compute runtime is any indication, maybe it's be ~90fps on my 6900XT
21:55Plagman: vs. the 40FPS it gets on my monster CPU going wide on all cores
21:55karolherbst: mhh..
21:56karolherbst: I'd hope for a bigger difference tbh
21:57karolherbst: Plagman: how much is the GPU idle when running that stuff?
21:57Plagman: it is a big cpu to be fair
21:57Plagman: it's like 10ms of compute on the gpu then idle, usage doesn't go above 10% in umr
21:58Plagman: that's how i'm theorizing 90fps if it wasn't idling
21:58karolherbst: oof
21:59karolherbst: I wonder if the main CPU thread is just busy all the time?
22:00karolherbst: but anyway... could also be just terrible offloading
22:00karolherbst: but I do wonder how much CPU time is spend in rusticl vs on whatever the lib is doing
22:00karolherbst: and if I could optimize it a bit
22:00Plagman: it doesn't seem like it, seems like either waiting or very slow memcpys
22:00karolherbst: mmhhh...
22:00karolherbst: what would be the fastest way to copy VRAM to host memory?
22:01karolherbst: maybe I should just optimize that part
22:01karolherbst: maybe there is some magic gallium flag I have to set
22:01Plagman: i know that in vulkan if you map vram without the cache coherent flags you can spend a looong time in memcpy on discrete
22:02karolherbst: mhhh, I see
22:02airlied: should be easily spottable in perf top
22:02Plagman: we were wondering if maybe that's what's happening here - but maybe it's the app deciding the flags and not the driver, not sure
22:02karolherbst: CL doesn't really have any flags like that
22:02karolherbst: but the way CL memory maps work is well... super annoying
22:03karolherbst: I have some very suboptimal paths in that area
22:03karolherbst: but it's mostly a limitation of how the CL API works and what gallium provides
22:04Plagman: perf has a memove in rusticl queue t near the top
22:04karolherbst: what's the callchain?
22:06karolherbst: but it kinda feels something very suboptimal is happening
22:08mareko: Company: it doesn't build any userspace components, but it includes LLVM binaries, I don't think it contains any binaries built from rust
22:08Plagman: https://gist.github.com/Plagman/934273eae0e34a17af8375e2895052e0
22:08Plagman: i can't tell if it's the same one as perf shows or not
22:09Company: makes sense
22:09karolherbst: Plagman: probably not
22:10karolherbst: memmove is kinda very generic, would have to check which of the callers have high CPU usage
22:10karolherbst: that one just points to a constructor basically
22:16Plagman: perf isn't picking up debuginfod symbols so i'm not sure
22:17karolherbst: okay... maybe I'll figure something out then, not sure when I'll find some time for it to dig deeper
23:56DemiMarie: ishitatsuyuki: I hope so too.