00:09cphealy: With Android, HWComposer abstracts out how buffer scaling is done such that even when SoC scaling HW is implemented via a separate IP core that requires a mem2mem operation, it's just handled somewhere below hwcomposer and the application never needs to know about it. With the DRM/KMS framework on the other hand, my understanding is that it is not permissible to have a DRM/KMS display controller driver
00:09cphealy: utilize a mem2mem based scaler IP core for scaling and that the only time HW plane scaling is supported via the DRM/KMS API is when scaling can be done "on-the-fly" in the display controller. Is this correct?
02:11alyssa: Could someone with Intel CI privileges test !6064? Specifically iris+clover is implicated
02:11alyssa: Thank you :)
04:58jekstrand: alyssa: We don't have iris+clover in CI
05:00jekstrand: alyssa: But the iris bit looks good to me so feel free to merge as far as I'm concerned. I like the change.
05:01imirkin: cphealy: drm only supports scanout-engine-based scaling (or otherwise transparent) -- not some sort of "active" scaler.
08:43emersion: cphealy: yea this sounds correct i think. maybe this is relevant https://blog.ffwll.ch/2018/08/no-2d-in-drm.html
08:47pq: cphealy, mem2mem operations take time: you have to wait for the IP to take your work and then wait for it to finish. KMS API more or less implicitly promises that it does not add such delays on its own, so that userspace can have predictable timings.
10:20Sumera[m]: danvet: I read that a vblanking interval is the best time to switch source buffer without causing tears, what is a scenario where source buffer needs to be changed? What other framebuffer modifications are more suitable for being made during this time?
10:24emersion: you should use double-buffering, so that you never need to do front framebuffer modifications at all
13:42emersion: who is responsible for libdrm releases? danvet do you know?
13:44emersion: ah, RELEASING says everybody is the release manager
13:45emersion: i'd like a new release with updated kernel headers
14:02danvet: emersion, do it
14:02danvet: all you need is commit rights
14:02emersion: ok, cool
14:02danvet: the release doc should be complete wrt all the steps
14:02emersion: are there any shell access requirements to upload the tarballs?
14:03danvet: maybe, not sure
14:03danvet: maybe ask on #freedesktop
14:04danvet: I'm also not sure we still to tarballs tbh
14:14cwabbott: jekstrand: ping on https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7989... this is the rework that you asked for
14:30tzimmermann: what's the purpose of drm_bind_agp, drm_unbind_agp, etc ? they don't do anything
16:14Rodrigo_: what's the preferred way to update regions of a GL buffer synchronized with rendering? I'm currently using BufferSubData, but it seems to tank performance on mesa
16:14Rodrigo_: would an ARB_buffer_storage/AMD_pinned_memory buffer + Copy(Named)BufferSubData fall in a better path?
16:14imirkin: Rodrigo_: the preferred way would be no to try to synchronize with rendering
16:16Rodrigo_: ugh... I'll try tracking if it has been modified from the GPU, and if it isn't, fall to a persistent stream buffer on mesa
16:16Rodrigo_: sometimes it has to be synchronized to 3D rendering due to the guest application using CbData
16:16imirkin: basically if you sync with rendering, performance tanks
16:17imirkin: using glBuffer(Sub)Data on a constbuf should not have undue perf implications
16:17imirkin: i guess it can depend on the driver
16:17imirkin: (and the hardware)
16:18Rodrigo_: on NV's blob I use BufferSubData/ProgramBufferParametersIuivNV, which seems to be the fastest path for cbuf streaming
16:18imirkin: yeah, nvidia has "magic" which enables multiple "simultaneous" values in a constbuf
16:19imirkin: i.e. draw; update constbuf; draw; update constbuf; draw -- the writes don't stomp over themselves, no waiting needed
16:19imirkin: however for other hardware it's better to just put the data in different place in e.g. a persistent buffer
16:19imirkin: uniform buffer object
16:20imirkin: aka constbuf depending on one's terminology
16:20Rodrigo_: I mean this one https://github.com/envytools/envytools/blob/master/rnndb/graph/gf100_3d.xml#L1233
16:20imirkin: this has the "magic" properties i was mentioning
16:21Rodrigo_: yeah, that fast path is "documented" on a gdc presentation
16:22imirkin: and on top of nvidia hardware, updating the constbuf doesn't cause a sync event
16:22imirkin: however with other hardware, i'm not sure - depends
16:22imirkin: i assume this is for switch emulation?
16:22Rodrigo_: yes, we have to emulate that exact behavior :P
16:22Rodrigo_: most of the time it's just writing gl_Base*, so it gets ignored
16:23Rodrigo_: I'll try using a classic stream buffer on mesa as long as the data hasn't been written from the GPU
16:24Rodrigo_: if it has, I guess there's no other option than BufferSubData
16:24imirkin: so wait, why can't you just use regular GL features to not worry about any of this?
16:24Rodrigo_: BufferSubData is being slow on radeonsi
16:25imirkin: no, i mean
16:25imirkin: writing (with a ssbo) to a buffer which is used as a ubo
16:25imirkin: should Just Work (tm)
16:25imirkin: ah right, but your issues are when the application uploads to cbdata? i see.
16:26Rodrigo_: yeah, atm I'm doing everything in-place
16:26Rodrigo_: except for cbufs smaller than 4096 bytes (on NV)
16:28Rodrigo_: on Vulkan I'm writing to a HOST_VISIBLE | HOST_COHERENT buffer + vkCmdCopyBuffer, that seems to run fine-ish everywhere except on Intel's blob
16:33dschuermann: is there a guarantee that nir_opt_algebraic_late matches all lowering pattern in one pass? e.g. lower_negate
16:41Rodrigo_: thanks, rebooting to write a GL stream buffer
18:22ajax: is there something like shadertoy but for compute shaders?
18:25imirkin: harder to make that visual
18:27ajax: interpreting the output as x8r8g8b8 would be fine for my purposes
18:27imirkin: "the output"?
18:28imirkin: i think you just want shadertoy ... the frag shader is basically a compute shader which does that.
18:31robclark: if you don't mind it not being portable to anything other than a6xx, there is computerator :-P
18:31ajax: let me make this a bit more concrete. lets say i wanted to do, like dxt1 compression on the gpu. the natural size of a work unit is the 4x4 block of pixels. the visualization of the output would be the expansion back to rgb for that tile.
18:31ajax: i could certainly do that in a frag shader but it'd be 16x slower than necessary
18:31HdkR: Also, WebGL2.0 compute isn't enabled by default in most browsers
18:32ajax: i'm fine with this being a native app
18:33ajax: i just don't feel like writing the app myself
18:33imirkin: ajax: go bit with ASTC!
18:33ajax: i figured that'd be step two, yeah
18:33HdkR: A GL compute test app would take about an hour to whip together. I recommend it :P
18:33anholt: cool, valgrind is just not getting backtraces for me now? that's great.
18:34ajax: HdkR: you flatter me if you think that's how fast i can code
18:34imirkin: HdkR: yeah, look at how many years it took him to get GLX to where it is today :)
18:34robclark: ajax: https://github.com/elima/gpu-playground/blob/master/render-nodes-minimal/main.c might be useful?
18:35imirkin: ajax: jk obviously ;)
18:35FreeFull: https://docs.mesa3d.org/envvars.html doesn't list DRI_PRIME. Which piece of documentation lists it? (and other DRI env variables)
18:35ajax: no offense taken, i know my track record, and it's mostly the delete key
18:36imirkin: in all seriousness, deleting code is so much harder than adding code
18:36HdkR: Just add a bunch of code first, then delete it :D
18:37ajax: robclark: that looks promising, thanks!
18:37imirkin: HdkR: yeah, i try to do that, but somehow bits get left over
18:37HdkR: `178,872++, 67,895--` I need to work harder on my numbers
18:38HdkR: At least double those deletions
18:38robclark: ajax: there is also https://github.com/robclark/gpu-playground/commits/hacks .. where I added some things/hacks.. which may or may not be useful
18:38ajax: tsk. my net contribution to mesa is positive LoC.
18:39HdkR: `320,513++, 199,769--` Dolphin pretty big number moves as well though :D
18:39imirkin: (what's FEX?)
18:39alyssa: Ooo ooo do me! :p
18:39HdkR: alyssa: But you'll win
18:39HdkR: imirkin: x86/x86-64 userspace Linux emulator for running inside of an AArch64 process. wee
18:39alyssa: how are we generating these stats?
18:39imirkin: HdkR: like qemu-softmmu?
18:40HdkR: imirkin: But fast
18:40ajax: alyssa: i was measuring with 'git log --author ajax | diffstat -s'
18:40ajax: er, -p in the obvious place there
18:41ajax: +41654 -21443
18:41airlied: ajax: https://godotengine.org/article/betsy-gpu-texture-compressor
18:41ajax: xserver though, +52546 -420606
18:41airlied: might be of orthogonal interest :-P
18:41HdkR: whoa, big numbers
18:41alyssa: 732 files changed, 103301 insertions(+), 60913 deletions(-)
18:42ajax: airlied: ooh yay code i can just steal instead
18:42bnieuwenhuizen: ajax: that command doesn't quite work for me "shows 0 on mesa when git log shows lots of stuff"
18:42HdkR: need the -p on `git log` bit
18:43ajax: bnieuwenhuizen: did you add the -p between log and --author?
18:43bnieuwenhuizen: no :P
18:43HdkR: alyssa: I'm surprised it is that low after writing multiple compilers. Need to glue a bunch of LLVM in there to bulk the numbers or something :P
18:44alyssa: HdkR: lol
18:44alyssa: HdkR: Hey, question, why th are you opped here?:p
18:44airlied: 3295 files changed, 289148 insertions(+), 138318 deletions(-) need to delete more
18:44bnieuwenhuizen: hmm, only "47401 insertions(+), 14702 deletions(-)". I need Dave to write my commits more often :)
18:45airlied: need to keep crocus small, and delete i965 :-P
18:45HdkR: alyssa: Mostly to ensure that if anyone needs a target to angrily yell at; I'm always available.
18:45anholt: feeling pretty good about my mesa stats, if they're to be trusted at all. 6346 files changed, 472407 insertions(+), 1458729 deletions(-)
18:46bnieuwenhuizen: how does it count moving files around btw?
18:46zackr: ajax: what do you need a gpu texture compressor for?
18:46FreeFull: Where is the DRI_PRIME environment variable actually documented?
18:47ajax: zackr: thinking it'd be a clever way to handle something like rapid screen updates in rdp without worrying about h264 licensing (long, exasperated sigh)
18:48ajax: while animating, compress before downloading to the cpu to minimize both pcie traffic and network bandwidth
18:48bnieuwenhuizen: I don't think they're generally usable for realtime compression though?
18:48airlied: yeah generally they are too slow for online compression afaik
18:48ajax: well that's the question, isn't it. given the use case here it's not like the quality need to be awesome, i'll be ending with a full-bitrate frame when the screen idles again
18:49imirkin: ajax: sorta like MJPEG?
18:49bnieuwenhuizen: maybe try vp8/vp9 encoding instead?
18:49bnieuwenhuizen: (yes I know HW availability is worse)
18:50ajax: how common is vp8/9 decode on the client side? (honest question)
18:51ajax: also, if we really are talking about rdp specifically, i _know_ the viewer is going to support nscodec and remotefx
18:51dschuermann: is there a particular reason for this MR being on hold? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8411
18:51ajax: if i can do an adequate job of either one in realtime and not need the cpu to do it...
18:51bnieuwenhuizen: vp9 somewhat newist, like since 2017 or so most new HW that gets released should probably have it?
18:52bnieuwenhuizen: I think vp8 never got widespread decode support across older HW
18:52pendingchaos: dschuermann: freedreno CI looks broken again
18:52dcbaker:was about to say that
18:52agd5f: ajax, Raven and newer APUs, navi and newer dGPUs
18:52agd5f: for VP9
18:53ajax: that could be plausible then
18:54bnieuwenhuizen: oh oops, navi is quite late then (2019)
18:54bnieuwenhuizen: though machines with dGPU could probably CPU decode
18:55ajax: there's some element here of wanting to implement well whatever the existing installed base of clients already supports. if i'm trying to cram vp9 into rdp that's sort of pointless if the windows client can't do it
18:55robclark: dschuermann, pendingchaos: rebase, we disabled those jobs.. something took our CI farm offline, and we are still trying to arrange to get physical access to office so we can rescue it
18:56bnieuwenhuizen: marge should already rebase stuff?
18:56robclark: right, but if you are triggering CI manually...
18:56dschuermann: robclark: thx!
18:56bnieuwenhuizen: ah sure
19:03zackr: ajax: that's going to be a hard project if you can't control the clients. if you're bound by existing clients then avc420 and avc444 streams with some legal trickery, akin to dllopening external lib at runtime, would be your best bet. if you're only doing the server though then implementing progressive codec compression (which iirc is patent bound on the client side) would work. that's in general easier, the core of that would
19:03zackr: be gpu differ (you send basically send image diffs to the clients), but then existing clients and your server would use gpus to reduce the load
19:07ajax: zackr: i mean obviously the sane thing to do is to use the hardware encoder rather than cs if there is one. and i imagine most people in practice would do that if they can because no end user actually cares about those patents.
19:07ajax: but if my employer for some silly ass reason thinks they can't do that, then i'm bound by what i know the client will support. and i know rdp clients have at least two credible codecs they will support, that i can ship.
19:08ajax: i can compress that on the cpu side, or on the gpu side. doing it on the gpu side means i'm doing the heavy work in the right memory domain.
19:09ajax: is this turd polishing, yes, of course it is. i'm using unix too, so i'm kind of starting from a bad place anyway...
19:10ajax: *shrug* all of this is exploratory anyway, i just wanted to know what tools and prior art was out there.
19:11zackr: we have "blast", because we're really good with names
19:11zackr: and that's basically h264 or jpeg/png with gpu image diffs. but we control the clients and the server
19:12ajax: and i would happily interoperate with it if i could
19:12airlied: zackr: and the horizontal and the vertical?
19:12ajax: apple's i can't believe it's not vnc has some clever custom i can't believe it's not jpeg they use to make mac-mac connections go faster than mac-non-mac
19:13airlied:wonders if people get outer limits references
19:13ajax: they _may_ have put the opencl kernels for that in plaintext in the client and server binaries.
19:13zackr: airlied: haha
19:13ajax: not sure if it counts as a trade secret if you can find it with strings(1)
20:48zmike: @ci any objections if I land !8321 since it seems to be triggering with increasing frequency lately?
22:44alyssa: Is dEQP-GLES3.functional.shaders.builtin_functions.precision.refract.lowp_fragment.vec4 super slow for everyone? (maybe just arm?)
22:46alyssa: (CPU side bottleneck computing references, looks like)
22:49anholt: it's pretty slow, yeah
22:49anholt: deqp job logs should have the cpu usage top hits for various drivers
22:49alyssa: ah, true
22:50alyssa: Just wanted to make sure I didn't muck something up on my end