01:47airlied: gfxstrand: could you glance at the nir patch in 25536?
01:54Company: excellent, using an UBO for push constants became magically fast when I stopped using MapBuffer() for the VBO and used BufferSubData() there, too
01:54Company: I clearly have no idea what I'm doing
01:55Company: but things went from 600 to 1600 fps, so I'm at least moving in the right direction
06:01lina: anholt: I think you want another lina maybe? ^^
06:01emersion: there can only be one!
06:05Daanct12: *insert the spiderman pointing each other meme*
06:08kode54: sorry about my departure, I'll try to be a little more mature about things
06:09kode54: quick question about modifiers, since I want to resolve that
06:09kode54: are they vendor specific identifiers for buffer formats?
06:09kode54: like, and can you ask the driver or device what the format for a specific modifier is
06:15emersion: a modifier can be used with multiple formats
06:15emersion: IOW: for a fixed modifier, there is no single format
06:15emersion: the driver may support the modifier for multiple formats
06:16emersion: or do you not mean "pixel format FourCC", by "format"?
06:17kode54: I guess I might have meant that
06:17kode54: I wasn't sure how unique modifiers were for formats
06:17kode54: or how much of a description is associated with one
06:18emersion: you might be interested in https://dri.freedesktop.org/docs/drm/userspace-api/dma-buf-alloc-exchange.html
06:19kode54: I'll look at that
06:52kode54: Thanks, that clears everything up
07:26kode54: again, thanks, it didn’t occur to me that modifiers literally meant that, the unique descriptor for how a given fourcc is laid out in memory
07:27kode54: Like, something could be linear, or a specific form of tiled or something else
07:27kode54: Neat that soreau is working on modifier support for old AMD gpus
07:28kode54: Assuming they support more than some basic layouts
07:28kode54: Though it might help if each fourcc they support is at least paired with the explicit layout the devices expect
07:29kode54: If they don’t support more than one layout for a given format
07:30kode54: Incidentally, I sold my Arc gpu and plan to acquire a 6750 XT
08:16pq: kode54, no worries. When I told you that, it wasn't just the last comment, but I had been reading you here for couple days and some of your gitlab comments too. Happy to see you now.
09:47karolherbst: ohh.. gcc and clang will have a __element_count__ attribute to bound check VLAs at runtime
09:47karolherbst: we might want to do something like we have on the kernel side for it now
09:47karolherbst: in case somebody is bored
09:54mareko: gerddie6: why does this fail, and if it fails because of the flake, why? because the flake is listed in the flakes file https://gitlab.freedesktop.org/mesa/mesa/-/jobs/49890490
10:13mareko: gerddie: ^^
10:23zzag: anholt: sometimes kwin crashes when destroying EGLImageKHR. libepoxy crashes with "No provider of eglDestroyImageKHR found. Requires one of:". Do you know what can potentially cause it?
10:24zzag: not really reproducible crash, but it's weird
10:24zzag: eglDestroyImageKHR should not require a current opengl context
12:07pepp: digetx, robclark: I don't get why virtio_gpu_do_fence_wait exits early if dma_fence_match_context returns true
12:07pepp: does it assume that previous jobs will always complete before the next ones?
12:30robclark: pepp: if they are on the same timeline/fence-context/ring_idx then they should complete in order
12:36pepp: robclark: hm ok. I have one ring_idx per hw queue type.. I guess I should have one ring_idx per hw queue
12:38digetx: that's correct, you need to have fence context per queue
12:44pepp: digetx: right, but unless I'm missing something I can't submit job to a specific queue
12:58digetx: then you'll need a new job flag to skip ctx matching for such jobs
14:03karolherbst: anybody here knows how to deal with permission issues on render nodes passed into systemd-nspawn containers?
14:31psykose: do you get an EPERM on /dev/dri/render* ?
14:31psykose: i assume you need `DeviceAllow=/dev/dri/renderXXX rwm` in some dropin config
14:33karolherbst: yeah.. that's what I was missing
14:34karolherbst: gfxstrand: I need to be able to spill constant spec operations to an initializer function... any ideas or should I just start hacking on it once I find time.... I already see it being quite painful, because spirv_to_nir is entirely not designed for that...
14:37karolherbst: pocl seems to be doing the same in case you have crossworkgroup variables referencing other crossworkgroup variables :')
14:38Lynne: dj-death: any ideas when descriptor buffer for anv may get merged?
14:42dj-death: Lynne: when someone reviews it
14:45Lynne: you don't have the rule where if it's sitting long enough without a review it gets merged?
14:45dj-death: I thought that was just for piglit/crucible ;)
15:05dj-death: Lynne: I've just been made aware of a HW thing, so I need to fix the existing DG2 direct descriptors & update the MR, so there is that too
15:12Lynne: it's pretty much the last major implementation not to have them yet, even on windows all vendors implemented it
15:12zmike: it's been implemented for a long time, but one does not simply merge code to anv
15:15dj-death: do people merge unreviewed stuff in radv?
15:16zmike: we have weekly slapfights to argue over who has to review things
15:17psykose: if you bring a fish to the slapfight you're guaranteed to win
16:05eric_engestrom: zmike: yes, I keep forgetting to create the 23.3 milestone
16:05eric_engestrom: thanks for the reminder :)
16:05eric_engestrom: created now: https://gitlab.freedesktop.org/mesa/mesa/-/milestones/44
16:06eric_engestrom: reminder for everyone: mesa 23.3 branchpoint is happening on Oct 25, the Wednesday _after_ XDC
16:07eric_engestrom: see https://docs.mesa3d.org/release-calendar for all the dates
16:08eric_engestrom: if you have any issue that need to be fixed before 23.3.0 final can be released, please add them to the milestone above :)
16:33anholt: lina: yeah, whoops, meant linyaa about updated prm repo.
16:45mareko: zmike: are you really arguing about that?
16:46zmike: what?
16:46zmike: oh
16:46zmike: no
16:46zmike: it's very structured discussion about areas of expertise
16:47zmike: sometimes we let bnieuwenhuizen do live code readings
16:47mareko: ooohh
16:49bnieuwenhuizen: Just trying to make sure stuff doesn't fall through the cracks
16:52mareko: sounds fun
16:54mareko: robclark: a306 has 9 new flakes: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/49910197
16:58anholt: mareko: those are all in known flakes.
16:58anholt: the job failure is in the post-processing stage, because //gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24738
16:59anholt: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25527 is up, but that MR should probably be reverted until then.
17:08mareko: so that's why all flake files are ignored
17:30linyaa: anholt: what about an updated prm repo?
17:30linyaa: i didn't know people were still using it. but if anyone is, then i'll resume keeping it up-to-date.
17:31linyaa: do you think it makes sense to move it to gitlab? any opinion?
17:32linyaa: maybe there are ip issues with hosting it on gitlab. i'm unsure.
17:43linyaa: anholt: i'm updating them right now...
18:09anholt: linyaa: thanks! it's been nice to have.
18:10Company: is there a way to create a dmabuf from userspace? I want to test that my EGL code does the right thing and for that I need to feed it dmabufs from somewhere
18:12anholt: Company: for some of our testing, we export a dmabuf from GL in another context. but dmabuf testing overall is hard and not done well.
18:13Company: hrm
18:13Company: my next question would have been about YUV
18:13Company: and I don't think exporting YUV works without first getting a YUV buffeer, and with GL that requires importing a dmabuf...
18:14anholt: welcome to the "not done well" part!
18:15Company: more stuff to do!
18:15anholt: piglit generates some bespoke untiled yuv dmabufs using gbm. which frequently fail because they're 2x2 and aligned wrong for import requirements because it can't know those.
18:15anholt: stride-aligned
18:15Company: yeah
18:16Company: do 4x4 instead or 8x8!
18:16zmike: check weston tests ?
18:16Company: I had hoped there was some way with simpledrm or whatever to create cpu-backed dmabufs
18:16anholt: let's be real, I've thought about just cranking it up to like 256 wide with the same pixel data so that we could dodge this.
18:19airlied: Company: udmabuf?
18:19anholt: Company: oh, that would be lovely. vgem/vkms are the thing that *should* be doable for this. except dmabuf import doesn't know about managing cache coherency, so you get mismatches between how vgem is mapping buffers (cpu cached woo why not) vs how your gpu wants to access them (data should have been write combined).
18:20anholt: so things only really work out when you're generating them from a real device on the system that's doing the same caching plan.
18:21Company: airlied: is that part of the kernel?
18:22airlied: yeah but all i have is that word
18:22Company:trying to figure out if it's worth trying to make CI work with it, or if it's just good enough for custom testing
18:22Company: google says https://github.com/ikwzm/udmabuf
18:23airlied: it may or may not be useful for you
18:23Company: well, /dev/udmabuf exists here
18:23Company: so it's definitely worth looking at
18:24airlied: the github thing is another thing
18:31Company: airlied: I assume https://github.com/torvalds/linux/blob/master/tools/testing/selftests/drivers/dma-buf/udmabuf.c is a good starting point?
18:38airlied: seems like it it
18:57linyaa: anholt: tgl is up now at https://kiwitree.net/~lina/intel-gfx-docs/prm/
18:58linyaa: rkl should be another 5min
18:58robclark: I suppose we could get around to finishing https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23214 to allow gbm to allocate/export yuv surfaces
18:58anholt: the git appears to be down?
18:59linyaa: anholt: git clone is broken right now. i took it down for maintenance.
19:00linyaa: lemme push it somewhere else for now so you can clone, brb
19:17linyaa: anholt: i created https://gitlab.freedesktop.org/linyaa/intel-gfx-docs.git
19:17linyaa: wait a few minutes for the initial push. it's fat.
19:17linyaa: s/push/push to complete/
19:17anholt: thanks!
19:18linyaa: do you want DG1 too today? RKL i'm doing now.
19:19linyaa: basically, i'm going to punt on the discrete cards until tomorrow, unless you need them now.
19:47anholt: linyaa: something mtl-ish was what I was looking at that made me go try to git pull
19:49linyaa: mtl is xe-ish, according to intel_device_info.c, but i don't understand how much xe-ish
19:49linyaa: because dg is also xe-ish, i'll push those after lunch. leaving for 30min lunch now.
20:35airlied: gfxstrand: the copy prop pass also hits the cast deref in 25536
21:49alyssa: karolherbst: https://rosenzweig.io/0001-nir-opt_algebraic-Optimize-LLVM-booleans.patch
21:49alyssa: llvm moment
21:49karolherbst: ....
21:50karolherbst: I mean... `iand` is probably the cheaper operation here, so it kinda makes sense
21:53alyssa: karolherbst: my kernel got nir:
21:53alyssa: ine(umax(iand(b2i(x), 1), iand(b2i(y), 1)), 0)
21:53alyssa: truly inventive.
21:54alyssa: with that patch it becomes
21:54karolherbst: keep in mind that we generally compile with -O0
21:54alyssa: ior(x, y)
21:54karolherbst: but yeah..
21:54karolherbst: it's all madness :D
21:54alyssa: big pile of .cl's coming soon
21:55karolherbst: oof
21:55alyssa: to a mesa repo near you
21:55karolherbst: btw.. we still have to fix memcpy :')
21:55alyssa: yes..
21:55karolherbst: though... we aren't so silly and copy megabytes of data
21:57linyaa: anholt: now, all the public prms are up. no mtl docs yet, but there are discrete alchemist/acm/arctic-sound docs, and both are 12.5. though i don't know how similar the ISA and arch are between mtl vs acm.
21:57anholt: thanks!
21:59linyaa: alyssa: now that it's oh-so-easy to write shaders, will you volunteer for writing hw-generic vulkan video encode shaders? ;-)
22:00karolherbst: yooo
22:01karolherbst: that would be a fun project, I just don't know how viable it is to do emulated video encoding/decoding via the vulkan APIs
22:01karolherbst: but should be fine.. I guess
22:07mattst88: the intel vaapi media driver has tons of encode shaders shipped as blobs -- presumably they're essentially compute shaders?
22:07karolherbst: not necassarily
22:08karolherbst: but probably?
22:08karolherbst: in any case... it would kinda be cool if we could have a sw video encoder/decoding lib inside mesa
22:08karolherbst: for various codecs
22:09linkmauve: karolherbst, depend on ffmpeg maybe instead?
22:09Lynne: take the dirtiest most convoluted ISA and overall architecture you can think of, and you still wouldn't be halfway to what implementing h264 from scratch would involve
22:09karolherbst: linkmauve: no
22:09karolherbst: also
22:09karolherbst: ffmpeg doesn't have that
22:10karolherbst: or at least, not what we'd need
22:11karolherbst: they don't even have cuda based encoders/decodes, alone CL based ones. For anything accelerated they use prop APIs
22:11karolherbst: it's super sad
22:11karolherbst: but ffmpeg won't help us here
22:12airlied: and gpus suck at decoding videos
22:12karolherbst: can't be worse than vp9 on the CPU
22:12Lynne: yeah, you wouldn't really gain much from this type of a decoder (what we call hybrid)
22:12airlied: karolherbst: can't it? :-P
22:12karolherbst: ehh.. wait.. I meant the vp9 encoder
22:13Lynne: most of the overhead of any well written decoder implementation is with token parsing for entropy decoding, which GPUs cannot do in parallel
22:13karolherbst: yeah.. I'm less worried about decoders, I'm more worried about encoding
22:13karolherbst: atm, e.g. gnome only supports vp9 for screen casting, which you can guess, is a horrible idea
22:13airlied: hopefully it grows av1
22:14karolherbst: it doesn't fix the problem of users not having GPUs supporting av1
22:14karolherbst: maybe in 15 years we can rely on it
22:14karolherbst: we still deal with h.264 videos today
22:14karolherbst: and recording
22:14karolherbst: because that's what people actually use
22:15linkmauve: karolherbst, AV1 is encodable in real time on not-that-performant CPUs nowadays.
22:15karolherbst: does it use <5% of the cPU?
22:15karolherbst: if not, it's useless
22:15linkmauve: Of course not.
22:15karolherbst: yeah, so it's useless for users
22:15karolherbst: for screencasting purposes
22:16linkmauve: Users do screencast using CPU encoders all the time.
22:16karolherbst: yeah.. and on gnome it's super broken
22:16karolherbst: h.264 would be fine, but it also doesn't push your CPU to its limits
22:18karolherbst: sorry, but relying on CPUs encoding for av1 won't cut it
22:18Lynne: encoding is easier, you can definitely implement a meh encoder on the GPU
22:19karolherbst: yeah.. as long as your CPU isn't at load that everything becomes laggy or the encoder can't keep up, that would be fine
22:19karolherbst: the issue is, you can totally encode vp9 on the CPU, but you get like 5 fps in avarage
22:19karolherbst: maybe av1 is simpler on the cpu?
22:19karolherbst: but the overall problem remains: high load on the CPU
22:19karolherbst: and power efficiency tanks
22:21Lynne: av1 is definitely not simpler than vp9
22:21karolherbst: yeah.... the only useable CPU encoder I've experience is h.264
22:22linkmauve: AV1 encoders are much more efficient than libvpx though.
22:22karolherbst: eveyrthing else is too heavy
22:22karolherbst: yeah.. but libvpx is beyond terrible in terms of speed
22:25linkmauve: I can do 90 fps at 1080p using SvtAv1EncApp on my i5-8350U laptop, not even tweaking the toggles too much.
22:26linkmauve: Just asking for one pass, since that’s what screencasting does.
22:26linkmauve: 106 fps now.
22:26karolherbst: I'm getting 9fps on my i7-10850H
22:26karolherbst: for 1080p content
22:27Lynne: if you turn on realtime mode you should get 100+
22:29Lynne: ffmpeg -i <input> -c:v libvpx-vp9 -quality realtime -speed 16 -b:v 2M -f null - gets me over 600fps here
22:30karolherbst: okay.. then it's just not very efficient but potentially usable
22:33karolherbst: Lynne: interesting.... maybe it's the wrongly used in the end, but I'm getting like 100 fps here with proper content
22:33karolherbst: and 1080p is still kinda low, some laptops come with 2560 or something
22:33karolherbst: and then it becomes tight again
22:34karolherbst: and what if you are also in a video call and have to decode as well
22:34karolherbst: but anyway... the way gnome uses vp9 for encoding was terrible the last time I've tried
22:34karolherbst: maybe that was also on aarch64....
22:35linkmauve: ffmpeg still limits libaom’s cpu-used setting to 0..8? :/
22:35linkmauve: That really should get fixed someday.
22:35karolherbst: yeah...
22:36karolherbst: but uhh... it's still a terrible user experience, if you are actually wanting to do something while recording :D
22:36karolherbst: and screencasting comes with extra overhead
22:36linkmauve: karolherbst, on ARM you still don’t have a proper VP9 encoding uAPI in the kernel.
22:36karolherbst: anyway.. I kinda want to see it working on "all" laptops with great user experience before I'm convinced here
22:37linkmauve: So acceleration would be moot anyway, even if the hardware did support it.
22:37karolherbst: linkmauve: ? how does that matter for cpu encoding?
22:37linkmauve: karolherbst, ah, I thought you meant VP9 as it has hardware encoding, as opposed to AV1 which doesn’t yet on most ARM SoCs.
22:37karolherbst: ah yeah, fair, but the point was, if your CPU/GPU doesn't have the fixed function hardware, you could also probably use the GPU and get better power efficiency than encoding/decoding on the CPU
22:37karolherbst: ahhh no
22:38karolherbst: hardware is fine
22:38karolherbst: the problem is just, that not everybody has it
22:38karolherbst: and you need to find a solution in case you don't have hardware encoding
22:38karolherbst: and decoding
22:38karolherbst: burning through your battery ain't fun if you could get it cheaper
22:38Kayden: linyaa: yeah, alchemist docs are up at https://www.x.org/docs/intel/ACM/ - MTL is really similar. some differences with MOCS/PAT/modifiers/etc, but using those docs should be mostly right for either platform
22:38linkmauve: karolherbst, I doubt your GPU will do it cheaper than your CPU.
22:39karolherbst: why not?
22:39karolherbst: I'm not necssarily talking about faster
22:39karolherbst: only with less power
22:39linkmauve: That’s also what I’m talking about.
22:39linkmauve: But I don’t have numbers right now.
22:39karolherbst: yeah.. probably also because of lack of generic GPU impl
22:40linkmauve: Maybe check what Lynne mentioned, “hybrid encoders”.
22:40karolherbst: not sure if that means a e.g. CL based impl
22:40karolherbst: sometimes encoding/decoding is shader assisted on top of fixed function hardware
22:44karolherbst: anyway.. could be a fun project
22:45bcheng: if I'm not mistaken x264 has some cl accelerated routines in it but I think they didn't get great results out of it
22:45karolherbst: there are proprietary products using CL for accelerated h.264/h.265 to cover all features on all hardware
22:46karolherbst: and they are faster than doing it on the CPU in general
22:46karolherbst: just in the open source world we don't have anything usefull
22:46karolherbst: I think there are also CUDA based ones, but that's kinda pointless to have
23:00lumag: airlied, mripard, mlankhorst, could you please review and ack https://patchwork.freedesktop.org/patch/556295/?series=123412&rev=1 for merging through msm-next?