IRC Logs of #dri-devel on irc.freenode.net for 2024-05-07

00:02 Company: I have no idea what modifier is used actually - but I think libva doesn't return LINEAR
00:03 Company: NV12:0x100000000000002
00:07 kode54: what does the PL in PL DOORBELL stand for?
00:07 kode54: since google is entirely useless and also wants to sell me video doorbells
00:08 robclark: alyssa: I think for all platforms.. it is in common .gitlab-ci stuff.. and AFAICT we aren't skipping wsi tests (premerge vk is fractional but we have full run for nightly)
00:16 karolherbst: alyssa: ever thought about implementing query_memory_info? I want drivers to do that in order to get rid of more compute pipe caps :D
00:23 alyssa: karolherbst: it's in asahi/mesa now
00:23 alyssa: i think
00:24 alyssa: robclark: Hmm ok
00:26 karolherbst: ahh cool
00:28 robclark: seems like query_memory_info could be a common helper for at least all the arm uma type gpu's
00:29 alyssa: yeah, probably
00:29 karolherbst: well
00:29 karolherbst: except the avail ones
00:29 alyssa: I think I just copypasted freedreno
00:30 karolherbst: would you have to use os_get_available_system_memory for the avail memory?
00:30 karolherbst: *wouldn't
00:31 karolherbst: I hate how parsing `/proc/meminfo` is apparently the way of doing that
00:31 alyssa: probably in theory
00:31 alyssa: in practice i just wanted to fix steam's system info screen ;P
00:31 karolherbst: :D
00:31 karolherbst: I don't need avail anyway in rusticl
00:32 robclark: oh, I guess we have the silly thing were we limit some devices to 4G gpu va because of silly fw $reasons... but it wouldnt be too hard to impl a w/a for that (just need to make sure certain buffers don't straddle 4G)
00:32 karolherbst: I just do that in the frontend
00:32 karolherbst: ohh wait
00:32 karolherbst: you mean something else
00:32 karolherbst: gallium doesn't support single allocations bigger than 2G anyway
00:33 karolherbst: but that's not even covered with that interface
00:35 robclark: it's really about driver internal allocations.. some versions of fw fail at doing 64b math (sqe is basically 32b risc core thingy) which could cause problems if vsc draw stream straddles
00:35 karolherbst: oh no
00:36 karolherbst: that reminds me, I need to get my hands on some dev board so I can enable freedreno in rusticl and drop all that nir support code in clover :D
00:37 robclark: there are laptops too.. and I hear some nice ones will come out in the next month or so (but even the current x13s is nice)
00:37 karolherbst: mhhh
00:37 karolherbst: I don't need another laptop though :D
00:38 kode54: what is PL domain doorbell
00:38 kode54: or do I need to consult the kernel documentation offline
00:38 robclark: I've been using x13s as daily driver for upstream work for a year or so now... only drawback is now I don't have a convenient x86 laptop to build freedreno x86 for steam+fex :-P
00:39 karolherbst: but I think somebody promised me hardware :D
00:39 karolherbst: not quite sure
00:39 karolherbst: maybe it wasn't for freedreno
00:39 karolherbst: right..
00:39 karolherbst: but like.. I have a macbook air, which is a lot faster then the x13s :D
00:40 kode54: there is only one mention of "PL domain" in the entire kernel documentation folder, and it doesn't tell me what it means
00:40 kode54: I guess nobody who even uses the terminology knows what it means
00:42 karolherbst: but anyway... I just don't want to have a laptop taking away even more space, but a dev board is generally quite better to deal with, especially due to easier serial console access
00:42 robclark: x13s is faster than the kbl i7 xps13 that it replaced.. but not as fast as the macs.. the upcoming things should give the macs a run for their money.. but yeah, serial console is nice
00:43 kode54: I love opaque acronyms
00:44 karolherbst: being passive aggressive about won't help you getting an answer either
00:46 alyssa: robclark: ugh, yeah, I have been x86-free for years but tempted to get an x86 build machine just to deal with fex. Ironic, isn't it?
00:47 robclark: heh, yeah
00:47 karolherbst: just use qemu-user
00:47 karolherbst: *hides*
00:47 alyssa: sounds.. unpleasant
00:47 karolherbst: it is
00:47 alyssa: it is $current_year, why does cross compiling still suck
00:48 karolherbst: we have containers now, which made it suck less at least
00:48 robclark: admittedly fex is like the one case where I kinda think multiarch distro would make things easier
00:49 karolherbst: it's a fine idea in theory, but sadly we have none which is actually proper multiarch
00:50 karolherbst: but like the issue on macbooks specifically with fex and steam is the page size mess anyway, so that's kinda a huge problem anyway
00:50 karolherbst: so even if we get to proper multiarch distro, now we have to figure out using different page sizes :')
00:51 robclark: thankfully that is a problem I have to deal with ;-)
00:51 robclark: *not
00:54 alyssa: karolherbst: page size is a solved problem now
00:54 karolherbst: oh really?
00:54 karolherbst: you mean the micro vm thing or something else?
00:54 alyssa: yes
00:54 alyssa: I merged the mesa patches last week
00:55 alyssa: (into asahi/mesa I mean)
00:55 karolherbst: interesting
00:55 alyssa: robclark: ...so it is a problem you deal with ;P
00:55 alyssa: or a solution rather
00:55 alyssa: ^^
00:56 karolherbst: given the micro VM solution didn't work for me (because full screen windows were busted and it overall was a giant mess) I'm delighted to know that this is a thing of the past and I can just use fex directly :D
00:56 alyssa: the microvm is the solution i mean
00:57 karolherbst: oh no
00:57 alyssa: it's trending in the direction of less of a mess :)
00:57 alyssa: current iteration shares your root filesystem, no container needed
00:57 karolherbst: ahh
00:58 karolherbst: though I think I mostly just had issues with window/cursor positions, so some games I couldn't put into window mode, because fullscreen was just a broken mess
00:58 karolherbst: maybe I shall try again the next time I feel the need wanting to play games while traveling
01:04 robclark: alyssa: qc things work fine with 4k pages, so not-my-problem ;-)
01:09 alyssa: (;
01:10 kode54: sorry for being passive aggressive there
01:11 kode54: it's also entirely possible I was asking someone to break an NDA there, if they knew but could not answer, so that's fair to dodge
04:08 mdnavare_: airlied: Ping
04:09 mdnavare_: airlied: Hi Dave, quick question, looking at your pull request here : https://lore.kernel.org/dri-devel/170439541204.3148.17028465187686419462.pr-tracker-bot@kernel.org/, is this supposed to be 6.7 final instead of 6.8? I see 6.7 final part 2 after this
04:12 airlied: mdnavare_: yeas that was the last 6.7 one
04:14 mdnavare_: so thats the 6.7 final?
04:14 mdnavare_: airlied: and then there is 6.7 final part 2 which would be the last 6.7 right?
04:20 airlied: mdnavare_: yeah that seems right
04:20 airlied: https://patchwork.freedesktop.org/project/dri-devel/patches/?submitter=71&state=&q=pull&archive=&delegate=
04:30 mdnavare_: Okay great thanks a lot for confirming airlied !
07:29 MrCooper: kode54: PL stands for "placement", AMDGPU_PL_ correspond to TTM_PL_
07:29 kode54: sorry I was so persistent about that
07:30 MrCooper: a doorbell is a HW mechanism involving special memory pages which can be mapped directly into user space, and then memory access triggers certain HW operations
07:30 kode54: I should read the DRM docs before I ask anything further
07:30 kode54: I know what a doorbell is vaguely
07:30 kode54: thanks for the explainer though
07:31 kode54: I recall seeing them being implemented in the Xe kernel driver
07:31 kode54: didn't stick around in Arc land long enough to see the Xe driver come to fruition though
07:32 kode54: I tested that kernel patch, it seems to have fixed at least the stability issues I had with Resizable BAR disabled
07:32 kode54: but then I promptly re-enabled it
10:39 karolherbst: zmike: mhh, so I'm hitting an issue with zink, but I'm not quite sure who's fault it is yet. The tldr is that I buffer_subdata cb0 before each launch_grid. What happens if I call buffer_subdata on the same buffer each time? Can I expect the content to be properly synchronized or should I use a different cb0 buffer for each launch_grid until I wait on the relevant fences?
10:40 karolherbst: (or use a different mehtod of making sure the buffer isn't in use anymore)
10:43 pq: dmabuf vs. epoll is getting interesting indeed: https://lists.freedesktop.org/archives/dri-devel/2024-May/452678.html O_o
10:50 zamundaaa[m]: pq: oof
10:57 zamundaaa[m]: maybe I misunderstand the issue, but isn't this "just" a kernel bug that would have to be fixed in the kernel? Application code can't always know if different application code uses epoll on a file, right?
10:57 pq: that's what I'd say
10:58 pq: but apparently that would take significant work, and possibly reduce epoll's performance
11:00 pq: I hope that "eager beaver" Linus mentioned will materialize to fix things
11:10 ratheralone: Zink or rather all API backbends should be written different, the compiler generates inefficient code, I am forking and splitting out from your terrorists, though you have not done those magical bits it's even better that way, I have the data and metadata format for hw agnostic code compressed format. Life is beautiful I do not know why you chose abuse. I got lots of work done in 20 years time, I branch off all my businesses
11:10 ratheralone: around this work, dmabuf is so and so, I already said it should be taught to deal with offloading compute. There are two regimes the source code is compressed into container and compiled into binary or can be harvested and generated from hw agnostic hash and compiled still with llvm. The other data is all structures that is compiled with llvm too they have direct generator to ansi c text or any language always. It's a decomp
11:10 ratheralone: ler kit called triton, that worries me , works with fuzzed binaries but some countries have sw license honoring turned into court practice. But in general I am stable programmer today already, it's just all my systems are intruded into and full of spyware. They treat dwfreed and all other terrorist scammers on those channels, you are too stupid bullies to avoid to get manhandled. My personal phone is magical Xiaomi a Chinese
11:10 ratheralone: model where Chinese deal with ROMs and security and a conflict with you is handled around their work heroics etc. so far they manage their phones sw extremely well.
11:10 ratheralone: But in general us based iphones and macs are good too.
11:25 ratheralone: Overall dma controllers i.e dmac's yes have three modes polling, interrupt, and dma channel like true dma mode. I am short of time to land security fixes there, but those are not very efficient against military based systems in the end anyhow.
11:58 jkhsjdhjs: hey, I'm running arch linux and since I upgraded to linux 6.8.9 all applications that in any way use radeonsi crash very frequently, here is an excerpt of coredumpctl: https://bpa.st/raw/PVWQ I first thought that it must be a memory issue, so I ran memtest for 10 hours without errors. then I looked into the traces and because radeonsi appeared in all of them, downgraded mesa from
11:58 jkhsjdhjs: 24.0.6 to 24.0.5 at first, with no change, applications were still crashing. finally I noticed that I updated linux on the fourth of may, which is when the crashes started occuring. since I downgraded to 6.8.8, I no longer experience any crashes. because radeonsi appears in all these traces, maybe you're already aware of such a kernel regression and can point me to the
11:58 jkhsjdhjs: corresponding mailing list thread? thanks in advance
12:01 karolherbst: jkhsjdhjs: is this on a laptop?
12:02 jkhsjdhjs: nope, on a PC, 5800X3D / 6700XT
12:02 karolherbst: anyway "SIGBUS" basically means, that the kernel couldn't access memory in question, which might mean that something is fishy on the kernel side
12:03 jkhsjdhjs: yep, I also think so, as it is fine with linux 6.8.8, but not with 6.8.9
12:07 zmike: karolherbst: if it's the same buffer and you're doing subdata at the same offset then you need to barrier
12:08 karolherbst: zmike: I see
12:08 karolherbst: there is no way to do it in a non blocking way?
12:09 zmike: what do you mean by "non blocking" here
12:09 karolherbst: like not having to barrier at all
12:09 karolherbst: I really just want to launch a compute shader, update part of the buffer and the launch the same state again
12:10 karolherbst: like.. I'm updating a couple of bytes
12:10 zmike: needs a barrier
12:10 karolherbst: I know that e.g. nvidia gpus can do such updates from within their command buffers
12:10 zmike: there is no way around that
12:10 karolherbst: I'm wondering if there is a vulkan api for it
12:10 zmike: yes, nvidia is special
12:10 karolherbst: ahh
12:10 karolherbst: sad
12:10 zmike: if you clobber the whole buffer again you don't need a barrier because it will invalidate
12:11 karolherbst: I wonder what would be the better approach then besides slicing the buffer and barrier when I end up at the beginning
12:11 zmike: you need to overwrite the whole buffer
12:11 zmike: or use a new one
12:11 karolherbst: overwriting the whole buffer would block again, wouldn't it?
12:12 zmike: no because zink would internally discard it and create a new buffer
12:12 karolherbst: ahh
12:12 karolherbst: mhh
12:12 zmike: also stop using the term block for this
12:12 karolherbst: that sounds too driver specific to me
12:12 zmike: it's confusing and not what you mean
12:12 zmike: most drivers will do this
12:12 zmike: but you can manually trigger the effect with invalidate_resouce
12:13 karolherbst: invalidate_resouce?
12:13 zmike: it's a gallium hook
12:13 zmike: or just keep a stream uploader and use that
12:14 zmike: and upload -> bind -> upload -> bind
12:14 karolherbst: I don't find that hook name
12:14 karolherbst: ohh wait
12:14 karolherbst: missing r
12:16 karolherbst: right... don't really know what's the best approach here in terms of lowering CPU overhead
12:17 zmike: how big is the data
12:17 karolherbst: 4k at most
12:18 karolherbst: or any other arbitrary number
12:18 zmike: I mean how big is the buffer in total
12:18 karolherbst: it's the buffer holding all inputs for a CL kernel (like think of it as a buffer holding all function arguments)
12:18 karolherbst: mhh
12:18 zmike: like could it be multiple megabytes?
12:18 karolherbst: no
12:18 karolherbst: I think I cap it at 64k
12:19 karolherbst: I query PIPE_SHADER_CAP_MAX_CONST_BUFFER0_SIZE and cap it
12:19 karolherbst: 4k it seems
12:19 karolherbst: anyway
12:19 karolherbst: it's small
12:20 zmike: I mean it seems like just use a stream uploader
12:20 zmike: if you have issues with that and cpu overhead then you can try punting it to a thread with PIPE_MAP_UNSYNCHRONIZED for your subdata
12:21 zmike: i.e., manually create buffer + subdata all in thread
12:22 karolherbst: I see
12:23 karolherbst: the reason I moved to a buffer was to take PIPE_CAP_PREFER_REAL_BUFFER_IN_CONSTBUF0 into account, but maybe for those small ones I really shouldn't
12:23 karolherbst: but in the end it really just matters if a method harms GPU throughput or not when launching compute stuff back to back
12:24 zmike: if you don't honor that cap then you're just moving that work to the driver
12:24 karolherbst: (and updating the content)
12:24 zmike: and the driver is doing exactly what I described in its thread
12:24 karolherbst: ohh you mean I should use the stream uploader directly?
12:24 zmike: yes, as a first attempt that would be simplest
12:24 zmike: upload -> bind -> upload -> bind -> ...
12:24 karolherbst: right
12:25 karolherbst: yeah, maybe that's the best idea here actually
12:25 karolherbst: haven't considered it before
12:25 karolherbst: but anyway, this issue I'm hitting explains some of the flakes I was seeing with zink :)
12:26 zmike: seems so
12:26 zmike: I'm surprised you don't get flakes with radeonsi or any other gpu that has real vram
12:27 karolherbst: they could wait on the buffer
12:27 zmike: depends on the map flags you're using with subdata I think?
12:28 karolherbst: could be
12:28 karolherbst: or maybe I'm just lucky
12:29 zmike: I was lucky for a while
12:29 karolherbst: then you started zink?
12:29 karolherbst: wait
12:29 karolherbst: did you even start it
12:29 karolherbst: I'm bad with history
12:30 zmike: I didn't
12:30 zmike: but it's easy to be lucky when you only use intel igpus
12:30 zmike:salutes
12:30 karolherbst: fair :D
12:30 karolherbst: yeah.. my testing I usually test native gallium + zink on CPU/Intel/AMD and sometimes nvidia if I swap the GPU
12:32 zmike: my experience is that stuff with nvidia "just works" unless it involves stencil or rebar
12:57 pepp: jkhsjdhjs: sounds similar to https://gitlab.freedesktop.org/drm/amd/-/issues/3343 - there's a kernel patch linked from one of the comment that you can try
13:00 kisak: I've been seeing a fair bit of broad spectrum splatter caused by #3343 over here.
13:00 kisak: much more than is landing on the proper bug report.
13:04 jkhsjdhjs: pepp: thanks for linking me that! yes, sounds like it's exactly the same issue. the issue also mentions that 6.8.9 is affected, which I can confirm. I'm currently on 6.9.0-rc7 and it seems to be fine so far
13:07 jkhsjdhjs: nevermind, there we go again
13:08 zmike: DavidHeidelberg: okay I ran on icl just now in xwayland
13:08 zmike: Passed: 28/29 (96.6%)
13:08 zmike: Failed: 0/29 (0.0%)
13:08 zmike: Not supported: 1/29 (3.4%)
13:08 zmike: only the android one is skipped
18:53 DemiMarie: pq: writing an exploit would be a good way to get it fixed.
19:12 DemiMarie: Nevermind, Linus fixed it.
19:27 DavidHeidelberg: zmike: man... I don't get it
19:30 zmike: does it pass ci?
19:31 DavidHeidelberg: zmike: our CI: yes; Intel CI: no; worked for me; now it doesn't work for me; but it works for you......
19:32 zmike: imo merge it and let people figure things out after
19:32 zmike: having something actively broken is great motivation
19:33 zmike: maybe we'll get some users with issues that provide better debugging scenarios
19:49 DavidHeidelberg: zmike: seriously, I would want that __ failing job __ in our MesaCI
19:49 DavidHeidelberg: also I think it would make Tapali sad to merge it when their CI fails.
19:50 DavidHeidelberg: I wish Intel CI would be more transparent
19:55 zmike: Intel CI is great, but at some point there's a limit to what can be done if their issues can't be reproduced
19:55 zmike: tbh it sounds like someone is just fucking up CTS config/build/env
19:55 zmike: there's so many variables there
19:57 DavidHeidelberg: zmike: but the problem is NOW it fails for me too.. I only idea I have the confs are not sorted as the extension says they should be
19:57 DavidHeidelberg: (but ofc only the sort test fails for me, everything else is fine)
19:58 zmike: idk what to say since I can't repro on numerous machines and setups
19:58 zmike: merging makes the code more readily available and enables other people to test
19:58 zmike: if the problem can't be solved for the next release then it can be disabled with a backport
20:00 cwabbott: has anyone seen a problem where weston-rdp becomes a slide show only when "--renderer gl" is enabled?
20:01 DavidHeidelberg: cwabbott: I guess it uses swrast/llvmpipe?
20:01 cwabbott: DavidHeidelberg: nope, it uses the GPU
20:01 zmike: ManMower: ^
20:01 cwabbott: specifically "GL renderer: zink Vulkan 1.3(Turnip Adreno (TM) 750 (MESA_TURNIP))"
20:02 cwabbott: without "--renderer gl" it's fine
20:04 cwabbott: with "--renderer gl" refreshing is super-slow and typing on the keyboard is impossible, it sometimes (50% of the time) repeats a character dozens of times (like if debounce wasn't working)
20:04 cwabbott: I'll type "cd" in the terminal and get "cddddddddddddddddddd"
20:04 DavidHeidelberg: zmike: drop some msg into the MR, I guess if Tapali would ack it, we can try
20:04 DavidHeidelberg: while it's in mesa-git it shouldn't be problem, only when this won't get figured out before hitting the release
20:05 DavidHeidelberg: eventually we could even merge it and revert it before release if needed
20:10 ManMower: cwabbott: is it really display that's slow, or is input just being bursty? I've seen the mouse jump around but things like weston-simple-egl run smoothly at the same time. :/
20:11 ManMower: that key repeat thing will happen if weston doesn't get back to its main loop fast enough to catch the key break event before repeat starts. :/
20:12 cwabbott: ManMower: how would I run any commands to test anything with the key repeat thing?
20:12 ManMower: ssh in from another host. :(
20:12 cwabbott: is there a command to run something in weston from ssh?
20:13 ManMower: there's a readpixels to get the frame buffer, I have a hunch that's being ridiculously slow for you. I have no idea why though.
20:13 ManMower: you'd just set your WAYLAND_DISPLAY env var properly (probably to "wayland-1")
20:14 ManMower: weston's log is probably helpfully telling you that your cpu is too slow as well.
20:14 cwabbott: I don't see anything suspicious like that in the logs
20:16 ManMower: does "top" indicate that it's cpu bound?
20:16 cwabbott: I do see the cpu increase when I move the mouse, but only to like 20-30% on a few cpus
20:18 ManMower: would you mind filing an issue at https://gitlab.freedesktop.org/wayland/weston/-/issues ?
20:19 ManMower: I think 20-30% when moving the mouse is pretty harsh. that should only be updating mouse cursor sized damage regions
20:19 cwabbott: yeah, although this is with hdk8650 which isn't exactly easy to get
20:20 ManMower: can you run 'perf top' and see what's burning cycles?
20:20 cwabbott: gotta run, I'll look into it more tomorrow
20:21 cwabbott: but running sascha willems vulkan gears is pretty fast (330ish fps)
20:23 ManMower: you could try a different gbm-format in weston.ini...
20:29 zmike: DavidHeidelberg: commented
20:30 DavidHeidelberg: zmike: thank you a lot (mainly for the testing!)
20:40 karolherbst: soo.. let's test this stream uploader thing. If that doesn't fix all the flakes I'm disappointed
21:56 Sachiel: gfxstrand: try bribing marge with something before re-assigning