IRC Logs of #dri-devel on irc.freenode.net for 2023-12-04

05:14 CADIndie: Hello, I am the project manager for QuestCraft. We are having some difficulty with Freedreno on the Quest 3. The Quest 3 runs the Adreno740v3 and has graphical corruption when any of the Quest 3 menus are visible, (including the guardian). Is there a known fix, a method to find a fix, or information that would be helpful to know?
05:15 HdkR: CADIndie: Probably the biggest issue is that Adreno 740 is very new and it is only just coming up. There are known missing features and bugs
05:16 HdkR: Probably better to rely on the proprietry Adreno driver on that device until A740 is more stable in Freedreno/Turnip
05:19 CADIndie: Is there any way we would be able to help fix/diagnose the issue? Right now we don't have the ability to use the proprietery driver as its missing a great deal of features we require as well as compat with Zink (which is almost detrimental).
05:22 airlied: CADIndie: I'd file issues with gfxreconstruct traces if you can get them
05:22 airlied: or just reproducer methods to start with
05:24 CADIndie: Thank you, we will try utilziing gfxreconstruct for this issue and will follow up soon.
05:30 HdkR: Also might be good to raise the alarm in QCom's graphics forum that people want their driver to support the features that Zink needs. Hopefully get them to increase priority for this on their end
05:32 HdkR: https://developer.qualcomm.com/forums/software/adreno-gpu-sdk They used to respond to this forum. But I haven't tried in a decade myself
05:33 airlied: they don't usually do many updates once they ship the hw, you kinda get stuck on whatever they release with
05:35 HdkR: Yea, but good to talk about needs early so next year they implement at least one extension
05:36 CADIndie: We'll make sure to let them know, hopefully things start going in the right direction with them.
05:37 HdkR: There are multiple people wanting Zink with their blob, so hopefully more people yelling makes it actually happen
06:00 JoshuaAshton: MrCooper: Sorry, I mean the dmabuf was dup'ed
08:03 MrCooper: JoshuaAshton: thanks for the clarification, I guess some kind of incorrect reference count handling is likely then
08:08 dolphin: agd5f: I could have sworn amdgpu had a shader memory transfer option in the kernel in the past, has it been dropped or am I just remembering incorrectly?
09:39 danylo: CADIndie pmed you, tldr: open an issue with how to repro and api traces, a740 should render things correctly, the diff with prop driver is in perf. And we appreciate any real world usage of our driver =)
10:46 jfalempe: javierm, I'm looking to disable the fbcon output (kernel logs only) when drm_panic runs. But fbcon doesn't use "register_console()" so I don't see how it gets the printk() output.
11:28 karolherbst: did we ever support GL_KHR_shader_subgroup_shuffle in GL?
11:29 karolherbst: mhh we also don't support GL_KHR_shader_subgroup
11:29 karolherbst: nvm then
12:34 mripard: sima: any opinion on https://lore.kernel.org/all/20231128-kms-hdmi-connector-state-v4-5-c7602158306e@kernel.org/ ?
12:34 mripard: it seems like it the most trivial patch in the series, but also the most controversial :)
12:58 alyssa: fifth times the charm, right? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26440
12:59 karolherbst: pain
12:59 alyssa: 🥖
13:00 karolherbst: mhh
13:00 karolherbst: maybe the runner is down?
13:00 alyssa: different flakes :clown:
13:01 karolherbst: oh indeed
13:07 lumag: mripard, emersion: I have prepared reverts, could you please take a look? If that looks fine. I'll push it to drm-misc-next. https://gitlab.freedesktop.org/lumag/msm/-/compare/drm-misc-next...revert-solid-planes?from_project_id=9879&straight=false
13:12 mripard: they look fine, but there's no reason to bypass the usual rules there
13:12 mripard: so please send the patches, and wait to a r-b or a-b before merging it
13:13 lumag: mripard, ack
13:13 lumag: this is even better from my pov
13:28 mupuf: alyssa: oh boy...
13:29 mupuf: daniels: seems like the little job prioritization hack isn't sufficient and Marge jobs don't get picked up fast-enough
13:30 mupuf: would be good if someone from collabora could have a look at the generic runner eric_engestrom wrote, so that it could get deployed as soon as possible
13:33 daniels: it's nothing to do with that, it's that some of our network services had a terribly rough weekend, and a lot of the runners ended up taking themselves offline
13:33 daniels: they're all back now
13:33 lumag: mripard, excuse me, just verifying as not to make more mess than I already did :-D With the final Heikki's r-b is it fine to merge https://patchwork.freedesktop.org/series/122584/ through drm-misc?
13:35 daniels: I'm fine with using the script - last I remember we were waiting to see from the AMD PoV how the experiment was going - but it wouldn't really change anything for us apart from changing the no-runners-available failure mode from 'the last thing you see is 'waiting for job to start' in the log' to 'job never gets picked up'
13:37 mupuf: daniels: great!
13:38 daniels: mupuf: does 'great' mean 'the script works perfectly and you should use it ASAP', or?
13:38 mupuf: lol, no, it was for the "they are back now"
13:39 mupuf: As for the new runner, it would mean you could set gitlab job timeouts and it would actually guarantee that Marge would always be the next job picked up, unlike now
13:40 mripard: lumag: didn't pinchartl have some comments about it?
13:40 mupuf: right now, if I queue 10 jobs for the same runner, 9 of them will be run before any Marge job can start
13:40 mupuf: that's not ok
13:42 lumag: mripard:
13:42 lumag: <pinchartl> lumag: I'm not a big fan of it as I still feel there's something we should do better. but as I don't have a better solution to propose, I have no objection
13:42 lumag: <pinchartl> so, please go ahead, you can merge the series :-)
13:43 daniels: mupuf: hm? we overcommit our gitlab-side runners wrt lava, so the jobs get queued at a higher priority inside LAVA, which uses strict prio ordering
13:44 daniels: mupuf: so if you fire 10 jobs at it, of which 9 are non-Marge, the Marge one will get picked first
13:44 mupuf: daniels: you said that the overcommit was 2, right?
13:44 daniels: I think we bumped that a fair bit higher
13:44 mripard: lumag: then yeah, go ahead
13:46 lumag: ack, thanks
13:47 mupuf: daniels: You sure about that? Looking at gitlab's admin runner page, 3 runners I checked only have one runner
13:47 daniels: we have one runner instance registered which has a wide concurrent
13:48 mupuf: I see, and I assume you went a little crazy, like 16?
13:48 daniels: yeah
13:48 mupuf: so, then the only issue is the gitlab timeout
13:49 daniels: if there are any that are missing then those should be fixed regardless, yeah
13:50 mupuf: ok :)
13:51 daniels: I'm on other stuff for the next couple/few days, but I can give you a dump of every job's timeout if that's useful
13:57 mupuf: daniels: no need, I just misattributed the issue here ;)
14:50 javierm: jfalempe: you mean to disable at runtime and not at built time?
14:51 jfalempe: javierm, Yes
14:52 jfalempe: javierm, in fact from my test, fbcon doesn't write over the drm panic screen, so I'm not sure it's really needed.
14:53 javierm: jfalempe: interesting. I don't know tbh
14:54 javierm: tzimmermann: robher: my opinion is that "of/platform: Disable sysfb if a simple-framebuffer node is found" can wait for v6.8, I woulnd't target the v6.7-rc cycle, specially since is already -rc5
14:54 javierm: tzimmermann: robher: WDYT ?
14:55 jfalempe: javierm, I think the panic backtrace is written before the panic callback is called. And then even if I add some printk at the end of drm panic, they are not written to the framebuffer.
15:10 javierm: jfalempe: maybe you can make fbcon_unbind() an exported symbol and use it ?
15:22 jfalempe: javierm, fbcon_unbind() takes the console_lock() so I can't use it directly from a panic handler. But I can do something similar.
15:34 sima: jfalempe, javierm imo if we have the drm panic handler, we need to nuke the fbcon one preemptively
15:34 sima: fbcon_unbind from panic context is a definite no-go imo :-)
15:35 sima: or anything like that really
15:36 sima: something like an fb_info->flags NO_PANIC or so maybe that's set at driver registration
15:36 sima: eventually we need to push that into the vc layer too, so that it can be handled in the actual console functions
15:37 sima: but that needs oggness' panic rework to get much further first
15:37 sima: mripard, perfect bikeshed :-)
15:37 sima: mripard, want me to reply? generally I'm with vsyrjala and jani, _init functions shouldn't fail and at most should WARN_ON if it's abused
15:38 sima: unfortunately C doesn't let us encode requirements all that well into function signatures
15:38 sima: connector_init is kinda me because the xarray can fail to allocate memory, but in practice it cannot, so drivers are kinda ok in outright ignoring that error value
15:39 sima: and so I don't think we should use the fact it's not void as an excuse to add more error cases, there should be none
15:39 sima: (GFP_KERNEL never fails for small allocs like 1-2 xarray branches, the kernel allocator will retry forever)
15:41 javierm: sima, jfalempe: I've to admit that suggested fbcon_unbind() without thinking on the locking situtation :)
15:42 sima: javierm, in general panic is not the place where you want to make any big scary calls, irrespective of locking
15:42 sima: so if something must not happen during panic, we need to arrange for that upfront
15:42 javierm: sima: got it
15:42 javierm: jfalempe: at first it can be a build time option though, depends on !VT
15:43 sima: like ideally on a system with drm panic we'd completely disable the vt console
15:43 sima: or at least the panic part
15:43 jfalempe: javierm, the current KConfig is !FRAMEBUFFER_CONSOLE
15:43 sima: so pure thread based printing, but that needs oggness' work
15:43 jfalempe: but I want to remove that, so that it's easier to enable for distribution.
15:44 javierm: sima: but even without that, and if is very slow, it would be better than nothing right?
15:44 javierm: jfalempe: I understand the goal but from the discussion it seems that won't be trivial
15:44 jfalempe: sima, I'm not able to make fbcon write over the panic screen, so I think even doing nothing works.
15:45 sima: javierm, oh it could be a fallback still
15:45 sima: so in static struct console vt_console_driver replace the ->write with ->write_threaded (or whatever it's going to be called)
15:45 sima: on a system with drm panic handler
15:45 sima: otherwise we need the legacy ->write panic horror show
15:46 sima: jfalempe, we've mostly disabled a lot of it already because too dangerous for drm drivers
15:46 sima: jfalempe, if you have a shadow buffer it won't do anything, since that wont run
15:46 sima: or damage handling in general
15:46 jfalempe: sima, ah ok, that can explain why it works.
15:50 jfalempe: javierm, sima, maybe we can merge drm_panic with !FRAMEBUFFER_CONSOLE as a first step, and then patch vt_console_driver, and remove this condition with confidence ?
15:50 javierm: jfalempe: that's what I was suggesting, yes
15:50 sima: yeah that sounds ok to get this going
15:51 javierm: jfalempe: because you mentioned distros enabling for testing but they couldn't really do it without having a user-space KMS based VT, a wayland compositor in the initramfs, etc
15:51 sima: maybe put the reason why into the Kconfig text
15:51 javierm: there's still a lot of missing pieces in the distros to make this feature useful IMO
15:51 sima: javierm, well also quite a few loose ends on the kernel side, but need to get going somehow somewhere
15:52 jfalempe: javierm, yes I think the drm_panic can still be useful without the KMS based VT.
15:52 javierm: sima: agreed. Hence my suggestion to make it depends on !VT (although !FRAMEBUFFER_CONSOLE probably makes more sense)
15:53 jfalempe: I didn't get review a drm_panic v5, I will send a v6 with just rebase and warning fixes, and see if it gets more eyes.
15:54 javierm: jfalempe: I think what I meant is that wouln't add too much value if one already has a fbcon/VT
15:54 sima: javierm, otoh we could just limit to finishing the CONFIG_VT=n case and ignore all the remaining warts :-)
15:55 javierm: sima: exactly :)
15:55 sima: although I guess ogness would still want to make the entire vt layer use a vt_console_lock to untangle this mess from the core side
15:56 jfalempe: javierm, I still think drm_panic looks more professional, and with qrcode, it can also help distribution tracking panics.
15:58 javierm: jfalempe: that's true too
16:19 agd5f: dolphin, the radeon driver at one point had an option to use shaders for kernel paging IIRC, but I think that all got migrated to CP DMA and SDMA.
20:46 karolherbst: impressive marge backlog....
21:04 Company: marcan: your toot's "Gnome prefers zink over asahi" - is that a GTK or a gnome-shell thing?