IRC Logs of #dri-devel on irc.freenode.net for 2024-12-09

00:32 karenw: "ERROR: [../src/amd/vulkan/radv_physical_device.c:1969] Code 0 : Could not open device /dev/dri/renderD129: Invalid argument (VK_ERROR_INCOMPATIBLE_DRIVER)" Oh boy I love updating my drivers only to lose vulkan support. *sigh*
00:35 karenw: This laptop is ultra cursed anyway (inegrated gpu using radeon, high-perf gpu using amdgpu) so I'm not too surprised. Time to debug and find out why.
00:46 KarenTheDorf: Welp, mesa mainline doesn't even boot correctly, most things segfault at launch, including plasma. No more hardware acceleration for me for a while I guess.
01:05 DavidHeidelberg: "lawyers" hanging around, please review https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32447 it's not rocket science, but it should clarify licensing to the distributions and people using the Mesa ! (corporate important stuff! :D )
01:06 DavidHeidelberg: it follows the Linux kernel conversions, which seems to be pretty clean and nice.
01:12 soreau: karenw: do either of the gpu's have modifier support?
01:12 karenw: soreau: How would I check? (I am currently downgraded to stock ubuntu)
01:12 karenw: They are ~10 years old though
01:13 soreau: so probably not. you might be being bit by https://gitlab.freedesktop.org/mesa/mesa/-/issues/12253
01:15 karenw: Sounds like it. I don't know what the "Incompatible driver" issue was, unless I mistakenly had changed mesa version and not rebooted.
01:15 karenw: But I assume that's the bug that causes my entire DE to crash when building from mainline mesa
01:16 soreau: well there's a fix toward the bottom with an MR
01:16 dviola: I was bitten by that too (virtio-gpu/virgl on QEMU)
01:16 soreau: might be worth a shot to try that branch
01:20 karenw: Will give it a shot
01:41 karenw: What are modifiers and why are they apparently such a pain recently?
01:45 clever: [Sun Dec 8 15:19:30 2024] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=105323402, emitted seq=105323404
01:45 clever: https://gist.github.com/cleverca22/24d64c92865e6b38e13b7a4c85a3425f
01:46 clever: ive got a re-occuring problem on my main desktop, where a gpu action has a timeout, so the driver hard resets the entire gpu
01:46 clever: [Sun Dec 8 15:19:31 2024] [drm] VRAM is lost due to GPU reset!
01:46 clever: and during the reset, all state is lost, and Xorg doesnt recover, requerying a -9 to X to get the system usable again
01:47 clever: 1: can X be improved to recover without a restart? 2: can the gpu failures be diagnosed further and resolved?
01:48 clever: https://i.imgur.com/pN8wZuV.png the GPU also randomly does this for a single render job
01:48 clever: for things like xterm that dont repaint often, it will persist until the next repaint
01:49 karenw: Hmm, I wonder if that's what I get. I don't lose kwin, but occasionally random applications will get a "This context is innocent" message and die.
01:49 clever: for video/games, its repainting at 60hz, so its just a flash
01:50 clever: i already went into the kernel source before, and raised the timeout massively, it didnt help, it just made it take longer for the kernel to complain
01:51 clever: once the gpu is reset and X hangs, chvt also hangs, so you cant get back to text mode until X is killed
01:59 vignesh: lumag: Yes, there is a plan to uprev IGT (See https://lore.kernel.org/dri-devel/20241128042025.611659-1-vignesh.raman@collabora.com/T/#ma297524f33b43e24b42163c9976f7de86bd17d59)
02:05 clever: karenw: and poof goes the gpu once more, got an X bt this time
02:10 clever: karenw: https://gist.github.com/cleverca22/b0005adb348f31244ea0513c482d015f the X bt
02:10 clever: it looks like a gpu function gave an error, so X ran abort()
02:11 clever: and abort() then leads to gpu de-init, which hangs
02:11 clever: possibly because its abort()'ing with a lock held?
02:54 clever: amdgpu: The CS has cancelled because the context is lost. This context is innocent.
02:54 clever: karenw: and with further digging, i found that X logs to stderr, and digging revealed a second log file (lightdm) that has the stderr
08:23 austriancoder: daniels: I was told that you might be working on arm32 mesa container thing that is needed to use ci-tron for arm32 devices. Is there any eta?
09:51 Kayden: is there a reasonable way to get a disassembly of .dxbc files?
09:52 Kayden: trying to see if an app is doing silly things in its shaders or if vkd3d-proton is translating things in ways I'd prefer it didn't
09:52 Kayden: managed to get those with VKD3D_DUMP_SHADER_PATH
10:10 glehmann: Kayden: I think https://github.com/doitsujin/dxvk-tests/blob/main/shader/dxbc_disasm.cpp would work, not sure if there is an alternative that doesn't require wine
10:27 Kayden: thanks!
10:47 zamundaaa[m]: <clever> "1: can X be improved to recover..." <- Not really to the first, difficult to say for the latter.
10:48 zamundaaa[m]: You could use kwin_wayland instead of Xorg though, which does survive GPU resets. Xwayland still crashes, but it's less bad
13:53 jani: airlied: sima: hey, we'll need a backmerge of -rc2 to drm-next because of cdd30ebb1b9f ("module: Convert symbol namespace to string literal")
13:54 jani: drm-tip doesn't build because we got that via drm-fixes, but e.g. xe tree has old style namespace naming
13:56 jani: there's https://lore.kernel.org/r/20241209121717.2abe8026@canb.auug.org.au and https://lore.kernel.org/r/Z1BawrcFMsj0ByLk@sirena.org.uk maybe more
13:56 jani: demarchi: thellstrom: rodrigovivi: fyi ^
14:00 rodrigovivi: jani: ack... then I follow with a backmerge of drm-next into our -next ones right after...
14:00 jani: rodrigovivi: right. up to you whether you want to fix the namespace in the merge commit to ensure it all builds
14:02 jani: rodrigovivi: http://paste.debian.net/1339019/
14:03 Konstant: Hello! Newbie question about RADV. Am I understanding correctly that BVH format is hardwired into GPU? I'm researching space partitioning algorithms for fun and it would be interesting to try out different tricks if the format is not hardwired. I found BVH encode compute kernel in repository, but no traversal, therefore it looks like it is indeed hardwired. Of course, I could just try different things with compute shaders and classic p
14:03 Konstant: ipeline, but it would be interesting to try to inject custom BVH format into raytracing acceleration structures if it is possible. (sorry if it is a duplicate, its my first time using IRC and I'm receiving "unable to send" errors)
14:03 rodrigovivi: jani: thank you!
14:08 sima: rodrigovivi, so should I roll drm-next to -rc2 so you can do the backmerge and sort things out for xe?
14:08 rodrigovivi: sima: yes, please
14:17 sima: rodrigovivi, done
14:23 glehmann: Konstant: if you want to use the ray tracing hw acceleration (mainly image_bvh64_intersect_ray), the bvh format can't be changed
14:23 Konstant: glehmann: Thanks for the answer.
14:52 sima: tzimmermann, mlankhorst I guess drm-misc-next also needs a backmerge and fixup for the module namespace stuff, see the mail from sfr
14:52 sima: https://lore.kernel.org/dri-devel/20241209121717.2abe8026@canb.auug.org.au/ this one
14:52 sima: airlied, ^^
14:53 tzimmermann: sima, i've seen another one of those in xe_vsec.c
14:54 tzimmermann: easy to fix
14:54 sima: tzimmermann, yeah, rodrigovivi is already handling that one with a backmerge to xe
14:54 sima: or is that one also in drm-misc?
14:55 tzimmermann: sima, i'm not on duty, but I'll check for the backmerge tomorrow
14:55 tzimmermann: sima, i've seen it in -tip
14:55 rodrigovivi: I'm on it
14:56 sima: tzimmermann, yeah tip is a bit busted rn until this is all sorted
14:58 sima: imre, for the hotplugged connector discussion, I think unconditionally adding to the connector list in drm_connector_regsiter and then still using that same list as we do now for delayed registration if the overall drm_device isn't registered yet should work I think?
14:58 sima: at least I couldn't come up with a corner case where it fails
15:18 mlankhorst: sima: Yeah I just noticed the buildfails
15:19 mlankhorst: was also hit by another failure in relocate_kernel
15:22 sima: mlankhorst, you'll do the backmerge or is it on mripard (who seems not around)?
15:24 mlankhorst: I'll take a look
15:27 mlankhorst: hit the compile failure too
15:29 mlankhorst: Do I include the build failure fix in the backmerge or as separate commit?
15:31 sima: mlankhorst, depends if you want to give sfr credit, imo either is fine as long as you push the fix together with the merge to make sure no one sneaks in
15:31 sima: or just credit sfr with a Link: to the mail
15:32 sima: if you do it in the merge as a fixup
15:32 sima: mlankhorst, see also further up for the sha1 curtesy jani that broke things
15:32 sima: that needs to be in the merge commit as the reason
15:36 mlankhorst: Yeah I'll finish compile testing first, then push. :)
15:37 mlankhorst: I had noticed the same issue on Xe with xe_vsec, and relocate_kernel for an unrelated failure that's config dependant, one of those releases..
15:37 mlankhorst: rodrigovivi: Can you also test it works without VSEC symbol?
15:39 rodrigovivi: I have the merge commit ready here, just build is taking longer then I'd like too... pushing soon
15:39 rodrigovivi: just in time... build passed... pushing it right now
15:40 mlankhorst: Pushing 1 merges and 0 non-merge commits. Merges should only be pushed by maintainers. Are you sure? (y/N) y
15:40 mlankhorst: one wins!
15:42 sima: mlankhorst, you typoed sfr's name :-/
15:42 mlankhorst: oops
15:43 mlankhorst: I have had his name wrong in my mind for years then!
15:43 sima: mlankhorst, tzimmermann -tip should work again I hope
15:44 tzimmermann: thanks
15:44 mlankhorst: https://lore.kernel.org/all/20241208235332.479460-1-dlemoal@kernel.org/T/ you wish :-)
15:47 sima: yeah but that seems hard to hit and not a drm one
15:47 mlankhorst: True, I was hitting that one though
16:12 imre: sima, I think detecting/adding MST connectors should work already before registering the device to be visible by userspace.
16:13 imre: there seems to be a check in drm_connector_register() to handle this
16:14 imre: the !connector->dev->registered and registration_state != DRM_CONNECTOR_INITIALIZING
16:24 sima: imre, yeah that's what I mean, leave that check and all the logic around it unchanged
16:24 sima: and for dynamic/mst connectors move the list_add from connector_init to the top of connector_register, without any additional conditions
16:24 imre: ah
16:25 sima: the delayed registration we still need, because that can only be done when we have the sysfs/debugfs files or it will blow up
16:25 imre: wouldn't the logical place for list_add be after connector->registration_state = DRM_CONNECTOR_REGISTERED; in drm_connector_register() ?
16:25 sima: but internally within the driver the connector list is all the registration we have, so that should always happen
16:25 imre: I thought that's what you meant
16:26 imre: at least wrt. GETRESOURCES and MST ..
16:26 sima: nah, before, even outside of the mutex I think
16:27 sima: there's about 3 cases
16:27 sima: 1. non-dynamic connectors: nothing changes
16:28 sima: 2. dynamic connector, registered after drm_dev_register: no delayed registration, drm_connector_regsiter called from connector/mst code does it all
16:29 sima: 3. dynamic connectors at driver load before drm_dev_register: this is tricky, but drm_connector_register called from connector/mst code needs to add the connector to the connector_list, but then the delayed register call from drm_dev_register will do everything else, since we still need to delay that part due to debugfs/sysfs nesting
16:29 sima: imre, that also means that drm_connector_register_all needs to call a version of _register which doesn't do the list-add, or things go boom
16:30 imre: so list_add on top of drm_connector_register() should happen only if the connector is not alreay on the list, right?
16:30 sima: yeah
16:30 imre: ok
16:31 sima: because there's also drivers that have non dynamic connectors but still call drm_connector_register, because we haven't finished the todo I linked
16:31 sima: that's why I suggested to key this off of the connector type being mst
16:31 sima: so that it's consistent
16:31 sima: otherwise if we both list_add in connector_init (non-dynamic version) and connector_register we have another boom
16:32 sima: or you do dynamic versions of both _init_ and _register
16:32 sima: or you do the handful of patches needed to nuke all the few remaining surplus drm_connector_register calls
16:33 sima: git grep says 9 drivers left with those
16:33 sima: if I'm correct that mst is really the only thing we hotplug (since we don't hotplug bridges yet)
16:37 imre: okay
16:38 sima: imre, hm a sanity check we should probably add is WARN_ON(dev->registered) in the non-dynamic connector_init
16:38 imre: this is what I also suggested in the email thread, except of an internal drm_connector_register() for the deferred registration
16:38 sima: because that would be a clear bug
16:38 imre: so I think this could be done now
16:38 imre: but one more thing
16:39 sima: hm yeah maybe I misread then a bit what you were aiming for, it looked like you missed the GETRESOURCES case for case 3 above
16:39 imre: I guess you agree that users should see the connector only after it's fully inited
16:39 imre: that is sysfs/debugfs is already added for it
16:39 imre: and late_register is called for it
16:39 imre: ?
16:39 imre: atm GETRESOURCES would not guarantee that
16:39 sima: hm
16:39 imre: as you pointed out, yes
16:40 imre: but it's an existing issue
16:40 sima: yeah I guess that's a really good reason to do the list_add at the end of connector_register
16:40 imre: but we can't atm
16:40 sima: but still needs to happen unconditionally because of case 2
16:40 sima: hm why?
16:40 imre: the best would be a 3rd connector list
16:40 imre: just for registration
16:41 imre: if deferred registratoin happens via the current connector list
16:41 sima: oh case 2 where the mst gets added before drm_dev_register ...
16:41 imre: then list_add needs to happen on top of drm_connector_register()
16:41 imre: yes
16:41 sima: well, that is a much bigger issue
16:42 sima: because it also impacts the non-dynamic connectors, and we cannot hotplug those a notch later because that could confuse userspace
16:42 sima: like ideally the entire mess would get added atomically, but that's not how this works unfortunately
16:42 imre: how are non-dynamic connectors impacted?
16:42 sima: imre, so for this I'd just add a patch which adds a comment to drm_dev_register explaining how we're kinda screwed
16:43 sima: they have the same race right under drm_dev_register
16:43 imre: right
16:44 imre: so for instance GETRESOURCES sees a non-dynamic connector before its sysfs/debugfs is added
16:44 sima: yeah
16:44 sima: we just hope and pray the system doesn't boot fast enough :-/
16:45 imre: what about checking registration state in GETRESOURCES
16:46 sima: hm right it's not just debugfs/sysfs, it's also GETRESOURCES and GETCONNECTOR because we have the drm_mode_object_register in there
16:46 imre: GETCONNECTOR checks mode_config.object_idr
16:46 imre: so that's ok?
16:47 sima: imre, wrong way to close the race, we want to make sure GETCONNECTOR immediately works right when you can open the drmfd
16:47 sima: not delay the visibility in GETRESOURCES, that only works for case 3, not 1
16:48 sima: so I guess for drm_dev_register we'd need to split drm_connector_register_all into 2 phases
16:48 sima: phase run before drm_minor_register: adds to list and registers mode object
16:48 sima: phase run afterwards: calls ->late_register and the sysfs/debugfs stuff
16:48 sima: I frankly would just record this as a FIXME somewhere and focuse on the more minimal fix for case 3
16:49 imre: ok
16:49 sima: since this issue is about as old as the delayed connector registration
16:50 sima: which is like over 8 years
16:50 sima: and I haven't heard anyone yell thus far
23:24 pinchartl: tomba: your V4M series seems ready, feel free to push through drm-misc