IRC Logs of #dri-devel on irc.freenode.net for 2023-08-03

00:05 mareko: zmike: clip/cull indexing
00:05 zmike: ah
00:05 zmike: yes, well, there's tricks to work around it
00:05 mareko: we have lots of piglit failures for 64-bit IO already
00:05 zmike: just seems inconsistent that they're needed
00:05 zmike: 64bit io is cancer
00:07 HdkR: How soon until smooth interpolation is allowed for 64-bit IO
00:07 mareko: never
00:08 mareko: also it's possible today
00:08 HdkR: Just do the interpolation manually using the exposed barycentrics? :)
00:09 mareko: there is more to it, you also need explicit loads with a vertex index in FS
00:10 mareko: and disable the vector subtract for P1 and P2 inputs
00:10 HdkR: So you're saying we need a new extension to let the driver do it all for you right? :)
00:11 mareko: no, Vulkan can do it already, hakzsam implemented custom interpolation not so long ago
00:13 zmike: mareko: what's left with your linker thingy? do you plan to merge it in this release cycle?
00:14 mareko: I could if I don't implement the remaining stuff
00:14 zmike: ah
12:11 dliviu: hello, drm-misc-next maintainership question: I have a patch that did not Cc dri-devel, only sima and airlied directly. Noticed when trying to apply with dim. What is the protocol? Should I ask to resend?
12:27 alyssa: jenatali: windows runners were down yesterday (IDK if they still are) so couldn't test windows with nir 2.0 MR, you might want to give that a smoke test
12:28 jenatali: alyssa: Pretty sure I clicked play on the Windows jobs on at least one pipeline
12:28 alyssa: :+1:
12:51 jenatali: Ugh, the D3D jobs depend on clang-format now?
12:51 jenatali: That's a huge pain
12:52 jenatali: I need to undo that, there's no good reason to trigger a Linux container just to run Windows jobs and it means I can't run just the jobs I want with one click from the UI anymore
12:52 daniels: jenatali: fwiw, .gitlab-ci/bin/ci_run_n_monitor.py --target dozen-deqp is what you want
12:53 jenatali: Yeah but as I've said, Windows doesn't have a way of storing tokens for that script which makes it a huge hassle every time I want to use it
12:53 jenatali: Which I guess I could try to fix that instead, but also I don't think there's value in depending on clang-format for the Windows jobs
12:54 alyssa:regrets putting clang-format in CI
12:54 jenatali: Unless there was a separate clang-format job that ran on the Windows runner, which I also don't think is valuable
13:36 javierm: sima: sorry, I missed your message before because I was on PTO, but I see that emersion already answered
13:41 hwentlan_: sima, emersion, yes, I'm aware
13:51 MrCooper: emersion sima hwentlan_: there's https://gitlab.freedesktop.org/drm/amd/-/issues/1740 but it was timeout-closed by Mario
13:51 emersion: ah yes that one
13:51 hwentlan_: yeah, that ticket is still valid
13:52 emersion: i really don't like that auto-close policy
13:52 emersion: i spent a lot of time collecting info about bugs and then it all goes to the void
13:52 hwentlan_: but the reason most things takes long is because they go to our DML (display mode lib) for bandwidth computations which is massive and therefore slow
13:53 hwentlan_: it's not an easy problem to solve
13:53 emersion: yeah…
13:53 sima: yeah 2ms for 5 test_only atomic calls is a bit much ...
13:54 sima: well it just seems to be one really that's bad
13:54 sima: *really bad
13:55 emersion: yeah, really depends what you test for
13:56 hwentlan_: is this mostly about enabling planes?
13:57 hwentlan_: one could probably pre-compute certain common scenarios and cache them, but it wouldn't be pretty and would be sub-optimal since you can't pre-compute everything
14:00 sima: hwentlan_, yeah a bit of grepping in the attached dmesg and it's only plane changes nothing else
14:01 sima: also only one crtc
14:02 sima: so should be a substantial subset of the overall atomic state and I have no idea how you managed to burn down over a ms on computing stuff with that
14:02 MrCooper: https://gitlab.freedesktop.org/drm/amd/-/issues/2186 seems like higher priority though; can make moving the mouse cursor very painful on my gaming rig
14:02 sima: but iirc amdgpu dc is pretty aggressive in escalating to "grab all states, recompute everything"
14:04 sima: the 0.1ms is more in line with "atomic ioctl is just not very fast" stuff I think
14:06 sima: for that I'd expect that profiling is needed to make sure you knock out the right inefficiencies, since they're absolutely everywhere
14:07 sima: maybe with vkms with some planes and just pushing through test_only commits as fast as possible
14:10 robclark: mort_: hmm, well apq8016 has legacy cursor.. and is using disp/mdp instead of disp/dpu.. could be something related to one of those two things? Maybe try sw cursor to rule out something cursor related?
14:15 hwentlan_: MrCooper, I agree
14:29 mort_: robclark: I don't know if I'm configuring X correctly .. but I tried to enable swcursor, and it doesn't seem to have stopped the leak
14:29 mort_: I created a /usr/share/X11/xorg.conf.d/20-swcursor.conf with: Section "Device" \n\t Identifier "Card0" \n\t Option "SWCursor" "true" \n EndSection, which should turn on software cursor from what I can tell
14:32 robclark: if you enable some drm.debug traces, you should see mouse cursor (while nothing else is changing on screen) generate drm traces if hw cursor is used, but not otherwise.. or otherwise maybe check /proc/interrupts (hw cursor updates will enable vblank irq)
14:32 MrCooper: I'd check the Xorg log file for whether the option is actually taking effect
14:32 robclark: ahh, yeah, that might be easier
14:37 mort_: the log does contain '(**) modeset(0): Option "SWcursor" "true"' so yeah, that's taking effect
14:42 robclark: mort_: ok.. then in theory you would see this with kmscube? That might be an easier/simpler thing to debug.. plus it can use either legacy pageflip like xf86-video-modesetting and atomic ioctl
14:45 mort_: robclark: kmscube says "failed to set mode: permission denied", or, if I run it with -A, it says "failed to commit: permission denied"
14:46 mort_: that's running without either a window manager or a compositor, if that matters
14:46 robclark: kill xorg first, if you haven't
14:46 mort_: ohh I assumed this was an X application
14:46 robclark: nope
14:46 javierm: mort_: no it's just a kms app. https://gitlab.freedesktop.org/daniels/kms-quads is another nice KMS test app
14:47 mort_: it's leaking both by default and with -A
14:47 robclark: hmm, ok
14:50 mort_: as a hack, could I increment the crtc refcount in the alloc and then just plain kfree it in atomic_state_default_release, or is it meant to still be used after the release
14:53 robclark: it is needed after the release if someone is waiting on one of the completions
14:59 robclark: looking at the kmemleak hexdumps.. it is leaking a single reference (the kref starts at the 9th byte)
14:59 mort_: I have also found with my prints that the leaked commits end up with a refcount of 1
15:00 mort_: robclark: I am happy to help debug this if you want, and help debugging it would be appreciated. However, until and unless we figure it out, I will work around it by having a pool of allocated commit objects which I cycle through and re-use, so I don't think I will work on this myself outside of getting information for you
15:01 sima: ... catching up: are we leaking struct drm_atomic_state?
15:02 mort_: yeah
15:02 robclark: sima: https://p.mort.coffee/REk fwiw
15:02 mort_: with the msm drm driver, on my hardware (but not robclark's), on recent kernels (both 6.1.34 and 6.4.7 are tested), a drm_crtc_commit structure is leaked from drm_atomic_helper_setup_commit every flip
15:03 sima: hm yeah that's leaking like a sieve :-/
15:03 robclark: I'm not seeing anything else leaked, so it isn't like we are leaking crtc state or something
15:03 mort_: yeah
15:03 sima: well anything in there is still referenced I assume
15:04 robclark: could be mdp5 vs dpu.. but driver itself doesn't touch the commit obj
15:04 sima: it's a bit of work, but might be worth it to wire up ref_tracker.h support
15:05 sima: ofc the thing is gloriously undocumented :-(
15:06 mort_: sima: see https://p.mort.coffee/DVX, I manually traced inc/dec :p
15:08 dliviu: asking again: is it OK to submit a patch in drm-misc-next that doesn't have a link to patchwork because dri-devel was never Cc-ed?
15:08 sima: oh it's struct drm_crtc_commit
15:13 mripard: we had a similar issue for vc4 at some point, but I can't retell how we fixed it up https://github.com/raspberrypi/linux/issues/4474
15:14 mripard: hopefully it will help :)
15:15 mripard: looking at the offending patch, I think we were doing a get in setup_commit and one in destroy_state
15:15 mripard: but we were also doing a get in duplicate_state
15:15 mripard: so the refcounting was off
15:30 sima: mort_, robclark I think I got it
15:32 mort_: sima: ooo, interesting, do tell
15:32 sima: https://paste.debian.net/hidden/8a28a3f8/
15:32 sima: broken since years it seems
15:33 sima: 2017 is when we added refcount for the plane_state->commit pointer
15:33 sima: mort_, testing would be much appreciated, if it checks out I'll bake it into a proper patch
15:34 sima: took me a while to load all the stuff into my brain again, since for a simple flip we have like 5 references floating around ...
15:36 mort_: fwiw, this patch also "fixes" (or, well, works around) it: https://p.mort.coffee/P4n.diff, I'm running that right now and it's not leaking yet graphics is working
15:36 mort_: but I will check out that patch, it's a much more proper fix
15:38 sima: mort_, yeah that's just very dangerous duct-tape
15:38 mort_: indeed
15:39 sima: I'm pretty sure I've found it, because reconstructing the refcount leak with your log pointed me at exactly the code that was buggy
15:39 sima: mdp5_plane_destroy_state wasn't dropping the refcount for plane_state->commit like it should
15:39 sima: and that's been broken since 21a01abbe32a roughly
15:40 mort_: the fix looks very logical
15:49 robclark: sima: oh, good catch, looks like mdp5 missed conversion as more stuff got added to plane state
15:53 daniels: emersion: so apparently I don't know how to work git-send-email anymore, but thanks for the necromancy https://lists.freedesktop.org/archives/dri-devel/2023-August/417333.html (cc sima, anyone else interested in dmabuf/uapi)
15:53 emersion: ohhhhhhhhh
15:53 emersion: <3
15:53 emersion: i've linked that doc patch *so many times*
15:53 daniels: I'd like to say two years is a new low, but it's probably not :\
15:54 sima: emersion, you'll do the honors of applying it?
15:55 emersion: i'll read it up and, yeah, apply it
15:55 javierm: daniels: the glossary section is very useful!
15:55 emersion: ah the missing cc's
15:56 daniels: javierm: you can thank pq for that! I just copy & pasted and added like two things
15:56 javierm: daniels: cool, thanks pq :)
15:57 daniels: emersion: yeah, turns out if you use git-format-patch so you can write revision notes, then use git-send-email to send those, the Cc in your cover letter only applies to the 0/n, not to the individual patches
15:57 daniels: it also turns out that it won't cc everyone on the thread if you didn't bother telling it to
15:57 emersion: git send-email --annotate
15:58 emersion: in DRM also we put the CC list in the commit message
15:58 javierm: daniels: I've moved to patman almost a decade ago and never looked back
15:58 emersion: git send-email will add everybody automatically
15:59 emersion: but yeah, should really write a git-send-email alternative which doesn't suck
15:59 sima: apparently it's called b4 or something
16:00 javierm: emersion, sima: did you ever try patman? https://u-boot.readthedocs.io/en/latest/develop/patman.html
16:00 _jannau__: another one? patman and b4 are already muh better
16:00 emersion: b4 seemed really tied to the kernel workflow last time i looked
16:01 sima: emersion, I think the only thing we might want to add in a follow-up is that for non-linear format everyone needs to use the stride computation like drm has for that format/modifier combo, or things break
16:01 sima: that's one thing that has lead to really long threads in the past
16:01 emersion: i just want a stateful tool which remembers version/cc/etc depending on the branch i'm in
16:01 emersion: ah, yeah, good point
16:03 daniels: javierm: I've moved to GitLab
16:04 daniels: emersion: from what I've seen of patman, it's the closest thing indeed to that
16:04 daniels: emersion: TIL --annotate!
16:04 _jannau__: b4 depends on email based workflow since it keeps the metadata in an empty commit
16:04 sima: s/impliied/implied/
16:04 sima: emersion, ^^
16:05 emersion: i'll need to check these tools out again
16:05 javierm: daniels: I'm part of the patman evangelism strike force :P
16:05 javierm: having all the patches metadata in the commit messages and just run patman is amazing. And you even have a --dry-run option
16:06 daniels: yeah, patman is definitely the right thing for email workflows
16:07 sima: emersion, a-b: me on both, just finished reading
16:07 emersion: cool
16:08 sima: Care and attention should be taken to ensure that
16:08 sima: + zero as a default uninitialized value signals no modifier.
16:08 sima: ^^ maybe add ", and must not be accidentally mixed up with DRM_FORMAT_MOD_LINEAR, which equals zero"
16:09 sima: but fine either way
16:09 emersion: hm, zero does not mean "no modifier" then?
16:10 emersion: also zero is not "uninitialized"
16:13 sima: I guess it depends, some interfaces guarantee that no modifier is signalled with MOD_INVALID
16:14 sima: some have an out-of-band flag (like addfb2/getfb2)
16:14 emersion: yeah
16:14 sima: I thought the note is just to make sure that you don't accidentally mix things up since it's confusing
16:17 ayaka: I watched XDC 2022 | Explicit Synchronization for Linux Display Servers
16:17 sima: maybe we should clarify this in a follow up
16:19 daniels: that's ... not what I meant to write
16:19 ayaka: I think the sync point is only used for notified the event like a frame is rendered(by GPU), it can't be used to notify partly render is done?
16:19 daniels: missing some words I think
16:20 daniels: something like 'Care and attention should be taken to ensure that zero (as a default uninitialized value or boolean comparison) is **not** confused with no modifier.'
16:20 daniels: total sense inversion ffs
16:20 ayaka: besides why we use IN_FENCE_FD? should we send the framebuffer to an atomic request only after the userspace have been notified?
16:20 emersion: i'd use some other word than "uninitialized" i think
16:20 daniels: ('don't be confused! do the opposite of what you should!')
16:20 daniels: emersion: just 'default' or?
16:21 emersion: "uninitialized" in C means that the contents can be anything
16:21 daniels: ayaka: in gfx, fences are only ever used to signal full completion of a frame. there would be no point in giving a Wayland server a fence which signaled when half the frame was complete
16:22 ayaka: daniels, then why amd is trying to implement async page flip
16:23 daniels: ayaka: they're totally different things
16:23 emersion: daniels, do you mean this? https://en.cppreference.com/w/c/language/initialization#Empty_initialization
16:23 ayaka: async page flip is scan out the rest of part with a new frame
16:23 daniels: emersion: I guess I meant 'explicitly initialised but to all-zero as a default value that's totally safe as a "nothing here" sentinel for everything except FDs and modifiers'
16:24 daniels: emersion: but that seems overly wordy
16:24 daniels: ayaka: yes, which is different to beginning to scan out something before anything's been rendered to it
16:24 emersion: "zero as a default value when omitted"?
16:24 ayaka: then why I can't flush the top part with a new frame for example I want to display a large graphics while its bottom is not decode yet
16:25 emersion: or just "as a default value"?
16:25 emersion: but yeah, i'm nitpicking here :P
16:27 daniels: emersion: yeah, I think just 'as a default value' is probably easiest?
16:27 emersion: wfm
16:27 daniels: or 'default or initial'
16:27 daniels: wanna just fix up locally or should I resend?
16:27 emersion: yeah, initial sounds good to me
16:27 emersion: i can fix up locally
16:27 daniels: ayaka: you can if you want, it's just that very few people want that
16:28 daniels: emersion: thanks!
16:29 ayaka: daniels, here is the case, a platform is designed for 4K decoding and display. But for the 8K display, its performance may not be enough. So you can't wait a frame to be finished to display
16:30 daniels: sure, in that case don't bother fencing, just display whatever's around at the time, and then the user will see half and half
16:30 ayaka: you must start to scan it out for example its half is done
16:30 daniels: ok, so then use fences to fence when half is done, or a third is done, or whatever
16:31 daniels: that's something you'd have to put in your driver, but I don't imagine 'signal fence when you've done half the work' would get accepted as uAPI upstream, so it's just downstream hacks in which case you can do whatever you want to
16:33 ayaka: well, that is what opencl could do, I am thinking about allow create multiple fences in video4linux because I didn't find any video4linux driver uses the fence
16:34 ayaka: its stateless decoder could support decode a slice but it has been hold the buffer until the whole frame is done
16:38 ayaka: also I didn't get why the kms driver itself would consume IN_FENCE, not the userspace should wait until the notification then submit the commit
16:44 ayaka: if the fence is not come that commit would not be submit, if the next commit came, it is the previous commit would be discard?
16:47 daniels: no, you can't queue multiple submits
16:52 emersion: hm, sorry, no time to finish this up tonight
16:53 daniels: emersion: I'm pretty sure it can survive another day :P
16:54 ayaka: daniels, yes, I forget I should wait for the event or the out fence
16:55 ayaka: but why we could let the kernel wait for fence not in the userspace?
16:57 daniels: because it avoids spurious wakeups and unnecessary queues with relatively deep/complex chains of operations
17:04 ayaka: daniels, I think I need some document to understand how this fence (in and out) work with egl
17:05 ayaka: from the kmscube, it would create an in fence for each cube frame and an out for each scan out
17:05 daniels: in-fences are waited for before rendering begins; out-fences are signaled when rendering completes
17:06 daniels: if you search around, there are a few presentations on how explicit synchronisation works
17:07 ayaka: I could understand what in fence and out fence are used for from drm doc
17:08 ayaka: but I just wondering why we can't resue those fence, we just need the out fence to make gpu to unlock the front buffer
17:08 ayaka: while in fence for display to wait the gpu completed its front buffer
17:09 daniels: how would you reuse fences? fences refer to one specific point in time
17:09 daniels: the in-fence kmscube passes to KMS, refers to the completion of exactly one drawing command made by the GPU
17:09 ayaka: because creating fence is a cost
17:09 daniels: it doesn't change in time to be 'the completion of whatever the latest rendering is'
17:10 ayaka: I am thinking the long time pending kmssink in gstreamer
17:10 daniels: yes, so in its render callback it would have to accept a dma-fence with each frame
17:11 ayaka: we don't need a in fence here but could we re-use out fence here
17:11 ayaka: we just create two out fence here
17:15 daniels: no, because fences always refer to one specific point in time
17:16 sima: robclark, ah right, the vma tracking locking fun was in the context of rpm I think ...
17:17 sima: mort_, have a testing verdict on the patch already?
17:18 ayaka: daniels, then should we drop the only poll drm event then switch to out fence way. Creating something and destroying something frequently doesn't sound a good idea
17:19 robclark: vma lock was completely uninvolved with that since isn't used for reclaim.. only to give userspace an error if it tried to change the VA of a in-use vma... the rpm/qos hell is still there
17:19 robclark: since that is about the obj resv / reclaim vs rpm/qos locking
17:20 ayaka: s/only/old/
17:20 daniels: ayaka: this is something that happens once every 16ms and is a very lightweight operation. I have no idea why you think it's a bottleneck.
17:27 zmike: eric_engestrom: any idea what's going on with https://gitlab.freedesktop.org/zmike/mesa/-/jobs/46637627
17:27 zmike: I retried a couple times and it seems broken
17:31 eric_engestrom: zmike: I've seen a bunch of 500 & 503 on the gitlab registry this afternoon, I'm guessing this is why it's failing
17:31 eric_engestrom: I don't know any more than that though, ask the admins on #freedesktop
17:32 ayaka: in 60fps mode it is true
17:32 eric_engestrom: also, it might be related to the windows runners being overwhelmed, possibly someone doing something heavy
17:37 daniels: looking at that job log, it's trying to pull x86_64-test_base and failing
17:37 daniels: looking at the x86_64-test_base job log, the job claimed to succeed, but the container push failed https://gitlab.freedesktop.org/zmike/mesa/-/jobs/46634148
17:37 daniels: so, retry that, then retry the other one
17:37 zmike: k
17:42 ayaka: daniels, now I am thinking the software fence is not fast enough, I am thinking anyone offer a hardware fence. Besides, ping-pong and page flip could be not enough, we had better prepare a queue that display could scan them out in order in such high fps case
17:43 ayaka: cpu is not that real time for such task
17:44 daniels: I haven't seen a system with a combination of such a high frame rate and such a low-end CPU that it couldn't service the IRQs quickly enough to do pageflips
17:44 daniels: but if that's a problem you're seeing, that's something you'll be solving I guess
17:45 ayaka: the cpu is not that low end it is quad arm a55 cores
17:47 ayaka: but cpu has more work to do, also we don't use irq but message box now
17:49 daniels: A55s can do pageflips
17:50 ayaka: as I said, cpu is occupied by the audio
17:50 daniels: well, if you manage to get yourself into a situation where you can't schedule enough time for one ioctl every 16ms (or 8ms or whatever), then that sounds quite bad, but also not something upstream's going to design for
17:50 daniels: tbh it sounds like a case of trying to optimise problems which don't exist; you're much better off doing actual real-world measurements before guessing
17:51 ayaka: Currently, it didn't run drm. But we have to use a queue(msg box) here to implement that refresh rate
17:53 ayaka: decoder could work exactly that speed in 4k mode, while not much for us to wait. Also we have to count the cost for REE and TEE context switching
17:56 ayaka: the point is whether the upstream method should do such thing frequently. Or why would we invent poll() that yield the thread
18:58 mort_: sima: I just installed a kernel with the patch, and so far it looks good! kmalloc-256 has been at 1380K for a while now, nothing else looks suspicious
18:58 mort_: I'll run a few memory logging tools I have overnight to see if I encounter anything weird but it's definitely not leaking a commit per flip anymore
19:05 sima: mort_, can I have some email for reported/tested-by credits?
19:18 mort_: sima: the best one to use is probably dorum@noisolation.com
19:26 mort_: hardware is a snapdragon 410 if that's relevant
19:45 fee1dead: Hi all. I'm having some trouble with a Touring card and the new NVK vulkan driver. My small vulkan triangle example is crashing Sway. I know it is experimental and not even merged, so maybe I should just wait? Wanted to know if it is a normal outcome or worth reporting and also asking for help if it is the latter. Thanks in advance!
20:09 airlied: fee1dead: probably at the just wait stage
20:30 fee1dead: Will do.
20:45 sima: robclark, just sent out the mdp5 leak fix, I'm assuming you'll apply somewhere?
20:46 robclark: yeah.. it shows up in freedreno patchworks so one of us will grab it
20:56 airlied: 90/92 sessions passed, conformance test FAILED so close, gl4.6 llvmpipe run, one ballot test in two configs
20:56 alyssa: Anyone know of test coverage for perspective correct interpolateAtOffset?
20:57 pendingchaos: the vulkan coverage probably isn't very good?
20:57 pendingchaos: IIRC RADV's is at least somewhat incorrect
20:58 alyssa: I'm not seeing anything in either gles or vk cts
20:59 alyssa: I also don't have a VK driver that's able to run the vk tests regardless, so really hoping for GL CTS or piglit coverage :p
20:59 alyssa: unfortunately the gles cts tests all pass even if I do perspective-incorrect interpolation
20:59 alyssa: Oh I know how to test this, I can just forcibly lower all interpolation and then see what breaks
21:16 zmike: piglit?
21:22 alyssa: worth a shot but I don't see coverage
21:23 zmike: I know there's interpolateatoffset coverage
21:24 alyssa: yes, but nothing that specifically tests for perspective correction
21:24 alyssa: i.e. with gl_Position.w set to anything other than 1
21:25 zmike: shader_runner to the rescue
22:41 glennk: alyssa, i vaguely remember the ulp error between affine and true perspective being less for the affine than trying to correct using a float division when doing this on evergreen
22:42 alyssa: hum?
22:52 glennk: see tgsi_interp_egcm in r600_shader.c
22:53 glennk: it uses affine interpolation for interpolateAtOffset