IRC Logs of #dri-devel on irc.freenode.net for 2023-01-25

00:26 airlied: Lynne: they should, I'll dig a bit more
00:42 DavidHeidelberg[m]: any chance that lavapipe reaches VK 1.3 soon?
00:43 airlied: isn't it 1.3 already?
00:43 DavidHeidelberg[m]: looking at code it looks like that
00:43 DavidHeidelberg[m]: but for some reason https://mesamatrix.net/ reports 1.1
00:44 airlied: apiVersion = 4206803 (1.3.211)
00:44 zmike: that's because the extension check there is wrong
00:44 airlied: yeah the ext check doesn't really match how vulkan works
00:44 zmike: and it's trying to require all extensions to determine version
00:44 airlied: where you can be vulkan 1.3 and not support all the etxs
00:45 zmike: the version is whatever the driver says it is
01:32 airlied: Lynne: this might help https://paste.centos.org/view/5d53abae or at least get a bit further
03:01 Lynne: airlied: yeah, this actually works
03:10 airlied: okay I'll push it to the radv repo
03:57 Lynne: VK_ERROR_INVALID_EXTERNAL_HANDLE
03:57 Lynne: hmm
03:57 Lynne: the ptr is very minImportedHostPointerAlignment aligned
04:00 airlied: Lynne: what driver gives that for what call?
04:07 Lynne: found the issue, my lengtth was not aligned
04:08 Lynne: works now, except for the very first slice
04:08 Lynne: pushed to my repo
04:09 Lynne: the rest of the slices get decoded fine, but the first one's not
04:11 Lynne: hmm, maybe I should try aligning in the negative direction too
04:16 Lynne: nope, aligning to the previous alignment point and adding offset to the slice offsets is even worse
04:17 Lynne: if you want to test this variant, this is the diff to apply to my repo - https://0x0.st/oFXB.diff
04:19 Lynne: err, hold on, I forgot to remove the offset
04:20 Lynne: it works! positive alignment works!
04:23 Lynne: still feels a bit like a hack though, negative alignment should work, combined with srcBufferOffset
04:29 airlied: Lynne: is the api specced as int or uint?
04:31 Lynne: uint, I meant the alignment direction (negative == align to previous, positive == align to next)
04:31 Lynne: all offsets are always positive
04:31 airlied: ah cool was wondering
04:32 Lynne: okay, https://0x0.st/oFXQ.diff is what ought to be working
04:32 Lynne: err, actually not, I forgot about the bitstream buffer offset
04:33 Lynne: *bitstream buffer offset alignment requirement special administrative investigative bureau of nomenclature
04:34 Lynne: so the version in the repo is fine, I'll test on anv
04:37 airlied: the repo version I have is giving the handler errors
04:37 airlied: handle errors
04:37 Lynne: last commit == 4439fe09?
04:37 airlied: 53f586bce203
04:37 Lynne: yeah, that's the old one, pull again
04:38 Lynne: anv kinda works, but after a few tens of frames pulls a VK_ERR_DEVICE_LOST while submitting
04:39 airlied: you pushed it? I'm not seeing it
04:39 airlied: I think I should reduce the anv value from 4k first
04:40 Lynne: I just pushed it again just in case
04:40 Lynne: I am getting VK_ERROR_INVALID_EXTERNAL_HANDLE errors, was testing on a small number of frames so I didn't see it at first
04:45 airlied: Lynne: this might help anv https://paste.centos.org/view/4f4ee7ac
04:50 Lynne: gets a bit further, first few frames are gray, still does a device lost
04:50 Lynne: srcBufferOffset is 0 though, so not sure why it helps
05:04 airlied: Lynne: it's the kernel throwing that error
05:05 airlied: so something is wrong with the ptr you are trying to host map
05:11 Lynne: hmm, it's definitely aligned
05:18 Lynne: gpu reset while just priting values, it's not fully working yet
05:18 airlied: Lynne: ah worked it out, you don't actaully allocate all the memory do you
05:18 airlied: the fails are because the ptr + size hasn't got pages backing all of it
05:20 Lynne: as in the size isn't enough?
05:21 Lynne: av_fast_realloc allocs to powers of two bytes only, so it usually leaves a large amount of padding, but maybe it isn't enough
05:26 Lynne: nope, tried aligning all slice allocs to the page size, still gpu crash
05:26 Lynne: either way, it shouldn't really matter, should it?
05:26 Lynne: we're not actually accessing any data past the last slice (if the file isn't corrupt)
05:31 airlied: Lynne: you can userptr an amount of unmapped memory
05:31 airlied: it has to be allocated and have pages in it
05:32 airlied: to answer the size is too much I think
05:35 airlied: host ptr 0x7f28a416c600 0x7f28a416d000 20864 24576
05:35 airlied: so you align from the first to the second, but there isn't 24576 bytes of memory mapped after that point
05:36 Lynne: I aligned the size in av_fast_realloc too, so there's always enough memory, it didn't help
05:39 airlied: I think you'd have to overallocate by the maximum delta
05:39 airlied: so probably an extra page
05:42 airlied: so above you allocate 20864@0x7f28a416c600 but try to map 24576@0x7f28a416d000
05:46 Lynne: I'll try it
05:46 Lynne:braces for another hard reset
05:47 airlied: Lynne: https://paste.centos.org/view/dc42187c that hacks seems to fix it
05:51 Lynne: I think I get it, the bitstream size align may push the size beyond what we've allocated
05:51 Lynne: after alignment
05:51 airlied: yup
05:55 Lynne: tried aligning vp->slice_size, which is the unaligned value, but that didn't work
05:55 airlied: my hack also cleans things up on anv
05:55 airlied: with my anv patch
05:59 Lynne: nope, your hack didn't fix the gpu crash on radv
05:59 Lynne:wants a separate coreboot machine to avoid staring at firmware logos
05:59 airlied: ah getting a VM fault
06:00 airlied: I'm guessing then something is undersized in some case then
06:00 airlied: and the bitstream reader is going off the end of allocated memory
06:35 Lynne: vlVaSyncSurface has started segfaulting for me for about a week or two now, I'll take this as a sign to finish my vulkan work instead
08:18 jani: tzimmermann: many thanks!
08:38 MrCooper: tzimmermann: did something go wrong with the drm-misc-next pull request? It seems to touch files all over the tree, PR e-mail is almost 1 MB
08:49 javierm: MrCooper: maybe due all the legacy dri1 drivers removal?
08:49 MrCooper: don't think so
08:49 MrCooper: all over the tree as in outside of the usual DRM paths
08:50 MrCooper: e.g. net/
09:07 jannau: MrCooper: 6f84981772535 merges drm/drm-next based on v6.2-rc2 while drm-misc-fixes was still on v6.1-rc6
09:08 jannau: I guess the diffstat is from drm-misc-next-2023-01-19 to drm-misc-next-2023-01-24 which includes the v6.1-rc6 to v6.2-rc2 changes
09:11 javierm: jannau: yeah, but the diffstat of commit b8f55f24bc82 ("Merge tag 'drm-misc-next-2023-01-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-next") only shows change in drivers/drm/
09:12 javierm: MrCooper: ^
09:13 MrCooper: so the PR e-mail is just misleading?
09:18 jannau: javierm: I assume this is for drm-misc-next-2023-01-24 (I don't see mail for that on https://lore.kernel.org/dri-devel/). the pull request email for drm-misc-next-2023-01-19 looks as expected: https://lore.kernel.org/dri-devel/Y8kDk5YX7Yz3eRhM@linux-uq9g/
09:19 javierm: jannau: yeah, I also didn't find an email for drm-misc-next-2023-01-24 but the diffstat of that merge also looks correct
09:20 javierm: so I think is what you said, the tag being merged using a different base
09:25 MrCooper: I'm inclined not to approve this for the dri-devel mailing list
09:30 jannau: that happens regularly when there's a merge see https://lore.kernel.org/dri-devel/YrQeAAVvQ6jxu2dl@linux-uq9g/ for example
09:33 jannau: it it is kind of useless, drm-misc-next-X..drm-misc-next-Y ^drm-next-Z (with drm-next-Z merged between drm-misc-next-X and drm-misc-next-Y) would be much more useful
09:33 javierm: one way to avoid that would be to back merge whatever tag is based drm-next before sending the drm-misc-next PR
09:33 javierm: but unsure if that would be more useful since git merge would do the right thing anyways
09:36 MrCooper: that was over half a year ago, and sparked a similar discussion back then; I'd say avoiding the confusion would be worth it
10:15 frieder: Are there any DRM bridge driver maintainers around?
10:15 frieder: If yes, could someone Ack or even pick up this patchset: https://patchwork.kernel.org/project/dri-devel/patch/20221212182923.29155-4-jagan@amarulasolutions.com/?
10:16 frieder: Or rather: https://patchwork.kernel.org/project/dri-devel/cover/20221212182923.29155-1-jagan@amarulasolutions.com/
10:20 daniels: pinchartl: ^
10:39 danvet: jagan should probably get commit rights too ...?
10:46 emersion: zzag: fyi https://gitlab.freedesktop.org/drm/amd/-/issues/2365
10:56 emersion: hm, apparently this is a more general issue
11:34 zamundaaa[m]: Yeah we got a bug report about such an issue relatively soon after changing KWin to set max bpc to 16. I changed it to use max bpc = buffer bpc (so 10 in most cases), haven't had any complaints about that yet
11:43 MrCooper: zamundaaa: there's no direct connection between those two things
11:44 MrCooper: the final output of the colour pipeline of current GPUs has at least 12 bpc AFAIK, regardless of the framebuffer format
11:45 zamundaaa[m]: This is about port bandwidth limits though, not GPU bandwidth
11:47 MrCooper: yeah, so what kwin does will artificially cap the colour accuracy on the link in some cases
11:51 emersion: MrCooper: is there any point in 16bps on the link, if the source buffer is 8bpc?
11:51 emersion: bpc*
11:52 MrCooper: it's like asking "is there a point in having apples, when there are oranges?" ;)
11:52 emersion: hm, i'm not following... i know these are unrelated, but it's a pipeline
11:53 MrCooper: bpc at the end of the pipeline has nothing to do with bpc of the source buffer
11:53 emersion: if you send bad quality data into the pipe, it's no big deal if the pipe is not high quality either?
11:53 MrCooper: even 8 bpp pseudocolour could result in 16 bpc at the end of the pipeline
11:54 emersion: a bit like: my network adapter is 100Mbps, no need for a CAT6 Ethernet cable
11:55 emersion: right, but the pipeline can't fix missing information in the source
11:55 MrCooper: it can lose information though
11:55 emersion: so your point is: if source buffer is 8bpc, and the link is 8bpc, then there can be lost info?
11:56 MrCooper: it can result in different input values resulting in the same output values, depending on the transformations in between
11:56 zamundaaa[m]: <MrCooper> "yeah, so what kwin does will..." <- I mean, yes, but there's not really anything better that can be done when compositors don't get any information about the restrictions on connectors
11:56 MrCooper: zamundaaa: it's a driver issue, needs to be addressed there
11:56 emersion: this seems very theoretical to me
11:57 daniels: right, but in the absence of transformations it doesn't help; there is a benefit in choosing lower bpc if it lets you choose a lower link rate, because then you can run your clocks slower, or use fewer lanes which might free up more for e.g. USB
11:57 emersion: yes, it's a driver issue, but my users an angry and the driver issue is hard to fix
11:57 emersion: are*
11:58 MrCooper: daniels: the driver is supposed to lower bpc automatically as needed
11:58 emersion: yes, but no driver does this today, because it's Difficult
11:58 emersion: for DP-MST
11:59 MrCooper: the property is "max bpc", not "effective bpc"
11:59 daniels: but 'I can lower my clock rate' or 'I can free up some lanes for less bad USB' aren't 'needed', they're just nice-to-haves
11:59 emersion: i think i'll go with the same fix as kwin for now
11:59 zamundaaa[m]: daniels: if the output doesn't work without it, it's absolutely needed and not just nice to have
12:00 emersion: 4k@30Hz is not "doesn't work", but not too far :P
12:00 daniels: zamundaaa[m]: yes
12:01 emersion: what daniels is saying is that even if 16bpc or 10bpc works, there might be use-cases where it's better to pick a lower one
12:01 emersion: for power savings, for instance
12:01 emersion: afaiu
12:02 zamundaaa[m]: That makes sense
12:02 daniels: yeah, or because you're doing alt-mode over type-C, and if you don't use all the lanes then you can get more USB bandwidth
12:02 emersion: right
12:03 zamundaaa[m]: So ideally compositors should also read the EDID and check what bit depth the display actually does, and limit it to that. Or does the kernel already do that?
12:03 emersion: i'm not sure that would help
12:04 emersion: for me the tl;dr is that compositors really need a knob to select between quality and power savings/room for other usage
12:04 emersion: so it sounds reasonable to me to tie max bpc to "is high bit depth on or off"
12:05 MrCooper: zamundaaa: it's not that simple either, due to things like temporal dithering or DSC
13:17 Ristovski: It is possible to measure the effective input-to-change latency of a game for example, by mounting a photodiode on the screen and then using some external hw to fire an event (preferably something that changes the brightness of the game, like some menu, instantly) and measuring the time it takes for the brightness to change
13:17 Ristovski: Would it be also possible if one simply cared about the pure software side?
13:20 Ristovski: GLX/EGL_EXT_buffer_age is not quite what would be useful. I assume it would require getting the pixel contents and a timestamp of frame (but "higher up" in the chain, since then you are at the mercy of the compositor etc)
13:22 Ristovski: Imagine a simple app that changes from black to white when you send it any input. In order to detect the change, ideally you would be able to "probe" various points in the pipeline (so from the backbuffer all the way to screen output) and do some naive pixel test on the contents. Is there something like that in mesa/drm that could help?
13:29 zamundaaa[m]: You can definitely measure the output latency of your compositor with an app by using presentation-time, no buffer age or anything like that required
13:36 MrCooper: emersion zamundaaa: if link bpc is capped to framebuffer bpc, any non-identity gamma transformation cannot be represented on the link directly and losslessly
13:40 emersion: yes, but better have this bug than being capped at 30Hz
13:45 karolherbst: anyone mind if I bump the meson req in CI? https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19778 if not, I'm probably merge in a few hours
14:35 alyssa: I totally know how to use nir_shader_compiler_options
14:35 alyssa: don't worry i'm a professional
14:35 zmike: same
14:35 alyssa: what's the reference NIR compiler these days
14:36 alyssa: NAK is rust, ACO is C++, Intel's is nightmare fodder
14:36 alyssa: who does that leave as the winner
14:36 alyssa: ir3?
14:36 zmike: "compiler" = ?
14:37 alyssa: real hardware
14:37 zmike: you mean backend
14:37 alyssa: yes
14:37 alyssa: i totally know what that word means
14:37 alyssa: i'm a professsional remember
14:37 zmike: I do remember
14:49 cwabbott: ir3 is weird because it started life making a lot of "my first compiler" mistakes but has been almost completely rewritten by various people over time
14:49 cwabbott: (no disrespect meant, everyone's gotta start somewhere)
14:50 cwabbott: there are definitely still a few remnants of the "my first compiler" stage leftover
14:54 alyssa: yeah, that's fair
14:55 alyssa: DAG-based IR?
14:58 cwabbott: no, it wasn't DAG-based
14:59 alyssa: * Adreno's comparisons produce a 1 for true and 0 for false, in either 16 or
14:59 alyssa: * 32-bit registers. We use NIR's 1-bit integers to represent bools, and
14:59 alyssa: * trust that we will only see and/or/xor on those 1-bit values, so we can
14:59 alyssa: * safely store NIR i1s in a 32-bit reg while always containing either a 1 or
15:00 alyssa: .............uh
15:00 alyssa: inot?
15:00 alyssa: (-:
15:01 cwabbott: that's what it used to do, but inserting inot and then removing it later is harder than just not inserting it
15:01 alyssa: weee
15:02 cwabbott: it's basically lowering bools to 32-bit in instruction selection, but with a 0/1 representation instead of the default 0/~0
15:02 alyssa: yeah, I see that
15:02 alyssa: comment is just for wrong inot, I know because I had that bug in Bifrost
15:02 cwabbott: oh, right
15:03 alyssa: (inot is handled correctly by a special case in ir3)
15:03 cwabbott: anyway, the biggest problem is that in the beginning it tried to map operands in the IR as closely to the assembly as possible
15:05 cwabbott: sources and destinations had the same ir3_register type, implicit operands weren't inserted at all, and read-modify-write type things reused the same operands as source and destination
15:05 alyssa: oh, interesting, yeah
15:05 alyssa:is guilty of some of the same
15:07 cwabbott: and as a result there was a wrapper to iterate over "real" sources
15:07 alyssa: ah, yeah, I remember seeing you clean that up with the SSA stuff
15:07 cwabbott: also, no support for multiple destinations (ughhhh)
15:08 alyssa: this is an annoying reminder I never finished either of my SSA RAs, I should probably do that at some point
16:16 jekstrand: jenatali, daniels: So when are we adding XBox to CI? 😅
16:17 jenatali: Good question
16:17 jenatali: Sounds like the FNA folks might've had a plan for it?
16:17 daniels: reminds me, I should check if it's actually possible to buy them in the UK yet
16:17 jenatali: Probably post-merge CI though
16:17 daniels: although my console being old is, it turns out, not the limiting factor in my CoD performance
16:18 jekstrand:didn't expect that to have a serious answer.
16:19 daniels: I do my best to meet your limited expectations
17:38 eric_engestrom: jekstrand: there was already a discussion about that on the initial MR (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19022#note_1585646)
17:38 eric_engestrom: the open question was on the license for using GDKX in a CI
17:43 jekstrand: eric_engestrom: Neat!
18:06 enunes: daniels: DavidHeidelberg[m]: looks like download speed is particularly bad today, it doesn't seem to be something bad on my runner as in a quick test I get my full connection download speed, I wonder if theres anything to do (other than cache proxy)?
18:14 DavidHeidelberg[m]: enunes: for traces we have the etag checking (so caching traces on NFS), but that's only idea which I have right now
18:19 enunes: the jobs on my runners don't use the traces, what is really slow are kernel/rootfs which today are hitting <1MB/s
19:05 alyssa: I'm ok with xbox in post-merge but not in pre-merge
19:09 DavidHeidelberg[m]: robclark: Hey! From the latest report it seems that perf for traces dropped around 3-18% https://gitlab.freedesktop.org/mesa/mesa/-/issues/7144#note_1739532
19:12 robclark: DavidHeidelberg[m]: if it isn't something unrelated like kernel uprev, it could be https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20708
19:13 DavidHeidelberg[m]: robclark: Emma mentioned perffeto enablement
19:14 robclark: it's possible that has some overhead too.. but also don't really care ;-)
19:15 robclark: I guess you should have a list of candidate MRs based on which day?
19:16 robclark: anyways, conservative lrz can be disabled via driconf (and therefore env var) so that would be an easy way to rule out it being related
19:16 anholt: the commit range didn't cover lrz.
19:16 anholt: unfortunately the stats thing doesn't decide where the regression happened, so you don't get a list of candidate mrs
19:17 anholt: you have to click the link and zoom in on the graph and log commit messages where you see the regression happen.
19:17 robclark: ok, then not conservative-lrz.. if the range did include perfetto enable, that seems plausible.. but also not a thing to care about since it is apple vs orange thing
19:18 robclark:amused that android play protect thinks android cts is a sketchy application that it should protect me from
19:19 anholt: robclark: crashes so many devices, must be some sort of attack!
19:19 robclark: yup, lol
19:23 Venemo: jekstrand, alyssa can I merge 15838 or do you have further comments or suggestions about it? also, can you guys please give an answer to dschuermann_ 's RFC here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838#note_1738359
19:48 alyssa: deferring to dschuermann
19:48 alyssa: if he's ok with it I probably am too
19:52 anholt: ajax_: started checking how hard it would be to run deqp-vk.wsi in ci, to catch that regression. despite being a build that I think should include the tests, just enabling xvfb is not getting tests run from what I can see.
19:54 airlied: would wsi not need dri3?
19:54 airlied: i suppose lvp might work in its putimage path
19:54 anholt: that's what I was thinking
19:55 daniels: weston-headless will give you dri3
19:55 daniels: at least as long as you pass --xwayland
19:55 anholt: looks like locally I can pass 76, fail 1 with lvp against xwayland with -j 1.
19:56 jekstrand: Venemo: Done both
19:56 anholt: weston on ci says command not found. yay for the stupid separate vk container :(
19:58 anholt: sticking an xvfb-run in locally gets the same results. and xvfb is eating some cpu, so I think the testing may even be valid
19:58 daniels: anholt: f
19:58 daniels: anholt: I've got somewhere in my scribbled to-do notes to check if having separate containers actually still makes sense or is just enormous annoyance for marginal gains at this point
19:59 anholt: daniels: scribble another note in there I came up with today: x11_glx and x11_egl_glx deqp targets are I think exactly the same.
20:00 anholt: you do need to be using default or x11_*_glx targets in order for vulkan to get x11 wsi. and we probably want to be using default so we can do both wayland and x11 in one build for vk. but then my question is what impact that has on winsys testing for EGL and GL/GLES.
20:01 anholt: then I imagine trying to trace through the layer cake of GL platform init in deqp, and then well what do you know it's lunchtime already.
20:03 daniels: yeah, I've just read through that and wow I cannot tell you how much it's after 8pm
20:10 zmike: anyone know how to do a 32bit build with cmake
20:11 zmike: nevermind I think I remembered
20:11 daniels: I'm sorry to hear that
20:12 zmike: me too
20:13 Ristovski: thanks for ruining my otherwise perfect night by mentioning c*ake
20:16 ayaka_: https://lore.kernel.org/dri-devel/B7EA66D1-1454-4612-BA68-59D4875506AD@synaptics.com/#r
20:16 ayaka_: any reminding problem about this MR? I didn't see any future review that I need to complete
20:47 airlied: zmike: I think I shove -m32 in the cflags/cxxflags for vk-gl-cts
20:47 airlied: -m32 -mfpmath=sse -msse
20:48 zmike: yea something like that indeed
20:48 airlied: it's in the vulkan cts docs
20:48 zmike: hm
20:48 zmike: shame I'm not building cts then, but will try to remember for next time
20:48 airlied: external/vulkancts/README.md
20:48 airlied: some things like llvm have an option
20:49 airlied: but yeah if you are building anything outside llvm/cts with cmake, then all I have are thoughts and prayers
20:49 zmike: I prayed in just the right way to make it happen, but I'm not happy about it
23:08 Lynne: airlied: figured out what was not aligned?