IRC Logs of #dri-devel on irc.freenode.net for 2023-01-20

00:36 alyssa: HdkR: Boo :p
00:37 HdkR: :)
01:04 mareko: zmike: I have a system using zink that runs Unigine Heaven and the window R and B channels are swapped, have you seen this before?
01:06 zmike: mareko: sounds like maybe the driconf override for dual src blend channel swapping isn't active?
01:06 zmike: although I think that maybe makes things black and not swapped 🤔
01:17 mareko: zmike: can it be caused by a vulkan driver not supporting the expected channel order?
01:18 zmike: wouldn't that just mean it doesn't support the right formats?
01:19 zmike: I'd expect some kind of error to be printed
01:23 zmike: mareko: I'd suggest trying validation on it (ZINK_DEBUG=validation if it's installed)
01:23 zmike: will probably give some sort of hint
01:31 alyssa: zmike: i've been thinking about how i can improve the world
01:32 alyssa: and i think i figured out what the world really, really needs
01:32 alyssa: a blog post about descriptor sets and a new pipe CAP
01:32 zmike:dies
02:26 mareko: the world needs more trains
02:32 alyssa: agreed
02:33 alyssa: especially high speed rail
02:33 alyssa: but also rail in general
03:31 Lynne: airlied: tested, your anv patchset works on skylake
03:36 Lynne: 3% FASTER THAN VAAPI!1!
03:38 Lynne: 7% for 4k
03:43 airlied: \o/
03:45 Lynne: 3% for both a typical yarr 1080p release and 1440p
03:45 Lynne: and it could be slightly faster still if the sps/pps param buffer wasn't recreated per-frame in the drivers
04:04 airlied: Lynne: actually on anv I don't think there is really anything I can see to shave off that would help there
04:05 airlied: the one cpu saver I dropped for now was prefollowing all the reference slot pNexts into an on stack array
04:05 airlied: that might be worth a little
04:07 Lynne: oh, intel needs the parameters sent before each frame?
04:09 airlied: yes, it's stateless
04:10 Lynne: fair enough, wonder where we get the extra percent
04:13 airlied: Lynne: you testing against media-driver or intel-vaapi-driver?
04:16 Lynne: iHD_drv_video.so, so the not-i965 one?
04:16 airlied: ah yeah that's media-driver
04:18 Lynne: i965's performance is basically on-par with vulkan though :/
04:19 Lynne: (so not only is media-driver very incomplete, it's also slower, what were intel thinking?)
07:01 Lynne: airlied: VK_QUERY_TYPE_RESULT_STATUS_ONLY_KHR doesn't work on anv btw
07:01 Lynne: even after forcing it, queries always return VK_NOT_READY
07:31 airlied: Lynne: okay I should recheck the code I just wrote, I ran CTS against it, but I didn't dig in
07:39 Lynne: queryResultStatusSupport == 0
07:40 Lynne: so the cts probably didn't pick it up
08:32 bbrezillon: airlied: I'll have a look. Thanks for the heads-up!
08:54 bbrezillon: jekstrand: sure, I get the point of having a single resv for the whole VM instead of adding job fences to all BOs, just didn't see what it was synchronizing. I mean, if user is expected to pass all fences explicitly to the exec ioctl, it should also make sure access to internal BOs is properly serialized. Now, if the shrinker kicks in and wants to reclaim memory, we probably want
08:54 bbrezillon: to wait for in-flight jobs to land before reclaiming mem, which I guess is what danvet was referring to.
09:52 Lynne: airlied: could you look at !20805? it touches some code radv dec is using
09:53 javierm: tzimmermann: I tried to figure out yesterday the issue with the ssd130x driver but still couldn't
09:53 javierm: tzimmermann: while looking at that though I noticed a problem with the DRM fbdev emulation layer
09:53 tzimmermann: ok?
09:53 javierm: tzimmermann: drm_fb_helper_generic_probe() sets fbi->fbdefio = &drm_fbdev_defio if shadow fb is used
09:54 javierm: tzimmermann: but that means that several DRM drivers will share the same struct fb_deferred_io
09:54 tzimmermann: ok. is that a problem?
09:55 javierm: tzimmermann: it is, because drivers/video/fbdev/core/fb_defio.c then will use the same mutex fbdefio->lock for all the drivers
09:55 tzimmermann: ah! there's the pagelist
09:56 javierm: which means that for example if the driver is removed and fb_deferred_io_cleanup() destroys the mutex, the other drivers won't be able to use it anymore
09:56 tzimmermann: yeah, i guess we should make this per-device then
09:56 javierm: tzimmermann: correct
09:56 tzimmermann: that's a good observation.
09:56 javierm: tzimmermann: I'll write a patch when have some time, probably later today
09:56 tzimmermann: i have a number of fbdev patches for next week. i can fix it afterwards
09:57 tzimmermann: well, if you want to type up seomthing, thats also ok
09:57 tzimmermann: your choice
09:57 javierm: tzimmermann: I also have a couple of cleanups / trivial fixes that also noticed while looking the code. Will post it later too
09:57 javierm: tzimmermann: yeah, it should be trivial. I'm just working on something else today
09:58 tzimmermann: maybe simply put a defio structure into drm_fb_helper for now
10:00 javierm: tzimmermann: hmm, that could work too. And set it to that if !fbi->fbdefio
10:01 javierm: it will be wasted memory if the driver sets its own deferred I/O handler or if not used, but meh
10:01 tzimmermann: javierm, in a meeting now. sorry
10:01 javierm: tzimmermann: no worries. Later!
11:27 MrCooper: jenatali: so the d3d12 driver isn't used for WSL, since you say it can ignore implicit sync?
11:30 tzimmermann: javierm, don't worry about the memory impact for now. i have more patches to rework fbdev and that will be an opportunity to sort this out
11:38 javierm: tzimmermann: Ok, I've the following: https://paste.centos.org/view/raw/0f3073ed
11:39 javierm: will test later and post
11:42 tzimmermann: javierm, i'd remove drm_fbdev_defio from the code and init the fbdefio field directly. it's just two values after all
11:42 tzimmermann: up to you
11:45 javierm: tzimmermann: right, no need to have that defined as a static variable indeed
11:48 javierm: tzimmermann: https://paste.centos.org/view/raw/d502cd89 then
11:50 tzimmermann: looks good to me. CONFIG_DRM_FBDEV_EMULATION will auto-select FB_DEFERRED_IO
11:50 tzimmermann: so the guard might not be needed
11:51 javierm: tzimmermann: yeah, I noticed and that's why I didn't add a guard in the code but wondered about the struct since is in a public header
11:51 tzimmermann: well, up to you
11:51 javierm: since I guess that drivers could use it and not CONFIG_DRM_FBDEV_EMULATION ?
11:52 javierm: tzimmermann: but I guess in those cases the drivers would define their own fbdefio anyways?
11:52 tzimmermann: javierm, i don't think that would happen
11:53 javierm: tzimmermann: always happy to drop ifdefery if not needed :)
11:53 tzimmermann: drm_fb_helper.h is used by a few drivers
11:53 tzimmermann: they already leave unused fields blank
11:53 javierm: tzimmermann: Ok
11:53 tzimmermann: such as buffer
11:54 tzimmermann: anyway, if you post this, you can add my r-b already
11:54 javierm: tzimmermann: cool, I'll add it. Dropped the guard also
12:28 javierm: tzimmermann: I think that should push patch #1 to drm-misc-next and patches #2 and #3 to drm-misc-fixes ?
12:28 javierm: only patch #3 is really important though
12:28 tzimmermann: javierm, i think so
12:29 javierm: tzimmermann: Ok. I'll do it on Monday then if don't get more feedback
12:30 jenatali: MrCooper: We use WDDM in WSL. Which, now that you mention it, could have app compat problems that I hadn't thought about...
14:49 MrCooper: jenatali: for a specific example, Xwayland draws to FBOs only, and calls glFlush to make sure the Wayland compositor can see the result
14:50 jenatali: MrCooper: Except that we're swrast and therefore XWayland needs to copy the contents to shmem to send it to the Wayland compositor
14:51 MrCooper: no HW acceleration in X apps then?
14:51 jenatali: There is HW accel, just it comes back to RAM before sending to XWayland too
14:52 MrCooper: just inefficient then :P
14:52 jenatali: Eventually I'd like to get an explicit sync protocol in place for removing that readback but it hasn't been a priority for me yet
14:52 jenatali: Yep!
14:53 jenatali: Depending on the workload HW accel can be slower than llvmpipe due to those readbacks, but most are at least marginally if not significantly faster
14:54 jenatali: (and then there's a third round trip for Wayland composed content to come to Windows via remote desktop)
15:34 glynnc: How to get libGL to speak GLX to XWin.exe? Every time I update the client system, this breaks
15:40 jenatali: glynnc: That sounds like a recipe for disaster. Why are you trying to do that?
16:11 tomba: A question about drm-misc-next. I have a few v4l2 patches that my DRM patches depend on. Those have been acked by v4l2 maintainers, and I have a -rc1 based branch for the v4l2 changes (in case it needs to be merged also to the media tree). Is it fine to just do a merge of that v4l2 branch to the drm-misc-next, and then apply the DRM patches on top with dim?
16:12 pinchartl: (Mauro has explicitly acked the patches for merge through the drm tree)
16:49 Venemo: on NIR intrinsics which have a component offset (for example load_input), is this offset meant in units of the dest bit size, or in 32-bit units?
16:53 Venemo: jekstrand, cwabbott maybe? ^^
17:04 jekstrand: Venemo: 32-bit units
17:04 Venemo: jekstrand: good to know, thanks
17:36 jekstrand: dcbaker: Ok, I got my proc macro working with cobbled together syn, etc. branches and your meson branch.
17:37 jekstrand: dcbaker: It meant I had to add includes manually again for bindgen instead of using dependencies (only in meson 1.0.0).
17:37 jekstrand: dcbaker: It's good enough to survive for now.
17:37 jekstrand: dcbaker: We are going to need something more proper when the time comes to actually merge into mesa/main
17:38 jekstrand: dcbaker: If you want a branch that actually uses this stuff to play around with, it's nak/main on my gitlab.
17:39 dcbaker: jekstrand: i was planning to rebase that branch today, but I’m home with a sick kiddo, so it’ll have to wait until next week :/
17:39 dcbaker: Glad you got at least a little unblocked
17:39 jekstrand: dcbaker: That's ok. It'll let me drop one hack patch.
17:39 jekstrand: dcbaker: Yeah, I'm unblocked enough that I can go write more Rust code :)
17:39 dcbaker: Lol. Nice
17:40 jekstrand: I'm not planning to pull in any other crates besides the 4 that are needed for proc macros.
17:40 jekstrand: So as long as we get something sorted out in the next couple months, I'm not too worried.
17:40 jekstrand: Worst case, we can fall back to something like what gstreamer does temporarily.
17:41 jekstrand: Or we could keep those wrap repos on GitHub. (I don't really like that option.)
17:55 airlied: Lynne: yeah those lifetimes in the vulkan driver are in the apps hands
17:56 Lynne: cool
17:57 Lynne: the wait gets skipped?
17:57 airlied: you wait on semaphores before destroying resources
17:59 Lynne: we already do that, but we do that from multiple threads, and the patch only has a single semaphore
18:04 alyssa: jekstrand: Git...Hub?
18:05 HdkR: It's like grubhub, for ordering git repos to be delivered to your door
18:06 alyssa: Oh, ok
18:23 jenatali: I don't suppose anybody already has a nir pass for sinking load_vulkan_descriptor to avoid bcsel/phi on the result?
18:23 jenatali: Apparently DXIL doesn't like any ops on resource handles, so I'd have to convert bcsel to if/else for operating on different resources...
18:24 Venemo: jenatali: you could add this as an option to nir_opt_sink
18:24 jenatali: Lemme take a peek
18:25 jenatali: Venemo: Despite having "sink" in the name, it seems like a pretty different set of ops compared to what the pass is doing
18:26 Venemo: what do you mean?
18:27 jenatali: It's more than just moving an instruction from one place to another, it's changing logic
18:27 Venemo: what logic does it change?
18:28 jenatali: I.e. converting res_a = a; res_b = b; res = test ? res_a : res_b; use_res(res); into if (test) { use_res(res_a); } else { use_res(res_b); }
18:28 jenatali: ^^ is what I need to do, not what opt_sink currently does. opt_sink preserves block indices so it definitely can't rewrite bcsel to if/else
18:29 Venemo: opt_sink does not create new CF, it just sinks instructions
18:30 jenatali: Right. So what I need seems like it should be a different pass :)
18:31 Venemo: so basically you want to convert bcsel to if-else, and then sink
18:32 alyssa: you're going to want a vendor pass for that, I expect.
18:32 alyssa: My gut feeling is that VK capable hw should not need tat
18:32 alyssa: that
18:32 jenatali: Any control flow or conditions that are done on the result of load_vulkan_descriptor, instead need to be done on the inputs to it
18:33 jenatali: alyssa: Yeah, if I wanted to rewrite significant portions of our binding model and limit the set of drivers we can layer on, I could avoid the need for this pass
18:33 pepp: jekstrand: could you take a quick look at the first commit of !20728? (it's a fix for dma-buf.h)
18:37 alyssa: jenatali: OOI what hw needs that
18:37 jenatali: None that I know of. It's just a stupid limitation of DXIL
18:38 alyssa: can you fix dxil?
18:38 jenatali: I've been working on it :P
18:39 alyssa: :-D
18:39 jenatali: High level, in the original D3D binding model, you declare arrays of resources in the shader. You can index between those resources as long as they're in the same array. If they're not, you have to if/else them
18:39 alyssa: heaps?
18:40 jenatali: One of the new shader models added (6.6) added a new thing where you don't declare resources, you just pass an index into a descriptor heap/pool and describe how you want to interpret the descriptor. With that, you wouldn't need to do the if/else
18:41 jenatali: Either way, there's still a DXIL instruction that returns a "handle" which is pretty identical to load_vulkan_descriptor, except that DXIL says the only thing you can do with that handle is pass it to a resource access intrinsic like load/store
18:41 jenatali: So any phi/bcsel stuff needs to be done on the inputs to the handle create intrinsic instead of the result
18:42 jenatali: This is I think the last thing I need for VK1.1
18:44 jenatali: It's a simple enough transformation, was just wondering if someone else had already done it 🤷
19:14 jekstrand: pepp: done
20:16 DavidHeidelberg[m]: robclark: anholt does proxy works well for a630? I just tried twice re-run a630-traces and both times "Downloading file SirYouAreBeingHunted/sir-f750-v2.trace took 32s.", 144M from cache should go faster I think, right?
20:16 DavidHeidelberg[m]: Over 50% utilized gigabit network it should be max. 5 seconds imho
21:04 robclark: DavidHeidelberg[m]: not sure if it is gigabit (or even if the poor little NUC w/ usb-eth adapter could saturate it if it was)
21:05 DavidHeidelberg[m]: well, usb3 could still get around 600-700Mbit; usb2 400Mbit and even in this scenario < 5s
21:06 robclark: it *is* nfsroot, so you are downloading it to rootfs, writing it back to the nuc ;-)
21:06 DavidHeidelberg[m]: ok, usb2 ~ more like 300Mbit, but still < 10 s for sure
21:07 robclark: so, like downloading things isn't the best plan
21:07 robclark: you are sending it multiple times over the network
21:07 DavidHeidelberg[m]: right, ignore me :P
21:07 DavidHeidelberg[m]: I forgot the nfs part
21:07 robclark: and the nuc is serving all the a630's over a single usb-eth
21:07 robclark: heheh
21:08 DavidHeidelberg[m]: sometimes I like to live in my imaginations in ideal world :)
21:11 DavidHeidelberg[m]: ok, we need the md5 piglit MR so we don't have to remove them anymore from NFS (so no new download) and just check the MD5.
21:12 DavidHeidelberg[m]: No better option I guess.
21:27 graphitemaster: What possible undefined behavior am I hitting here. There's a bunch of random artifacts caused on various GPUs and drivers with this simplex noise. Strange out of place pixels that I cannot explain. The right is much worse, but the left side also has some artifacts https://www.shadertoy.com/view/mlXSz8
21:27 graphitemaster: I have a reduced test case here https://gist.github.com/graphitemaster/07f91e483de9b4458ca08f83ded64229
21:27 graphitemaster: Not sure if it's a compiler bug or the code is just faulty.
21:28 graphitemaster: The inconsistencies are problematic though and don't seem to make much sense.
21:40 robclark: DavidHeidelberg[m]: yeah, having them already exist in the nfsroot would help
22:02 DavidHeidelberg[m]: Plan to update mold from 1.9.0 to 1.10.0, please scream if you have any problems mesa X 1.10.0 :) (it should boost 10% linking performance up) https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20819
22:03 DavidHeidelberg[m]:don't want to say Mold is magical, since around 10% was lost with 1.9.0 version, so this is more like getting back to previous perf)
22:08 DavidHeidelberg[m]: Rant: two magical commit names in the Mold project. "Simplify" and "Refactor"
22:12 psykose: moldy commit messages
22:20 jenatali: Is the Vulkan spec lying to me? It says that clearing a sscaled/uscaled format is supposed to use floats, but the CTS very clearly does it using ints
22:28 jekstrand: jenatali: Uh... That sounds like quite the edge case.
22:28 jekstrand: Why are you allowing them to create images with uscaled and sscaled formats to begin with?
22:28 jenatali: Because maintenance1 says I have to
22:28 jekstrand: No it doesn't
22:28 jenatali: Oh, wait, no you're totally right this was a bug somewhere else
22:29 jenatali: We have to support them for vertex elements and I think that was tripping up image support
22:29 jekstrand: jenatali: Incidentally, this just came up last week:
22:29 jekstrand: https://github.com/KhronosGroup/Vulkan-Docs/issues/1223#issuecomment-1379078493
22:33 jenatali: Yeah that looks better. Thanks for making me double-check what was going on
22:34 jekstrand: :)
23:22 jenatali: And it looks like that's most of my VK1.1 fails, huzzah