IRC Logs of #dri-devel on irc.freenode.net for 2024-02-10

00:33 KungFuJesus: this is with the current verison of rbdoom-3-bfg, btw, I was hoping that I could eek out a playable framerate on the pi5 despite the posted 8fps benchmarks early on
00:33 KungFuJesus: but since nvhri support had been added, it seems to want a ton of vulkan extensions that it may or may not be making use of currently
00:34 KungFuJesus: one of which is the compressedBC textures, others are tessellation shaders and that dualsourceblend feature
00:34 KungFuJesus: not requesting that makes the fps HUD not show up but it gets in menu, haven't managed to get in game though
00:36 KungFuJesus: also a bunch of ray tracing stuff it seems to want, but again, maybe I can patch that out. I think part of the problem is the dev really wants to try to introduce whiz bang render features into the game, which is cool, but probably not suited for a meager v3d gpu
02:00 DemiMarie: Company: how big of a gap do you think there is between what LLVMpipe can do right now, and what a modern multi-core CPU with its wide SIMD units is actually capable of?
02:04 DemiMarie: Intuitively, I expect that one of two will hold:
02:04 DemiMarie: 1. You are limited by CPU memory bandwidth, in which case the amount of power consumed will be high enough that even with a GPU users will complain about battery life on mobile.
02:06 DemiMarie: 2. You are limited by SIMD throughput, which is the case where a GPU would really help.
02:06 DemiMarie: Is this a reasonable assumption?
02:07 DemiMarie: This assumes that GTK wants to be nice with other applications and the user’s battery, which I presume is a decent assumption.
02:08 DemiMarie: Company: do you know of anybody in the thin client case who might be willing to sponsor some devs?
02:09 DemiMarie: (without either me or my company being involved)
02:40 Company: DemiMarie: llvmpipe will use all the cores you have if you have the right shaders
02:41 Company: like, I have a 8 core / 16 thread AMD and they're all pegged
02:46 Company: as for the limit, yes, you kinda have that correct - the bottleneck is (depending on use case and hardware involved) either memory bandwidth between cpu/gpu (which is an effect on discrete GPUs more than on llvmpipe of course), it's SIMD instructions or it's memory bandwidth if the shaders are simple
02:47 Company: GPUs in the last 10 years got a lot faster in instructions, so if you write a shader that a modern desktop GPU can easily handle, a 10 year old Intel integrated GPU will bottleneck on it
02:48 Company: another issue is that there's the code on the CPU that generates the instructions for the GPU
02:48 Company: GPUs have gotten a lot faster relative to CPUs, so on old hardware generating the instructions is pretty much never the bottleneck for GTK
02:49 Company: and on modern discrete GPUs, generating the instructions is almost always the bottleneck
02:50 Company: that's why my rpi4 gets 30fps with like 30% CPU usage and a pegged GPU and my desktop gets 2500fps with 100% CPU and like 30-50% GPU
03:00 DemiMarie: Company: 2500fps is a trivial case IMO.
03:01 Company: what do you mean?
03:02 DemiMarie: It means your system is running an easy enough workload that it can go that fast
03:02 Company: it's the same workload
03:02 Company: my Radeon 6500 is just more than 100x faster than the rpi gpu
03:03 DemiMarie: “trivial” might not be the right word; what I meant is that your workload is very easy for your system to run, so you will bottleneck on things that would not be bottlenecks at realistic frame rates.
03:03 DemiMarie: or at least may bottleneck
03:04 Company: the point is that a different less powerful system will bottleneck on the same workload
03:04 Company: because gpus easily differ by a factor of 100x between systems
03:04 Company: while cpus are usually less than 10x
03:05 DemiMarie: So what I am wondering is how much llvmpipe and lavapipe are leaving on the floor, compared to how well they could be doing
03:07 Company: I don't know - I think there's quite a bit you can gain if you wanted to - but why would you optimize it for a 64 cpu system if such a system always has a gpu that runs circles around it
03:07 DemiMarie: As people here probably already know by now, I would much rather GPUs be designed for preemptive multitasking and process isolation, just like CPUs are. Sadly, they are not.
03:07 DemiMarie: Company: It only has a GPU if it is a client system.
03:07 Company: it's not that anyone runs llvmpipe on the Oak Ridge supercomputer
03:07 DemiMarie: Those systems the thin clients connect to probably have no GPUs
03:07 Company: yeah, but those thin clients also don't have much server CPU
03:07 DemiMarie: Because GPUs and cloud environments do not go well together at all.
03:08 Company: the thin client is a thin client because it's cheap, not because it wants to use a datacenter to render a display
03:08 DemiMarie: What do you mean by “there’s quite a bit you can gain if you wanted to”? Are you referring to scaling to large numbers of CPUs, or using a given number of CPUs more efficiently?
03:11 DemiMarie: Company: so the VDI world is actually a bit more complex than that.
03:13 Company: I mean making llvmpipe and GTK generate code that's better suited for CPU execution
03:13 DemiMarie: Are we talking a factor of 1.5, a factor of 10, or somewhere in between?
03:16 Company: depends on what you mean - llvmpipe (where I have no idea), GTK's old renderer, GTK's new renderer, GTK's drawing in general, or a combination
03:17 DemiMarie: Something that is not going to be abandoned
03:17 DemiMarie: So the new renderer and drawing in general
03:18 Company: the new renderer is about 5x slower on llvmpipe than the old one atm
03:18 DemiMarie: how much of that could be recovered?
03:18 Company: everything probably
03:18 DemiMarie: how much work would be involved?
03:19 Company: no idea, I didn't try yet
03:19 DemiMarie: if you could try that would be great
03:19 Company: like, it needs an optimization stage that makes it not do AA when everything is integers
03:19 alyssa: Company: OOI, why Vulkan if you're never CPU bottlenecked when running at 144fps?
03:20 DemiMarie: Company: that would help GPU perf too, right?
03:20 Company: alyssa: because Vulkan is better with the weird things that GTK does, like dmabuf import/export and Wayland integration
03:20 alyssa: ah, sure, yeah
03:20 DemiMarie: Company: there are many old GPUs that will _never_ get support for Vulkan, so using Vulkan means those fall back to lavapipe
03:21 Company: alyssa: and because I learned Vulkan before GL and I like Vulkan more
03:21 DemiMarie: and notably Asahi doesn’t have Vulkan support yet IIUC
03:21 alyssa: no argument there
03:22 Company: alyssa: there's also the question about using compute for various things, though with GL extensions you can do that with GL, too - but it's almost always easier for me to write the code in Vulkan first and then add the GL codepaths later
03:22 alyssa: ESO-only VK seems legitimately nice to work with tbf
03:22 Company: ESO?
03:22 alyssa: EXT_shader_object
03:23 alyssa: no pipelines, no render passes
03:23 Company: I haven't looked at that at all yet
03:23 Company: I started this renderer's basics in 2017 when that wasn't a thing and then went from there
03:25 Company: fwiw, Vulkan is noticeably faster than GL with the new renderer - like the 2500 fps are Vulkan, new GL gets around 1600 (or 1800? don't remember)
03:25 alyssa: Very curious
03:25 alyssa: on AMD?
03:25 Company: I suspect it's some driver overhead because we're CPU bottlenecked
03:26 alyssa: (radeonsi vs radv?)
03:26 Company: but it could also be that Im doing things wrong
03:26 Company: yeah, 6500XT
03:26 alyssa: Fascinating
03:26 Company: but it's faster on my Intel Tigerlake, too
03:26 alyssa: yeah it's just that AMD is the reference hardware ;P
03:26 Company: 950 vs 650 or something
03:28 Company: my Tigerlake is the laptop where CPU and GPU are both near 100% for this dumb benchmark I use
03:28 Company: one thing that both AMD and Intel get "wrong" sometimes is that for some shaders, GL wins
03:29 Company: I suspect SPIRV uses different optimizations than GLSL and I end up with worse gpu instructions
03:30 Company: what makes me think that is that I turned off optimizations in glslc and things got noticably faster
03:30 Company: but I didn't look any further into it when that happened
03:31 Company: DemiMarie: the thing with Vulkan is that we can make it do complicated things fast for the best GPUs out there - and that can be a lot of fun
03:32 Company: DemiMarie: and example would be vector rendering openstreetmap and get smooth 120fps zoom in/out
03:32 Company: can still fall back to texture tiles for slower hardware
03:35 Company: there's lots of stupid ideas, like zoom-out/zoom-in in text documents instead of scrolling
03:35 Company: but if you zoom out a 500MB document, that's a lot of text
03:36 Company: it's just that nobody ever tries that, because toolkit renderers are crap
03:39 Company: EXT_shader_object is only supported by lavapipe so far - so nothing I need to look at for a while
03:39 alyssa: radv just landed support
03:39 alyssa: there are open MRs for nvk and turnip
03:40 Company: I was about to say, if nvk can't even do it I'll lose faith
03:41 Company: I guess it's not a big benefit if you need fallback code with renderpasses for everything else anyway
03:45 alyssa: Company: nvk can do anything if you have Faith
06:16 DemiMarie: Company: I would expect zooming a huge document to re-render the text as needed.
06:29 Company: yeah
06:30 Company: and now imagine you zoom out to a text size of like 3px
06:31 Company: that's like 360 rows of text you need to render
06:56 Company: DemiMarie: btw, 2 low-hanging fruit examples of things where software rendering could improve a lot:
06:57 Company: 1. damage region support isn't implemented, ie eglSwapBuffersWIthDamage() doesn't work (on Wayland, no idea about X11)
06:57 Company: which means the compositor will blit the full buffer even if there's only a spinner rotating in a corner
06:57 Company: 2. there's no dmabuf implementation for software rendering
06:58 Company: which means everyone has to implement their own codepath for shm which means all the optimizations for dmabufs don't apply
06:59 Company: and there's no reason why such a thing shouldn't exist
10:15 riteo: hiii!
10:18 riteo: Over at Godot we do a bunch of weird things to try and default to a chunky, dedicated device. During the GLX days we forked a bunch of times, initialized some contexts and detected the vendor for some priority calculation. Now, with EGL, it looks like we can use EGL_EXT_explicit_device, but I have a bunch of questions I can't find an answer for.
10:21 riteo: First of all, can we trust the device list order to be consistent? The spec does not say anything there so that might be a no, but as mesa does not support the device ID extension, I can't see really a good way to let us select a card based on our (or the user's) preference.
10:24 riteo: Second, why are SW renderers (or at least, LLVMPipe) excluded from the explicit device logic? Is it because it's not backed by a physical device?
10:25 riteo: (to be clear, by "explicit device logic" I mean the fact that it ignores them when creating a platform display with an EGL_DEVICE_EXT attribute)
10:26 riteo: uh right, one last clarification: the GLX approach relied on forking _and_ setting DRI_PRIME, which BTW I think also excludes LLVMpipe
11:40 apinheiro: KungFuJesus, mairacanal pointed me the questions you were making about v3dv
11:40 apinheiro: about dual src blend, as alyssa said, it is doable
11:41 apinheiro: it was asked by other emulators/ports too btw:
11:41 apinheiro: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10402
11:41 apinheiro: we don't have a clear idea of when we would work on it though
11:41 apinheiro: about tessellation shaders
11:41 apinheiro: although it was also requested by other ports
11:41 apinheiro: it is clearly out of our TODO list
11:42 apinheiro: technically the hw would be able to support it, but it would rquire a really big amount of work to get it done
11:44 apinheiro: about compressedBC
11:44 apinheiro: v3dv supports BC1, BC2 and BC3
11:44 apinheiro: but we can't enable that feature unless we support all (BC1 - BC7)
11:45 apinheiro: having said so, last time I tested the game you mentioned, RBDOOM3-BFG, before the big revamp
11:45 apinheiro: it used a BC texture that we supported
11:46 apinheiro: so it worked fine if you skipped that check
11:46 apinheiro: I had a early plan to write a patch so it checked for formats supported instead of that feature, but didn't have time
11:47 apinheiro: btw, at that time, before such big revampt, we were able to get ~8fps, or 11fps using a gfxreconstruct trace on the rp4
11:48 apinheiro: it is surprising that now after the big refactoring on the game, rpi5 gets 8
11:48 apinheiro: although I guess that as part of that they added more effects and so on
11:49 apinheiro: btw, before, when I talk about the "big revamp" I mean the big revamp on RBDOOM3-CFG
11:50 apinheiro: ah yeah, you already mentioned it: when they moved to use nvhri
12:27 daniels: riteo: device order is absolutely not stable
12:27 daniels: and yeah, swrast just isn’t a device as such
12:42 alyssa: apinheiro: OOI what's hard about tess if you have hw?
13:16 riteo: daniels: thanks for the clarification!
13:16 riteo: I suppose that we can't implement a suitable improvement to the DRI_PRIME solution without the device ID, all right
13:17 daniels: riteo: sorry it’s maybe not what you wanted to hear
13:17 riteo: nah, it's perfectly fine
13:17 riteo: it was an actual thanks
13:18 riteo: this stuff is not really well documented
13:18 riteo: anyways, we already have the DRI_PRIME hack, which is awful but will do fine for now I think
13:18 riteo: so it's not really an issue
13:49 daniels: why do you need DRI_PRIME when you have explicit devices?
14:21 riteo: daniels: because we fork
14:22 riteo: I don't have much experience with this stuff, but it looks like opening a lot of context (for glGetString) is a bit problematic as drivers might crash and there's some memory increase due to static allocations
14:23 riteo: or at least this is what I got told
14:24 riteo: BTW this definitely applied in the past, as the whole DRI_PRIME thing uses forking and pipes exactly for this reason, as per the comments
14:24 riteo: Maybe things improved in EGL land?
14:28 riteo: FTR, the aforementioned hack has been merged in early 2019, here: https://github.com/godotengine/godot/pull/25391
14:28 riteo: and the comments say this: `// Fork so the driver initialization can crash without taking down the engine.`
14:29 HdkR: Nice
14:30 riteo: HdkR: what's nice
14:32 daniels: i mean, drivers absolutely should not be crashing
14:32 HdkR: Using fork to avoid driver crashing :)
14:32 daniels: else GL would be kind of unusable
14:33 riteo: I think that the idea is to not crash for a device that we're not actually using
14:33 riteo: maybe a bad iGPU
14:34 riteo: HdkR: oh cool! TBC it's not my approach, you have to thank hpvb for that
14:34 HdkR: Ideal world is to poke at this with fork+execve to ensure sanity in thread state but I love the courage to use raw fork
14:35 riteo: HdkR: this stuff is done extremely early so there should be very little in the way, if I'm understanding properly what you mean
14:35 HdkR: Yea, the earlier the better there :)
14:36 riteo: daniels: BTW I think that this might also be related to the funky nvidia drivers
14:36 riteo: again, I know extremely little so don't be afraid to correct me
14:36 riteo: I'm trying to learn as I go
14:37 riteo: so, are you proposing to just go enumerate each device, start a context and evaluate what device to choose all in one go?
14:37 riteo: BTW I can also ask hpvb to come here if that might be useful, as they definitely know their stuff better
14:40 riteo: I let them know about this discussion, so that they can join if they want
14:50 riteo: I'd still be interested to know more personally, though
14:55 KungFuJesus: apinheiro: are you saying you can get in game on an rpi5 with it?
14:55 KungFuJesus: or did you misinterpret what I said to mean I got in game? I didn't get past the menus
14:56 KungFuJesus: even with the latest version of mesa, it just brings down the window manager
14:57 KungFuJesus: it could be the reliance on the rt features but my bet is the tesellation shaders
14:57 KungFuJesus: on the distro distributed mesa, I got a kernel stack dumped to dmesg, though I didn't save it
14:59 KungFuJesus: but yeah, perhaps the best direction I can take is forking from the last commit pre nvhri and trying to go from there
15:36 mairacanal: KungFuJesus, any chance you could provide us the dmesg log?
15:57 KungFuJesus: possibly, I can hit it over SSH but the the display stack kind of requires a hard reboot. Let me try using the distro's mesa again (rpi os). I also don't recall offhand if it requires the default 16k page kernel or not, that may have been a bigger role
15:57 cheako: Where should I ask about the steamdeck getting hdmi@60hz on a tv? I have lots of details but keeping this message short.
16:03 KungFuJesus: mairacanal: trying to reproduce now, I'll send a pastebin if I manage to
16:05 KungFuJesus: yeah it seems like it's related to the default 16k page kernel
16:07 KungFuJesus: yep, here we go: https://pastebin.com/9fGYeSSz
16:07 KungFuJesus: so, with a standard page size kernel (probably 4k) the display manager just crashes and restarts
16:08 KungFuJesus: but with a 16k one, I get this, requires a hard reboot to get the display device working again
16:16 apinheiro: > <KungFuJesus> apinheiro: are you saying you can get in game on an rpi5 with it?
16:16 apinheiro: KungFuJesus, most of my replies were about the features missing
16:17 apinheiro: and then I mentioned that before that game ported their engine to nvhri, I got it working on the rpi4
16:17 apinheiro: I didn't test the game in some years
16:18 apinheiro: for example, before the port to nvhri, the game didn't require tessellation shaders
16:18 KungFuJesus: ah gotcha, so at some point they injected some nvhri features that prevented the nvhri version from working, but at one point it did
16:25 mairacanal: KungFuJesus, did you compiled this kernel by yourself?
16:34 KungFuJesus: nope, stock rpi os kernel
16:34 mairacanal: and also, could you give me the steps to reproduce this issue? it can be in the #videocore room to avoid spamming here
16:39 KungFuJesus: sure
16:41 KungFuJesus: hmm, can't join that channel without ssl for some reason
16:41 uhaabergen: I am requesting the clearance of banlist and silent list, just to avoid bad traffic towards me in every day life again and again, and from all the channels, gpu related mintlinux debian and llvm, and favor against is that i do not enter the channels, with more than 50 lines a month of non insulting content , is that agreement fine to you?
16:58 uhaabergen: And the point behind that decision is: There is nothing that irritates my eyes anymore in current state of science or compute science, these things that move or irritate me are in the era of wars but those are not the channels to discuss those. So i lack proper reasons and motivation to troll or insult anyhow or even spend my free time here, cause i haven't got much either.
17:03 uhaabergen: From time to time i get those underground people coming to me extorting some code or bitcoin solutions and it's fair to say i do just not respect such extertion in response to also excess humiliations and such, cause i only had some ideas, i doubt i am superior to you also btw.
17:14 uhaabergen: the thing is i chose to deal with respectful and nice behaving people which you wanted not to offer here officially, so this gone i am not making agreements with underground extorting persons at all with a long time strategy already placed there, and it has not gone through any changes or change of plans. But i am not sure if i have all the solutions or am i ill, i tend to think i do have them yes, but it's just common sense if you want
17:14 uhaabergen: something to work, you gotta behave or pay for the job, at least either one of those, with a terror kind behavior i do not carry out negotiations at all anyways.
17:15 Ermine: Who is op here?
17:21 zmike: sadly still not enough people to be truly effective
17:26 uhaabergen: IF you want to know about me, i am clearly not someone you think i am, god father or underground boss, i was a player, now i am a bit smaller player, cause i have not got the leg strengths for years in a row, but all my games with fair salary are taped and are official, i do not deal in underground world , it's your field or someones elses, i do not do that.
17:31 uhaabergen: which is to say, i was never active in the gangster boss and bosses of mob bosses world, even during my lifetimes big 90s conflict, i was then only a table tennis athlete similarly but not yet injured only then .
17:33 uhaabergen: and estonian nation has nothing to do with me either, i have many friends here, but estonians are not russians either, they are mixed with all other nations today yes, but language is different and bases come from central europe as many times said, europes big migrations, those that originally fought the land here, were some clans of huns likely
18:18 riteo: what aer you talking about
18:58 DemiMarie: Company: would it be okay if software-rendered dmabufs are pinned system memory and provide an unmap notification? The reason is that I want to be able to export the underlying memory using Xen grant tables.
19:10 Company: DemiMarie: that's outside of my expertise
19:10 Company: I'm looking at it from a client perspective
21:53 daniels: riteo: crashing on enumeration or initialisation is obviously unacceptable, even if it is a driver without suitable hardware present. if you do see that happening please file it and escalate as hard as you can
21:54 daniels: I don’t believe Mesa ever does it, and if proprietary drivers do then that is certainly fixable - forking to work around it is a joke tbh
21:54 daniels: (if you need help escalating within NV, just ask here)
22:54 riteo: daniels: I see, thanks a lot for explaining this further; this is precious info and your contact is extremely valuable too :D
22:54 riteo: I'll let hpvb know, as they're the author of the original forking thing