00:01gfxstrand[d]: Uh... does explicit sync work without atomic?
00:19prop_energy_ball[d]: Well, good compositors buffer latch on the CPU
00:19prop_energy_ball[d]: so yes
07:24jfalempe: Hi, can someone with nvidia tiling knowledge take a look at https://patchwork.freedesktop.org/patch/595262/?series=133963&rev=1 ?
07:25jfalempe: I don't know much nouveau, nor the nvidia hardware, so it might not be the right way to do that.
08:45ahuillet[d]: aren't there helper functions already to avoid rewriting this?
09:07karolherbst[d]: normally we don't manually detile anything, because that's usually up for the GPU to do
09:09karolherbst[d]: jfalempe: you could also copy the entire content from linear to tiled via the 2D acceleration the GPU provides
09:10karolherbst[d]: but I guess the idea is also to use as few functionality as possible so you don't risk running into other issues
09:51jfalempe: karolherbst: yes in panic situation you try to do it direct and simple.
09:51jfalempe: also you can't sleep, or use queue/irq, so might be hard to use the 2d accel.
09:58jfalempe: At the same time, it's tricky to get the tiling parameter that the hw is using, so if it's not too complex to use 2d, or even to disable tiling on the current framebuffer that would make thing a bit easier.
10:20karolherbst: jfalempe: the tiling generally depends on the size of the texture
10:22karolherbst: anyway... I kinda want to see a proper library declaring tiling formats anyway (so one might be able to use tiled instead of linear for prime use cases and tile/de-tile in shaders) and we could also use that in the kenrel to generate helpers like that maybe
10:22karolherbst: using 2D acceleration on the nvidia GPU still requires you to have a proper GPU context allocated and stuff, but I don't know if the thing you are working on relies on that anyway
10:22jfalempe: each hardware as it's own tiling format, so you probably can't do much in common.
10:23karolherbst: that's the idea here
10:23karolherbst: you just declare the format
10:23karolherbst: then you can generate tiled -> linear or linear -> tiled code
10:23karolherbst: and use it inside shaders or wherever you want to use it
10:24karolherbst: and then intel GPUs could deal with nvidia tiled formats e.g. by just running a special shader or add a preprocessing stage inside shaders directly or whatever
10:24karolherbst: or might even be able to generate nvidia tiled -> intel tiled directly
10:24karolherbst: but for this to exist, we need something to declare those tiled formats in the first place
10:25jfalempe: for the panic code, I'm not sure we can run shader ? everything is done on CPU currently.
10:26airlied[d]: Can the BAR mapping do some details?
10:26airlied[d]: Detile?
10:26karolherbst: jfalempe: nah, that's just an idea of mine for use in vk/gl
10:27karolherbst: the point is just, that we don't have a place where those tiling formats are declared
10:27karolherbst: and I think we'd be able to use it for other purposes as well if that existed
10:28jfalempe: I tried to get some tiling info from the fb->modifier, but the documentation is not clear about that.
10:28karolherbst: yeah, and as I said, it doesn't exist yet properly in a way that's machine readable anyway
10:31jfalempe: airlied: I think the proprietary driver keep the uefi framebuffer available, so it can revert to that later. (that's why it can co-exist with efifb when switching VT).
10:32karolherbst: so if you while working on this come up with this "tiling format in some structured file format" -> "some tool" -> "generate code" that could also help with other things.
10:32karolherbst: jfalempe: mhhh... maybe it's a good idea to make drivers do that in general
10:34karolherbst: would just have to reserve some physical memory and not use it for anything else
10:34jfalempe: But it has other drawback, like you may need to change the modsetting, which can be tricky in a panic situation.
10:34karolherbst: fair
10:35karolherbst: though I wonder if that actually matters
10:35jfalempe: and if you swap monitor at runtime after boot, you're a bit screwed.
10:35karolherbst: it's just physical memory after all, and you can reserve more than the firmware used
10:35karolherbst: like.. just reserve for 8k for good measure and leave that area alone
10:36karolherbst: though I wonder if the firmware framebuffer then would just cut of the content or whatever would happen then
10:36karolherbst: ah yes...
10:37karolherbst: anyway, I don't think that having the CPU "draw" something into the currently bound framebuffers is a bad idea. I just think that we might want to have a proper way of declaring those tiled formats long-term
10:37karolherbst: and be able to generate code for all kinds of purposed out of it
10:39jfalempe: but I even didn't found a documentation, about what the format is for nvidia.
10:40jfalempe: I just draw it on GTX1650, and try thing until it draws something readable on screen
10:40karolherbst: right... gfxstrand[d] might know, but yeah.. we should document the format somewhere
10:42jfalempe: also I don't know if the format is defined in the hw, or in the firmware (that mean if you update the firmware, you may end-up with a different tiling). probably a bit in both.
10:43karolherbst: nah, that should be in hardware
10:53mohamexiety[d]: karolherbst: https://gitlab.freedesktop.org/mohamexiety/mesa/-/commit/01b183ec2769cc913941e8222c367157d92a4198#eb11a8743094c9debf12758c48121b85321d39fa_193_197 this is really old, I haven't pushed my changes since then but there's a little bit of documentation here based on stuff I gathered from poking around + the Orin manual
10:54mohamexiety[d]: I didn't change much in that comment block beyond block width being 64B rather than 32B (the 32B was a typo)
10:55mohamexiety[d]: er, woops, didn't notice it was irc. karolherbst: ^
11:01karolherbst[d]: mohamexiety[d]: no need to do that anymore 😄
11:01karolherbst[d]: the bridge translates pings and replies accordingly
11:01mohamexiety[d]: ohhh nice then
11:01airlied[d]: So formats with modifiers should be pretty well defined, some of the device specific ones on older hw has all sorts of crazy
11:02airlied[d]: Esp Intel had one that changed between single and dual channel ram
11:02karolherbst[d]: airlied[d]: why does it make sense
11:04karolherbst[d]: later you tell me that also cachelines and stuff worked differently depending on single or dual channel and I'd be like: "mhh yes, makes perfect sense"
11:08airlied[d]: https://fossd.anu.edu.au/linux/v2.6.38-rc5/source/drivers/gpu/drm/i915/i915_gem_tiling.c#L39
11:08airlied[d]: Ah the cursed comment
11:13karolherbst[d]: oh wow
11:20ahuillet[d]: airlied[d]: I believe it may have been the case a decade ago, but not any longer. Not fully certain.
11:21ahuillet[d]: have we not released the internal utility library for blocklinear stuff? I could look into that, though the code style will probably not be appropriate for the kernel
11:22karolherbst[d]: as long as it's open source licensed, we can just restyle it anyway or something. But I think generally just having documentation on it might also be helpful.
11:23ahuillet[d]: I don't know if there /is/ documentation other than the reference implementation. I'll see what I can find (next week - on vacation currently)
11:23ahuillet[d]: surely openRM has something?
11:23karolherbst[d]: why would it?
11:24karolherbst[d]: there aren't really many situations where you actually need to know it
11:24ahuillet[d]: it never needs to detile anything?
11:24karolherbst[d]: besides the CPU drawing directly into the buffer
11:24karolherbst[d]: yeah
11:24karolherbst[d]: GPU drivers just draw the frames themselves usually
11:24karolherbst[d]: so you use 2D + the tiling config
11:25karolherbst[d]: at least that's also what Nouveau does
11:25ahuillet[d]: and RM doesn't draw anyway
11:25ahuillet[d]: I'll check what we have/can release. I'd still be surprised if there was nothing.
12:20zmike[d]: has anyone started looking at https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VK_KHR_dynamic_rendering_local_read.html
12:21zmike[d]: I guess I'll make a ticket
13:16tiredchiku[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1253337741937213500/kdelog.log?ex=66757d3b&is=66742bbb&hm=b012d7d98ff255183cc816bb45a585132df35b8aa8917ec7e2938d36eee743f7&
13:16tiredchiku[d]: plasma wayland seems to be failing on intel + nouveau prime setups with zink+nvk enabled (NOUVEAU_USE_ZINK=1 in the environment
13:17tiredchiku[d]: x11 starts up fine, wayland also starts up fine if NOUVEAU_USE_ZINK is unset
14:23redsheep[d]: tiredchiku[d]: This sounds identical to the testing I did yesterday but I don't see the same stuff in the log so hard to be certain
14:23gfxstrand[d]: karolherbst: I've thought about trying to do something like this but I've never been able to convince myself it's enough of a win to bother. In a prime setup, most of your cost is PCIe and that's not really affected by tiling. Once you get it on system RAM, the only real benefit to tiled vs. linear is that sampling from tiled is a little more efficient. Unless you're REALLY pushing both GPUs hard,
14:23gfxstrand[d]: the compositing GPU probably has extra time anyway so meh. Meanwhile, in order to get it into Intel tiling, you've replaced the really efficient DMA engine with a compute shader which probably isn't going to be as efficient for the dGPU (the one actually under load) or have good and access patterns across PCI (the actually scarce resource). I just can't see it being fast.
14:24gfxstrand[d]: Otherwise, I would have written it years ago. 😅
14:24karolherbst[d]: fair enough 😄
14:25karolherbst[d]: though in the case of nvidia, it would allow to drop this workaround in regards to linear depth buffers
14:26karolherbst[d]: potentially
14:27redsheep[d]: The comment about I tell tiling being channel dependent blows my mind... Why would that interleaving not just be abstracted away inside the memory controller?
14:27karolherbst[d]: it's about optimizing cache hits
14:31gfxstrand[d]: karolherbst[d]: Only in cases where we're guaranteed we know what the destination is and...
14:32gfxstrand[d]: redsheep[d]: That's exactly why it's only in older hardware. The memory controller got smarter on broadwell.
16:09gfxstrand[d]: What was truly cursed was really ancient Intel where they swizzled bit 14. For those of you reading along at home, a page is 4096B, i.e. 12 bits of address so swizzling wasn't just based on where you were within the page but based on where that page lived in *physical* memory.
16:10karolherbst[d]: how much can you hate your driver devs
16:11gfxstrand[d]: To be fair, this was back in the day where taking a 128MB or 256MB GPU carve-out at boot was pretty common.
16:13mohamexiety[d]: every time I hear about anything Intel HW related I feel they were following a list of the most cursed and out there ideas possible
16:13mohamexiety[d]: even Alchemist/Xe still has bits and pieces that make you go _why_
16:13karolherbst[d]: gfxstrand[d]: okay sure, but ...
16:15redsheep[d]: gfxstrand[d]: Is that not still a thing? I'm pretty sure on amd the memory you tell your bios to use for iGPU is not accessible to the cpu
16:16karolherbst[d]: yeah, but you can allocate things on top
16:16redsheep[d]: And you can tell it to use like 4 GB
16:16karolherbst[d]: that wasn't that common earlier
16:16karolherbst[d]: redsheep[d]: does it make any sense to use that much though? 😄
16:17redsheep[d]: If you're using your iGPU for gaming, yeah I think it can
16:17karolherbst[d]: I wonder why...
16:18redsheep[d]: Dunno, wonder how this will be handled on strix halo. 40 CUs is a whole lot to have on an APU
16:22redsheep[d]: Maybe they have removed the overhead or fixed whatever issue now, but I remember it being pretty common advice as of a few years ago to give it more allocation for games
16:24gfxstrand[d]: redsheep[d]: Not since like 15 years
16:24gfxstrand[d]: But it was a thing back in the i915/i815 days
16:25mohamexiety[d]: redsheep[d]: nah. you can set it to the lowest and it works fine
16:25mohamexiety[d]: it doesnt actually restrict the iGPU to the carve out region
16:26mohamexiety[d]: from my understanding this thing is just a legacy thing for the boot process nowadays
16:26redsheep[d]: Ok, good to know. I thought it was something about games misbehaving but that's probably all fixed up
16:44gfxstrand[d]: redsheep[d]: Nah, except for a few embedded parts (like old raspberry pi), no one uses carve-outs anymore. The most they'll reserve is maybe 1-2 4K displays worth for emergency situations.
16:44gfxstrand[d]: Just about everything is allocated from the system memory pool
16:46redsheep[d]: Can somebody with some anv or radv hardware try to replicate my chromium/discord issue to get zmike his answer?
16:46redsheep[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11160#note_2461524
16:48redsheep[d]: It should really not be difficult to replicate. Launch discord with zink, put the system under load, I use Minecraft, then move your mouse over the server icons for a bit
16:48redsheep[d]: For that matter, can anyone replicate with NVK so we know it's not just me?
16:58asdqueerfromeu[d]: gfxstrand[d]: My laptop still has a 512 MB carveout
17:00gfxstrand[d]: But does it really?
17:00gfxstrand[d]: Like, it might report something but that's very different from actually partitioning physical ram
17:06asdqueerfromeu[d]: gfxstrand[d]: `[ 0.101554] Memory: 15674396K/16170372K available (18432K kernel code, 2164K rwdata, 13224K rodata, 3412K init, 3624K bss, 495716K reserved, 0K cma-reserved)`
18:41gfxstrand[d]: Yeah, the kernel is reserving some memory for *something*. That's very different from the old bios-separated GPU memory carve-outs
18:57karolherbst[d]: well.. even modern firmware have an option to pre-allocate RAM to dedicate it to the iGPU
18:57karolherbst[d]: I just don't know if anything actually makes use of it or not
19:15gfxstrand[d]: Not on Intel
19:15gfxstrand[d]: I'm pretty sure not on AMD
19:16gfxstrand[d]: Everything goes through GPU page tables, even the context and firmware memory.
19:17karolherbst[d]: all I know is, that I have this option in my uefi firmware and it's reducing the amount of RAM available to the OS. But if it's just wasted memory, then I wonder why that option is there in the first place
19:20mohamexiety[d]: I remember reading something somewhere that it's needed for legacy reasons for the BIOS?
19:21mohamexiety[d]: it's not needed for UEFI but CSM is what I understood
19:21mohamexiety[d]: could be wrong though
19:42dadschoorse[d]: I think for amd apus the carveout can still perform slightly better than gtt
19:43dadschoorse[d]: something, something, iommu, but don't quote me on that
19:44pixelcluster[d]: iommu? i thought i heard something about tlb flushes not needing to happen on cpu but maybe I dreamt that up
19:45dadschoorse[d]: there was some iommu virtualization related bios setting that made the difference larger, I think
20:27HdkR: iommu address translation can have noticable performance impacts if the iommu can't quite handle enough transactions from all the clients on the bus
20:28HdkR: Sad but true reality
20:40airlied[d]: yes on amd integrated the carveout has better perf due to some bus optimisations
21:51gfxstrand[d]: interesting
22:24prop_energy_ball[d]: dadschoorse[d]: Yes, iommu stuff. We disable iommu on Deck for that reason
22:25mohamexiety[d]: wait if it performs better, wouldn't you want it enabled?
22:35airlied[d]: they do have carveout, just no iommu
23:50karolherbst[d]: airlied[d]: I suspected the same on Intel, hence why I always maxed out the carveout 😄 Though on intel system it's generally like 256 or 512MB and that's like almost nothing
23:51airlied[d]: The Intel driver never takes advantage of it
23:51airlied[d]: Often the bios carveout it actually gtt size
23:52airlied[d]: There were some bioses that would steal RAM for a 1GB or so. But I think that stopped at some point
23:53karolherbst[d]: yeah, fair
23:53karolherbst[d]: so then the question is: does a bigger gtt size even matter on intel