04:09 Armada: Found a potential nullptr derefence in the gallium nouveau driver on older GPUs: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c#L1494
04:09 Armada: not every GPU has a copy engine so screen->copy can by NULL which would cause null pointer deref
04:10 Armada: *can be
04:10 airlied: Armada: indeed, what gpu you seen it on?
04:13 airlied: ah fermi would be the only one
04:13 Armada: I haven't reproduced it on an actual system, I was just overriding the supported classes manually and if you try and use the NVC0_M2MF_CLASS class you'll get a crash because it's older than NVE4_P2MF_CLASS
04:15 airlied: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30421 should fix it
04:20 Armada: airlied: thank you for creating the MR :)
09:53 karolherbst[d]: airlied[d]: mind reviewing https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30374 ? The original bug is about a memory corruption somewhere, so this is just dealing with the fallout from that, might not even matter, but... realloc is a pain regardless
10:09 airlied[d]: Seems unlikely realloc will fail but yeah is fine to handle it
10:13 karolherbst[d]: yeah...
10:13 karolherbst[d]: from the bug: `kernel: __vm_enough_memory: pid: 2310, comm: gnome-shell, bytes: 136553893888 not enough memory for the allocation`
10:13 clangcat[d]: airlied[d]: Well yea memory allocation almost never fails
10:13 clangcat[d]: But yea it can
10:14 karolherbst[d]: it definitely ran out of VM space 😄
10:14 karolherbst[d]: things memory corruption can lead to
10:14 clangcat[d]: karolherbst[d]: Yea I mean it almost never happens but it can so it's not a bad idea just to handle it.
10:14 karolherbst[d]: some also disable over commitment
10:14 karolherbst[d]: but well
10:15 karolherbst[d]: handling here means it's failing to render and causes other issues
10:15 HdkR: What is that byte count? 127-ish GB?
10:15 karolherbst[d]: something like that
10:15 HdkR: Why does gnome-shell need that much :D
10:15 karolherbst[d]: because of nouveau
10:15 clangcat[d]: karolherbst[d]: But how does that work?
10:16 HdkR: I'm also curious, how does that work for 32-bit?
10:16 clangcat[d]: Like why does them using nouveau mean they need so much
10:16 karolherbst[d]: I think the code ended up doing `sizeof(some_struct) * corrupted_memory * 2)` and `corrupted_memory` was `0xfe59f850`
10:16 HdkR: ah
10:16 karolherbst[d]: in a loop
10:17 clangcat[d]: karolherbst[d]: Cursed
10:17 karolherbst[d]: yeah..
10:17 karolherbst[d]: some linked list corruption or something
10:17 karolherbst[d]: though I don't see why that would fail, but all bets are off if you operate on random memory anyway
10:20 clangcat[d]: I mean 127gb alloc probably should fail
10:29 karolherbst[d]: depends
10:32 karolherbst[d]: but yeah, I guess most `malloc`/`realloc` impls will allocate memory in such a way it needs to be backed by RAM
10:56 asdqueerfromeu[d]: karolherbst[d]: Including GNU Libc? 🦬
10:56 karolherbst[d]: I have no idea
10:57 karolherbst[d]: I've checked `godbolt` and it fails around 8GiB, but no idea what it uses, might also disable over-commitment, so nobody causes a mess with swap
11:00 karolherbst[d]: apparently on my system malloc fails around 128GiB
11:00 karolherbst[d]: which roughly is my RAM + swap
11:01 karolherbst[d]: so I guess glibc does check how much memory you have available and at least errors when you try to allocate more than that
11:34 HdkR: karolherbst[d]: This shouldn't be glibc checking. It's mmap failing due to overcommitment limits. If you had more swap then it would be allowed.
11:35 karolherbst[d]: HdkR: it depends on what flags you pass to mmap
11:35 HdkR: Yea, glibc won't be passing in MAP_NORESERVE
11:35 karolherbst[d]: that's not the relevant flag even
11:35 karolherbst[d]: you can just create anon mapping just fine
11:36 HdkR: Isn't it? That's the one I use in FEX to allocate 256TB without getting ENOMEM or ENOSPC
11:36 karolherbst[d]: it just blows up your vm space until you use the mapped memory
11:36 clangcat[d]: karolherbst[d]: Well I know you can alloc more ram than the system has with swap. Along with that `malloc` and `realloc` often times will also allow you to allocate more than you have. As I remember a video from Jacob something showing most malloc won't actually show the memory as used til the program uses it.
11:37 clangcat[d]: https://www.youtube.com/watch?v=Fq9chEBQMFE
11:37 clangcat[d]: Found it
11:37 karolherbst[d]: HdkR: `MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED_NOREPLACE` should do the trick as well
11:37 clangcat[d]: :3
11:37 karolherbst[d]: or maybe that only works with `PROT_NONE`? But I think that's irrelevant
11:37 HdkR: As far as I'm aware, without NORESERVE, mmap will limit how large a singular mapped buffer is allowed from the overcommitment rules
11:38 HdkR: well, "buffer" is a tracked VMA region in the kernel at that level
11:39 clangcat[d]: HdkR: But yea it's probably mmap failing cause my example is 1GB at a time.
11:39 clangcat[d]: I imagine 127GB in one go just makes the OS say no
11:40 clangcat[d]: AND
11:40 clangcat[d]: I imagine GNOME/Nouveau is actually using that block of memory
11:40 karolherbst[d]: HdkR: ohh, looks like it fails when you use `PROT_WRITE`
11:41 clangcat[d]: uHHH I BROKE MY KEYBOARD
11:41 clangcat[d]: CAPSLOCK IS STUC
11:41 karolherbst[d]: clangcat[d]: :blobcatnotlikethis:
11:41 clangcat[d]: I PRESS IT AND THE LIGHT STAYS ON
11:41 clangcat[d]: WHAT xd
11:42 HdkR: Ah yes, because it doesn't need to swap read-only zero pages :P
11:42 karolherbst[d]: but yeah.. seems like it's mmap failing
11:44 HdkR: Also with some testing, looks like behaviour has changed around these overcommitment rules without NORESERVE. Somewhere between 4.18 and 6.2 it has changed on me
11:48 clangcat[d]: karolherbst[d]: Welp that was strange
11:49 clangcat[d]: had to power off computer and it still didn't turn off so had to hold the power button till it worked itself out XD
11:52 karolherbst[d]: HdkR: annoying :ferrisUpsideDown:
11:54 HdkR: Maybe I'll just assume it was a bug in old kernels and it was fixed. That'll make me feel better :P
11:55 clangcat[d]: Okay apparently it's an overheating issue thanks dell. There are better ways to convey this.
12:19 karolherbst[d]: gfxstrand[d]: you already implement the `UUID` stuff properly for nvk, didn't you?
12:27 karolherbst[d]: I might just copy the thing to nouveau, because apparently the android emulator uses that without checking if the driver even supports it
12:27 karolherbst[d]: (or expect those to fail)
12:27 karolherbst[d]: and mesa simply crashes
12:36 asdqueerfromeu[d]: karolherbst[d]: NVK has a more specific `deviceUUID` than RADV in my case
12:40 karolherbst[d]: isn't it just the pci address?
12:40 karolherbst[d]: ehh pci vendor + device id
12:40 karolherbst[d]: mhhhhhhh
12:40 karolherbst[d]: mhhhhhhhhhhhh
12:41 karolherbst[d]: I don't think that's the correct thing to do?
12:42 karolherbst[d]: though maybe it's fine?
12:45 karolherbst[d]: yeah.. I think using the vendor + device ID is something which driver shouldn't do, because it might be the same across multiple devices
12:46 marysaka[d]: yeah and probably need to be unique compared to the proprietary driver if I'm reading the spec right
12:46 karolherbst[d]: though it's fine to do for pipelineCacheUUID
12:46 karolherbst[d]: marysaka[d]: if we guarantee compatibility it might be fine
12:46 karolherbst[d]: `deviceUUID and/or driverUUID must be used to determine whether a particular external object can be shared between driver components, where such a restriction exists as defined in the compatibility table for the particular object type`
12:47 marysaka[d]: main issue is finding a way to have the id being immutable across reboot... so maybe the PCI bus info should come into play here?
12:47 karolherbst[d]: so if it's the same with nvidia and nvk, and an application can assume it's shareable and it works, it's all fine
12:47 karolherbst[d]: marysaka[d]: yeah, that's I think encouraged here
12:47 karolherbst[d]: `While VkPhysicalDeviceIDProperties::deviceUUID is specified to remain consistent across driver versions and system reboots, it is not intended to be usable as a serializable persistent identifier for a device. It may change when a device is physically added to, removed from, or moved to a different connector in a system while that system is powered down`
12:48 marysaka[d]: For mobile platform, I think it's fine as the GPU isn't going to be present two time anyway
12:48 karolherbst[d]: right
12:48 marysaka[d]: (I do the same as NVK on panvk atm)
12:48 karolherbst[d]: I just want this to be figured out, before adding anything to the gl driver
12:49 karolherbst[d]: I'm also making use of this in rusticl to check if the GL and CL device is the same thing
13:12 ahuillet[d]: karolherbst[d]: Curious/offtopic, but how do you reconcile this with the mess that the Pentium unique ID thing was?
13:12 karolherbst[d]: ahuillet[d]: by making it a best effort thing
13:13 ahuillet[d]: I mean the privacy mess
13:13 karolherbst[d]: well.. dunno, but llvmpipe/lavapipe return `llvmpipeUUID`
13:14 karolherbst[d]: and `mesa` as the device_uuid
13:17 karolherbst[d]: soooo.. do we have some games which are very GPU (not CPU) bottle-necked with nvk?
13:21 gfxstrand[d]: karolherbst[d]: yup
13:22 asdqueerfromeu[d]: karolherbst[d]: That would be easier to know by fixing issue 336
13:29 gfxstrand[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1267836591028699248/Screenshot_20240730-0828302.png?ex=66aa3c56&is=66a8ead6&hm=b1f01fd6c23cbdd02c39bc98464c574a18fb587dc8209f8ed362d67559292b2e&
13:29 gfxstrand[d]: From a YouTube comment:
13:30 gfxstrand[d]: karolherbst[d]: do you know anything about those cards?
13:31 karolherbst[d]: nope
13:31 pixelcluster[d]: "aliexpress modded cards"? sounds like those scams where they flash a new vbios onto a 1050 Ti or whatever to make it say "3060"
13:31 triang3l[d]: karolherbst[d]: > an application can assume it's shareable and it works
13:31 triang3l[d]: Haha texture tiling goes txetuer tliin g
13:31 gfxstrand[d]: karolherbst[d]: Uh... Most of them? I've been benchmarking with The Witness and that's like 10% CPU
13:31 karolherbst[d]: pixelcluster[d]: yeah....
13:31 karolherbst[d]: gfxstrand[d]: see the discussion above, I have my doubts about the deviceUUID
13:32 karolherbst[d]: like atm, if you plug in 4 times the same GPU, all devices have the same deviceUUID
13:32 karolherbst[d]: and I'm not sure if that's according to the spirit of those UUIDs
13:33 karolherbst[d]: gfxstrand[d]: okay, I just want to know what's really impacted by shader performance, but maybe I just use zink and pixmark_piano, because that's like.... a tiny thing is measurable even
13:33 karolherbst[d]: I just want something where when I improve the instruction wait counts, that I get reliable results 😄
13:34 ahuillet[d]: pixelcluster[d]: my reaction too
13:34 gfxstrand[d]: Yeah, pixmark is probably the thing to use.
13:35 gfxstrand[d]: pixelcluster[d]: If that's what they are, then they might work, they'll just be advertised as what they actually are.
13:36 ahuillet[d]: unless it's a Fermi which I think I've heard of
13:36 pixelcluster[d]: I mean in the end I don't think it'll be much different from the windows situation
13:36 karolherbst[d]: yeah.. nouveau doesn't care about the PCI id or whatever
13:36 ahuillet[d]: because it was easier to lie back then
13:36 karolherbst[d]: it's just using the chipset
13:36 karolherbst[d]: but that might also be modded, I have no idea :ferrisUpsideDown:
13:36 pixelcluster[d]: "it might work, but it also might not and who knows how reliable bug reports from these GPUs are"
13:36 ahuillet[d]: but if it's an "M" variant they mean mobile so what are they actually talking about. 🤷
13:36 gfxstrand[d]: Like, no, I'm not going to go out of my way to read vbios just to make scammers happy. But if you got conned, we'll probably still boot the card. 🤷🏻‍♀️
13:36 karolherbst[d]: I mean...
13:36 karolherbst[d]: it would be interesting to look at to get one such GPUs and check it out
13:37 karolherbst[d]: but I also don't know if "nouveau now supports fake nvidia GPU" is the headline I want to see
13:37 gfxstrand[d]: Yeah, same
13:38 gfxstrand[d]: karolherbst[d]: Hrm... That's a good point. I mean, stuff should be shareable but yeah maybe it's not a UUID at that point.
13:38 mohamexiety[d]: pixelcluster[d]: back during the mining craze, since it was really hard to get cards, some chinese OEMs were clever and started taking mobile GPU dies and putting them on a regular dGPU PCB and selling them
13:38 karolherbst[d]: gfxstrand[d]: I think radv uses the PCI address
13:39 tiredchiku[d]: mohamexiety[d]: was about to say the same thing
13:39 mohamexiety[d]: so e.g. you buy a RTX 3060M 6GB, which is a mobile exclusive GPU, and put it normally in the PCIe slot
13:39 karolherbst[d]: gfxstrand[d]: see `ac_compute_device_uuid`
13:39 mohamexiety[d]: I am not sure how the NVIDIA driver doesn't support these though.. as far as the driver is concerned, it should just be a laptop GPU so it should work 😮
13:39 gfxstrand[d]: If they did it properly, such a card should work.
13:39 mohamexiety[d]: yep
13:39 karolherbst[d]: maybe it should be PCI address + vendor/device ID?
13:40 karolherbst[d]: something something
13:40 karolherbst[d]: anyway, if you change nvk there, I'd just copy it to the gl driver
13:40 gfxstrand[d]: karolherbst[d]: Yeah, that makes sense to me. Wanna write a patch?
13:40 karolherbst[d]: yeah, I can do it, need to fix a gl driver bug anyway
13:40 ahuillet[d]: mohamexiety[d]: on Linux yes, on Windows isn't there something like a INF file that lists what is allowed?
13:41 gfxstrand[d]: karolherbst[d]: IDK that I trust GL/VK sharing without Zink, BTW. We make different tiling choices.
13:41 ahuillet[d]: karolherbst[d]: would it make sense to copy what the blob returns? I can look it up if it's not public
13:42 karolherbst[d]: ahuillet[d]: ahh maybe
13:42 karolherbst[d]: at least it would make sense to know what they are doing 😄
13:42 karolherbst[d]: so we can do the same or something different
13:42 mohamexiety[d]: ahuillet[d]: oh! maybe that's why then. I didn't know
13:42 gfxstrand[d]: ahuillet[d]: I see no reason why not. They're also required to check driverUUID and those won't match so it should be fine.
13:42 mohamexiety[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1267839805912387625/image.png?ex=66aa3f55&is=66a8edd5&hm=9304efd8efe170a52965c2cfab6827f52888aac3295bd1aca34973f81bb801b0&
13:42 mohamexiety[d]: here's an example of what one looks like btw. really old ad from aliexpress
13:42 ahuillet[d]: lol at "M", that looks like a "M" alright
13:43 tiredchiku[d]: 😅
13:43 tiredchiku[d]: have seen one of those irl too
13:43 tiredchiku[d]: just a few days ago at that
13:44 karolherbst[d]: gfxstrand[d]: mhhh... we can also add a bit for vk vs gl
13:44 karolherbst[d]: or vm_bind vs old UAPI
13:48 karolherbst[d]: anyway.. for GL sharing in rusticl I rely on the deviceUUID to be the same, because there is some "gimme the same device for this context" thing going and I rely on the UUID to be sane and unique
14:05 triang3l[d]: Where does Nvidia store the actual compression metadata? I wonder how feasible sharing of compressed render target is
14:06 karolherbst[d]: in the page tables afaik
14:06 triang3l[d]: on AMD I think there's too much that's driver-dependent here, how you place the metadata planes in the BO, more specifically
14:06 karolherbst[d]: but doesn't really matter, because modifiers should reflect on that as well
14:06 karolherbst[d]: I think
14:08 triang3l[d]: (unless you require that all clients use something like a common library like libdrm for image placement calculations, but then you may still need some workaround stuff, like the intermediate depth buffer that I'm planning to use on TeraScale to fix mismatching depth and stencil pitches for small mips)
14:08 triang3l[d]: and also details like whether you use compression for mips
14:09 triang3l[d]: but ez with the page table compression switches probably
14:15 gfxstrand[d]: triang3l[d]: With modifiers, it should be totally feasible.
14:19 gfxstrand[d]: mohamexiety[d]: Thing is... I don't want to spend $300-500 USD on one. Like, if someone wanted to send me one, I'd be happy to plug it in and see if it works. But I'm not gonna spend piles of $$$ on a hacked up card I'll plug in once and never again.
14:20 gfxstrand[d]: Also, as much as I might be doing NVK in spite of NVIDIA in some ways, thoroughly pissing them off isn't really one of my career goals...
22:01 redsheep[d]: karolherbst[d]: If you want something native vulkan that is very highly GPU bound in nvk doom eternal is about as bound as it gets
22:02 redsheep[d]: Trouble is getting perfect consistency and not waiting around for it to load. Maybe you could capture and replay a scene though?
22:08 redsheep[d]: Vkmark probably has some quite GPU bound scenes as well
23:42 zmike[d]: karolherbst[d]: You could probably check out unigine stuff
23:42 karolherbst[d]: zmike[d]: afaik those are pretty memory heavy
23:43 zmike[d]: 🤔
23:43 zmike[d]: Furmark?
23:44 karolherbst[d]: I kinda liked to work on the pixmark_piano benchmark, as memory clocks matter 0, and doing some super tiny micro opt gave me a stable +0.2% improvement 😄
23:44 zmike[d]: Nice
23:44 zmike[d]: We must beat the blob.
23:44 karolherbst[d]: e.g. https://gitlab.freedesktop.org/mesa/mesa/-/commit/a61c388d077edf78321ee31c84b24c6cce24ccbc
23:46 zmike[d]: Micro indeed
23:46 karolherbst[d]: yeah.. same instruction count and eveyrthing
23:46 karolherbst[d]: just replacing a variable runtime instruction with a fixed one
23:46 karolherbst[d]: though i2i and f2f might even be fixed as well
23:46 karolherbst[d]: just higher latency
23:47 karolherbst[d]: anyway, on top of seeing those differences, the benchmark only wiggles by +-1 score if you run it a few times
23:47 karolherbst[d]: it's incredible how stable the results are
23:48 karolherbst[d]: anyway, those are the kind of benchmarks I'm after
23:51 zmike[d]: I've been in wsi jail too long to remember how fps works
23:53 zmike[d]: Not looking like I'll be out before end of year