00:34 calico: Hello, I'm wanted to give a try to the nouveau driver on some of my 10 years old Nvidia GPUs. I got a GT 630 (Fermi), a GTX 650, and a GTX 760. All ASUS ones. The first issue I got is that Nouveau was using DRI 2 by default which led to artifacts when I was switching windows in XFCE (I recorded a video if you devs would like to see). This was easily solved by using DRI3 instead of the default. The second issue I found is that I'm unable to reclock the GT
00:34 calico: 630 (Fermi, GF108), even if, according to the docs, this feature should be "mostly implemented". And the third issue is, I see those kind of "ACPI Error: AE_NOT_FOUND ..." in the BIOS, which according to some threads, is related to the usage of nouveau.
00:45 ermine1716[d]: ACPI Error messages are not related to nouveau highly likely
00:49 ermine1716[d]: and you'll be better off by bying newer cards, which have the most effort invested
01:07 calico: last issue to point out: Very bad 3D rendering perfs both in video games and Geeks3D benchmarks. Like 5 fps and less. See: https://imgur.com/a/MWIGQ6p
01:11 calico: https://imgur.com/a/MWlGQ6p
01:11 gfxstrand[d]: If your card isn't clocking up, that's why the perf is bad.
01:16 calico: that's for the GTX 760, ok so the current clock during that test was: 07: core 405 MHz memory 648 MHz
01:16 calico: I'll do a second test with the max clock:
01:17 gfxstrand[d]: There may be other reasons as well. The nouveau GL driver isn't exactly spectacular.
01:17 calico: according to those logs, now it's supposed to be clocked up: https://bpa.st/KFEA
01:17 calico: I'll post results in 2 mins
01:20 calico: way better ... 46 fps https://imgur.com/azpoeT4
02:09 calico: btw, what's the current status of NVK support for Kepler? also OpenGL 4.5 and 4.6 support?
02:11 orowith2os[d]: awilfox: might be able to answer that ^^
02:16 awilfox[d]: NVK on Kepler is probably not worth trying to use rn unless you are a dev and want to help fix it
02:16 awilfox[d]: I still have hope it will come "some day" but that day is def not today for normal usage
02:25 calico: ok
02:27 calico: > unless you are a dev and want to help fix it
02:28 calico: if no one would work on it, and if I'd have time, and docs I do it
02:49 gfxstrand[d]: awilfox[d]: It kinda works right now, barely. I keep meaning to delete codegen support, though, and then it won't until someone adds NAK support. But also someone needs to implement Kepler image support all over again regardless.
02:49 gfxstrand[d]: And then there's probably a bunch of bug fixing.
02:51 gfxstrand[d]: The bad news is that Kepler has two different ISAs. The good news is that it has hardware scoreboarding so they're a bit easier than Maxwell and Volta.
02:53 gfxstrand[d]: The other good news is that "let's bring up a new ISA" is not a bad entrypoint for community developers as folks can kinda work on it at their own pace without conflicting with other developers.
02:54 gfxstrand[d]: Fermi should also be possible, in theory. It just has a different copy engine and IDK what the ISA situation is. I'd bring up Kepler first.
03:06 calico: specific GPUs under Fermi codename or all of them?
03:11 calico: according to techpowerup, there's no support for Vulkan on GF108 so ...
03:12 calico: but I don't know if it's just on prop driver level or on hw level too
03:22 gfxstrand[d]: That's just because Nvidia decided not to bother as far as I can tell.
03:23 gfxstrand[d]: I filed a couple issues for Kepler support. I have no plans to work on it but it gives people a starting point and a place to organize.
03:23 gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12531
03:33 mhenning[d]: gfxstrand[d]: I think fermi isa is very similar to kepler 1 with some minor changes in texture argument ordering and stuff
03:42 gfxstrand[d]: Yeah, texturing is different but that's in a NIR lowering pass so it's easy. If we can reuse kepler1 NAK support, that makes Fermi a lot easier.
08:40 karolherbst[d]: fermi has short encodings, which are a bit of a pain, but also not required
15:08 gfxstrand[d]: Ah, right. Fermi doesn't have bindless. That's why we don't want to support it.
15:09 gfxstrand[d]: And that's also why all the texturing instructions changed on Kepler.
15:41 gfxstrand[d]: And Kepler doesn't have images. I don't actually care too much about that one, though. It can just be a lowering pass that we only kick off on Kepler. That's easy enough to keep out of the way of the rest of the driver.
15:41 karolherbst[d]: kepler has images, just need to know the format at compile time
15:42 karolherbst[d]: same for fermi
15:42 gfxstrand[d]: Oh? So there's a dimensional load, just not a formatted one?
15:42 karolherbst[d]: yeah
15:42 gfxstrand[d]: Oh, that's not so bad then.
15:43 karolherbst[d]: suldb and sustgb/p are supported
15:43 gfxstrand[d]: We have `nir_format_convert.h` so patching in format conversion is easy. We just won't support `shaderStorageImage*WithoutFormat`.
15:43 karolherbst[d]: there are different sust ops for kepler and fermi
15:43 karolherbst[d]: so that's a bit weird
15:43 gfxstrand[d]: meh
15:43 gfxstrand[d]: that's fixable
15:43 karolherbst[d]: same for suld
15:44 karolherbst[d]: but yeah
15:44 karolherbst[d]: I think the issue with bindless and fermi is, that fermis samplers and images are strictly separate, not quite sure how that works out there
15:44 gfxstrand[d]: separate images and samplers is fine.
15:45 karolherbst[d]: I mean, you need two binding tables
15:45 gfxstrand[d]: Having binding tables at all is a problem.
15:45 karolherbst[d]: yeah.. though I'm not really sure how it works on fermi
15:45 gfxstrand[d]: Separate image/sampler doesn't make much difference there. Intel is all separate, too.
15:46 karolherbst[d]: ahh
15:46 gfxstrand[d]: It's all the code to scrape data out of the shaders and descriptor sets at draw time and build the table that I'm opposed to.
15:46 karolherbst[d]: mhhh
15:46 karolherbst[d]: okay.. so in regards to images
15:46 karolherbst[d]: fermi only has them in fragment and compute stages
15:47 gfxstrand[d]: Yeah, that's not gonna work...
15:47 karolherbst[d]: apparently, just going by the gallium driver there
15:47 gfxstrand[d]: Well, I guess we could maybe disable them. I think there's a bit for that.
15:47 karolherbst[d]: yeah.. not sure how useful that would be, might be enough for zink tho
15:47 gfxstrand[d]: We could just disable `vertexPipelineStoresAndAtomics`. IDK how much DXVK will hate us for that, though.
15:48 karolherbst[d]: gaming on fermi is entirely pointless
15:48 gfxstrand[d]: heh
15:48 karolherbst[d]: they reduced the boot clocks
15:48 karolherbst[d]: and we have no reclocking
15:48 karolherbst[d]: so most fermi GPUs have like 150MHz clocks or something
15:48 gfxstrand[d]: Okay, I'm just going to declare fermi dead then.
15:48 karolherbst[d]: yeah...
15:48 gfxstrand[d]: Between images, binding tables and clocks, there's no point.
15:48 gfxstrand[d]: You might as well run llvmpipe
15:48 karolherbst[d]: the only point I'd see is to be able to use zink instead of gallium
15:49 gfxstrand[d]: llvmpipe
15:49 karolherbst[d]: well.. that hogs your CPU
15:49 karolherbst[d]: maybe I should plug in a fermi and see how bad it really is
15:49 gfxstrand[d]: I don't think it's worth carrying a pile of pain in NVK just for ancient GPUs that barely work to begin with.
15:50 karolherbst[d]: yeah..
15:50 karolherbst[d]: and then there is the copy subchan pain as well...
15:50 gfxstrand[d]: Kepler is maybe interesting because it can reclock. Even that I'm skeptical about. I just don't care because, like Earth, it's mostly harmless.
15:51 karolherbst[d]: the only thing I can think of is the ISA mess and nil
15:52 karolherbst[d]: and maybe some custom code here and there as a fallout from the images descriptors being different
15:52 karolherbst[d]: ohh yeah.. another reason against fermi: entirely different compute interfaces
15:53 karolherbst[d]: doesn't even use QMD afaik
15:53 gfxstrand[d]: Yeah, I don't want to deal with that.
15:54 karolherbst[d]: would probably better to have it's own driver 😄
15:54 karolherbst[d]: *its
15:54 gfxstrand[d]: Though we're likely to end up with some differences for compute dispatch anyway. NVIDIA likes their inline QMDs and there's probably a reason for that.
15:54 karolherbst[d]: well.. won't doesn't require memory reads
15:55 karolherbst[d]: when was that added?
15:55 karolherbst[d]: mhhh ohhh mhhh
15:55 karolherbst[d]: it still takes a buffer address...
15:55 karolherbst[d]: I wonder...
15:55 karolherbst[d]: pascal..
15:56 karolherbst[d]: maybe it's just an optimized upload interface or something
15:56 gfxstrand[d]: 🤷🏻‍♀️
15:56 gfxstrand[d]: It makes the QMDs show up in the pushbuf which is nice by itself.
15:56 karolherbst[d]: yeah
15:57 karolherbst[d]: I'm sure the command just writes to the buffer, though it might already cache it or something
15:57 gfxstrand[d]: Yeah. There's a QMD cache starting on Ampere, I think. It might go through that.
15:57 gfxstrand[d]: Right now we have to invalidate it at the top of every command buffer. IDK what the perf implications of that are.
15:58 karolherbst[d]: mhh.. yeah, probably not great
15:58 gfxstrand[d]: Maxwell B, actually
15:58 gfxstrand[d]: That was an entertaining bug to fix.
15:59 karolherbst[d]: sounds horrible tbh
16:00 gfxstrand[d]: "It looks like this test is executing the wrong shader. How is that possible? Am I seeing things? It really looks like that's what's happening. WTF? Arg.... Yup. That is what's happening. How? Why? WTH? :facepalm: Oh, there's a cache..."
17:07 karolherbst[d]: I'm hitting a bug where llvmpipe crashes its shader, when compiled with `NIR_DEBUG=print` ...
17:08 karolherbst[d]: apparently removing `validate_ssa_dominance` always triggers it, the bad part is, before `validate_ssa_dominance` was removed, it crasahes with `NIR_DEBUG=print`, and the shader is equal... so I suspect something weird is going on there
17:12 gfxstrand[d]: Uh oh...
17:13 gfxstrand[d]: Dropping this here in case it's useful later: https://cfallin.org/blog/2022/06/09/cranelift-regalloc2/
17:14 karolherbst[d]: mhhh.. llvmpipe is the only driver supporting function calls, so maybe that messes something up.. but wild...
17:14 gfxstrand[d]: That's plausible
17:15 karolherbst[d]: apparently `LP_DEBUG=cs` prints it without crashing, and the shader indeed changes with validate_ssa_dominance
17:15 karolherbst[d]: ohhhhhhh wait...
17:17 karolherbst[d]: mhhh I wonder if that's related to `nir/functions: force inlining for barriers.` because the change is that something doesn't get inlined...
17:17 karolherbst[d]: and there are barriers
17:17 karolherbst[d]: 6714689613aebec164f55b7ba8db41c8261885ab
17:18 karolherbst[d]: I can see how ssa_dominance could mess with the code from that commit
17:19 karolherbst[d]: weird...
17:24 karolherbst[d]: should ask Dave later
17:49 mhenning[d]: gfxstrand[d]: Yeah, I read cfallin's blog - a lot of the cranelift work is pretty interesting
17:49 mhenning[d]: I asked him a while back if he had considered using the ssa register allocation algorithms from Bouchez or Hack's theses and he said he hadn't really considered them
17:50 mhenning[d]: but yeah, I think the way they verify their register allocator is especially interesting and is something I've wondered about implementing for NAK
17:53 gfxstrand[d]: Once I'm back in a headspace where I can think about RA, I'd like to do some more reading.
17:53 gfxstrand[d]: We really need something better for coalescing. IDK if the cranelift bundling thing is what we want but we need something.
17:54 gfxstrand[d]: Unfortunately, we have to deal with vectors, which cranelift doesn't. Vectors in your RA suck.
17:55 gfxstrand[d]: And I really wish I could blow them off as "Oh, it's just for texturing" but 64-bit integers are everywhere if you have SSBOs.
17:55 zmike[d]: 🤠
17:57 mhenning[d]: Yeah. We definitely need more work on coalescing. There are some more techniques from Colombet's thesis that might be useful and are designed for ssa allocators, which might be worth looking into
17:58 mhenning[d]: Bouchez has parallel copy motion which is interesting but I'm a little skeptical about
17:59 mhenning[d]: and also we have these vector and fixed reg heuristics already that only operate within a basic block and I'd like to extend to whole-program
18:00 mhenning[d]: but yeah, I'm not sure there's an obvious next step for that one
18:09 zmike[d]: if anyone has any zink/nvk bugs the rodeo has begun so make sure they're filed
18:24 calico: since you were talking about supporting NVK on Kepler, Fermi, ... what about older gens like Tesla?
18:24 gfxstrand[d]: That's definitely not happening.
18:25 gfxstrand[d]: Fermi probably isn't happening.
18:25 gfxstrand[d]: We don't even have class headers for Tesla
18:26 calico: I know about not happening ... but theorically, if there's was enough time, and enough doc, would it be doable or would there be hardware limitation?
18:29 orowith2os[d]: Hardware limitations, you'd probably only get so far in supporting vk
18:29 gfxstrand[d]: Maybe? I mean, it has compute shaders.
18:29 gfxstrand[d]: 🤷🏻‍♀️
18:29 orowith2os[d]: I was thinking more in the case of older gens, Fermi and older
18:30 orowith2os[d]: Kepler is doable because I know the Nvidia driver supports 1.2 on it
18:31 gfxstrand[d]: Kepler is kinda reasonable.
18:32 gfxstrand[d]: In theory, anything with compute shaders should be able to. It may or may not pass conformance thanks to bugs but it should be possible.
18:32 gfxstrand[d]: The GLES 3.1 feature set is a pretty low bar in the desktop space.
18:36 mhenning[d]: zmike[d]: I'm running out of ideas for how to fix https://gitlab.freedesktop.org/mesa/mesa/-/issues/11901 , if you want to wrangle that one in your rodeo
18:41 zmike[d]: I'm currently trying to get weston to start
18:41 zmike[d]: for some reason enabling HIC usage causes nvk to try forcing host-visible memory type
18:41 zmike[d]: which is...not really what anyone wants
18:43 zmike[d]: and if I hack that out then trying to run anything in that weston explodes
18:43 zmike[d]: :stressheadache:
18:48 zmike[d]: does rebar still not work ?
18:55 tiredchiku[d]: :thonk:
19:01 gfxstrand[d]: zmike[d]: HIC?
19:01 gfxstrand[d]: zmike[d]: Depends on your GPU and motherboard. It should work on Ampere+ with most configurations.
19:10 zmike[d]: host image copy
19:10 zmike[d]: forcing a host-visible heap when that usage is requested is pretty bad
19:10 zmike[d]: you shouldn't be advertising support at all
19:12 gfxstrand[d]: Yeah, we shouldn't. I thought we had logic for that.
19:13 gfxstrand[d]: Ugh... That would mean disabling 1.4...
19:13 zmike[d]: you do have logic, and it removes all the device-local memory types
19:13 gfxstrand[d]: I meant logic for HIC
19:13 zmike[d]: so do I
19:14 gfxstrand[d]: Oh, I mean that we just shouldn't advertise HIC without ReBAR.
19:14 zmike[d]: ah
19:14 zmike[d]: yes
19:15 zmike[d]: but also at the top of `nvk_get_image_memory_requirements` you need to delete that workaround
19:15 mohamexiety[d]: zmike[d]: how does it explode? you mean removing this right? https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/nouveau/vulkan/nvk_image.c?ref_type=heads#L1091
19:16 zmike[d]: 1) immediate gpu hang 2) yes
19:17 mohamexiety[d]: yeah I remember removing this back then but gfxstrand[d] said we should keep it in case of issues. I wonder why it hangs though
19:17 zmike[d]: very easy to repro: `MESA_LOADER_DRIVER_OVERRIDE=zink weston -Bdrm --continue-without-input --xwayland -i0` over ssh while the machine isn't running a display server
19:17 zmike[d]: then try to run something in that xwayland
19:30 asdqueerfromeu[d]: gfxstrand[d]: Unless 🔺 moves to NVIDIA I guess
19:32 karolherbst[d]: the problem with tesla is, that their command buffer layout is entirely different, so yeah, that's not gonna happen 😛
19:32 gfxstrand[d]: Our memory heap/type code is so confusing. :blobcatnotlikethis:
19:34 gfxstrand[d]: In theory it looks like we try to always advertise a mappable VRAM heap, it just may be smaller if you don't have ReBAR. The problem is that the nouveau kernel driver will just give you bad maps if it runs out of BAR and it won't tell you.
19:34 karolherbst[d]: yeah.. that's not great
19:36 gfxstrand[d]: We also can't map arbitrary pieces of BOs to try and get around restrictions. We have to always map starting at the front. There's no good technical reason for this AFAICT but it's a core DRM restriction.
19:37 karolherbst[d]: mhhh...
19:37 karolherbst[d]: I mean...
19:37 karolherbst[d]: I can see some of the reasoning, but yeah...
19:38 gfxstrand[d]: But also, if we have to put the BO in the first 256MB of physical VRAM to map it, that wouldn't help much.
19:39 gfxstrand[d]: Nouveau should either fail to allocate if it can't place it or evict to system RAM if there's an active map.
19:39 gfxstrand[d]: But there's a real limit as to what we can do in NVK if the kernel is handing out bad maps.
19:41 karolherbst[d]: use a temporary signal handler to verify it's sane... 🙃
19:49 mhenning[d]: gfxstrand[d]: I thought the issue was that the kernel driver tries to evict and sometimes runs into bugs
19:50 gfxstrand[d]: mhenning[d]: We also have those bugs. I'm honestly not sure where all the bugs are. I just know memory pressure kills us.
19:51 karolherbst[d]: it's a pain on low VRAM cards...
19:51 karolherbst[d]: at some point push buffers just fail to execute
20:02 mhenning[d]: so, can we pester some of the kernel people to fix it?
20:05 gfxstrand[d]: This Weston thing feels weird, though. It sounds too reproducible to be the pressure issues.
21:52 airlied[d]: the only place I see bad maps recently was mapping tiled images in host memory via the bar
21:52 airlied[d]: I was writing test hacks for nova and noticed the test seemed to fail on nouveau as well
21:53 airlied[d]: the VRAM BAR should be able to map anything from anywhere in VRAM into it, just 256MB at a time
21:53 gfxstrand[d]: Oh, that might explain the Wayland issues. Is it trying to do HIC on dma-bufs?
21:53 airlied[d]: (by bad maps, I assume reads returning 0xbadxxxx)
21:55 airlied[d]: I'm just back home, so I'll have a bit of jetlag etc to get back
22:00 gfxstrand[d]: No worries. I'm still only barely here.
22:03 gfxstrand[d]: gfxstrand[d]: mohamexiety[d] this might be something to look into.
22:11 mohamexiety[d]: gfxstrand[d]: wait I am not sure I understand, sorry. some of the HIC images are going in sys mem?
22:14 airlied[d]: https://paste.centos.org/view/raw/df8bbfaa if you can dig through the noise in there to nv50_sysram_test playing around with the kind stuff caused some wierdness
22:15 airlied[d]: the test just creates a coherent dma allocation and tries to map it into bar1 and read from it
22:20 gfxstrand[d]: mohamexiety[d]: No, it's that the tiling parameters that get passed on BO creation (PTE kind) for dma-bufs might be messing up maps.
22:21 mohamexiety[d]: hm I see. will look into that then
22:24 gfxstrand[d]: Also, if the image comes from some other client, we don't control the BO creation parameters so we can't safely map it anyway.
22:25 gfxstrand[d]: Really, we shouldn't be doing HIC on dma-bufs at all.
23:39 gfxstrand[d]: Firing up my space heater (CTS box) for the first time this year. :frog_party: