00:07fdobridge: <airlied> @gfxstrand if you are around care to give an ack or rb for the two ioctls I'm okay with how the userspace is going to look and I'd like to just merge them since they are trivial
00:08fdobridge: <airlied> just a verbal ack/rb here is fine
00:14fdobridge: <gfxstrand> ack
00:15fdobridge: <gfxstrand> Actually, RB
00:15fdobridge: <gfxstrand> Let me know what branch to update the headers from
00:27fdobridge: <airlied> they are in drm-fixes now
00:30fdobridge: <karolherbst🐧🦀> @airlied do you have any WIP branch or so where you started porting the gl driver to `VM_BIND`? Otherwise I'll just look into it next week and see how far I get.
00:45fdobridge: <airlied> nah I got as far as seeing it needed libdrm_nouveau to be rewritten, and wished we'd just pulled that into mesa 6 months ago like I suggested, and gave up 🙂
00:48fdobridge: <karolherbst🐧🦀> I'm mostly sure it takes more than just rewriting libdrm_nouveau :ferrisUpsideDown: not quite sure we can just use the existing API for that?
00:48fdobridge: <airlied> primarily I didn't want to rewrite util/vma.c in libdrm_nouveau
00:48fdobridge: <karolherbst🐧🦀> fair
00:49fdobridge: <karolherbst🐧🦀> porting to `VM_BIND` also requires changes to command submission and stuff, right?
00:50fdobridge: <airlied> yes you'd have to move to the new exec API, though it might be possible to hack support on top of the old one
00:50fdobridge: <airlied> since you aren't doing anything fancy, just 1:1 allocating bo and vm space
00:51fdobridge: <karolherbst🐧🦀> mhh yeah...
00:51fdobridge: <karolherbst🐧🦀> I have _some_ ideas on how to make this all not painful, but I'd have to see when I'll work on it
00:51fdobridge: <airlied> I'm also not sure how clean it would be to abstract old and new support at once
00:51fdobridge: <airlied> step one, just get past whatever block you have, and merge libdrm_nouveau into mesa
00:52fdobridge: <karolherbst🐧🦀> it's probably easier to just reimplement it...
00:53fdobridge: <karolherbst🐧🦀> but yeah.. I haven't really thought about compat yet
00:57fdobridge: <airlied> it's easier to import and then remove all the unneeded bits if you want to keep stuff working
00:58fdobridge: <karolherbst🐧🦀> maybe, but the unneeded bits are also like 80% of the code
00:59fdobridge: <karolherbst🐧🦀> and I like the new code we have in mesa much better anyway
01:00fdobridge: <karolherbst🐧🦀> but I'll see...
01:00fdobridge: <karolherbst🐧🦀> we'll probably need some kind of indirection for the new/old stuff anyway
01:00fdobridge: <karolherbst🐧🦀> like what radeonsi and/or r600 was doing?
01:05fdobridge: <airlied> the problem is a bunch of the NVIF and abi16 stuff is all intertwined
01:10fdobridge: <gfxstrand> Anyone know how I'm supposed to WFI the DMA engine?
01:13fdobridge: <airlied> doesn't WFI WFI the DMA engine?
01:14fdobridge: <gfxstrand> Ah, 6F has a WFI
01:18fdobridge: <karolherbst🐧🦀> I know
01:19fdobridge: <karolherbst🐧🦀> I already implemented it for nvk once on top of the old uapi, it's not _that_ bad
01:19fdobridge: <karolherbst🐧🦀> but anyway, if we want to support VM_BIND we need some kind of wrapper anyway
01:28fdobridge: <airlied> yeah I was thinking of just adding the vma handling and static binding on bo allocation in libdrm_nouveau, but then I realised I didn't have vma.c and ran away
01:31fdobridge: <gfxstrand> So copy+paste vma.c?
01:31fdobridge: <gfxstrand> There might already be something in libdrm for AMD
01:31fdobridge: <gfxstrand> Yeah, 6F_WFI isn't enough. 🙄
01:32fdobridge: <airlied> no import libdrm_nouveau into mesa 🙂
01:32fdobridge: <gfxstrand> That works too. Marek has an MR
01:32fdobridge: <airlied> I actually have a libdrm_nouveau MR from a while back
01:32fdobridge: <airlied> or maybe just a branh
01:32fdobridge: <gfxstrand> But also, yes, libdrm_driver needs to die in general
01:33fdobridge: <airlied> we don't want all of libdrm for this, just the driver pointless abstractions
01:33fdobridge: <karolherbst🐧🦀> I think it's just pointless to move, because after moving I'd rewrite everything anyway
01:33fdobridge: <gfxstrand> We learned this all with Intel long ago
01:33fdobridge: <karolherbst🐧🦀> none of hte old code will remain
01:33fdobridge: <gfxstrand> I mean, if you'd rather just rewrite it to not use libdrm, that works, too
01:33fdobridge: <karolherbst🐧🦀> there is really no value in any of the code there
01:33fdobridge: <karolherbst🐧🦀> yeah.. as I said, I already did it for nvk
01:34fdobridge: <airlied> https://gitlab.freedesktop.org/airlied/mesa/-/commits/nouveau-libdrm-nouveau-import/?ref_type=heads does it
01:34fdobridge: <karolherbst🐧🦀> gl won't need much more than that
01:34fdobridge: <airlied> if you want to interoperate with old uAPI it will need a bunch more though
01:34fdobridge: <karolherbst🐧🦀> 80% of the code in libdrm_nouveau is for the DDX
01:34fdobridge: <karolherbst🐧🦀> and only the ddx
01:34fdobridge: <airlied> and I don't really want to clutter up the nice nvk winsys with BO tracking 😛
01:34fdobridge: <karolherbst🐧🦀> why?
01:34fdobridge: <karolherbst🐧🦀> I wrote the code for nvk against the old uapi
01:35fdobridge: <airlied> yeah and we removed a bunch of it once we got the new one
01:35fdobridge: <karolherbst🐧🦀> yeah
01:35fdobridge: <karolherbst🐧🦀> but I can also copy from the initial version 😛
01:36fdobridge: <gfxstrand> Also, FYI: I'm not sure I really like how nouveau_ws_device/bo/context stand today. I'm going to rework them one more time at some point in the future. So IDK that it's worth trying to share them.
01:36fdobridge: <karolherbst🐧🦀> yeah, I'm not going to force to share it
01:36fdobridge: <karolherbst🐧🦀> I can also just copy the code I'd need
01:36fdobridge: <gfxstrand> Unfortunately, I don't have a specific list of grievences or I'd have reworked them already. 😝
01:37fdobridge: <gfxstrand> Ugh... the DMA engine really doesn't want to WFI...
01:37fdobridge: <karolherbst🐧🦀> what I might end up sharing is some of the arch specific properties
01:37fdobridge: <karolherbst🐧🦀> stuff like `max_warps_per_mp_for_sm`
01:38fdobridge: <karolherbst🐧🦀> though.. GL doesn't really need that I think? Mhhh...
01:38fdobridge: <karolherbst🐧🦀> we never resize the tls buffer in gl...
01:38fdobridge: <karolherbst🐧🦀> we just allocate a really big one
01:39fdobridge: <karolherbst🐧🦀> but anyway... any time I look at the libdrm_nouveau code I got lost, because of those ioctl indirection nonsense going on there...
01:40fdobridge: <gfxstrand> Yeah, go ahead and share nv_device_info
01:40fdobridge: <gfxstrand> That one's fine
01:41fdobridge: <karolherbst🐧🦀> I think I just dislike the libdrm_nouveau code so much, I'd rather not see it inside mesa at all :ferrisUpsideDown:
01:42fdobridge: <gfxstrand> I dislike most of src/mesa/gallium/drivers/nouveau enough that I'd rather not see it in Mesa. 😛
01:42fdobridge: <karolherbst🐧🦀> 😄
01:42fdobridge: <gfxstrand> 🌶️ -take, I know
01:47fdobridge: <gfxstrand> I wonder if I have to use semaphores on my DMAs
01:48fdobridge: <gfxstrand> That feels clunky
01:49fdobridge: <gfxstrand> But also maybe what I need to do? 🤷🏻♀️
01:49fdobridge: <karolherbst🐧🦀> try `NON_PIPELINED`?
01:50fdobridge: <karolherbst🐧🦀> or what DMA are you using anyway?
01:50fdobridge: <gfxstrand> Already NON_PIPELINED
01:50fdobridge: <gfxstrand> And FLUSH
01:51fdobridge: <karolherbst🐧🦀> is it b5 you are using or something else?
01:51fdobridge: <gfxstrand> Yup, b5
01:51fdobridge: <karolherbst🐧🦀> mhhh...
01:52fdobridge: <gfxstrand> Drp... I'm an idiot
01:52fdobridge: <karolherbst🐧🦀> wrong buffer?
01:59fdobridge: <gfxstrand> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27757
01:59fdobridge: <gfxstrand> Worse
02:00fdobridge: <karolherbst🐧🦀> oof
02:03fdobridge: <redsheep> Would this affect performance? I am working on figuring out how to build mesa and use the devenv thing right now, I was going to test !27755 but it's slow going because steam makes the lib32 situation complicated...
02:06fdobridge: <gfxstrand> Oh, the above? No. It only affects a rare hang on device destroy which really only matters if you're running 18 threads of CTS and DOSing nouveau's hang recovery.
02:06fdobridge: <redsheep> Ah, okay. Maybe !27755 will be interesting to play with then
02:06fdobridge: <gfxstrand> This one might affect perf, though: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27755
02:06fdobridge: <gfxstrand> Oh, yeah, sorry... I didn't know what 27755 was until I linked it. 😂
02:08fdobridge: <redsheep> Either it doesn't affect performance in talos, or specifying the icd path to the devenv json from my build just doesn't work for some reason, or my build doesn't actually include !27755 for some reason
02:09fdobridge: <redsheep> And I have no idea which it is 😦
02:11fdobridge: <gfxstrand> Oh, it very well may not change perf
02:12fdobridge: <redsheep> I really wish there were better docs for all this, but I guess I should't expect a doc called "5 easy steps to become a driver developer" to exist
02:13fdobridge: <gfxstrand> *sigh*
02:13fdobridge: <gfxstrand> Yeah...
02:14fdobridge: <redsheep> Assuming talos is a 64 bit application I do think it's likely I did it all right and that performance is no different
02:16fdobridge: <redsheep> How does the icd path mix with zink? Can I just do VK_ICD_FILENAMES to my devenv icd and expect zink to pick it up?
02:17fdobridge: <redsheep> Suppose that doesn't help if I am actually testing zink but I assume I just set LD_LIBRARY_PATH for that and it will work too
02:17fdobridge: <gfxstrand> Yup
02:28fdobridge: <redsheep> Ok I managed to confirm I am running against my own build through the dxvk hud, and I fetched the branch from !27755 so I don't think I messed anything up, it's just not making a difference in talos then
02:37fdobridge: <mhenning> `vulkaninfo | grep git` should print out the exact commit hash you're running - it's useful to check that you're running the code that you expect
02:38fdobridge: <redsheep> Yep ok that's certainly easier
02:38fdobridge: <redsheep> Thanks!
02:59fdobridge: <redsheep> That turned out to be a really important verification, I was not in fact running the code I expected due to a complex web of rookie mistakes. Now that I can actually tell what is going on I fixed all of that and now I am actually going to test 27755
03:06fdobridge: <redsheep> Well would you look at that, once I can actually A-B test properly I see that 27755 DOES increase performance. I managed to find a repeatable test in the witness and it goes from 60 to 68 fps. Talos remains a wash but that's a 14% uplift in the witness
03:06fdobridge: <redsheep> @gfxstrand Turns out that MR was a nice performance win after all ^^^
03:07fdobridge: <gfxstrand> Cool! Mind throwing that in the MR?
03:07fdobridge: <redsheep> Will do
03:07fdobridge: <gfxstrand> I'm still working on the details of how I want to expose it but that's good data to have
03:14fdobridge: <gfxstrand> Any idea what's up with these?
03:14fdobridge: <gfxstrand> ```
03:14fdobridge: <gfxstrand> [ 3165.180724] nouveau 0000:65:00.0: fifo: fault 01 [WRITE] at 000000000054f000 engine 05 [BAR2] client 08 [HUB/HOST_CPU_NB] reason 0d [REGION_VIOLATION] on channel -1 [017fa33000 unknown]
03:14fdobridge: <gfxstrand> ```
03:15fdobridge: <gfxstrand> Looks like maybe someone's writing a read-only map?
03:17fdobridge: <gfxstrand> Everything's still alive for now and not failing but that's still really weird
03:17fdobridge: <gfxstrand> @airlied ?
03:18fdobridge: <airlied> I wonder what a region violation is
03:18fdobridge: <airlied> Also channel -1 seems suspicious
03:18fdobridge: <gfxstrand> It's on BAR2, whatever that is
03:19fdobridge: <gfxstrand> I'm also seeing this:
03:19fdobridge: <gfxstrand> ```
03:19fdobridge: <gfxstrand> [ 3548.332601] nouveau 0000:65:00.0: fifo: fault 00 [READ] at 0000008000010000 engine 07 [HOST0] client 06 [HUB/HOST] reason 03 [VA_LIMIT_VIOLATION] on channel 3 [017f65a000 deqp-vk[121752]]
03:19fdobridge: <gfxstrand> ```
03:19fdobridge: <gfxstrand> And giant piles of
03:19fdobridge: <gfxstrand> ```
03:19fdobridge: <gfxstrand> [ 3668.480285] nouveau 0000:65:00.0: fifo: PBDMA0: 01000000 [] ch 3 [017f65a000 deqp-vk[125677]] subc 0 mthd 0000 data 00000000
03:19fdobridge: <gfxstrand> ```
03:20fdobridge: <gfxstrand> Like, okay, maybe I'm doing SET_OBJECT on a somehow already destroyed context? That's moderately believable but I honestly don't know how.
03:22fdobridge: <redsheep> Isn't BAR2 just the resized bar? I see it on BAR1 but I see BAR2 for it on an rdna 2 gpu
03:22fdobridge: <gfxstrand> There's 2 or 3 BARs
03:22fdobridge: <gfxstrand> I'm not sure if that's 0 or 1-indexed.
03:22fdobridge: <gfxstrand> If it
03:23fdobridge: <redsheep> It doesn't even look like I have a BAR2
03:23fdobridge: <gfxstrand> If it's 0-indexed, then BAR2 is is the 32MiB one (edited)
03:23fdobridge: <redsheep> ``` BAR 0: current size: 16MB, supported: 16MB
03:23fdobridge: <redsheep> BAR 1: current size: 32GB, supported: 64MB 128MB 256MB 512MB 1GB 2GB 4GB 8GB 16GB 32GB
03:23fdobridge: <redsheep> BAR 3: current size: 32MB, supported: 32MB
03:23fdobridge: <redsheep> ```
03:23fdobridge: <gfxstrand> heh
03:24fdobridge: <gfxstrand> It's not BAR 0 or BAR 1 but a secret third BAR...
03:25fdobridge: <gfxstrand> It's possible that ch 3 is just permadead on my machine due to a heavy CTS run
03:26fdobridge: <redsheep> Does your gpu even show a BAR2? Seems like it would blow things up to try to interact with a non-existant bar
03:27fdobridge: <gfxstrand> Yeah, I have 0, 1, 3 as well
03:35fdobridge: <Sid> same here
04:33fdobridge: <gfxstrand> @airlied for whatever it's worth, I almost never saw those kinds of messages in my serial CTS runs. They're also not causing any device lost in userspace stuff as far as I can tell. Seems like some sort of pressure thing. Something the kernel is doing with massively parallel runs.
05:08fdobridge: <gfxstrand> I don't think this new kernel likes me
05:17fdobridge: <redsheep> Not sure if it merits an issue yet but I've found that while zink can run the plasma x11 session it can't run the Wayland one.
05:17fdobridge: <redsheep> Probably not worth worrying about until modifiers merge
12:29fdobridge: <pavlo_it_115> I want to try to do this... I realize this is a VERY big deal. What do I need for this? What components are needed, the programming language (I assume C), what branches, and so on.
12:30fdobridge: <pavlo_it_115> Now I am ready to devote all my time to it
12:30fdobridge: <pavlo_it_115> I am ready to devote all my time to it (edited)
12:30fdobridge: <karolherbst🐧🦀> you'd have to reverse engineer nvidia PMU firmware
12:30fdobridge: <karolherbst🐧🦀> and find a way to reliable extract it
12:31fdobridge: <karolherbst🐧🦀> and the firmware interface isn't stable, so anything can change between nvidia driver version
12:31fdobridge: <karolherbst🐧🦀> and not just major updates, also minor
12:41fdobridge: <pavlo_it_115> https://cdn.discordapp.com/attachments/1034184951790305330/1210567221131350026/0840fc04632cf890.png?ex=65eb0794&is=65d89294&hm=0cb9179e83afe037feafbc2ac4a873ab62c7d325576d80cd223c73b59d063fbe&
12:42fdobridge: <pavlo_it_115> 🤔
12:43fdobridge: <karolherbst🐧🦀> for GPUs where we can use gsp it's all fine
12:43fdobridge: <karolherbst🐧🦀> the problem are all the others
12:48fdobridge: <pavlo_it_115> I don't understand the situation a bit, sorry
12:56fdobridge: <pavlo_it_115> If I'm not mistaken, before this there were attempts to create a firmware using PE? I don't know how it works at all... If there were attempts, is the infrastructure preserved for all this? Or do we still have problems unpacking the required firmware?
12:56fdobridge: <pavlo_it_115> If I'm not mistaken, before this there were attempts to create a firmware using RE? I don't know how it works at all... If there were attempts, is the infrastructure preserved for all this? Or do we still have problems unpacking the required firmware? (edited)
12:56fdobridge: <karolherbst🐧🦀> writing the firmware is not the problem here really
12:56fdobridge: <pavlo_it_115> Who has tried to work with this?
12:56fdobridge: <karolherbst🐧🦀> on some GPUs we just simply can't use our firmware as we can't control the fans or voltage
12:57fdobridge: <karolherbst🐧🦀> it's just pointless to spend time on this unless you have like a year to spare
12:58fdobridge: <pavlo_it_115> Well, I have no problems with time. I would like to acquire the skills for this somewhere
13:03fdobridge: <pavlo_it_115> Judging by the name of the firmware, pascal probably also has something like RISC-V
13:06fdobridge: <pavlo_it_115> Judging by the name of the .bin, pascal probably also has something like RISC-V (edited)
13:08fdobridge: <pavlo_it_115> no, it's crazy
13:09fdobridge: <pavlo_it_115> no, it's crazy assumption (edited)
13:09fdobridge: <karolherbst🐧🦀> pascal requires the firmware to be signed
13:09fdobridge: <pavlo_it_115> Judging by the name of the .bin, pascal probably also has something like RISC-V... ні , це божевільне припущення (edited)
13:09fdobridge: <karolherbst🐧🦀> otherwise you can't control the fans and the voltage
13:10fdobridge: <pavlo_it_115> Well, do we have to wait for Nvidia to sign our firmware?
13:10fdobridge: <pavlo_it_115> I realized
13:10fdobridge: <pavlo_it_115> https://cdn.discordapp.com/attachments/1034184951790305330/1210574522961363076/08a7d85a6d238bd3.png?ex=65eb0e61&is=65d89961&hm=ac18463915a80bee89d169de662beeddd0bf7f6da40c3368618c93cdae4b6271&
13:11fdobridge: <karolherbst🐧🦀> nvidia won't do that
13:11fdobridge: <karolherbst🐧🦀> and they also won't release any firmware for us there
13:15fdobridge: <pavlo_it_115> After all, Kepler was fine, no, they just needed to make a signed firmware. Inhumans
13:15fdobridge: <pavlo_it_115> But they still released GSP firmware for new video cards. It's just business. Buy new video cards from us to use the open driver.. There are no words for them
13:15fdobridge: <pavlo_it_115> After all, Kepler was fine, no, they just needed to make a signed firmware. Inhumans
13:15fdobridge: <pavlo_it_115> But they still released GSP firmware for new video cards. It's just business. Buy new video cards from us to use the open driver... There are no words for them (edited)
13:16fdobridge: <pavlo_it_115> After all, Kepler was fine, no, they just needed to make a signed firmware. Inhumans
13:17fdobridge: <pavlo_it_115> But they still released GSP firmware for new video cards. It's just business. Buy new video cards from us to use the open driver... There are no words for them.. (edited)
13:17fdobridge: <triang3l> Can Nvidia's KMDs for Maxwell and Pascal technically be considered as having "stable UAPI" at this point?
13:17fdobridge: <pavlo_it_115> https://tenor.com/view/raiden-metal-gear-rising-floor-punch-jack-the-ripper-gif-23152533
13:18fdobridge: <bylaws> if you really do want to add support, you can just extract the Nvidia blobs firmware and add support for it in the nouveau kmd
13:19fdobridge: <bylaws> But that would be unshippable in any distro by default
13:19fdobridge: <pavlo_it_115> Aur
13:19fdobridge: <pavlo_it_115> AUR (edited)
13:19fdobridge: <pavlo_it_115> How it is done with video codecs
13:20fdobridge: <pavlo_it_115> AUR or do it yourself (edited)
13:20fdobridge: <pavlo_it_115> How it is done with video codecs nouveau-fw (edited)
13:22fdobridge: <triang3l> Whether any distribution of it outside Nvidia's own packages is compatible with Nvidia's license (at least in the US) would also be a question to investigate
13:24fdobridge: <karolherbst🐧🦀> mhhh.. I guess we could grab the latest firmware there...
13:24fdobridge: <karolherbst🐧🦀> but uhhh
13:24fdobridge: <karolherbst🐧🦀> still needs to be reverse engineered
13:26fdobridge: <triang3l> ~~along with the entirety of command submission, buffer allocation and sync~~ :frog_gears:
13:26fdobridge: <karolherbst🐧🦀> mhh?
13:27fdobridge: <karolherbst🐧🦀> that's not important, because back then you had multiple firmwares
13:27fdobridge: <karolherbst🐧🦀> and the PMU was its own thing
13:39fdobridge: <pavlo_it_115> Do we have any documentation for this?
13:41fdobridge: <karolherbst🐧🦀> we don't
14:23fdobridge: <gfxstrand> Even if the setup required running a python script that downloaded one particular blob driver version, extracted the PMU firmware, and installed it, that would probably make users happy, as long as it's fairly robust.
14:28fdobridge: <karolherbst🐧🦀> yeah, but then we get to the hard part: reverse engineering the firmware's interfaces
14:30fdobridge: <gfxstrand> Yup
14:30fdobridge: <gfxstrand> Not saying that's easy
14:32fdobridge: <marysaka> The PMU firmware is different on tegra btw?
14:32fdobridge: <marysaka> if the interface is the same it could be a nice starting point
14:33fdobridge: <karolherbst🐧🦀> mhhh.. good question
14:58fdobridge: <!DodoNVK (she) 🇱🇹> Why don't more Vulkan drivers adopt a hybrid acceleration model when needed (aka offloading unsupported stuff to some custom shader or even the CPU)? This way we could maybe have Vulkan 1.3 support on Kepler 🍩
15:12fdobridge: <babblebones> The love for the hardware has to be pretty high to tailor these
15:12fdobridge: <babblebones> A lot of people just ewaste and forget
15:14fdobridge: <babblebones> Maybe a framework or lib to help lower the barrier would incentivise?
15:15fdobridge: <pixelcluster> not everything can be offloaded to custom shaders easily
15:16fdobridge: <pixelcluster> a lot of stuff is probably technically possible, but slow to the point it's unusable
15:16fdobridge: <triang3l> And not on all hardware… Unless you really mean offloading to the CPU. Like NPOT textures back in the dark days :cursedgears:
15:17fdobridge: <!DodoNVK (she) 🇱🇹> I think that will still be better than lavapipe though
15:18fdobridge: <triang3l> What blocks 1.3 support on Kepler though? :frog_gears:
15:18fdobridge: <triang3l> I think RADV could easily have scalar block layout on GFX6 maybe with some descriptor patching
15:19fdobridge: <mhenning> iirc, kepler can't support the memory model, which isn't something you can really emulate
15:20fdobridge: <triang3l> Is D3D11's globallycoherent support not enough for it? 🥵
15:22fdobridge: <triang3l> or does it not work across something like binding aliasing?
15:23fdobridge: <triang3l> (I completely don't know what the limitations are on Kepler other than typed load memes)
15:27fdobridge: <mhenning> I don't know the details
15:39fdobridge: <gfxstrand> Vulkan 1.3 on Kepler basically means lavapipe the moment you use SSBOs.
15:40fdobridge: <gfxstrand> It's about ordering of memory loads and stores.
15:40fdobridge: <gfxstrand> If it is possible to ensure ordering on Kepler, it's really expensive.
15:41fdobridge: <gfxstrand> Some VERY smart people at NVIDIA looked into it and determined it just wasn't feasible.
15:41fdobridge: <gfxstrand> It's possible that one of us is smarter than them and can figure it out but unlikely
15:42fdobridge: <zmike.> nothing is better than lavapipe
15:43fdobridge: <gfxstrand> Also, if you're using a PCI device and flipping back and forth between CPU and GPU, the PCIe bandwidth is probably bad enough that you may be better off just using lavapipe
15:47fdobridge: <!DodoNVK (she) 🇱🇹> Is lavapipe better than other Vulkan drivers or is not having a Vulkan driver better than lavapipe? 🤷♀️
15:47fdobridge: <!DodoNVK (she) 🇱🇹> How does v3dv do the CPU-side stuff?
15:48fdobridge: <gfxstrand> @airlied Kernel is still blowing up hard with big parallel runs. Here's my dmesg from last night:
15:48fdobridge: <gfxstrand> https://cdn.discordapp.com/attachments/1034184951790305330/1210614083666317342/dmesg.zst?ex=65eb3339&is=65d8be39&hm=84ee023d9e8480e05934c6aecd86e308f22c0ed93c6dd5e07452812d32f72550&
15:49fdobridge: <gfxstrand> What CPU side stuff? If you're careful with where you place resources, you can do some limited CPU side stuff without burning PCIe like mad. If it's an "oops, someone needs X feature" fallback, it's a lot harder.
15:50fdobridge: <!DodoNVK (she) 🇱🇹> I've heard that the v3dv driver does some stuff on the CPU (I think it was in your XDC 2022 talk)
15:51fdobridge: <gfxstrand> First, v3dv is on a low-end Arm GPU where it hurts less. Second, it's a unified memory architecture so there's no PCI in the way.
15:51fdobridge: <gfxstrand> Third, they should probably stop doing that in most cases.
15:52fdobridge: <gfxstrand> And the cases where they're doing it are pretty tightly restricted. It's not an "oops, we don't have shader feature X" situation
15:53fdobridge: <pavlo_it_115> Can we adapt an existing PMU from maxwell for pascal? Or it doesn't work like that?
15:58fdobridge: <gfxstrand> You can't just take a PMU firmware from one GPU and run it on a different GPU
16:44fdobridge: <orowith2os> are there more details on this somewhere?
16:44fdobridge: <orowith2os> ooc
16:44fdobridge: <orowith2os> I'm just going to read all i can on these weird little things until I'm really confident enough to provide anything to the discussion :v
17:08fdobridge: <gfxstrand> No, that's all the detail I have
17:41fdobridge: <pavlo_it_115> I meant to "adapt" the PMU
17:41fdobridge: <pavlo_it_115> As if Pascal is the successor of Maxwell and they are similar. Or is it really necessary to write everything from scratch?
17:51fdobridge: <mhenning> The firmware on Pascal is signed. We cannot modify it.
17:53fdobridge: <gfxstrand> And you can't run a Maxwell firmware on Pascal
17:57fdobridge: <pavlo_it_115> 😵💫
17:58fdobridge: <gfxstrand> Yeah, if someone gets it working at all it's going to be by extracting FW from the blob driver and R/Eing the interface. There's no other way.
18:13fdobridge: <pavlo_it_115> @gfxstrand I know almost nothing. I am a simple user. At most I can compile a package. I don't know how the video card works, I don't know how the driver works. I assume that the programming language for writing the driver is C and Rast. I know absolutely nothing about the driver. I can't even understand GitLab. But I want to learn all this and help create the Nouveau driver. And I don't even know where to start getting acquainted with a
18:14fdobridge: <pavlo_it_115> Please help me get started. I understand you if you don't want to deal with it
18:16fdobridge: <redsheep> I'm in a sort of similar spot but I've started picking stuff up by just reading all the patches I can and trying to understand the intentions. This stuff just isn't documented enough anywhere to get learn more efficiently than that without sucking up lots of time from those who can spend their time more effectively on just implementating new stuff instead.
18:17fdobridge: <Sid> I just test things and bother the people smarter than me to fix it, but if you wanna get started with mesa development, pick an open issue, try to implement it, and ask questions
18:17fdobridge: <Sid> with the kernel, it's a bit more difficult
18:17fdobridge: <pac85> I'd suggest to start programming and learning some graphics programming first
18:18fdobridge: <Sid> yeah, I imagine it's easier if you know how vulkan works
18:18fdobridge: <gfxstrand> Reading patches isn't a bad place to start
18:19fdobridge: <gfxstrand> We're kinda running out of really easy stuff, unfortunately.
18:19fdobridge: <Sid> I've heard https://vkguide.dev/ is a good resource to get started with vulkan
18:19fdobridge: <gfxstrand> Do you at least know C?
18:19fdobridge: <pavlo_it_115> not very strong. but yes
18:20fdobridge: <gfxstrand> And have you done anything with hardware ever? Even playing around with an Arduino counts.
18:20fdobridge: <pavlo_it_115> No
18:24fdobridge: <pavlo_it_115> never encountered arduino (edited)
18:24fdobridge: <pavlo_it_115> > And have you done anything with hardware ever?
18:24fdobridge: <pavlo_it_115> Programmatically never
18:26fdobridge: <redsheep> Do you know if utilizing nvidia's closed kernel module has been explored? If ripping the PMU firmware from a driver download is potentially viable then maybe just installing their module would be too? I know it's probably messier on the mesa side but if openRM can provide even a basic starting point for understanding the closed one that seems like it might be easier than the RE work on the PMU
18:28fdobridge: <redsheep> Also it moves the RE problem somewhere that sounds easier to probe to me
18:34fdobridge: <redsheep> Though given how often y'all talk about needing uapi changes to do something right I suppose a kmd you can't change is probably a pretty big problem
19:03fdobridge: <gfxstrand> There may be some value in carrying the code to support it. It kinda depends on how much churn there ends up being.
19:04fdobridge: <gfxstrand> For their older legacy drivers, I doubt they change the kernel interface much
19:05fdobridge: <gfxstrand> But, from a user PoV, unless we're faster than them, why bother with closed kernel and open userspace?
19:20fdobridge: <phodius_> can i run zink on the proprietary driver how would i set it like ie MESA_LOADER_DRIVER_OVERRIDE=zink
19:20fdobridge: <phodius_> for nvidia
19:24fdobridge: <redsheep> At the least it means the userspace driver can operate more cohesively with the ecosystem, which afaict is the cause of most of the breakages when using the prop driver
19:24fdobridge: <zmike.> I think probably it won't work too well right now because of https://gitlab.freedesktop.org/mesa/mesa/-/issues/10513 and related issues
19:25fdobridge: <gfxstrand> That's okay. I'm just trying to get a sense for what folks know. I'm also starting to wonder if we don't need to do something more structured for some of the really new folks.
19:26fdobridge: <gfxstrand> Like, do we need a #noubies channel for people who are learning to ask newbie dev questions without getting lost in the main chatter?
19:27fdobridge: <gfxstrand> (Someone French is going to slap me for "noubies". 😂 )
19:28fdobridge: <gfxstrand> I'm going to put some non-trivial thought into possible spin-up projects.
19:28fdobridge: <gfxstrand> We're starting to run out of things that are totally trivial
19:29fdobridge: <gfxstrand> And going forward I'm likely to just spend 20 min and implement something if it's that easy and something that's actually useful.
19:29fdobridge: <airlied> Also it's not just the PMU fw you have to extract afaik
19:29fdobridge: <airlied> You can't use any of the NVIDIA fws supplied for nouveau
19:29fdobridge: <airlied> You have to extract all the fws and RE the loading sequence
19:30fdobridge: <airlied> And recreate that, then re the PMU interface
19:31fdobridge: <airlied> Learning to RE is a bit different than learning to program, tools like ghidra and I forget the other decompiler and learning x86 assembly
19:32fdobridge: <airlied> Ah IDA is the windows one
19:34fdobridge: <averne> what's the standing of nouveau/mesa wrt to information obtained through decompilation, actually? considering that sometimes TOSs disallow it (but some legislations allow it anyway)
19:35fdobridge: <pac85> Most legislations allow it for the purpose of interoperability (which making stuff work on Linux is part of) but it needs to be clean room to nor violate copyright
19:35fdobridge: <airlied> If it was done clean room style it's probably okay, but just dumping register programming sequences probably not
19:36fdobridge: <pac85> Most legislations allow it for the purpose of interoperability (which making stuff work on Linux is part of) but it needs to be clean room to not violate copyright (edited)
19:37fdobridge: <redsheep> That makes just using their kmd seem more appealing.
19:38fdobridge: <redsheep>
19:38fdobridge: <redsheep> Here's my logic, I imagine nouveau being useful to 3 kinds of people:
19:38fdobridge: <redsheep> 1. Those who just need their distro to work correctly out of the box so they can get far enough to install the prop driver easily. That's already largely accomplished and getting better all the time.
19:38fdobridge: <redsheep>
19:38fdobridge: <redsheep> 2. Those on recent hardware who want full functionality and performance and to use as much free software as possible on principle, or don't like things the Nvidia driver breaks. I'm in this camp, and we're heading towards this being great.
19:38fdobridge: <redsheep>
19:38fdobridge: <redsheep> 3. Those on hardware Nvidia doesn't support anymore, or are soon to drop. This isn't addressed as well, and those people are going to be desperate enough for any solution that installing the Nvidia kmd won't be so distasteful. This is the group that letting mesa run on the Nvidia kmd helps.
19:38fdobridge: <redsheep> Though the part about new kernels not being able to use really old modules is an issue there and I'm not sure there's a good solution for that.
19:39fdobridge: <airlied> The problem with old NVIDIA is user env integration, egl and Wayland etc stuff rots quick
19:40fdobridge: <airlied> Also for what we use in nvk my guess is the driver uapi has barely changed for older kmd to newer ones
19:41fdobridge: <gfxstrand> Yeah, if we can allocate BOs, create a context, and get at the ring, that's really all we need. NVK's UAPI interface is tiny
19:43fdobridge: <pavlo_it_115> Of course. And you can also make the channel as a forum where there will be questions
19:43fdobridge: <redsheep> So for example then the 470 driver that doesn't work well with Wayland on Kepler wouldn't suddenly work well with it my replacing their userspace with mesa?
19:45fdobridge: <airlied> It might be possible to fix some things to make it but there might be limitations, but they might be in the open bits of their kernel driver (even the old driver has some open pieces)
19:45fdobridge: <pavlo_it_115> Weiland worked perfectly for me on Kepler with mesa
19:46fdobridge: <redsheep> Those old driver have to use different code in the compositor so it's really not ideal
19:47fdobridge: <redsheep> Oh you said mesa, yeah, but it's slow there ofc
19:49fdobridge: <redsheep> The idea is that the Nvidia module with mesa would let you run full speed without delving into firmware crazy land
19:50fdobridge: <!DodoNVK (she) 🇱🇹> I've heard the IDA+Hex-Rays combo is quite powerful (but Ghidra is okay too)
20:27fdobridge: <pac85> Isn't IDA super expensive?
20:32fdobridge: <redsheep> Hmm. In the last hour I've gone from convincing myself that trying to use their module is a good idea to being pretty sure I'm wrong.
20:32fdobridge: <redsheep>
20:32fdobridge: <redsheep> I'm realizing it's just not a long term solution, because once those modules don't work on supported kernels anymore you're either left trying to provide stable abi for it forever with some extra layer, or you reverse engineer it enough to try to ship binary patches or other hacks to keep it limping and at that point you might as well have reverse engineered enough to understand how they interact with their firmware so you can do it yourself
20:33fdobridge: <redsheep> So unsurprisingly a decade of logic from people more experienced makes more sense.
20:58fdobridge: <pavlo_it_115> So, how should I start writing PMU as a beginner? What do I need to learn?
21:30fdobridge: <gfxstrand> @airlied I'm getting a mess of `[ 2172.327567] nouveau 0000:65:00.0: fifo: SCHED_ERROR 20 []` on drm-misc-next as of a couple days ago
21:31fdobridge: <gfxstrand> IDK what's up but this kernel is looking pretty near unusable for mass CTS runs
21:31fdobridge: <gfxstrand> I can get through one maybe every 2-3 tries
21:31fdobridge: <airlied> Does 6.8-rc work any better?
21:31fdobridge: <gfxstrand> I can try
21:31fdobridge: <gfxstrand> 6.8-rc from Linus?
21:31fdobridge: <airlied> Did you turn off gsp btw?
21:32fdobridge: <gfxstrand> No, GSP is on
21:32fdobridge: <gfxstrand> Ampere at the moment
21:33fdobridge: <airlied> Yeah Linus latest. Then try 6.7, like we haven't merged a lot of stuff after 6.8
21:33fdobridge: <gfxstrand> I'll reflog and figure out what I was running before, too
21:33fdobridge: <airlied> 6.8 has the common bo resv work
21:33fdobridge: <gfxstrand> Oh...
21:33fdobridge: <gfxstrand> That might be it
21:35fdobridge: <gfxstrand> This is what I was running before: https://gitlab.freedesktop.org/gfxstrand/linux/-/commits/nvk/?ref_type=heads
21:35fdobridge: <gfxstrand> It says 6.8-rc1
21:37fdobridge: <gfxstrand> Unfortunately, there are other bugs in that branch. 😩
21:38fdobridge: <gfxstrand> Like, that's where I was hitting that issue with renderdoc which I can't repro now
21:38fdobridge: <gfxstrand> But otherwise it was pretty stable for CTS runs
21:46fdobridge: <pavlo_it_115> https://www.phoronix.com/news/NVIDIA-550.54.14-Linux-Driver
21:46fdobridge: <pavlo_it_115> https://cdn.discordapp.com/attachments/1034184951790305330/1210704372884045824/46ff73114e6b4df7.png?ex=65eb8750&is=65d91250&hm=017183145aa95c180498baee5b9861e836bc16bd1ca52ef607d40e9cfcc1770d&
21:47fdobridge: <pavlo_it_115> 🤔
21:52fdobridge: <rinlovesyou> oh man, perhaps we'll actually see hw accelerated encoding for things like webrtc
21:52fdobridge: <rinlovesyou> oh man, perhaps we'll actually see hw accelerated encoding for things like webrtc in chrome (edited)
21:55fdobridge: <gfxstrand> Chrome already does on some GPUs
21:55fdobridge: <gfxstrand> But it's turned off on a lot of deskop scenarios
21:56fdobridge: <rinlovesyou> yeah i mean we don't have vaapi support so currently streaming webrtc is entirely cpu bound on nvidia
21:57fdobridge: <rinlovesyou> discord screenshare for instance, when running the web version
22:10fdobridge: <gfxstrand> Ugh... That just died, too.
22:10fdobridge: <gfxstrand> I'm getting really tired of this...
22:11fdobridge: <gfxstrand> I may need to start regression testing with shards just so it survives
22:42fdobridge: <airlied> @gfxstrand I'm more worried some userspace changes might be causing new problems tbh, the kernel side has t been seeing a lot of work
22:43fdobridge: <airlied> Though I'll will do a test with rebar disabled next week to see if I can figure that one out
22:46fdobridge: <gfxstrand> Sure, but the kernel has been in a state where the only reason I don't destroy my kernel every run is because NVK is really good at not hanging most of the time.
22:46fdobridge: <gfxstrand> Throw a Maxwell run at it and watch it burn!
22:47fdobridge: <gfxstrand> It's a hell of a lot better than it was but it's a long ways from robust
22:48fdobridge: <gfxstrand> Sure, but the kernel has been in a state since forever where the only reason I don't destroy my kernel every run is because NVK is really good at not hanging most of the time. (edited)
22:48fdobridge: <gfxstrand> Maybe 18x runs are just too much for it but 🤷🏻♀️
22:49fdobridge: <gfxstrand> Maybe 18x runs are just too much for it 🤷🏻♀️ (edited)
23:12fdobridge: <airlied> Like for me my care is pretty much gsp+ only, unless older card problems are generic issues
23:13fdobridge: <gfxstrand> Yeah, my point isn't that non-GSP is bad. It's that NVK hangs more on Maxwell and that DOSes the kernel faster
23:13fdobridge: <airlied> Your CPU seems to be good at exposing races I can't reproduce which doesn't help
23:13fdobridge: <gfxstrand> I mean, non-GSP does seem generally less stable but not fundamentally bad
23:13fdobridge: <airlied> I've done lots of complete gsp Turing/ampere runs with no problems here
23:13fdobridge: <gfxstrand> 😩
23:13fdobridge: <airlied> With mediocre ryzen and Intel CPUs
23:15fdobridge: <gfxstrand> Try nvk/pipelines in my tree. That's got ESO and seems to take down kernels even though all the tests pass
23:46fdobridge: <airlied> Also interesting if 9 or 12 threads cause less pain
23:47fdobridge: <gfxstrand> Dropping to 10 seems to be a bit more robust, maybe, but takes like 1.5 times longer
23:47fdobridge: <gfxstrand> I'm tempted to get a 2nd Ampere and shard across the two because I think I'm running enough threads that the CTS is GPU-limited