17:00 gfxstrand[d]: zmike[d]: I'm starting to wonder if this ReBAR thing isn't something funky with your card. Has anyone else been able to reproduce on something that isn't Turing?
17:14 zmike[d]: my card may or may not be special
17:15 zmike[d]: what's the heap situation supposed to look like there?
17:18 tiredchiku[d]: gfxstrand[d]: no repro on my ampere
17:19 gfxstrand[d]: zmike[d]: Well, ReBAR isn't supposed to work on Turing unless you have weird VBIOS upgrades.
17:19 gfxstrand[d]: So if the kernel is advertising it by mistake...
17:19 gfxstrand[d]: I'm tempted to do `if < AMPERE: no ReBAR for you`
17:20 gfxstrand[d]: Which should probably be done in nouveau.ko
17:20 zmike[d]: I can neither confirm nor deny that my gpu has these upgrades
17:20 gfxstrand[d]: Before we jump to that conclusion, though. Can I get a backtrace?
17:20 zmike[d]: from what
17:21 gfxstrand[d]: the crash
17:21 gfxstrand[d]: I want to see what it's crashing on
17:21 zmike[d]: the crash is just hitting a zink sanity assert
17:21 gfxstrand[d]: Before jumping to absurd conclusions
17:22 gfxstrand[d]: Even knowing what assert might help
17:22 gfxstrand[d]: There's no bug report or info in the MR
17:22 zmike[d]: I assumed it would be easy to repro
17:22 zmike[d]: you're saying with turing I shouldn't see >=4gb on any heaps in vulkaninfo?
17:22 gfxstrand[d]: Not on the DEVICE_LOCAL+HOST_VISIBLE heap
17:23 zmike[d]: hm
17:23 karolherbst[d]: well.. _normally_ no, unless your MB bios has a hack
17:23 karolherbst[d]: or your vbios has a hack
17:23 gfxstrand[d]: zmike[d]: What does your vulkaninfo give you for heaps?
17:23 zmike[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1336024396900597780/message.txt?ex=67a24d2e&is=67a0fbae&hm=e93bc0ab4b861ba7ae011ef8802604372a79ed0b687276425d582b922d983331&
17:24 zmike[d]: :fullheadache:: vulkaninfo not working without a valid x connection is insane
17:24 gfxstrand[d]: :facepalm:
17:25 zmike[d]: okay, now that I'm actually here at my pc
17:25 gfxstrand[d]: Just SSH from your phone. 😛
17:25 tiredchiku[d]: zmike[d]: this sounds like a design flaw honestly
17:25 zmike[d]: I'm sshed from my phone to my pc to my nvk pc
17:25 zmike[d]: obviously
17:25 gfxstrand[d]: obviously
17:26 karolherbst[d]: one or two layer of VPNs?
17:26 gfxstrand[d]: zmike[d]: It should work but sometimes it gets confused
17:27 zmike[d]: just like me
17:27 zmike[d]: oh ok
17:27 zmike[d]: actually this may be an even stupider issue than I thought
17:28 gfxstrand[d]: Uh oh
17:32 anholt: is there anything for pinning clock frequencies in gsp mode?
17:33 mhenning[d]: anholt: Nothing that's wired up yet
17:34 gfxstrand[d]: I'm not even sure if we can
17:35 karolherbst[d]: we should be to a certain degree
17:35 mhenning[d]: I'd be a little surprised if nvidia didn't have anything like that available, but yeah I haven't actually looked at the gsp interface
17:35 karolherbst[d]: you can disable boosting, which doesn't mean it won't go lower
17:36 karolherbst[d]: nvidia even has an option for that in its GUI
17:37 karolherbst[d]: I _think_...
17:37 gfxstrand[d]: If we can set limits, that should be enough. But typically the firmware will want to be able to go lower if it hits thermals
17:45 gfxstrand[d]: But also, I really doubt most of the smaller (*60) PCI cards are hitting thermals much right now. I can barely get my 4060 to turn on the fan
17:46 anholt: (context: I've got trace perf testing going, but since getting gsp enabled the stats are much noisier)
17:46 karolherbst[d]: just throw furmark at it, that will do it
17:46 gfxstrand[d]: lol. Yeah...
17:46 zmike[d]: oh are we furmarking today?
17:46 zmike[d]: because I've been furmarking already today
17:47 zmike[d]: also I found the issue
17:47 zmike[d]: as expected, my gpu doesn't have rebar and I was just bad at math
17:47 gfxstrand[d]: https://tenor.com/view/confused-computing-counting-math-problems-gif-14678592
17:47 zmike[d]: https://c.tenor.com/iKq0McbAqCMAAAAC/math-zach-galifianakis.gif
17:48 zmike[d]: I am, however, puzzled as to how nobody else hit this
17:48 gfxstrand[d]: You have exactly the right amount of VRAM?
17:49 zmike[d]: I have enough to trigger this bug, yes
17:49 zmike[d]: though I think anyone on any system where rebar is not active would hit this
17:49 gfxstrand[d]: 🤷🏻‍♀️ I couldn't hit it on my 12GB 2060
17:49 zmike[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33359
17:50 zmike[d]: if you don't have rebar, zink will fill the rebar heap using device-local
17:50 zmike[d]: and then enable rebar anyway
17:50 zmike[d]: really making me second guess all the sanity checks I put in if nobody but me hit any of them
17:52 gfxstrand[d]: 😬
17:52 gfxstrand[d]: Are we sane? No, no we're not...
17:52 zmike[d]: well I am
17:52 zmike[d]: that's for sure
17:53 gfxstrand[d]: Okay, I will go back to debugging games.
17:53 gfxstrand[d]: Unless there's some other Zink+NVK thing you want me to look at
17:53 zmike[d]: not at this moment
17:53 karolherbst[d]: that reminds me of this weirdo issue I've hit on nvidias blob driver, where I was like.. plugging in a 48GB VRAM GPU to play some 32 bit games, and well.. nvidia allocated like 3 GiB of virtual memory for _reasons_ and the game keeps crashing 🥲
17:53 zmike[d]: but I think there will soon be since I may or may not attempt to do some B E N C H I N G soon
17:54 karolherbst[d]: oh yeah.. maybe I should tell nvidia folks if they aren't aware of it, dunno if it's still an issue though 😄
17:54 zmike[d]: :stressheadache:
17:58 HdkR: Oops, there goes all your virtual memory
18:08 gfxstrand[d]: 640k should be enough for anybody
18:11 phomes_[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1336036424205729804/Screenshot_From_2025-02-03_19-09-31.png?ex=67a25861&is=67a106e1&hm=bfb4b740c9f7c76220862187e3dcd42336e0a232e998df541d7eaba42f7717a3&
18:11 phomes_[d]: 0ad released a new version with vulkan support. NVK gets a pretty good fps in the menu
18:12 zmike[d]: [ Demo Quick Stats ]
18:12 zmike[d]: - demo : FurMark (GL) (built-in: YES)
18:12 zmike[d]: - renderer : zink Vulkan 1.4(NVIDIA GeForce RTX 2070 (NVK TU106) (MESA_NVK))
18:12 zmike[d]: - 3D API : OpenGL 4.6 (Compatibility Profile) Mesa 25.0.0-devel (git-1540d8d983)
18:12 zmike[d]: - resolution : 1920x1080
18:13 zmike[d]: - SCORE : 5088
18:13 zmike[d]: - duration : 60000 ms
18:13 zmike[d]: - FPS (min/avg/max) : 59 / 84 / 89
18:13 zmike[d]: - GPU 0: NVIDIA GeForce RTX 2070 [10DE-1F07]
18:15 karolherbst[d]: that's kinda.... low, isn't it?
18:16 zmike[d]: [ Demo Quick Stats ]
18:16 zmike[d]: - demo : FurMark (GL) (built-in: YES)
18:16 zmike[d]: - renderer : NVIDIA GeForce RTX 2070/PCIe/SSE2
18:16 zmike[d]: - 3D API : OpenGL 4.6.0 NVIDIA 565.77
18:16 zmike[d]: - resolution : 1920x1080
18:16 zmike[d]: - SCORE : 3602
18:16 zmike[d]: - duration : 60016 ms
18:16 zmike[d]: - FPS (min/avg/max) : 46 / 60 / 61
18:16 zmike[d]: - GPU 0: NVIDIA GeForce RTX 2070 [10DE-1F07]
18:16 zmike[d]: 🤔
18:16 zmike[d]: think this one was vsynced
18:16 karolherbst[d]: yeah
18:16 zmike[d]: what's the nvidia env var for that?
18:17 zmike[d]: __GL_SYNC_TO_VBLANK=0 I guess
18:17 zmike[d]: yeah here we go
18:17 gfxstrand[d]: Nooo! We're winning! Don't make us not win!
18:17 zmike[d]: HERE WE GO
18:17 karolherbst[d]: `__GL_SYNC_TO_VBLANK=0`?
18:17 karolherbst[d]: ahh
18:17 karolherbst[d]: try with the gallium gl driver for the lols 😄
18:18 karolherbst[d]: look, I know how the result will look like, I'm not sure you know
18:18 zmike[d]: [ Demo Quick Stats ]
18:18 zmike[d]: - demo : FurMark (GL) (built-in: YES)
18:18 zmike[d]: - renderer : NVIDIA GeForce RTX 2070/PCIe/SSE2
18:18 zmike[d]: - 3D API : OpenGL 4.6.0 NVIDIA 565.77
18:18 zmike[d]: - resolution : 1920x1080
18:18 zmike[d]: - SCORE : 6944
18:18 zmike[d]: - duration : 60002 ms
18:18 zmike[d]: - FPS (min/avg/max) : 88 / 115 / 119
18:18 zmike[d]: - GPU 0: NVIDIA GeForce RTX 2070 [10DE-1F07]
18:19 karolherbst[d]: RIP nvk
18:19 karolherbst[d]: though dunno how much worse the gallium driver was, but I think that's one of the things which were in the 80% area compared to the blob
18:19 mhenning[d]: tbh we're closer than I expected us to be
18:20 karolherbst[d]: but maybe on modern GPUs, the gallium driver will be slow as heck
18:20 notthatclippy[d]: zmike[d]: any chance for zink+blobvk?
18:20 zmike[d]: [ Demo Quick Stats ]
18:20 zmike[d]: - demo : FurMark (GL) (built-in: YES)
18:20 zmike[d]: - renderer : zink Vulkan 1.3(NVIDIA GeForce RTX 2070 (NVIDIA_PROPRIETARY))
18:20 zmike[d]: - 3D API : OpenGL 4.6 (Compatibility Profile) Mesa 25.0.0-devel (git-1540d8d983)
18:20 zmike[d]: - resolution : 1920x1080
18:20 zmike[d]: - SCORE : 6503
18:20 zmike[d]: - duration : 60003 ms
18:20 zmike[d]: - FPS (min/avg/max) : 75 / 108 / 116
18:20 zmike[d]: - GPU 0: NVIDIA GeForce RTX 2070 [10DE-1F07]
18:20 notthatclippy[d]: Haha, thanks
18:21 zmike[d]: have to reboot again to test nouveau...
18:22 karolherbst[d]: I wonder if making furmark fast will also fix the other perf issues faith was seeing
18:22 karolherbst[d]: I'm not quite sure what furmark is testing, but it kinda tests things making the gpu run very hot
18:22 karolherbst[d]: so I suspect it kinda tries to run a lot of things in parallel or something? Dunno...
18:24 karolherbst[d]: though maybe my testing with nouveau was unfair, because uhm.. on kepler we didn't clocked down on hot thermals, sooo.. uhm...
18:24 zmike[d]: nouveau looking almost identical to nvk
18:24 karolherbst[d]: heh
18:24 zmike[d]: SAD
18:24 notthatclippy[d]: How is furmark when it comes to querying metrics? We had issues with Unigine and some others that would have a side thread that would just constantly hammer the driver asking for any and every thing they possibly could. And if that metric is exposed by GSP, well, you're starving other things that need it.
18:24 karolherbst[d]: and here I thought nouveau would be a bit faster 😄
18:24 zmike[d]: [ Demo Quick Stats ]
18:24 zmike[d]: - demo : FurMark (GL) (built-in: YES)
18:24 zmike[d]: - renderer : NV166
18:24 zmike[d]: - 3D API : OpenGL 4.3 (Compatibility Profile) Mesa 25.0.0-devel (git-1540d8d983)
18:24 zmike[d]: - resolution : 1920x1080
18:25 zmike[d]: - SCORE : 4954
18:25 zmike[d]: - duration : 60000 ms
18:25 zmike[d]: - FPS (min/avg/max) : 65 / 82 / 85
18:25 zmike[d]: - GPU 0: NVIDIA GeForce RTX 2070 [10DE-1F07]
18:25 karolherbst[d]: almost
18:25 karolherbst[d]: maybe a few optimizations missing 🙃
18:25 karolherbst[d]: I think I might still have a few patches here and there
18:26 zmike[d]: phoronix tomorrow: NVK open source driver DESTROYED by proprietary NVIDIA
18:35 redsheep[d]: "Lower your GPU performance with this one easy trick"
18:36 redsheep[d]: Honestly though getting 70+% of prop is pretty good, as far as the spread of results I've seen
18:42 anholt: sure would be nice if gfxreconstruct would crash less when replaying. it's so close to awesome.
18:45 zmike[d]: if you have issues with replay please file them and I'll tag them so they are put in the expedited queue
18:50 gfxstrand[d]: The story of so many Vulkan tools...
19:12 gfxstrand[d]: Does VKD3D not use swapchains?!?
19:13 zmike[d]: it does
19:15 gfxstrand[d]: Yeah, I'm just not as good at coding as I thought. 🤡
19:15 zmike[d]: big monday feel
19:16 gfxstrand[d]: Turns out counters work better if you actually increment them. Who knew?!?
19:16 zmike[d]: bizarre
19:22 asdqueerfromeu[d]: gfxstrand[d]: I mean you coded a new Vulkan driver from scratch and got 1.3 conformance for it in under 2 years (so that's quite impressive)
19:29 gfxstrand[d]: That just means I must think I can code really good.
19:32 gfxstrand[d]: *Dragon Age: The Veilguard* is currently stalling about 1000 times per frame. It seems like I should be able to get that down...
19:34 zmike[d]: why even bother with a dead game
19:35 mhenning[d]: gfxstrand[d]: What kind of stalls? WFIs?
19:41 gfxstrand[d]: Yeah, WFIs
19:43 mhenning[d]: Okay, cool. that lines up with a hunch that I've had
19:43 gfxstrand[d]: The current WFI code is very much "correct first"
19:44 gfxstrand[d]: Which, for getting NVK off the ground was the right call. But it needs to be improved.
19:44 mhenning[d]: yeah, sounds right
19:44 karolherbst[d]: that's going to be fun around 3D + compute 🙃
19:45 gfxstrand[d]: Even with all the shenanigans that game and VKD3D are doing, there's only like a dozen or two render passes.
19:45 gfxstrand[d]: It does use some ExecuteIndirect, though, so that's 🤡
19:47 gfxstrand[d]: I also need to figure out a better plan for compute and copy synchronization.
19:47 gfxstrand[d]: Right now we're just setting all the stall flags
19:48 gfxstrand[d]: But at least stalling all the time is mostly correct!
19:48 karolherbst[d]: you probably want to use the 3D or compute upload paths as well
19:48 karolherbst[d]: or well..
19:48 gfxstrand[d]: Maybe?
19:48 karolherbst[d]: optimize against what's used before
19:48 gfxstrand[d]: What we really want is to get our kernel issues sorted out so we can have proper async copy queues
19:49 karolherbst[d]: like if you have an upload and compute before and after, you probably also want to do it on compute, not 3d
19:49 karolherbst[d]: or copy
19:49 karolherbst[d]: yeah....
19:49 gfxstrand[d]: Maybe?
19:49 karolherbst[d]: well.. switching from 3D to compute is a WFI
19:50 gfxstrand[d]: That's kinda fine
19:50 airlied[d]: Did you just remove the WFIs and see if it goes faster 🙂
19:50 gfxstrand[d]: You can only do compute between render passes.
19:50 karolherbst[d]: sure, but there are things you can do on both e.g. const buffer uploads
19:50 gfxstrand[d]: airlied[d]: I'm guessing it'll be very incorrect, which isn't a great indicator of fastness
19:50 karolherbst[d]: though not sure if switching to copy also invokes a WFI
19:50 mhenning[d]: gfxstrand[d]: It can give you an upper bound
19:50 airlied[d]: But if it's still slow it is an good indicator wfi isn't your problem
19:51 gfxstrand[d]: karolherbst[d]: We only do that on 3D right now. Compute is all UBOs
19:51 karolherbst[d]: I see
19:52 karolherbst[d]: but yeah.. sub channel switches are to be avoided, though I think copy is a special case there
19:52 karolherbst[d]: _but_ 3D and compute have data uploads in their class, so might be better to use that
19:53 airlied[d]: This is where I blame zcull and everyone ignores me 🙂
19:53 karolherbst[d]: correct
19:53 gfxstrand[d]: airlied[d]: I'm not ignoring you
19:53 karolherbst[d]: but yeah.. zcull...
19:54 mhenning[d]: I'm not ignoring zcull either, it's just not especially easy
19:54 karolherbst[d]: the thing is, there is also implicit zcull going on even if you don't enable it
19:54 gfxstrand[d]: karolherbst[d]: Oh, copies just always stall right now. 🤡
19:54 mhenning[d]: really? I thought you needed to set up buffers?
19:54 mhenning[d]: for zcull
19:54 karolherbst[d]: yes and no
19:55 zmike[d]: how does linux determine what is the "first" gpu on a system?
19:55 karolherbst[d]: you just get a cleared thing if you switch the depth buffer
19:55 zmike[d]: like for enumerations and such
19:55 karolherbst[d]: instead of reusing the old cache
19:55 karolherbst[d]: more or less
19:55 gfxstrand[d]: zmike[d]: Like, /dev/dri order? It's based on PCI paths, I think.
19:56 gfxstrand[d]: But then VK_MESA_device_select shuffles it
19:56 zmike[d]: yeah I mean without shuffling
19:56 karolherbst[d]: as I understand things, the hardware has an internal zcull buffer, and the buffer you bind is used to save/restore the content
19:57 gfxstrand[d]: Maybe? If there is, it's very automatic because we're not doing anything to flush it or anything.
19:57 karolherbst[d]: yeah, it gets cleared when you switch the depth buffer
19:57 anholt: pci paths usually get lucky, but /dev/dri order is really just module probe order.
19:58 airlied[d]: zmike[d]: boot_vga
19:58 mhenning[d]: Aren't there cases where zcull isn't safe to enable, though? I assumed it would need driver involvement
19:58 gfxstrand[d]: karolherbst[d]: And when you WFI, if it really exists
19:58 airlied[d]: whatever card has the vga resources at boot is usually considered the "primary"
19:58 zmike[d]: you mean a monitor plugged?
19:58 airlied[d]: no whatever the BIOS decides
19:58 karolherbst[d]: gfxstrand[d]: there might be more cases where it gets cleared, yes
19:58 zmike[d]: hmmm
19:58 zmike[d]: I checked my bios and I don't see any settings for that at least
19:58 karolherbst[d]: mhenning[d]: the hardware can disable it itself
19:59 airlied[d]: usually the BIOS has a display init order
19:59 airlied[d]: PEG slots
19:59 gfxstrand[d]: karolherbst[d]: I doubt it. Not when you have your dept buffer bound as a texture. How would the hardware detect that?
19:59 gfxstrand[d]: And yes there are CTS tests which do that and they work
20:00 gfxstrand[d]: There might be some test-accel-only mode that's on by default but at that point it's just a cache
20:02 karolherbst[d]: gfxstrand[d]: don't you have to disable depth testing there anyway?
20:03 gfxstrand[d]: No
20:04 gfxstrand[d]: It's still depth testing.
20:04 gfxstrand[d]: You do have to WFI between draws but there's no flush
20:04 karolherbst[d]: well.. zcull happens before the fp gets invoked
20:06 gfxstrand[d]: That's fine. That's just early depth testing
20:06 karolherbst[d]: though it could be that atm nvk just disables zculling entirely...
20:07 asdqueerfromeu[d]: airlied[d]: Wouldn't that cause data races though?
20:08 gfxstrand[d]: Absolutely! You've gotta race to go fast. 😛
20:09 karolherbst[d]: mhhhh... it might be, that writing to the depth buffer doesn't really mess with zcull operation
20:10 karolherbst[d]: though not usre
20:11 gfxstrand[d]: Proper zcull has to be very carefully driver-managed.
20:11 karolherbst[d]: could also be that the internal buffer is recreated every time
20:11 gfxstrand[d]: You can have a cache be automagic but that's not going to get you what actual zcull will.
20:12 redsheep[d]: Wild theory: I wonder if wfi causing stuff to reset or clear helps explain msaa often slowing stuff so much, if there's more work/bandwidth to reset something that gets bigger with msaa on
20:13 gfxstrand[d]: Maybe?
20:13 gfxstrand[d]: MSAA is likely more about not having compression, though.
20:15 gfxstrand[d]: Damn! 7 FPS with all the WFI gone. 😭
20:15 gfxstrand[d]: Time for a new theory!
20:15 redsheep[d]: Isn't that exactly what it was before?
20:16 gfxstrand[d]: yup
20:16 gfxstrand[d]: There are plenty of other things that might be stalling us out, though
20:18 gfxstrand[d]: cbufs are a known culpret
20:18 karolherbst[d]: oh no, not the cbufs
20:19 skeggsb9778[d]: karolherbst[d]: yeah, subc switches are an implicit WFI
20:19 karolherbst[d]: skeggsb9778[d]: also with copy?
20:20 karolherbst[d]: I _think_ copy was special there
20:20 skeggsb9778[d]: yes (ish, maybe) - i think there's likely some special logic around the GRCEs (the ones on the same channel as the graphics objects), but i've not gotten around to looking into that yet
20:20 skeggsb9778[d]: but, proper async copies are a separate channel
20:20 karolherbst[d]: ahhh, that might be it
20:21 gfxstrand[d]: I'm not too worried about subc switches. Render passes make those pretty rare unless we implement draw stuff with compute, which we don't.
20:21 skeggsb9778[d]: yeah, not so common on current hw
20:21 skeggsb9778[d]: prior to g80, you'd even not have *enough* subchannels for all the objects you had to use 😛
20:21 gfxstrand[d]: I mean, we do have them for compute right now because we use some macros which is super annoying.
20:21 karolherbst[d]: ohh right...
20:22 karolherbst[d]: turing not having compute mme.....
20:22 gfxstrand[d]: But I don't think that's what's killing us
20:22 karolherbst[d]: why nvidia... why
20:22 karolherbst[d]: yeah...
20:22 karolherbst[d]: WFI might be annoying, but they shouldn't hurt _that_ much
20:22 gfxstrand[d]: It also shouldn't be happening mid-pass
20:23 karolherbst[d]: engine and performance counters would be nice to have at this point 🙃
20:25 gfxstrand[d]: no kidding
20:25 karolherbst[d]: I'm sure GSP does the engine counter stuff these days, because it's a massive and giant pita to do it on the host
20:26 gfxstrand[d]: probably
20:26 karolherbst[d]: however
20:26 karolherbst[d]: I know how to read and set them, lol
20:26 karolherbst[d]: it's just upwards counting mmio registers
20:26 karolherbst[d]: but something needs to read them and clear them regularly because they overflow fast
20:26 gfxstrand[d]: What kind of data do we get out of them?
20:27 karolherbst[d]: you can tell how much an engine idled
20:29 karolherbst[d]: https://github.com/karolherbst/nouveau/commit/b50914aa305afbb85cec50814fc3a33a65db0a18
20:29 karolherbst[d]: implemented it in firmware at some point, but never managed to actually upstream any of it
20:29 karolherbst[d]: also because.. well.. it requires firmware support, making it pointless on newer gpus
20:30 gfxstrand[d]: ah
20:30 karolherbst[d]: but yeah..
20:30 karolherbst[d]: you basically let the firmware count "10000 cycles" and then you get a value of how many of those the engine was idle
20:31 karolherbst[d]: doesn't tell you much, but it can tell your how much your shaders or the memory controller were bored
20:31 karolherbst[d]: there is also a pcie counter
20:31 karolherbst[d]: on laptops that one is almost always at 100% busy 😄
20:31 karolherbst[d]: or pretty high
20:32 karolherbst[d]: there is something weird stuff going on tho
20:32 karolherbst[d]: like nvidia marks memory at 100% even if the counter is a bit idle
20:32 karolherbst[d]: never figured out their heurestics
20:32 karolherbst[d]: but anyway.. nvidia's GPU and memory load is based on those counters
20:33 gfxstrand[d]: knowing whether we were memory or shader bound would be nice
20:33 karolherbst[d]: yeah.. for that those can help
20:34 karolherbst[d]: also if the pcie counter says 100% busy, then that also points out a problem 😄
20:34 mohamexiety[d]: mtijanic iirc said he'd do it (or guide someone to) whenever we get the GSP bump to 570
20:34 redsheep[d]: Given recent Blackwell pcie scaling benchmarks that shouldn't be anywhere near being the bottleneck
20:35 redsheep[d]: Maybe the counters are weird though and show that busy more than it is? Idk
20:35 mohamexiety[d]: mohamexiety[d]: re: wiring up HW utilization
20:35 karolherbst[d]: redsheep[d]: well.. if the driver is pointlessly reuploading stuff and shit
20:35 karolherbst[d]: but yeah
20:35 karolherbst[d]: it _shouldn't_ be
20:36 karolherbst[d]: though some games can trigger suboptimal paths
20:36 redsheep[d]: You can make applications that stress the bus, but normal ones don't seem to unless there's a bug
20:36 karolherbst[d]: talos principle was pcie bound on nouveau 🙃
20:36 karolherbst[d]: just speeding up pcie got you a nice perf improvement
20:37 redsheep[d]: karolherbst[d]: Well the good news is Talos 1 removed that renderer so you don't have to worry about that anymore 😜
20:37 karolherbst[d]: lol
21:09 notthatclippy[d]: mohamexiety[d]: Just to clarify, the only reason why I'm saying that is because there is a simple, obvious and wrong way to do it 😛
21:09 notthatclippy[d]: So if someone wants to handle it, reach out to me. Otherwise, if I ever get some spare time to actually set up a more permanent nouveau test system, I'll do it to preempt someone submitting the wrong thing.
21:10 mohamexiety[d]: I could take a poke at it with guidance when the time comes. kinda curious about that wrong way :KEKW:
21:10 mohamexiety[d]: I am guessing it's constantly poking at the GSP to get the values, which overwhelms it and stalls other more important stuff?
21:24 notthatclippy[d]: Pretty much. Also does a number on your power consumption.
21:25 tiredchiku[d]: yeah, that tracks
21:25 tiredchiku[d]: considering nvidia-smi can and will wake up a suspended GPU to get its info 😅
22:32 gfxstrand[d]: karolherbst[d]: I'm happy to blame that on nouveau GL. 🙃
22:54 gfxstrand[d]: Shouldn't we be getting a GSP bump soon for Blackwell?
23:03 redsheep[d]: gfxstrand[d]: That was talked about for Blackwell enablement. Are you having as much trouble as I am ordering a 5090?
23:04 redsheep[d]: I'm signed up for stock alerts same way I was for 3090 and 4090 and I've had like 20x fewer alerts than those did
23:04 gfxstrand[d]: Stock is really low, I think. I haven't gotten one yet.
23:06 mohamexiety[d]: redsheep[d]: keep in mind the launch coincides with Chinese New Year, which means that all logistics/production/etc is halted for like 10 days since January 29
23:07 mohamexiety[d]: and that's with launch stock being much lower than Ada stock
23:07 mohamexiety[d]: so I wouldn't expect things to get better until a while later sadly
23:08 gfxstrand[d]: Right. Yeah. That'll do it.
23:08 redsheep[d]: mohamexiety[d]: Yeah and global trade and logistics isn't exactly a rosy situation right now
23:08 mohamexiety[d]: yeeeeah 😐