01:25mhenning[d]: i started hacking at zcull a bit and i managed to make everything slower
01:25mhenning[d]: so maybe it's doing something?
01:28zmike[d]: progress!
02:22Pheoxy[AWSTUTC8][m]:posted a file: (18487KiB) < https://matrix.org/oftc/media/v1/media/download/AbOLzhV7VxavUeic288EO0Bv6VLCdmWC1ChJF6xP-FrxfBMDaJlFqmIpz7gOPVGbzJH3O5f5f_TBkdOQopNcc21CeVFt1NxQAG1hdHJpeC5vcmcvc0JmaEdnQkhzdmhRV1lZZXhTT0FXVlNo >
02:23Pheoxy[AWSTUTC8][m]: Whats going on here with NVK_DEBUG=all? Playing Warthunder or at least trying and its spaming logs about HDR? I don't even have HDR support on this laptop...
02:48mhenning[d]: I think that stands for header, not high dynamic range
02:58gfxstrand[d]: Pheoxy[AWSTUTC8][m]: NVK_DEBUG=push dumps out every command we send to the GPU. You really don't want that when running a full game.
03:04Pheoxy[AWSTUTC8][m]: <gfxstrand[d]> "Pheoxy [AWST/UTC+8]: NVK_DEBUG=..." <- What are your preferred variables for getting debug info? Specifically I'm trying to get logs for what is causing the fps drops with some specific games with smoke effects and it's very noticeable on Vulkan native also seems to hit VKD3D as well.
03:09gfxstrand[d]: There isn't much you can log which will tell us anything useful for performance analysis. Figuring out why something slows down due to a game setting or effect really requires looking at the game itself. File a bug with the games you're seeing this with, your HW information, a screenshot of the game graphical settings page, and maybe a short video or two.
03:14gfxstrand[d]: Also, smoke effects causing FPS drops is pretty common. It may be that you just don't see it on the blob driver because it's enough faster overall that you don't notice.
03:43Pheoxy[AWSTUTC8][m]: <gfxstrand[d]> "Also, smoke effects causing..." <- Just confirmed it happens on the blob driver as well just its able to compensate better.
03:43Pheoxy[AWSTUTC8][m]: It's a bit crazy compared as it goes from over 200fps to under 30fps
03:44Pheoxy[AWSTUTC8][m]: On the blob driver its still sitting around the 130fps range when I try to recreate the same fps drop
03:45Pheoxy[AWSTUTC8][m]: Even more annoying though because War thunder for example uses a ton of smoke effects so ground battles 20fps at best compared to air battles 200fps wise is massive jump.
03:45Pheoxy[AWSTUTC8][m]: Seems the game devs had issues with clouds too but thats been fixed.
03:46Pheoxy[AWSTUTC8][m]: Anything that specifically needs testing though?
04:55gfxstrand[d]: Yeah, there are some games from that era that were really bad with the smoke. There was this one WWII flying game I played on PS3 that would drop to like 10 FPS if you flew through too much smoke. 🤦🏻♀️
04:57gfxstrand[d]: The thing that would be helpful is if you could get a RenderDoc trace of frame with lots of smoke and get it to me somehow. I'm on my phone right now so probably not the best to walk you through that at the moment.
05:21Pheoxy[AWSTUTC8][m]: <gfxstrand[d]> "The thing that would be helpful..." <- No worries I'll see what i can do myself today, editing my game startup bash script to output all the logs I can get with optional parameters to make it a bit easier cause I keep forgetting certain `env` variables or programs to get trace logs.
05:24Pheoxy[AWSTUTC8][m]: Whats your preferences to run NVK? `VK_ICD_FILENAMES` or `DRI_PRIME`?
05:24Pheoxy[AWSTUTC8][m]: I'm testing on a ASUS Laptop with Intel integrated and NVIDIA Dedicated RTX 3070 and sometimes it likes to pretend the NVIDIA card doesn't exist or most common steam decides to preshade the Intel card instead.
05:26Pheoxy[AWSTUTC8][m]: Its a pain
05:36tiredchiku[d]: VK_ICD_FILENAMES is more reliable
05:47gfxstrand[d]: Yeah, that's what I always use.
07:24Pheoxy[AWSTUTC8][m]: I'll use VK_ICD_FILENAMES as after some testing it's actually working every time.
08:35gfxstrand[d]: \o/
08:49gfxstrand[d]: I kinda want vkbench but for GPU overhead/stalls...
08:50gfxstrand[d]: I think cbuf binds are killing us but I need to be able to prove it
08:51gfxstrand[d]: And then I need to figure out what to do about it
08:53gfxstrand[d]: Unfortunately, memory reads from the MME suck
08:55gfxstrand[d]: Otherwise, I'd just have a thing which reads a few values out of descriptors and LOAD_CONSTANT_BUFFER
08:56gfxstrand[d]: I can potentially do stuff on the CPU but yuck
08:59gfxstrand[d]: I should also figure out how fast `MME_DMA_READ_FIFOED` is
09:00gfxstrand[d]: And whether or not it stalls stuff
09:00gfxstrand[d]: I sure hope it doesn't stall anything. I don't see why it would.
09:04gfxstrand[d]: Or maybe our shaders just suck but I doubt they suck enough to cause all the problems.
09:04Pheoxy[AWSTUTC8][m]: Sound like a test in that area would be nice then to be honest?
09:27gfxstrand[d]: That's why I'm writing one. 🙂
09:43Pheoxy[AWSTUTC8][m]: While getting all the logs done my poor bash script has gone to python... At least the log output is looking nice.
09:43Pheoxy[AWSTUTC8][m]: Is there even a point to also use `VK_LAYER_PATH`? I see some people use it but it seems to work fine without for me.
09:52gfxstrand[d]: You only need that if you're trying to use a layer you've built yourself
10:20gfxstrand[d]: Okay, I've got a micro where I can double the per-draw time with WFI. Now to see what all stalls.
10:53gfxstrand[d]: Okay, yeah, cbuf bind stall confirmed
10:54karolherbst[d]: oh no
10:54gfxstrand[d]: Yeah. BIND_GROUP_CONSTANT_BUFFER is as bad as WFI, at least in this micro
10:54gfxstrand[d]: womp womp
10:55karolherbst[d]: mhhh... makes sense tho
10:55gfxstrand[d]: I'm guessing it's not quite as bad as WFI. It probably just stalls all the shaders in that group or something.
10:55karolherbst[d]: yeah...
10:55gfxstrand[d]: So maybe binding a constant buffer for VS doesn't stall FS.
10:56karolherbst[d]: I'm kinda wondering how the hardware partitions the cbuf cache honestly
10:56gfxstrand[d]: (I don't actually know that that's true.)
10:56karolherbst[d]: nah
10:56karolherbst[d]: I think it's the same hardware
10:56karolherbst[d]: would be weird to have per stage caches there
10:56gfxstrand[d]: This is super annoying...
10:56gfxstrand[d]: I need new descriptor code
10:57karolherbst[d]: yeah.. I made the suggestion of making it all a bit more dynamic and selecting the index at runtime so it won't evict too often (or moves buffers around)
10:57gfxstrand[d]: I'm tempted to do preambles...
10:57karolherbst[d]: but... that's gonna be some work
10:58gfxstrand[d]: Ugh, runtime indices would be a mess
10:58karolherbst[d]: and also pain, because alus can't take indirect indicies
10:58gfxstrand[d]: yeah
10:58gfxstrand[d]: That's a no go
10:58karolherbst[d]: ~~could patch the shaders~~
10:58karolherbst[d]:though
10:58gfxstrand[d]: You can't shader patch in Vulkan
10:58karolherbst[d]: at least within a pipeline, you could reorder ubos to optimize for this
10:59gfxstrand[d]: I think we just figure out how not use bound UBOs
10:59karolherbst[d]: all alus can take bindless ubos, no?
10:59gfxstrand[d]: yes
10:59karolherbst[d]: though the question then is.. how is the hardware evicting ubos under the hood
11:00gfxstrand[d]: The problem with bindless UBOs is that there's extra indirection vs. what I can do with bound right now
11:00karolherbst[d]: oh?
11:00karolherbst[d]: ahh right, need to load the metadata
11:00gfxstrand[d]: But I think I can get rid of that with preambles...
11:00gfxstrand[d]: Well, not really preambles. A gather pass.
11:00gfxstrand[d]: I don't really want to add that complexity to the driver but....
11:01karolherbst[d]: mhhhh
11:02gfxstrand[d]: I did this for Intel once: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4745
11:02karolherbst[d]: with what would that help tho?
11:03gfxstrand[d]: It would let me scrape descriptors out of sets and put everything in cb0 with streaming LOAD_CONSTANT_BUFFER which doesn't stall.
11:04gfxstrand[d]: Including images
11:04gfxstrand[d]: It's not real preamble shaders but it's as close as I think we can reasonably get on NV
11:05gfxstrand[d]: Well, get without adding function pointer support to NAK.
11:05gfxstrand[d]: 🙃
11:05karolherbst[d]: heh
11:05karolherbst[d]: but yeah... couldn't you also do it within an mme tho? 😄
11:05gfxstrand[d]: That's a damn complex MME
11:05gfxstrand[d]: But theoretically yes maybe
11:06karolherbst[d]: could do both and see what's faster
11:06gfxstrand[d]: Eh, the shader will be faster
11:07gfxstrand[d]: It's just more complex in the driver
11:07gfxstrand[d]: But having an MME which does a multi-level crawl of descriptors isn't exactly low-complexity.
11:07karolherbst[d]: yeah, fair
11:09karolherbst[d]: so the idea is basically to move hot ubo descriptors into cb0, so shaders pull them from there directly instead of moving through the indirection in each thread
11:09gfxstrand[d]: yup
11:09gfxstrand[d]: We can even go so far as to move individual hot uniforms, depending on how complex we want to get.
11:10karolherbst[d]: mhhhh yeah.. I can see how doing the desc fetches in each thread can mess up with caches compared to just doing it once
11:10gfxstrand[d]: Like, if we had real preambles, we could hoist basically anything
11:10karolherbst[d]: I _think_ having ubo indicies optimized in pipelines is also a good idea probably
11:10gfxstrand[d]: Including D3D12 descriptor fetches. 🙂
11:10karolherbst[d]: not sure if you already do that
11:10karolherbst[d]: and how often that is even beneficial in vulkan
11:11karolherbst[d]: not sure "binding the same ubo in each stage" is even a thing anymore
11:11gfxstrand[d]: Nah
11:11karolherbst[d]: uhm.. accessing rather
11:11gfxstrand[d]: We just do cb0 which is the same everywhere and then ubo1+ is different per-stage
11:11karolherbst[d]: yeah.. I'm wondering how often it happens in vulkan, that the same buffer is used across stages
11:12gfxstrand[d]: Not often enough to be worth special-casing
11:12karolherbst[d]: though that information being only available in pipelines might not help _that_ much overall
11:12gfxstrand[d]: And it's a really weird link optimization with the current compile flow. It would have to be a pretty big benefit to be worth it.
11:12karolherbst[d]: but might help nonetheless if the same shader is invoked repeatedly
11:13karolherbst[d]: are you unbinding unused slots?
11:13gfxstrand[d]: I think so?
11:13karolherbst[d]: mhhhhhh
11:13gfxstrand[d]: But also, I think the real answer is to figure out how to stop using anything but cb0
11:13karolherbst[d]: so if an entire stage uses slot 1-8 e.g., might help not unbinding at all and leaving gaps
11:13karolherbst[d]: *pipieline
11:13gfxstrand[d]: Use cb0, stream everything, everything else is bindless
11:13karolherbst[d]: and then you won't ever have to bind if that same pipeline is ran a couple of times
11:14karolherbst[d]: enable/disable might not even cause a WFI if the buffer stays the same
11:15gfxstrand[d]: So one change we could make would be to bind cbufs based on the pipeline layout instead of basing it on the shader.
11:15karolherbst[d]: yeah and leave gaps so later stages don't overwrite a slot
11:15gfxstrand[d]: No. FS and VS are unaware of each other
11:15gfxstrand[d]: The HW has bind groups for a reason
11:15karolherbst[d]: well.. you have 18 slots
11:16gfxstrand[d]: I have 18 * 5
11:16karolherbst[d]: ohhh...
11:16karolherbst[d]: mhhh
11:16karolherbst[d]: mhhh
11:16karolherbst[d]: let me check something
11:17gfxstrand[d]: gfxstrand[d]: If we did it based on pipeline layouts (or effective pipeline layouts) we could move the binding to descriptor set bind time rather than having it be a draw-time validation thing.
11:17gfxstrand[d]: The blob uses 16-31 for dynamic UBOs always
11:17gfxstrand[d]: Or 8-15, rather
11:17gfxstrand[d]: Something like that
11:18karolherbst[d]: okay.. I might have some more info on what's evil and bad there...
11:19karolherbst[d]: are you using I2M for cb uploads?
11:20gfxstrand[d]: gfxstrand[d]: So right now we have the problem where if you have two shaders A and B and A uses UBO X and B uses UBO Y and you do and you switch back and forth between the shaders, we'll keep rebinding X and Y. If we based it on pipeline layouts, we could avoid some of that.
11:20gfxstrand[d]: karolherbst[d]: I2M?
11:20karolherbst[d]: inline data stuff
11:20gfxstrand[d]: For cb0, yes
11:20karolherbst[d]: okay
11:20gfxstrand[d]: Well, LOAD_CONSTANT_BUFFER
11:20karolherbst[d]: gfxstrand[d]: yeah.. you probably want to avoid that... I wonder if leaving the address alone already helps a lot or not
11:21gfxstrand[d]: Nah, re-binding the same address over and over stalls
11:21gfxstrand[d]: You have to avoid the bind entirely
11:21karolherbst[d]: mhhh, so could assume your code is perfect and never unbind an unused slot 😄
11:22gfxstrand[d]: Avoiding unbinds is complex
11:22gfxstrand[d]: And is a great recipe for weird faults
11:22karolherbst[d]: okay... soooo
11:22karolherbst[d]: I found some bits
11:22karolherbst[d]: apparently it makes a difference if you update one or multiple cbs at the same time
11:23karolherbst[d]: but given you only have one internal cb, you probably don't mess that part up
11:24karolherbst[d]: apparently it helps only updating the parts which actuall change
11:24gfxstrand[d]: Well, sure
11:24karolherbst[d]: and there is some alignment you need to care about to optimize that even further
11:25gfxstrand[d]: I'm already tracking a bunch of state in MME scratch regs so I can avoid double-updates.
11:26karolherbst[d]: mhhh
11:26karolherbst[d]: do you do anything with `SET_ROOT_TABLE`?
11:26gfxstrand[d]: No
11:26gfxstrand[d]: I assume that's useful for something but IDK what
11:26karolherbst[d]: you might want to
11:27karolherbst[d]: it's a perf optimization
11:28karolherbst[d]: don't know how it works, but it sounds useful combined with e.g. cb0
11:29gfxstrand[d]: Yeah, it looks like it's probably a faster cb0 or something
11:29karolherbst[d]: yeah
11:29karolherbst[d]: something like that
11:30karolherbst[d]: might be worth checking how nvidia uses it
11:31karolherbst[d]: ohhh
11:31karolherbst[d]: I found what might help with the overall cb situation.. mhhh
11:31karolherbst[d]: as in: preventing WFI
11:33karolherbst[d]: just don't find what's the relevant stuff in the class for it...
11:37karolherbst[d]: mhhh yeah... no idea... it sounds like it's helping when there are partial updates to cbs and prevents WFI between pipelines
11:38karolherbst[d]: but no idea how it's configured
11:38gfxstrand[d]: I think cb0 is kind of okay
11:39karolherbst[d]: yeah.. it sounds useful for e.g. push constants, but they end up in cb0, right?
11:39gfxstrand[d]: yup
11:39gfxstrand[d]: I mean, it might make cb0 even faster
11:39karolherbst[d]: yeah
11:39gfxstrand[d]: But the real problem right now is all the other cbs
11:39karolherbst[d]: okay, let's focus on that then
11:40karolherbst[d]: but anyway.. there are ways to make cb0 faster, guess that's good to know regardless
11:40gfxstrand[d]: yeah
11:40karolherbst[d]: especially if you can update cb0 without a WFI
11:40karolherbst[d]: maybe it's a transparent feature and you already make use of it, who knows
11:41gfxstrand[d]: Yeah
11:41karolherbst[d]: I _wonder_ if only using bindless ubos helps a lot here...
11:41karolherbst[d]: what size are you using for bindless ubos btw?
11:42karolherbst[d]: the size of the buffer or the range the shader might access?
11:42karolherbst[d]: or aren't you using them at all?
11:43karolherbst[d]: okay... sooo...
11:43karolherbst[d]: you don't want to treat bindless buffers as being strictly correlated to real buffers
11:43gfxstrand[d]: I've filed it so I don't forget again: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12576
11:43karolherbst[d]: so binding sub ranges from the same memory allocation can be beneficial
11:43karolherbst[d]: well..
11:44karolherbst[d]: "using" rather
11:44gfxstrand[d]: karolherbst[d]: Yes, it does. Except for the extra indirections.
11:45karolherbst[d]: soo.. I think what you want to do here with bindless ubos, is that you find a lot of small ranges accessed and use that, instead of a buffer + size approach. No idea what you do there, but yeah.. I should read the code
11:46gfxstrand[d]: I mean, I could smash the descriptors but that sounds like a lot of headache and extra instructions.
11:46karolherbst[d]: like if you have an ubo, and the shader accesses [0-0x100) and [0x1000-0x1100), then you should treat it as two bindless ubos both of size 0x100
11:47karolherbst[d]: can construct the second base address in the shader
11:47gfxstrand[d]: Yeah, for constant offset things we can. It's just annoying.
11:47karolherbst[d]: the important part is, to not indicate you care about [0x100-0x1000)
11:47karolherbst[d]: uhm.. to indicate you don't care about those parts
11:48gfxstrand[d]: Why? Does it just load the whole thing on a cache miss?
11:48karolherbst[d]: unsure
11:48karolherbst[d]: but it sounds like it does
11:48gfxstrand[d]: That would be annoying
11:48karolherbst[d]: worth checking if it makes a difference
11:48gfxstrand[d]: The blob doesn't do any shenanigans here
11:49karolherbst[d]: mhh, I see
11:50karolherbst[d]: I can see that having multiple bases might also cause multiple fetches and that can cause issues...
11:50karolherbst[d]: maybe it's more important with bigger gaps
11:50karolherbst[d]: or maybe the hardware can handle it...
11:52gfxstrand[d]: I would hope the hardware is a little smarter than "Go load 4MiB" the moment you get a miss.
11:54karolherbst[d]: ohh yeah.. sounds like the hardware can deal with it actually
11:54gfxstrand[d]: Ugh... Real preambles would be such a PITA
11:55karolherbst[d]: https://forums.developer.nvidia.com/t/profiling-constant-cache/306586/2
11:55karolherbst[d]: perf counters would be nice 🙃
11:56gfxstrand[d]: Yeah
11:56karolherbst[d]: anyway.. those are the things related here
11:56karolherbst[d]: IDC mainly for ubos
11:58karolherbst[d]: ehh.. maybe IMC
11:58karolherbst[d]: those terms are confusing and what they are doing
11:59karolherbst[d]: ohh yeah..
11:59karolherbst[d]: IMC if the access is uniform
12:01karolherbst[d]: but anyway.. if bindless ubos help.....
12:01karolherbst[d]: how does bindless ubos compare to ldg.constant?
12:04gfxstrand[d]: Way faster
12:04karolherbst[d]: ahh
12:04gfxstrand[d]: And we get free bounds checking
12:04karolherbst[d]: so ldg.constant is probably more fo a caching thing and is not using the cb thing at all
12:04karolherbst[d]: just something I was wondering about
12:04gfxstrand[d]: Yeah. ldg.constant is just ldg but totally ignoring that other things might be writing.
12:04karolherbst[d]: okay
12:05gfxstrand[d]: bindless UBOs are super fast
12:05karolherbst[d]: I wonder what nvidia does for GL....
12:05gfxstrand[d]: If I can get rid of that pesky indirection
12:05karolherbst[d]: ohh.. right...
12:06gfxstrand[d]: The reason why bound UBOs are faster now is because I'm pulling the descriptor out of the descriptor set with the MME and binding the actual UBO.
12:06karolherbst[d]: nvidia just pulls the bindless handle out of the descriptor?
12:06gfxstrand[d]: Mostly?
12:07gfxstrand[d]: Except dynamic UBOs are always bound AFAICT
12:07gfxstrand[d]: Unless you use the max number. Then they probably fall back to bindless.
12:07karolherbst[d]: mhhhh
12:08gfxstrand[d]: Maybe I can do more-or-less the same thing I'm doing now with the MME and just stash stuff in CB0 instead of binding.
12:08gfxstrand[d]: That might not be too hard to hack up
12:09gfxstrand[d]: I don't especially like doing it all with MME but maybe it's okay
12:09karolherbst[d]: ohhh yeah, that should work
12:10karolherbst[d]: add some "virtual slots" or something 😄
12:10gfxstrand[d]: Yup
12:10gfxstrand[d]: Kinda dumb but also probably fine
12:15gfxstrand[d]: There's still a load but at least we can skip the descriptor set base address fetch
12:16karolherbst[d]: you mean a load besides from cb0?
12:20gfxstrand[d]: We still have to load something from cb0
12:26karolherbst[d]: yeah.. well.. that's for free (basically)
12:26karolherbst[d]: or at least I'd consider them to be for free, given it's never rebound and only partially updated and whatever
12:27karolherbst[d]: might be able to optimize it later, so I think that's fine to ignore as a cost
14:59gfxstrand[d]: Okay, virtual cbuf slots are implemented. Running dEQP-VK.ubo.*
15:37gfxstrand[d]: No difference in *The Witness* on my hackity hack branch for virtual cbufs
15:37gfxstrand[d]: No difference means nothing bad
15:38gfxstrand[d]: https://tenor.com/view/i-don%27t-know-idk-idk-about-that-gif-7336250873949261946
15:50gfxstrand[d]: No difference in DA:TV
15:52gfxstrand[d]: I guess if we want to do this we probably need to do something about image descriptors, too.
15:58gfxstrand[d]: gfxstrand[d]: Not quite. It's the same as actual cbufs but NVK_DEBUG=no_cbuf is still faster. :facepalm:
16:02gfxstrand[d]: Which makes no sense because my branch is has one `BIND_GROUP_CONSTANT_BUFFER()` at context creation time and never again.
16:04gfxstrand[d]: Are my streaming updates stalling things?!?
16:04gfxstrand[d]: I really need a bigger GPU than a 4060 to test this stuff
16:10gfxstrand[d]: redsheep[d]: If you wanted to throw my nvk/virt-cbs branch at some things...
16:10zmike[d]: if only the 5000 series wasn't a paper launch
16:21gfxstrand[d]: My virt-cbuf branch does turn off cbuf textures implicitly so maybe that's why it's no better?
16:21gfxstrand[d]: Wait, no. We don't have those at all in that branch. I haven't rebased since it landed.
16:21gfxstrand[d]: IDK what's going on then
16:40karolherbst[d]: gfxstrand[d]: I think so yes
16:41karolherbst[d]: need to wait on shaders finish executing before you can update the content
16:41gfxstrand[d]: I thought there was some hardware magic that made it versioned or something
16:41karolherbst[d]: I'm not sure if that's transparent or if there is something we need to enable to get that
16:42karolherbst[d]: but yes, that exists, but no idea how that's working out practicalaly
16:43karolherbst[d]: are you overwriting the entire cb0 or only parts?
16:43gfxstrand[d]: only minimal parts
16:44gfxstrand[d]: Like one or two dwords at a time if needed
16:45karolherbst[d]: mhhhh
16:45karolherbst[d]: yeah with versioning that shouldn't stall at all
19:44dwlsalmeida[d]: hey, this is not a nouveau discussion per se, but since there's so much Rust creeping in, I wonder if we can figure out a way to support `--error-format=json` when invoking rustc through meson
19:45dwlsalmeida[d]: this is so rust-analyzer can highlight errors, instead of having to compile and comb through the stuff in the terminal
19:46karolherbst[d]: does that integrate well with e.g. vscode?
19:46dwlsalmeida[d]: yes, by setting `rust-analyzer.check.overrideCommand`
19:46karolherbst[d]: mhh
19:46dwlsalmeida[d]: in fact, that's my objective too
19:46karolherbst[d]: can one append stuff? otherwise might make sense to handle it within the meson vscode extension
19:47karolherbst[d]: anyway, gtg, but yeah, that would be cool to have properly working
19:48dwlsalmeida[d]: in the rust-for-linux project, someone managed to find a solution:
19:48dwlsalmeida[d]: make KRUSTFLAGS=--error-format=json
19:49dwlsalmeida[d]: this means that you'll actually run `make` whenever you save etc, but I don't think it matters much
19:49dwlsalmeida[d]: the end result is that you finally get a `cargo`-like experience when using an IDE like vscode and others
19:50dwlsalmeida[d]: with errors/warnings showing up inline etc
19:50dwlsalmeida[d]: I wonder if it's just a matter of appending `--error-format=json` to the right meson.build file, i'll check that later
20:01redsheep[d]: gfxstrand[d]: I can probably test some this evening. New job has me very busy. My old setup to build stuff is probably bad now too so might take some doing
20:06gfxstrand[d]: No worries. I'm starting to suspect I'm still getting stalling for some reason.
20:09tiredchiku[d]: I'd have offered to test but my 3070 isn't much bigger than your 4060 😅
20:23anholt: dwlsalmeida[d]: yeah, I'd love to get a solution rust-analyzer in vscode in mesa.
20:45karolherbst[d]: dwlsalmeida[d]: well.. can just modify your rust_args locally
20:46karolherbst[d]: let me try that...
20:47karolherbst[d]: mhhh
20:49karolherbst[d]: mhhhhh
21:00dwlsalmeida[d]: 🤞
21:00dwlsalmeida[d]: (did it work? :D)
21:23karolherbst[d]: it didn't
21:24karolherbst[d]: `rust-analyzer.check.overrideCommand` is more for `cargo check` and I _think_ meson will need some alternative which doesn't mean recompiling everything, but not quite sure
21:24karolherbst[d]: though I think it would make sense to have a `meson check` command with some flags
21:25karolherbst[d]: you probably also don't want the warnings/errors of subprojects
21:25dwlsalmeida[d]: at least for the kernel, `make KRUSTFLAGS=--error-format=json` worked fine here
21:25dwlsalmeida[d]: are you sure you defined the override thing right? i.e., for me that is:
21:25dwlsalmeida[d]: "rust-analyzer.check.overrideCommand": [
21:25dwlsalmeida[d]: "make",
21:25dwlsalmeida[d]: "LLVM=1",
21:25dwlsalmeida[d]: "KRUSTFLAGS=--error-format=json",
21:25dwlsalmeida[d]: "-j32",
21:25dwlsalmeida[d]: ],
21:26dwlsalmeida[d]: i.e.: note that this is an array
21:27karolherbst[d]: but that doesn't relaly work with iterative builds, does it?
21:27dwlsalmeida[d]: what do you mean?
21:28karolherbst[d]: like, when it builds again
21:28dwlsalmeida[d]: it works like it would if the project was using cargo, i.e.: everytime you save, you run `make` (in this example)
21:28dwlsalmeida[d]: instead of cargo check
21:28dwlsalmeida[d]: it takes a couple of seconds, of course
21:28karolherbst[d]: but won't errors in other files vanish?
21:30karolherbst[d]: e.g when you have multiple crates
21:30dwlsalmeida[d]: I'm not sure I get you, why would any errors disappear? all errors will be output by make, then all errors will be consumed by vscode at all times
21:30dwlsalmeida[d]: doesn't matter how many crates
21:30karolherbst[d]: sure, but if it doesn't compile certain files, it won't output warnings for those
21:31karolherbst[d]: errors sure, it fails to compile so you get everything you need
21:31karolherbst[d]: but e.g. clippy warnings in crates would just not be part of that if you touch a file of another crate
21:33dwlsalmeida[d]: hmmm
21:33karolherbst[d]: though I doubt cargo can handle it, because it only compiles a single crate anyway 🙃
21:37gfxstrand[d]: dwlsalmeida[d]: File a meson issue. Dylan or Xavier will look at it.
23:13mhenning[d]: NV2080_CTRL_CMD_GR_GET_SM_ISSUE_RATE_MODIFIER is interesting. I wonder if everything defaults to full speed
23:36karolherbst[d]: maybe it's one of those quadro falgs
23:36karolherbst[d]: *flags
23:38mhenning[d]: oh, I was wondering if it was related to either hash rate limiting or power management
23:39karolherbst[d]: maybe, but nvidia has a few hardware flags to artifically make the GPU slower for things only mattering for like.. workstation workloads 🙃
23:39karolherbst[d]: there are a few driver hacks to flip those things, but I think they usually work on a symbol level instead of whacking, the hardware
23:42mhenning[d]: NV2080_CTRL_GR_GET_SM_ISSUE_RATE_MODIFIER_FFMA_FULL_SPEED doesn't sound like something that only workstation cares about to me
23:42mhenning[d]: but for all I know they're just debug stuff
23:43karolherbst[d]: heh
23:57gfxstrand[d]: Might as well throw it on and see what happens. 🤷🏻♀️
23:58gfxstrand[d]: I really need to resurrect my NAK branch for better scheduling. But first I need to rewrite the dependency solver again. :frog_upside_down:
23:59gfxstrand[d]: Right now it can't handle the case where an instruction is variable latency with respect to one instruction type but not another.