04:40mangodev[d]: aaaaaAAAAAaaa the gitlab is down *again*
04:40mangodev[d]: so i'm just gonna ping you here in response to your comments in the issue that i saw earlier today
04:40mangodev[d]: [here](https://crash-stats.mozilla.org/report/index/675031b8-9d6f-4f44-bac5-5adce0250429) is a crash report, but idk how helpful it'll be because all that's new is a new `Debug Identifier` field in `Modules`
04:40mangodev[d]: mangodev[d]: whoops, forgot the ping
04:40mangodev[d]: mhenning[d]
04:46mangodev[d]: something else that's kind of alarming that i've noticed while daily driving nvk is that i think there's a memory leak, but i'm not 100% sure enough on it to file a report yet
04:47mangodev[d]: certain things render FAR slower after leaving the system on for a while (about 8-10 hours idle)
04:47mangodev[d]: modals especially (such as clicking on an image in discord, using your system screenshotter, or even the minecraft pause menu)
04:48mangodev[d]: i suspect something slowly starts going wrong over time, though i have zero idea what it could be
05:09mhenning[d]: mangodev[d]: oh, yeah that didn't do what I hoped it would do
05:10mhenning[d]: do you have symbols installed for both zink and nvk, or just nvk?
05:11mangodev[d]: mhenning[d]: i have mesa itself in debug mode
05:11mangodev[d]: *with debug symbols
05:24mhenning[d]: mhh. okay, you could try with `MOZ_CRASHREPORTER_DISABLE=1` and then see if you get a coredump once it crashes
05:24mhenning[d]: you might need to enable coredumps on your system if they're not enabled already
05:25snickerspoodle: I only gave some fuzzed content here, and i am not interested to do a blog. I did it cause of not getting any rebelling and fraud complaints, when i grab more assets under my belt. To the questions such as where those come from, the answer is then already self-explanatory, it came from my superior thinking and engineering. And a lifelong work was there to train myself and readjust to new
05:25snickerspoodle: circmustances. Whatever you do from here, is not my problem anymore (most of the code you do isn't needed and will only complicate the re-engineering of linux to modern sw), i have split up. I put the gangs behind the bars when those harass my territory again. Your Laura is unreal idiot, it bragged me with her abortions performed and such content every day with a new gangsta sexually
05:25snickerspoodle: involved, went over to say i am raper and wanker now it complains that everyone bullies with it, it went as far as kicking me out from my own territory like get the fuck out of here, with the overtake events , hahaa, so we mount a shooter to the area, it will shoot through their kneecap after warning shot, if they ever come to our territory again. I have lot better women than that scum
05:25snickerspoodle: thieve tyran. never ever will angry fuck crocodile either step through the line, she is a sort of fuckgangster who leads their forces, tells the bipolar moran how much contaminated pussy of angry fuck crocodile laura has to eat etc. It believes they are gangsters and it works as pimp. Those are so mad people harassing our territory with knives at hands, we will shoot them if needed and
05:25snickerspoodle: put them behind bars, cause otherwise there is no tourism left for our businesses.
06:56kar1m0[d]: Yapper final boss
07:58karolherbst[d]: ohhh.. VSCode has a function to expand a rust macro
07:58karolherbst[d]: like to see the output
07:58karolherbst[d]: rust-analyzer rather
12:43snowycoder[d]: Need help from those more familiar with nvidia naming conventions.
12:43snowycoder[d]: Kepler uses `.sd` a lot, and I don't know what it means,
12:43snowycoder[d]: in subfm the modifier can either be `.bl`, `.pl` or `.sd`, I think that bl and pl stand for block linear and pitch linear for the calculation it does, but for `.sd` it checks wether to do block or pitch linear based on a parameter loaded from the descriptor.
12:43snowycoder[d]: Same thing with `imadsp`, it has lots of modifiers, but with `.sd` it loads them from parameters.
12:43snowycoder[d]: Can it be `.stored_descriptor`? It doesn't sound too natural to me
16:09karolherbst[d]: airlied[d]: gfxstrand[d] I was working on something to make the latencies less of a disaster: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/fb6eac4d02256297c886e78aab463f221d1d7c89
16:09karolherbst[d]: I'm just wondering how to deal with logistical issues...
16:09karolherbst[d]: like sometimes the tables merge a couple of categories in one column/row
16:10karolherbst[d]: and I wonder if I just want to ignore it or add a way to also specify which colums there is
16:10karolherbst[d]: there are also sometimes left out rows, or entries not having values, etc.. but I can encode this with the macro...
16:10karolherbst[d]: also want to encode the special latency types directly there
16:11karolherbst[d]: marysaka[d]: ^^
16:37mhenning[d]: karolherbst[d]: At least for what you're currently doing, that doesn't need a macro at all.
16:37mhenning[d]: This:
16:37mhenning[d]: CoupledDisp => [ 4, 4, 5 ],
16:37mhenning[d]: CoupledAlu => [ 4, 4, 5 ],
16:37mhenning[d]: CoupledFMA => [ 5, 5, 4 ],
16:37mhenning[d]: could just be a match that returns a `[u8; 3]` and then later code could pick an index out of the array
16:37karolherbst[d]: yeah, but we have bypass rules and other funky things
16:38karolherbst[d]: like the entire "different latencies if predicated" stuff
16:38karolherbst[d]: it could also encode the need of scoreboards
16:39karolherbst[d]: like e.g. `write_after_write` in the same file
16:40karolherbst[d]: what I kinda prefer is that Nvidia gives us tables with those things, and we can just express them as tables as well
16:42mhenning[d]: even then, a proc macro feels a little excessive. you could probably do all of that with macro_rules
16:43karolherbst[d]: you can't
16:43karolherbst[d]: like literally
16:44mhenning[d]: why not?
16:44karolherbst[d]: because you can't invoke macros inside match statements
16:44karolherbst[d]: well.. could use ifs
16:44karolherbst[d]: but that makes other things harder
16:45karolherbst[d]: but doing combinatorial sets is a huge pain in macro_rules
16:45karolherbst[d]: there are ways, but that just makes the macros really really really really ugly and hacky
16:45karolherbst[d]: and hard to change
16:45karolherbst[d]: I tried for couple of hours and it's just not working out
16:46karolherbst[d]: and the proc macro is way cleaner than whatever macro rules I've came up with
16:47karolherbst[d]: but then again... if you'd use ifs, it might be not too hard, but then having rows for 2 or more categories are becoming kinda a pain, because even using `matches!(...)` is troublesome
16:48karolherbst[d]: but anyway... those special rules are encoded in a special way, and doing that with macro_rules is also gonna suck if you start matching against constants/symbols
17:15karolherbst[d]: now one can fill one row/col for multiple categories like a match pattern: https://gitlab.freedesktop.org/karolherbst/mesa/-/commit/39a128e96521c9830a338eef57ec1d6ca562bebf#81587241fdb91e609305b6e365184451645d7908_212_225
17:34mhenning[d]: gfxstrand[d]: marysaka[d] pls review https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34750 The first commit is a fix that I'd like to get in soon, the other two are lower priority
17:35marysaka[d]: on it
17:49marysaka[d]: took a look at my notes and yeah I completely messed up there sorry about that... I think your patches are good mhenning[d]
17:50karolherbst[d]: what's the policy on merging anyway, does Faith want to review everything or can we just merge stuff? 🙃
17:51karolherbst[d]: also.. when will we switch to zink by default for more gpus in the future?
17:51karolherbst[d]: I'm asking for uhm.. RHEL reasons
17:51mohamexiety[d]: iirc Faith mentioned that we want more time to guarantee zink is fine on maxwell/pascal
17:52karolherbst[d]: okay, so no immediate timeline yet?
17:52marysaka[d]: karolherbst[d]: Considering 25.0.5 is due for tomorrow and that the first patch only restrict the problematic encoding, it should be fine to merge after vk3d tests + CTS imo
17:52mohamexiety[d]: I remember something about 25.2 but dont quote me on it
17:52mohamexiety[d]: we also dont have GL CTS submissions on pascal/maxwell yet iirc, right?
17:53karolherbst[d]: marysaka[d]: I was asking for my delay patch
17:53karolherbst[d]: mohamexiety[d]: we do
17:53mohamexiety[d]: on zink?
17:53karolherbst[d]: mhhhhh
17:53karolherbst[d]: right...
17:53karolherbst[d]: no idea 🙂
17:53mhenning[d]: marysaka[d]: no need to be sorry for writing imperfect code. that's just how it goes
17:54marysaka[d]: karolherbst[d]: It should only trigger for MMA opcodes right? so I think it's not a problem even if it get backported
17:54karolherbst[d]: marysaka[d]: yeah....
17:56mhenning[d]: karolherbst[d]: I've been merging things with one reviewer who knows the related code, which is often Faith but not always Faith
17:56mhenning[d]: but yeah I don't think we've discussed a more formal policy there yet
17:57karolherbst[d]: right
17:57karolherbst[d]: I haven't done any testing besides the MMA stuff with that MR so far
17:57mhenning[d]: have you done a full cts run?
17:58karolherbst[d]: not really asking about a real policy either, just what the vibes are, because I'm not that much in the loop when it comes to NVK MRs
17:58karolherbst[d]: nope
17:59mhenning[d]: If cts is happy with it and it fixes mma stuff, I think that's a reasonable amount of testing for that one
17:59karolherbst[d]: okay, but yeah, I was more interested how much Faith still or wants to watch over things
17:59marysaka[d]: yeah I agree on that too
18:00karolherbst[d]: will do some testing later, but it's long weekend and I've been working like a lot this weekend already 🙃
18:00karolherbst[d]: but the MMA stuff was so fun to work on 😄
18:01karolherbst[d]: but I also kinda prefer a better way to express those latencies, because with those bypass things it can quickly become a complete disaster
18:32airlied[d]: I think the nop patch is fine to merge
18:33airlied[d]: I was very confused why those tests regressed, had forgotten about rebasing might have busted nip handling
18:36karolherbst[d]: I also should look at performance stuff soon, but I think just merging the overall HMM support is already good. Though the mov optimization stuff seems important. I've also implemented the bypass latency and that worked fine at lesat
18:36karolherbst[d]: mhhh.. LDSM I should also look into or clean up your patches...
18:37karolherbst[d]: airlied[d]: have you seen any of the ssbo OOB patterns in the perf testing you've been doing?
18:37karolherbst[d]: I've seen some of it in the CTS
18:37karolherbst[d]: like where NAK ends up doing real ifs instead of predicated bcsel
18:37karolherbst[d]: or predicated mov even
18:42gfxstrand[d]: marysaka[d]: I trust the two of you. Merge it.
18:42airlied[d]: I don't think the test shader I was optimising cared that much about ssbo oob, I don't remember seeing it
18:44airlied[d]: I'd get the first coop matrix perf shmem test case and dump that shader, my branch points out a lot of NAK things
18:45airlied[d]: It would be good for internal RH reasons to gather some perf stats from either coop matrix perf or llama.cpp at various points
18:46airlied[d]: Once initial coop mat lands, then as we iterate through the optimisations and coopmat2
18:47karolherbst[d]: right
18:47karolherbst[d]: something for next week
18:47karolherbst[d]: I think the MR is more or less ready to get reviewed tho
18:47karolherbst[d]: I left significant changes in its own MR on top of Marys stuff
18:48karolherbst[d]: though I _think_ the `nak: Allow up to 16 SSA values in SSARef` patch isn't needed anymore
18:48karolherbst[d]: marysaka[d]: any idea why you added it?
18:49marysaka[d]: karolherbst[d]: because I had a vec16
18:49karolherbst[d]: mhhh
18:49marysaka[d]: for like 16x16x16 if my memory serves me right
18:49karolherbst[d]: I do know that I've seen some vec8, but those get treated specially, no?
18:49karolherbst[d]: mhhh
18:49karolherbst[d]: on ampere?
18:49marysaka[d]: Yeah I think
18:49karolherbst[d]: right... I should test Ampere...
18:49karolherbst[d]: aand hook up 16x8x16 because I've disabled it for now..
18:50karolherbst[d]: 16x8x32 is also supported.. fun
18:50marysaka[d]: yeah that one
18:50marysaka[d]: that get you 16 values
18:50karolherbst[d]: there are also 4 bit ones.. like 8x8x32
18:51karolherbst[d]: but.......
18:51marysaka[d]: but I know gfxstrand[d] was against that patch anyway, we should do something about this
18:51karolherbst[d]: I don't want to look into 4 bit int types yet 🙃
18:51karolherbst[d]: that's going to be a massive pain
18:51marysaka[d]: VK doesn't have that right?
18:51marysaka[d]: or is it part of the other extension?
18:52karolherbst[d]: indeed.. vulkan doesn't have int4 so far it seems
18:52karolherbst[d]: but there is e4m3 and e5m4
18:52karolherbst[d]: *e5m2
18:53karolherbst[d]: that's tensor coop stuff starting with blackwell
18:53karolherbst[d]: bfloat is also going to be fun
18:56karolherbst[d]: my cmat_convert code is horrible.... I really shouldn't shuffle 8 bit values channel by channel around 😄
18:56karolherbst[d]: but I kinda wonder if there is a more sane way of doing it...
18:56karolherbst[d]: those things generally operate in quads somehow
18:56karolherbst[d]: or the layout is designed in a way, that you can move things around in quads
18:57karolherbst[d]: but some values need to be moved to different ssa values and that's just pain
18:58gfxstrand[d]: marysaka[d]: Which patch?
18:58marysaka[d]: gfxstrand[d]: `nak: Allow up to 16 SSA values in SSARef`
18:58karolherbst[d]: anyway.. please review the MMA MR and tell me better ways of converting Matrix layouts 🙃
18:59gfxstrand[d]: marysaka[d]: Right... We probably need to do something there. I just don't want to bloat every instruction by 48B per source.
19:00marysaka[d]: yeah it's just too much :blobcatnotlikethis:
19:01gfxstrand[d]: But I think we're can do something. We can have a magic value in v[3] that means "v[0] and (maybe) v[1] are actually a Box"
19:02gfxstrand[d]: We just need to rejigger it and write some unsafe code and add a drop implementation.
19:02airlied[d]: vulkan does at least have bfloat now
19:04karolherbst[d]: what's the best way to run the full CTS these dyas?
19:06mhenning[d]: `deqp-runner` is what I use
19:08gfxstrand[d]: I have a little script that wraps deqp-runner and saves me some typing. But it's not really necessary.
19:08gfxstrand[d]: gfxstrand[d]: I can get around to typing that but I'm on the road today and I'm the one who does the driving.
19:21mhenning[d]: gfxstrand[d]: I think I have an idea for how to do this. I'll hack at it a bit
20:49karolherbst[d]: any important flags I should pass to `deqp-vk`?
20:50karolherbst[d]: otherwise I just use `--deqp-visibility=hidden` as always
20:52airlied[d]: just let deqp-runner do it's thing usually, I posted Faith's wrapper script in here a few days ago
20:52airlied[d]: vulkan cts doesn't usually do windows so you don't need hidden
20:52airlied[d]: just unset DISPLAY and avoid wsi tests 🙂
20:54karolherbst[d]: heh
20:55karolherbst[d]: airlied[d]: this one?
20:55airlied[d]: yes I've sometimes used that, but I often just let deqp-runner go, helps also to have latest CTS and build a release build
20:58airlied[d]: probably should go do another round of spot the tests that use VRAM readback and complain
21:32karolherbst[d]: looks like intel landed bfloat support https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34105
22:13orowith2os[d]: gfxstrand[d]: re: Box<T> sizing stuff
22:13orowith2os[d]: Did you figure something out?
22:21gfxstrand[d]: I haven't looked. I've been road tripping since Friday.
22:22karolherbst[d]: uhhhhh
22:22karolherbst[d]: I just noticed something
22:23karolherbst[d]: even on turing, nouveau operates in pcie 1.0 mode
22:23karolherbst[d]: which is a bit odd... not sure if that's due to thunderbolt.. but that should be pcie 3.0, no?
22:24karolherbst[d]: uhh.. looks like not with my laptop...
22:24karolherbst[d]: pain
22:29karolherbst[d]: yeah... it's TB4, so 3.0 x 4
22:29karolherbst[d]: but the bridge only caps to 1.0 speeds
22:31karolherbst[d]: yeah.. the egpu case reports 8.0.. mhh
22:33karolherbst[d]: I should run the CTS on my desktop 🙃
23:51gfxstrand[d]: karolherbst[d]: Yes you should
23:52gfxstrand[d]: You really don't want that TB cable to cause you to miss an interrupt in the middle of the CTS.
23:52karolherbst[d]: the cable isn't the issue, it's just that driving the GPU with 2.5 x4 makes it very slow 😄
23:52karolherbst[d]: deqp-runner said something like 16 hours
23:53karolherbst[d]: but not sure why it dropped down to 2.5...
23:54skeggsb9778[d]: RM will change PCIE settings based on demand btw - not sure if that's what you're seeing
23:57skeggsb9778[d]: ```bskeggs@unit-00 ~ $ sudo lspci -d 10de::0300 -vv | grep LnkSta
23:57skeggsb9778[d]: LnkSta: Speed 2.5GT/s (downgraded), Width x16```
23:58skeggsb9778[d]: run xonotic quickly...
23:58skeggsb9778[d]: ```bskeggs@unit-00 ~ $ sudo lspci -d 10de::0300 -vv | grep LnkSta
23:58skeggsb9778[d]: LnkSta: Speed 16GT/s, Width x16```
23:58x512[m]: Who physically control PCI settings? GPU or PCI host controller?
23:59karolherbst[d]: yeah.. the tb bridge is stuck at 2.5...