00:48mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1365852014155665448/image.png?ex=680ed050&is=680d7ed0&hm=fdd7fef1bd804538d7d9af6124890b341d6d997a7c09523796a82ab18c90db3d&
00:48mangodev[d]: now it's completely dead, even the gitlab :|
05:36karolherbst[d]: I suspect most of those issues are actually fastly and the config not being correct
05:49damo22: we had a gitlab instance at work that often restarted itself, i dont know why, maybe OOM killer?
09:30karolherbst[d]: mhhh.. I'm doing the matrix layout translation with 8 shuffles and 4 bcsel and I'm wondering if one can do it cheaper...
09:30karolherbst[d]: for a 4 component value that is
09:30karolherbst[d]: so 2 shuffles and a bcsel per component
10:27dancegroupchoco: you are all down for fraud and abuse and tyranny soon. Your abusers will also all be found from Estonia to Germany and brutally treated. That is not how abortion leftovers like you should be treated, but you crossed the line, next time choose your actions as well as victims better.
10:27dancegroupchoco: it's not only freedesktop that goes down.
11:40kar1m0[d]: Is gitlab still down?
11:41kar1m0[d]: Also freedesktop seems to still be down
14:32karolherbst[d]: I think I'll run the coop matrix stuff through my volta just for the fun..
14:32karolherbst[d]: actually.... nevermind that, 😄
14:33karolherbst[d]: it has no IMMA and the HMMA support is a huge pain
14:33karolherbst[d]: though.. Turing and Volta share support for 8x8x4 HMMA
14:34karolherbst[d]: and 8x4x8
14:35karolherbst[d]: but those execute HMMA in several steps.. shouldn't be toooo hard, but still
14:36karolherbst[d]: I should fix the latency stuff first
18:35snowycoder[d]: I can load the first row of the first gob of an image (progress!) 🎉
18:36mangodev[d]: snowycoder[d]: oh?
18:36mangodev[d]: on kepler i assume?
18:37snowycoder[d]: mangodev[d]: Yes, image address computation must be done manually with some special opcodes, really weird arch
18:37mangodev[d]: oh damn
18:37mangodev[d]: sweet
18:38asdqueerfromeu[d]: snowycoder[d]: I'm GOBsmacked that actually works 🥁
18:38mangodev[d]: -# i can't wait for the workweek so this discord and the gitlab become active again
18:40snowycoder[d]: Me too, at least i have more time during weekends since there's no work
18:41mangodev[d]: been trying to debug why firefox crashes so much and i've gotten little to no progress on it, debug symbols haven't helped
18:42orowith2os[d]: snowycoder[d]: lucky. My work will schedule me all week.
18:42orowith2os[d]: they have me six days in a row, and would do seven or more if I wouldn't get OT.
18:42mangodev[d]: mangodev[d]: after that is done, i'm gonna try debugging why discord crashes spontaneously as well (may be for the same reason though 👀)
18:43snowycoder[d]: orowith2os[d]: Oh god, I couldn't survive that
18:44snowycoder[d]: mangodev[d]: Discord never works even with nvidia proper, it would be awesome to have
18:44orowith2os[d]: I was on for 39 hours two days ago, and I traded for today, so I'm at an even 40. I have no wiggle room in being able to come in early or clock out late like I normally would.
18:44orowith2os[d]: :wires:
18:45mangodev[d]: snowycoder[d]: worked okay-ish for me, but was just stuttery
18:45mangodev[d]: on nvk it's smooth, aside from the random soft crashes
18:45orowith2os[d]: orowith2os[d]: I think I was on for, 21 hours last week? came in on an off day, stayed later, and I ended the week at 34.75
18:45orowith2os[d]: got me two new pairs of shoes. lmao
18:46mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1366123318553350235/image.png?ex=680fccfc&is=680e7b7c&hm=e132c67bf25af1a4e4b90559670e8957abede696d28c234f595ea0723e115fae&
18:46mangodev[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1366123318775644230/image.png?ex=680fccfc&is=680e7b7c&hm=9696b5baa387d7664066fc1e44ecffbd6c7603de64e9a924ab2a97074159a0ec&
18:46mangodev[d]: firefox stop shitting yourself damnit
19:20mohamexiety[d]: snowycoder[d]: huh what goes wrong?
19:20mohamexiety[d]: I have had no issues on both arch (well, cachy) using the discord package and fedora using the flatpak
19:21mohamexiety[d]: that's with both nvidia and mesa
19:21mohamexiety[d]: the only problematic thing was screen sharing but I think that's just a linux thing at this point. never looked too much into it
19:21mangodev[d]: haven't had issues with discord on nvidia official other than the horrific stutter (applied to all web browsers) and lack of hw accel screenshare (also like that on nvk too)
19:22mangodev[d]: on nvk, discord randomly soft crashes (though thankfully, it recovers silently)
19:22mangodev[d]: still annoying getting a freeze and blank screen every 20 minutes though
19:34snowycoder[d]: mohamexiety[d]: It just crashes at start 🤷♂️ .
19:34snowycoder[d]: I have been using vesktop for a while since it had better screen sharing, didn't test real discord though.
19:34snowycoder[d]: I would need to search the bug more, I just switched to discord web beacause I didn't have time.
19:39gfxstrand[d]: mangodev[d]: I won't be around much this next week. Others should be, though.
19:56karolherbst[d]: where was this table with all the isntruction encoding stuff?
19:58airlied[d]: https://kuterdinel.com/nv_isa_sm89/ that one?
19:59karolherbst[d]: yeah, thanks
19:59karolherbst[d]: I'm almost done with Turing.. just need to hook up 8x8x4 HMMA 🙃
19:59karolherbst[d]: heh...
19:59karolherbst[d]: it's not in that table.. well wrong sm
19:59karolherbst[d]: mhhh
20:00karolherbst[d]: does it exist for sm75?
20:00mohamexiety[d]: the table? nope
20:00mohamexiety[d]: but you can generate it yourself I think
20:00mohamexiety[d]: it'll take a lot of time tho
20:00karolherbst[d]: huh.. interesting.. `DMMA`
20:00mohamexiety[d]: https://github.com/kuterd/nv_isa_solver/issues/1 the instructions are here
20:00mohamexiety[d]: (see last comment)
20:01karolherbst[d]: I wonder if it exist on turing...
20:01mohamexiety[d]: just swap sm 89 with sm 75
20:01karolherbst[d]: probably not
20:04karolherbst[d]: mhhhh
20:05airlied[d]: just use nvfuzz a bit
20:05airlied[d]: needs nvdisasm installed
20:06karolherbst[d]: nah
20:06karolherbst[d]: got it
20:06karolherbst[d]: 0x236 is the opcode
20:06karolherbst[d]: it is a funky instruction tbh
20:07mhenning[d]: karolherbst[d]: If you;re just looking for opcodes, there are lists here: https://gitlab.freedesktop.org/mhenning/re/-/tree/main/opclass?ref_type=heads
20:08karolherbst[d]: yeah.. but I already just decremented the opcode until I hit the right one
20:08karolherbst[d]: well.. had to only try out 6 or so
20:09mhenning[d]: Well, we have it available for the next opcode you want to look up
20:13karolherbst[d]: mhhh
20:14karolherbst[d]: I don't fine the `.F32.F32` variant...
20:17karolherbst[d]: mhhh.. odd
20:18karolherbst[d]: bit 76 is flipping the dest to .F32, so mhh
20:19karolherbst[d]: 77 is `.SATFINITE`
20:19karolherbst[d]: ohh
20:19karolherbst[d]: found it...
20:19karolherbst[d]: it's 78 🙃
20:19karolherbst[d]: it's illegal without .F32 dest.. no wonder I missed it
20:42karolherbst[d]: okay.. no hardware errors
20:44karolherbst[d]: sooooooo
20:44karolherbst[d]: what about vec8 in nak? 🙃
21:15karolherbst[d]: mhhh... I think I'll have to write some PTX code to figure that one out...
21:18karolherbst[d]: ahhh
21:18karolherbst[d]: nope
21:18karolherbst[d]: it passes 🙃
21:18karolherbst[d]: forgot to put the B input in column major
21:18karolherbst[d]: in the encoding
21:22karolherbst[d]: the 8x8x4 ones have their own layout, it's a bit of a huge pain
21:22karolherbst[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32777/diffs?commit_id=42cdc4b6c29d7ac101474dadd00c9e5c2ae67e59#0a9ff4b9096a5b117f30194f854d39bda79d1b21_442_501 could be worse
21:23karolherbst[d]: f32 uses 4 steps...
21:37orowith2os[d]: You could generate code that's so bad it makes Nvidia engineers reel in disgust, and they submit patches to make it faster
21:37orowith2os[d]: 🧠
21:41HdkR: Implying they don't already see that everyday internally </s>
21:41HdkR: 🔥
21:42karolherbst[d]: I'm more annoyed by the shuffles in the `cmat_convert` path 🙃
21:42karolherbst[d]: why use 1 matrix memory layout when you can have... 3?
22:24gfxstrand[d]: karolherbst[d]: Uh... Is there vec8 in hardware?
22:33HdkR: e2m1 going crazy.
22:54karolherbst[d]: gfxstrand[d]: .... maybe?
22:54karolherbst[d]: ehh yeah, for HMM.884.F32
22:54karolherbst[d]: 256 dest size
22:54karolherbst[d]: for HMMA.884.F32.F32 also the C source is vec8
22:55mhenning[d]: I think blackwell also has 256-bit loads for some things
22:55gfxstrand[d]: Pain
22:56karolherbst[d]: could be worse
22:57gfxstrand[d]: In principle, it's not hard to change. It just bloats all the instructions.
23:00gfxstrand[d]: I love `as_slice()` but it means all the sources have to be the same size so if it's vec8 then everything's a vec8. 🫤
23:01gfxstrand[d]: `RegRef` doesn't really care but `SSARef` needs to be able to reference 8 different things.
23:04karolherbst[d]: mhhh
23:04karolherbst[d]: let me check blackwell.. not caring _that_ much about the old way HMMA, but...
23:05karolherbst[d]: yeah...
23:05karolherbst[d]: blackwell has 256 bit load and stores
23:05karolherbst[d]: though...
23:05karolherbst[d]: that's vec4x2
23:06karolherbst[d]: you have two destination vectors 🙃
23:07karolherbst[d]: and two source vec4 for STG
23:08karolherbst[d]: ui..... blackwell is gonna be fun
23:08karolherbst[d]: the ISA docs I got are actually very detailed compared to older gens
23:10karolherbst[d]: coop matrices are going to be very fun there
23:11karolherbst[d]: ahh.. Hopper DMMA also needs 256 bit sources
23:12karolherbst[d]: but yeha.. the tensor stuff on blackwell also uses 256 bit quite a bit...
23:12karolherbst[d]: so I guess there isn't really a way to avoid it long term
23:14karolherbst[d]: anyway.. I think vec8 regs are only used in matrix/tensor ops
23:19snowycoder[d]: gfxstrand[d]: We could use a Box just for big stuff if it's infrequent enough
23:33gfxstrand[d]: Yeah. Maybe.
23:35gfxstrand[d]: If `SSARef` ends up punting to a vec or box when things get big, that's probably not the worst thing.
23:37gfxstrand[d]: I really need to look over the unbox MR