00:58 airlied[d]: karolherbst[d]: looking at the shader delta for that one shader, I think there'll be plenty to do, and that's only the first extension, the 2nd one put a lot more into the driver/compiler side to sort out
03:08 gfxstrand[d]: The 2nd extension is going to need some thinking and some serious NIR work. It adds function pointers of sorts.
03:08 gfxstrand[d]: I've been thinking about how we want to implement it.
03:12 gfxstrand[d]: Fortunately, I think most of the lowering can be in common code in terms of coopmat1, with just a bit of tuning code in the driver. But still...
03:36 airlied[d]: oh aco has an MR to force a vector into regalloc
03:37 airlied[d]: might make sense to generalise that at some point
03:58 gfxstrand[d]: Uh... Maybe? I'm skeptical
04:00 gfxstrand[d]: With 64-bit stuff, the problem isn't that we don't know it's a vector when parsing NIR. It's that NAK's RA is inherently scalar and our vector collector isn't working well enough.
04:03 airlied[d]: I'm badly hacking it now to see if I can vector collect smarter
04:42 airlied[d]: okay maybe after I read phiwebs a few times
06:28 airlied[d]: https://gitlab.freedesktop.org/airlied/mesa/-/commit/1ecdee24d628df77af03badc5421ef00fe1f2544 probably naive but does seem to do what I want 🙂
06:50 gfxstrand[d]: MOAR FLOPS!!!
06:51 tiredchiku[d]: https://tenor.com/view/everybody-do-the-flop-flop-dance-flop-gif-3526611
06:52 tiredchiku[d]: huh
06:52 tiredchiku[d]: RADV already converts vertex shaders to mesh shaders?
06:56 airlied[d]: if by the hardware is hardware that looks like a mesh shader, then yes kinda
06:57 airlied[d]: don't think it does a conversion to API mesh shaders
06:58 airlied[d]: gfxstrand[d]: I think smarter address calculations is probably the next big delta I see
07:00 airlied[d]: or maybe some predication
08:28 tiredchiku[d]: interesting
15:11 gfxstrand[d]: airlied[d]: Yeah, I've definitely seen some issues there.
15:12 gfxstrand[d]: And predication should help
15:45 gfxstrand[d]: eBay just told me my Volta is showing up today.
15:46 gfxstrand[d]: I'm gonna try to CTS it over the weekend. Here's hoping everything passes and I can submit conformance. 🤞🏻
15:47 gfxstrand[d]: And I think we're 2-3 bugs from conformance on Maxwell.
15:47 gfxstrand[d]: Getting everything to Vulkan 1.4 would be nice
15:48 gfxstrand[d]: Mesa 25.1 is looking like it'll be a big release for NVK.
15:48 magic_rb[d]: If only the gpus werent stuck at boot clocks...
15:48 gfxstrand[d]: Well, yeah...
15:48 magic_rb[d]: I still appreciate the work, dont get me wrong :)
15:49 magic_rb[d]: But its still a sad future for my 1060
15:49 gfxstrand[d]: And there are some that do. The 1060 is screwed, though.
15:50 magic_rb[d]: Yeah some can be reclocked manually afaik. So theoretically CPU side automatic reclocking is possible
15:50 magic_rb[d]: Not the 1060 thats just stuck forever
15:50 gfxstrand[d]: My 750 Ti relocks fine, I think. Not that it's a particularly powerful card...
15:51 magic_rb[d]: Its just sad that something i paid 350€ for is ewaste by now
15:51 magic_rb[d]: Even though it would be completely capable of lower end compute tasks
15:51 gfxstrand[d]: magic_rb[d]: The bigger problem on Maxwell, AFAIK is fan control. It doesn't matter if you can ramp up the clocks if the fans don't spin.
15:52 magic_rb[d]: Eh, nothing an external fan controller couldnt solve
15:52 magic_rb[d]: I cant attach an external clock tho :P
15:52 gfxstrand[d]: Yeah.
15:53 magic_rb[d]: I was thinking about that for my sparkle arc a310, apparently the fan control there is horrible
15:55 magic_rb[d]: Btw if anyone wants a 1060, i could part with it. I have 0 use for it
15:56 magic_rb[d]: I will be at xdc so i could take it with
15:56 gfxstrand[d]: There are Maxwell patches from a while ago that allow manual clock control. So you could hack something together I suppose. The patches don't work on Pascal, though. Those are just sunk.
15:56 magic_rb[d]: I dont have a maxwell card i could test with
15:56 magic_rb[d]: And i dont plan to buy one :/
15:58 gfxstrand[d]: Oh, for sure. I just buy random cards so I can test them. Then they go live in the pile.
15:58 magic_rb[d]: :D i dont have a big enough room for that
15:59 magic_rb[d]: I already have way too many things
16:00 gfxstrand[d]: I usually go for the cheap ones, though. I think if you eBay'd my entire collection, you still wouldn't get enough to buy a 5090.
16:01 magic_rb[d]: Yeah if youre only testing, not much point in buying the high end ones
16:01 magic_rb[d]: Though, for benchmarking? A high end card might uncover more bottlenecks
16:02 gfxstrand[d]: Yeah, that's why I really want to get a 5090 once they're available
16:04 magic_rb[d]: Jesus, 2.5k
16:13 zmike[d]: 2.5 will be cheap compared to how much they cost once they actually exist
16:15 snowycoder[d]: mhenning[d]: welp, neither modern nvdisasm nor nvcc work for nv30, do you know if there's an archive for older bins?
16:36 mhenning[d]: snowycoder[d]: Are you sure you mean nv30? That's a part from 2003 which predates cuda or compute shaders
16:37 snowycoder[d]: Sorry, sm30
16:37 snowycoder[d]: I'm using sm35 that is still supported in cuda 10 (I found an old docker container)
16:39 mhenning[d]: Ah, right. Nvidia should have older cuda versions on their website
16:40 mhenning[d]: Maybe this is the right one? https://developer.nvidia.com/cuda-11-8-0-download-archive
16:41 mhenning[d]: Yeah, looks like the 11.8 disassembler supports kepler https://docs.nvidia.com/cuda/archive/11.8.0/cuda-binary-utilities/index.html#instruction-set-ref
19:04 _lyude[d]: btw gfxstrand[d] just wanted to give you an update because this took way longer then I expected: have had a limited amount of time to look at the cursor issue because of what turned into basically a 3 day dentist appointment 😫
19:59 gfxstrand[d]: _lyude[d]: Ugh... That sounds horrible! Don't worry about it. 💜
20:01 mohamexiety[d]: dentistry stuff is always awful. hope you're feeling a bit better at least
20:01 _lyude[d]: yeah I'm doing fine haha, mouth hurts but things are going as well as they can
21:17 gfxstrand[d]: IEEE 754 is hard. :blobcatnotlikethis:
21:23 mhenning[d]: gfxstrand[d]: I thought you were off of work today? Is that what you do in your free time - read the IEEE floating point spec?
21:23 karolherbst[d]: mood
21:27 gfxstrand[d]: 😂
21:28 airlied[d]: hope it's a least a printed and bound copy for more relaxed reading
21:30 gfxstrand[d]: Nah. I have enough of it memorized that I can lie in bed thinking about floating point (which is what I did the night before last).
21:30 gfxstrand[d]: I am not a normal person.
21:32 gfxstrand[d]: So I finally sat down and actually typed out the thing I thought up in bed:
21:32 gfxstrand[d]: https://gitlab.freedesktop.org/gfxstrand/mesa/-/commit/aae65a2c358b89147ca7fe755d8394e1ff526498
21:33 mhenning[d]: I dunno I definitely sit around wondering about edge cases involving denorms in my free time, so you at least have company
21:34 karolherbst[d]: denorms? *sigh*
21:35 karolherbst[d]: gfxstrand[d]: impressive, glad we have you to solve those issues
21:36 tiredchiku[d]: meanwhile I'm here wondering if expired poison is more poisonous or less
21:36 gfxstrand[d]: Differently poisonous?
21:36 mhenning[d]: that's another good question
21:36 tiredchiku[d]: but at least now I know I'm missing the right mindset :ha:
21:37 tiredchiku[d]: gfxstrand[d]: sure, but is that more or less deadly than the original :doomthink:
21:56 gfxstrand[d]: https://tenor.com/view/princess-mostly-dead-slightly-alive-the-princess-bride-gif-15503255
22:15 asdqueerfromeu[d]: magic_rb[d]: I mean it has playable performance in GTA 5 with 4K resolution (at least the 6 GB variant)
22:40 snowycoder[d]: Wait, why does nvidia-proper driver generate instruction dependencies?
22:40 snowycoder[d]: They are in all nvcc-generated code for sm_30, sm_35, sm_37 (all kepler versions).
22:40 snowycoder[d]: P.s. Nop encoding works!
22:50 magic_rb[d]: asdqueerfromeu[d]: Problem is drivers, the proprietary driver is probably gonna drop support soon and the only place i could put it into are headless machines. And i do not trust the driver to not segfault my kernel into oblivion, especially with large ZFS pools around
22:50 magic_rb[d]: And i have a rule about running as little proprietary code as possible, running proprietary code in the kernel is exactly the opposite of that rule
22:51 magic_rb[d]: And since the 1060 can still pull its weight the whole situation bothers me that much more
22:51 mohamexiety[d]: gfxstrand[d]: goals honestly :KEKW:
23:32 redsheep[d]: gfxstrand[d]: I have officially called off the 5090 search, by the way. Went ahead and used the flood of stock pings I was getting to help the friend waiting on my 4090 get a 9070xt instead. Waiting for next gen, whatever that turns out to be called. In any case, no blackwell testing coming from me, at least not any time soon.