00:15esdrastarsis[d][d]: If dmesg is clean, its stable, ship it!
00:22gfxstrand[d]: hehe
00:22gfxstrand[d]: Gotta love things that changed on Ampere B...
00:25gfxstrand[d]: Good thing my 3060s are Bs
01:31gfxstrand[d]: This extension has an annoying number of corners. 😰
01:37gfxstrand[d]: gfxstrand[d]: Turing was the "shove as much in as we can" gen. Ampere B is the "Okay, seriously. We can do this better" gen.
01:38gfxstrand[d]: And Volta is the "Sure it's half-baked but look! New ISA!"
01:40redsheep[d]: That all fits pretty well with the apparent chip design and product strategy
01:41redsheep[d]: Volta was like a souped up proof of concept, ampere was where they got serious and started making all the new stuff performant
02:18gfxstrand[d]: Test run totals:
02:18gfxstrand[d]: Passed: 40633/107044 (38.0%)
02:18gfxstrand[d]: Failed: 2/107044 (0.0%)
02:18gfxstrand[d]: Not supported: 66409/107044 (62.0%)
02:18gfxstrand[d]: Warnings: 0/107044 (0.0%)
02:19gfxstrand[d]: Waived: 0/107044 (0.0%)
02:25gfxstrand[d]: And fixed the last two. Full CTS running
02:25gfxstrand[d]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31585
02:25mhenning[d]: it says 0% failures so it's perfect, right?
02:25gfxstrand[d]: hehe
02:26gfxstrand[d]: Okay. Bed time. I'll let the CTS run over night. Tomorrow I'll swap in my Turing and see how that's doing. Annoyingly, Turing is different (and maybe totally incorrect).
02:44gfxstrand[d]: `Pass: 380310, Fail: 40, Skip: 364647, Timeout: 2, Flake: 1, Duration: 19:55, Remaining: 52:48`
02:44gfxstrand[d]: Nooooooo
02:44gfxstrand[d]: Oh, well. I'll debug tomorrow when I'm not doing Q&A
03:08gfxstrand[d]: Looks like I probably won't out-of-date my talk, though. 🙈
13:35magic_rb[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1293929819293089813/1000002912.jpg?ex=6709298c&is=6707d80c&hm=979af26806babcffc8f36ad82fb5f3b79200ebe540223b6b3a6ef47b73306692&
13:35magic_rb[d]: Art
13:35asdqueerfromeu[d]: magic_rb[d]: Art nouveau 🥁
13:36magic_rb[d]: Nah this is a hardware problem i think
14:01mohamexiety[d]: NVK time in 5 minutes! https://www.youtube.com/watch?v=pDsksRBLXPk
14:19rhed0x[d]: I was hoping she'd say a few words on performance :(
14:22rhed0x[d]: like whats the main blockers for usable performance
14:29rhed0x[d]: someone asked ❤️
14:33mhenning[d]: I'd add to Faith's answer that there's still some obvious low hanging fruit in terms of performance. What comes to mind are: hierarchical z, instruction scheduling (in the sense of reordering), turning small ifs into predicated execution, using weaker memory orderings
14:59ahuillet[d]: (we call it zcull btw)
14:59ahuillet[d]: probably insn scheduling is the most important, but I haven't measured it to confirm.
15:06mhenning[d]: ahuillet[d]: oh, yeah all of the vendor names are jumbled together in my head
15:15mhenning[d]: ahuillet[d]: yeah, I'm not really sure which will be the biggest improvement on that list. I could see memory orderings being really significant if we're memory bound on a lot of apps
15:26notthatclippy[d]: Maybe we could find a way to disable individual bits in the blob driver and you can benchmark their effect, then prioritize based on that.
15:44mhenning[d]: yeah. to be honest, we probably want to do all of these things anyway, so doing them in the right order might not be super important
15:44mhenning[d]: if y'all want to give us info on how the memory orderings work or instruction latency tables that would save us some reverse engineering work though 🙂
15:44mhenning[d]: (actually, it's possible karol has some of that under nda, I'm not totally sure what he has)
15:45notthatclippy[d]: Yeah, I think such things would have to go through RH via NDA. They can request whatever they're missing through the formal channels.
15:46karolherbst[d]: I got the instruction latency info, just didn't really found time to type the code
15:48mhenning[d]: karolherbst[d]: For which gens?
15:48karolherbst[d]: Turing and Ampere
15:49karolherbst[d]: I think
16:22gfxstrand[d]: karolherbst[d]: I should have it shortly as well.
18:40airlied[d]: Was there any zcull progress?
18:47gfxstrand[d]: I started poking at it some but didn't get far.
19:05redsheep[d]: Is fragment shader interlock next after fragment shading rate? Kind of sounded like it from that slide
19:10redsheep[d]: Exciting that you'll have the latency info soon, hopefully that makes a big difference
19:36gfxstrand[d]: Oh, good. Element decided to stop working so I'm no longer in the XDC matrix room. :facepalm:
19:39gfxstrand[d]: redsheep[d]: Probably not. I'm actually going to vanish in a couple of weeks until probably Febuary. I'll be on Discord a bit but probably not writing much code. Certainly not taking on something like interlock.
19:40gfxstrand[d]: When I get back, I'll probably focus on perf and trying to sort out some of the compute stuff.
19:46karolherbst[d]: gfxstrand[d]: good to know it's not just me
19:48anarsoul: element is down for me as well
19:49skeggsb9778[d]: mhenning[d]: compression is another one, particularly as you need it for zero-bandwidth clear etc too
19:49skeggsb9778[d]: however, that might be tricky, as the VM_BIND code in nouveau forces everything to use small pages for some reason
19:49skeggsb9778[d]: and large pages are needed for all those things
19:50skeggsb9778[d]: those gave a very decent perf boost on earlier gens when it was added
19:51mhenning[d]: yeah, Faith added compression to the list on matrix
19:56mohamexiety[d]: yeah compression is my next target. currently trying to see if I can improve host copies atm but after that I'll start with that one
20:08gfxstrand[d]: skeggsb9778[d]: Oh, we should definitely fix that.
20:09gfxstrand[d]: But from what I've been told, compression works with sparse okay on Turing+
20:09gfxstrand[d]: Pre-Turing, yeah, there are more annoying restrictions.
20:19airlied[d]: gfxstrand[d]: just catching your talk, for nvk port I think porting to the Linux private API would be a good start, pretty sure their windows private data will be very close to the linux apis
20:22blockofalumite[d]: gfxstrand[d]: ~~This is what one gets for using Matrix~~
20:30gfxstrand[d]: airlied[d]: Yeah, probably. But now that I have the kernel APIs abstracted a bit, either should be pretty easy.
20:30gfxstrand[d]: And I've already got Nvidia engineers trying to troll me into porting NVK.
20:31airlied[d]: for amd I'd probably look at the stuff we expose from the kernel side on Linux and try and find matching values, since it's usually very similiar info just probably reordered
20:38airlied[d]: when you writing a d3d12 umd in mesa? 🙂
20:42zmike[d]: https://cdn.discordapp.com/attachments/1034184951790305330/1294037445972725812/20241010_164210.jpg?ex=67098dc9&is=67083c49&hm=b87b4bbfa3bb664a4ce890f0b38e78ffee6ab90f6be19a2eacb7fb7bbd92a7ba&
20:42zmike[d]: gfxstrand[d]: uh oh
20:43zmike[d]: In accordance with my branding policy I've had to cover one of your stickers
20:43gfxstrand[d]: airlied[d]: It's damn tempting, especially with the spirv announcement.
20:45redsheep[d]: That would remove the reliance on vkd3d-proton and the suboptimal perf on Nvidia... I've heard some people even say d3d12 feels like an API that's almost 1:1 with Nvidia hardware, would be interesting
20:45gfxstrand[d]: But there are too many interesting projects I could work on. I doubt that one will trickle to the top of the list.
20:48redsheep[d]: mohamexiety[d]: Let me know if I can help with that at all, I've done some poking at it, if you saw my spreadsheet with data on how prop appears to utilize L2
20:49redsheep[d]: IIUC the switch people have it working there, so there's probably some insight to gain from that
20:53redsheep[d]: Oh I wonder how that interacts with fragment shading rate, I'd bet the cache might hold more when that's on, it certainly works the other way around when msaa is on
20:54redsheep[d]: Might be worth making a vulkan port of trianglebin and adding a few more controls
20:57mohamexiety[d]: yeah there are quite a few starting points
20:59gfxstrand[d]: I think the first thing to do is to hack things up and try to figure out how compression impacts the memory layout of images. I know it has some effect but we don't know what it is yet.
20:59gfxstrand[d]: I also don't know how many layers we have to punch through to do that.
21:00gfxstrand[d]: I do know that I accidentally had compression enabled on Kepler and it was faulting right after the image.
21:03redsheep[d]: Kepler had render compression? That's news
21:08gfxstrand[d]: It's in the headers
23:47gfxstrand[d]: There we go. FSR merged and my XDC talk is now out of date. 😄