14:08 kuter7639[d]: Wanted to ask why Mesa backends general use register allocation in SSA form (Braun-Hack et al algorithm) instead of more Chaitin-esque allocators. Is it because SSA based allocation is faster? Remember seeing somewhere that vector register (adjacent registers being written to by a instruction) usage had something to do with it but I don't remember where.
14:10 kuter7639[d]: Any pointers to papers or discussions would be appreciated.
14:17 kuter7639[d]: if anyone else is interested, there is this talk that talks about this a bit. https://www.youtube.com/watch?v=lHhV6KyNCG0
14:19 karolherbst[d]: yeah.. I think the tldr is that SSA reg alloc is practically O(n log n)
14:46 gfxstrand[d]: O(n)
14:46 gfxstrand[d]: Well, data-flow gets to be a bit more like n^2 worst case
14:46 gfxstrand[d]: But traditional graph coloring is like O(n^4) or something stupid like that.
15:00 karolherbst[d]: ahh
15:02 karolherbst[d]: I thing the worst part of graph coloring is, that you retry after spilling
15:02 dadschoorse[d]: I think another advantage of SSA reg alloc is that you can compute a fixed maximum register pressure beforehand. That allows you to do spilling as a separate step and since you have that register pressure info, it can also be used to schedule memory loads without causing additional spilling
15:03 karolherbst[d]: yeah.. that's somewhat useful on nvidia, where the amount of registers used actually impacts how many threads you can run in parallel
15:03 karolherbst[d]: though the question is if an additional spill offsets the perf gain
15:04 gfxstrand[d]: karolherbst[d]: Yeah, graph coloring itself is only about O(n^2), maybe a little more. But then you retry every spill and things go downhill fast.
15:04 karolherbst[d]: yeah...
15:04 karolherbst[d]: though codegen was smart enough to spill everything at once and then retry
15:05 karolherbst[d]: but there is a bug
15:05 karolherbst[d]: and uhm..
15:05 karolherbst[d]: it's an annoying one
15:05 gfxstrand[d]: Also, there's no way that you can spill up-front with graph coloring unless you literally spill everything.
15:06 gfxstrand[d]: SSA-based allocators re-shuffle the register file to defragment as they go. Graph coloring can't. This can lead to shaders which theoretically consume less than half of the register file failing to allocate. I did a bunch of experiments with this when I was working on IBC.
15:07 karolherbst[d]: I have an example where I made it fail with 1/4
15:07 karolherbst[d]: codegen that is
15:07 dadschoorse[d]: did you ever write a ssa reg alloc for IBC?
15:07 gfxstrand[d]: There are other RA strategies which aren't technically SSA such as linear scan with 2nd chance bin-packing but they tend to be equivalent to SSA-based in the end.
15:07 gfxstrand[d]: dadschoorse[d]: No I didn't
15:07 gfxstrand[d]: I wanted to but I spent most of my time just getting the thing to work.
15:07 gfxstrand[d]: And I didn't end up going all-in on SSA, either, which was a mistaek.
15:10 dadschoorse[d]: seems like intel is still on that path, even with the recent brw MRs that moved more things to ssa
15:12 dadschoorse[d]: kuter7639[d]: as far as I understand, vector registers are a problem that you have to solve when you want to do ssa regalloc, not something where ssa regalloc has some major advantage
15:12 gfxstrand[d]: IDK what Intel is doing
15:12 gfxstrand[d]: I've washed my hands of it at this point.
15:13 gfxstrand[d]: dadschoorse[d]: Vector registers are also annoying for graph coloring but a lot of the graph coloring research was done on really strange architectures so the standard papers handle it okay if you know how to set up your register classes.
15:13 dadschoorse[d]: dadschoorse[d]: aco's vector handling isn't the greatest for code gen for example
15:15 dadschoorse[d]: I think daniel had some ideas for how to improve vector handling in aco, but he always finds something else to work on instead
15:20 gfxstrand[d]: Yeah, his new plan is basically what I did for NAK. IDK if it's actually better or not, though.
15:21 gfxstrand[d]: It has its own set of problems
17:45 zmike[d]:struggles to use gitlab
17:50 zmike[d]: so unbelievably slow
17:50 karolherbst[d]: yeah...
17:50 karolherbst[d]: maybe we should restart it 😄
17:54 karolherbst[d]: zmike[d]: I figured out what's wrong 😢
17:54 karolherbst[d]: some domain spams new accounts
17:55 zmike[d]: :hypertensionheadache:
17:56 redsheep[d]: Careful with your sodium intake there zmike
17:56 HdkR: We must return to mailing lists to solve the problem
17:57 karolherbst[d]: 8 new users since my last message 🥲
17:57 tiredchiku[d]: damn
17:57 zmike[d]: if only we had more than one person able to block these spammers
17:57 karolherbst[d]: yeah.. maybe I should ask what I need to do in order to block domains
17:57 karolherbst[d]: because it all comes from the same email domain
17:58 gfxstrand[d]: zmike[d]: does anyone but you review kopper patches?
17:59 clangcat[d]: karolherbst[d]: I mean you should easily be able to black list the domain
17:59 clangcat[d]: Though
17:59 karolherbst[d]: clangcat[d]: well.. you can't do that in the gitlab UI
18:00 clangcat[d]: karolherbst[d]: I mean would you not block it in your hosting/server software
18:00 tiredchiku[d]: iftables time
18:00 clangcat[d]: Or that
18:00 clangcat[d]: there are many places
18:00 karolherbst[d]: yeah, but I don't have access
18:00 c133[d]: I thought nftables was the future
18:00 clangcat[d]: karolherbst[d]: what email provider is it?
18:00 tiredchiku[d]: something something
18:00 tiredchiku[d]: idk
18:00 clangcat[d]: c133[d]: Ehhh I mean people are every point have gone "this is future"
18:00 tiredchiku[d]: networking is for people who hate themselves anyway
18:00 clangcat[d]: then it's not
18:01 clangcat[d]: tiredchiku[d]: But i hate myself
18:01 clangcat[d]: and I don't do networking
18:01 tiredchiku[d]: cat :(
18:01 karolherbst[d]: clangcat[d]: a spamming one
18:01 juri_: hate yourself less. :)
18:01 tiredchiku[d]: I agree juri_
18:01 karolherbst[d]: the domain doens't even have a webpage
18:01 clangcat[d]: karolherbst[d]: Is it like an actual email provider
18:01 clangcat[d]: or just some domain someone bought
18:02 karolherbst[d]: just some domain
18:02 clangcat[d]: juri_: yea really not that easy
18:02 zmike[d]: gfxstrand[d]: nope
18:03 zmike[d]: unless they're my patches, and then I have to try wrangling airlied or ajax to rubber stamp me
18:04 clangcat[d]: zmike[d]: You could just make a seperate online indentity
18:04 clangcat[d]: Build them up until they can review patches.
18:05 clangcat[d]: Then you can review your own patches
18:05 zmike[d]: :galaxybrain:
18:06 clangcat[d]: zmike[d]: Call them like amike.
18:06 clangcat[d]: that way no one will suspect it's just you.
18:08 karolherbst[d]: anyway.. I've sent an abuse report, maybe that takes them down quicker than admins being available 😄
18:11 karolherbst[d]: ohh, I can actually do that myself
18:11 karolherbst[d]: zmike[d]: should be better now
18:12 karolherbst[d]: maybe
18:12 karolherbst[d]: dunno
18:12 karolherbst[d]: maybe it takes a while
18:13 karolherbst[d]: okay.. so sign-ups have stopped at least
18:13 karolherbst[d]: the server might probably still try to create new accounts
18:15 clangcat[d]: karolherbst[d]: It would suprise me if it's automated
18:15 karolherbst[d]: yeah.. dunno
18:15 karolherbst[d]: I mean.. it's not hard to automate this
18:15 clangcat[d]: Oh no
18:15 clangcat[d]: wouldn't
18:15 clangcat[d]: Blegh brain is shit
18:16 karolherbst[d]: let's see how long this is already ongoing
18:17 karolherbst[d]: oh wow...
18:17 karolherbst[d]: that's a looooot
18:17 karolherbst[d]: the domain kinda started this week
18:18 karolherbst[d]: ehh last week
18:18 clangcat[d]: karolherbst[d]: Anyway who'd you piss of to get the to spam you
18:18 karolherbst[d]: 16400 acounts since last Wednesday
18:18 clangcat[d]: XD
18:21 karolherbst[d]: okay.. no new accounts in the last 10 minutes at least
18:22 karolherbst[d]: gitlab still slow though :blobcatnotlikethis:
18:23 clangcat[d]: karolherbst[d]: I mean still probabyl has to process all the times this domain tries to connect
18:23 karolherbst[d]: yeah...
18:25 karolherbst[d]: also a db with that many users might slow down gitlab regardless
18:26 karolherbst[d]: ahh.. feels faster now
18:27 clangcat[d]: karolherbst[d]: yea that to
20:02 gfxstrand[d]: Are NVIDIA VRAM pages 16KiB or 64KiB?
20:03 karolherbst[d]: 64 k
20:04 karolherbst[d]: though there is also support for 4k pages
20:04 karolherbst[d]: some archs also have 128k
20:04 karolherbst[d]: anyway... 4k, 64k, 2M are supported everywhere, 128k is pre-ada and 512M is GA100
20:05 gfxstrand[d]: So 4k is supported even for VRAM?
20:06 gfxstrand[d]: Then why are we bumping things to 64K various places? Or is that just to satisfy max image alignment requirements?
20:06 karolherbst[d]: sparse seems to be 4k and 64k only
20:06 karolherbst[d]: gfxstrand[d]: I don't really know the details here, there are also some notes about things being decided at boot time and such
20:07 gfxstrand[d]: 😢
20:07 karolherbst[d]: but I think that's related to what pages sizes are supported
20:07 karolherbst[d]: not one being choosen
20:07 gfxstrand[d]: I know that Intel only supported 16k (or was it 64k?) and bigger for VRAM for $reasons
20:08 gfxstrand[d]: But if NVIDIA is 4k pages always, that makes things simpler.
20:08 karolherbst[d]: but yeah.. I suspect it has to do with alignment, because the GPU does have random alignment requirements for various things
20:08 gfxstrand[d]: As long as those are virtual alignments and not physical, we're fine
20:23 gfxstrand[d]: Looks like nouveau just uses the OS page size
20:28 notthatclippy[d]: Does Mesa take a big chunk from the kmod and then suballocate, or does it let the kernel handle all the individual allocations?
20:30 notthatclippy[d]: The proprietary driver does a lot of suballocating, on the graphics but especially the compute side, so larger page sizes make a lot more sense there. Particularly when you're dealing with 64+GB of VRAM that needs to be arbitrarily shared between applications
21:16 gfxstrand[d]: Vulkan apps should.
21:16 gfxstrand[d]: NVK doesn't currently sub-allocate for things like pushbufs but we could.