14:38marmistrz_: I'm wondering: what is the current state of automatic reclocking support in nouveau? The feature matrix states WIP, but how far is that from "done"?
14:40imirkin_: karol has patches
14:40imirkin_: for ... some part of it
14:40imirkin_: i'd say it's about 20% done
14:40imirkin_: where 100% done is "enabled by default"
14:41karolherbst: 20% sounds about right
14:42karolherbst: latest version is here: https://github.com/karolherbst/nouveau/commits/better_dr_v2
14:42karolherbst: I think
14:42karolherbst: let me check
14:42karolherbst: nope, the important bits aren't there yet
14:43karolherbst: mhh I don't have a working version anymore, because those were all flawed by design
14:49karolherbst: imirkin_: do you have some kind of list in your mind about all the painful parts about nvir RA?
14:50imirkin_: most painful is messing with control flow once you're in SSA
14:50imirkin_: oh. wait. nvir RA.
14:51imirkin_: the spill pass is downright wrong.
14:51imirkin_: i've tried to wrap my head around how to fix it, but i've come up short
14:51imirkin_: it's ultimately a result of how the merging stuff works.
14:51karolherbst: yeah okay, I am kind of aware of some bugs in the spill pass
14:51karolherbst: I know that one
14:51karolherbst: I tried to fix it ;)
14:52imirkin_: i also think that there are problems with fixed regs
14:52karolherbst: fixed regs?
14:52imirkin_: like around function calls
14:52karolherbst: like "this has to be in $r1"?
14:52imirkin_: (as with our pre-written procedures)
14:52imirkin_: it generally works but ... there apepar to be issues
14:52imirkin_: that goes to bugs with the RA
14:52karolherbst: I think we also have some issues with deep for loops and spilling in comination
14:53imirkin_: as for RA features ... i think we could do better with merged regs
14:53imirkin_: there are a lot of constraints, and i think they're not being properly represented
14:53imirkin_: which in turn causes the sub-par RA
14:53imirkin_: perhaps there's a way to express them differently which will let us allocate better
14:53imirkin_: it's pretty common to e.g. see
14:54imirkin_: mov $r5..r9, various values
14:54imirkin_: mov $r12..r15, $r5..r8
14:54imirkin_: do thing with $r12q
14:54karolherbst: ah, right
14:54imirkin_: whereas if it originally moved the values into the right place
14:54imirkin_: then this would all have been fine.
14:55karolherbst: I was thinking about having like a unbalanced tree based register allocation thing, where you can have leave of different sizes. Kind of how memory allocation can be impemented
14:56karolherbst: no idea if this would be a good idea here
14:58imirkin_: "thinking" and "RA" are generally incompatible concepts
14:58karolherbst: it was just an idea
14:58imirkin_: what you really want to do is find a paper that describes the constraints you're trying to solve under
14:58imirkin_: and implement the thing that that paper proposes verbatim
15:03robclark: karolherbst, btw, there is a graph coloring thing which deals w/ register classes which can conflict (ie. overlap) with registers from another class.. that is what util/register_allocate implements.. not that I've looked at what nvir does, but that might help w/ places you need certain values in successive registers
15:03robclark: (ir3 needs that for arguments to tex instructions and a few other places)
15:03robclark: (and iirc i965 is similar that way)
15:06karolherbst: well in nvir we can have things like ld b64 $r1d c0[0x0], where it can also be a ld b128 $r1q c0[0x0]. And in RA we just try to mark the whole range of registers being used, bascially
15:07imirkin_: yeah, so we do the merging in MemoryOpt
15:07imirkin_: which is prior to RA
15:07karolherbst: well we merge two ld ops together, but in theory we could have 64bit loads as inputs, right?
15:07karolherbst: maybe that will be a thing with spir-v or nir?
15:07karolherbst: no idea
15:08karolherbst: well I assume this isn't a thing in nir, but I don't see why spir-v shouldn't do it
15:09robclark: nir can have 32b or 64b (or now 16b) values..
15:09karolherbst: same goes for spirßv
15:09imirkin_: it's not so much the bits of the value
15:10imirkin_: as a sequence of registers
15:10imirkin_: freedreno has that too
15:10karolherbst: so we mind end up with a ld b64 before doing any opts
15:10robclark: heheh, iirc spriv can have 183 bit values if you want :-P
15:10karolherbst: spir-v has a way to declare bit widths of types
15:10karolherbst: which is kind of awesome
15:10imirkin_: but unlike freedreno, fermi/kepler1 gpu's have 64 regs, not 256.
15:11imirkin_: and sucking them up 4 at a time causes all sorts of problems.
15:11imirkin_: we almost never have RA issues on kepler2+ :)
15:11robclark: but from RA perspective, if 64b ops work on 2 consecutive 32b regs, then you'd define two register classes, one w/ 32b values, one w/ 64b with half as many regs, each of which conflict w/ corresponding two 32b regs
15:12imirkin_: robclark: yeah, our RA doesn't really have "classes" - it's somewhat different concepts.
15:12karolherbst: kind of like if you take $r1 from the 32bit class, $r0 from the 64bit class is occupied?
15:12robclark: karolherbst, right
15:12karolherbst: I see
15:12karolherbst: sounds like a good idea
15:12robclark: imirkin, hmm.. what algo *does* nvir use?
15:14robclark: karolherbst, fwiw, https://cgit.freedesktop.org/mesa/mesa/tree/src/util/register_allocate.c#n55
15:14imirkin_: whatever calim implemented... it's based on some paper
15:14imirkin_: but i'm not 100% sure which one
15:14karolherbst: robclark: sounds good
15:15imirkin_: feel free to peruse.
15:15karolherbst: robclark: all I know is, that we can have 32bit, 64bit, 92bit and 128bit values in nvir
15:15imirkin_: i assume "RIG" node came from somewhere
15:15imirkin_: 96bit :)
15:15karolherbst: ahh right
15:15karolherbst: 192, ends with 2
15:15imirkin_: 92 would just be mean.
15:15robclark: GCRA.. I guess it is some sort of graph coloring at least..
15:15karolherbst: 24bit is mean already
15:15karolherbst: but 92... :D
15:16imirkin_: robclark: it also deals nicely with the nv50-era situation of having everything be 32-bit regs, but sometimes addressing them 16-bit at a time
15:16imirkin_: well, it basically has this "simplify" step which messes around with ... stuff.
15:16karolherbst: well with maxwell we also have hi/lo bits in some instrcutions
15:17karolherbst: is that a thing on fermi/kepler?
15:17imirkin_: robclark: it also deals with predicates, unions, etc
15:17imirkin_: karolherbst: yeah, it is. but with nv50 we actually do everything as 16-bit
15:17imirkin_: for the benefit of mul
15:17karolherbst: simplify isn't doing what the name suggests... at least not when I looked into it
15:17imirkin_: karolherbst: should be called "complicate" :)
15:17karolherbst: imirkin_: sounds totally like super pain
15:18karolherbst: I am sure if you understand that stuff, it indeeds "simpifies"...
15:18karolherbst: but maybe not in the meaning we are used to
15:18robclark: fwiw, ir3 does 32b, 64b, 96b, 128b, 256b, and 320b sizes :-P https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/freedreno/ir3/ir3_ra.c#n87 (where 320b is really just 10 consecutive registers)
15:19robclark: well, it doesn't really have even 64b alu instructions, most of the larger sizes are for variations of tex fetch instructions
15:19karolherbst: play well, nvir does have those
15:20karolherbst: I think
15:20karolherbst: well, yeah, I am pretty sure it does
15:20karolherbst: it feels like it is cheated away in some form, but yeah
15:21karolherbst: basically before doing RA we don't care about any of that, not really. there are some split passes to make thins... compatible? would need to look that up
15:22karolherbst: right we split integer mad and muls
15:23karolherbst: the 64bit ones
15:27imirkin_: yeah, coz for opt purposes, we want to keep it as 64-bit
15:28imirkin_: and then we fix it up in the legalize step or around there
15:28imirkin_: (which is post-ssa-opt lowering)
15:30karolherbst: we actually have a SSA based opt to split those
15:34imirkin_: well... not an opt
15:34imirkin_: it's a fixup
15:34karolherbst: okay, right
15:34karolherbst: it is a pass
15:34imirkin_: but that pass prevents a lot of optimizations, so it comes last
15:35imirkin_: (or last-ish)
16:28karolherbst: well I kind of like the idea of those "util/register_allocate.h" things. I would like to remove some own written code and replace it with common code within nvir
16:29imirkin_: yeah ..... i generally view the nvir code as being better
16:29karolherbst: well, the register class thing pretty much convinced me
16:29imirkin_: won't help anything.
16:30karolherbst: it will clean the code up
16:30karolherbst: why not?
16:30imirkin_: the problem isn't the RA.
16:30karolherbst: non understandable code is a problem as well
16:30imirkin_: all RA is non-understandable.
16:31karolherbst: then all code of RA is the problem
16:32karolherbst: I am just saying, if we can replace some code in nvir with code, which more people know about, then this is a good thing
16:32karolherbst: and if we end up with less code in the nouveau part, then this makes it easier for us, hopefully
16:34karolherbst: or we continue with our situation, where nobody really knows what to do and we somehow try to fix it or not, because nobody has time
16:47imirkin_: well - here's the deal
16:48imirkin_: there are two things going on
16:48imirkin_: first of all - "you break it, you buy it"
16:48imirkin_: i.e. if you want to flip things over to $other thing, fine, but now you're the owner/expert of it, and get to fix everyones bugs
16:48imirkin_: secondly - you can't just do it willy nilly - it has to not regress everything horribly
16:48imirkin_: my guess is that it will.
16:49imirkin_: my guess is that it's a giant waste of time
16:49imirkin_: but i could be wrong.
16:49karolherbst: I am just currently thinking if a completly new RA pass might be a better idea, so that we can switch between the old and the new one and keep both until the new one is better in most regards
16:50imirkin_: and it's not my time
16:52karolherbst: well, that is how I see it: the RA code should be written in such a way, that even a new dev shouldn't have troubles to at least understand the code. Even if that means not understanding the concepts behind it. Or not being able to change anything about it.
16:52karolherbst: and the current situation doesn't help anybody
16:54imirkin_: what's hard to understand about the current code
16:54imirkin_: i just have no idea how the algo behind it operates
16:54imirkin_: either way ... wtvr
16:54imirkin_: my above points stand no matter what else is said
16:54karolherbst: well, right
16:55imirkin_: i certainly have little interest in trying to wedge a different RA algo into the current compiler, since the existing RA seems perfectly fine
16:55imirkin_: however if you'd like to give that a go, be my guest.
16:55imirkin_: imo it's an enormous waste of time, but it's your time.
16:56karolherbst: maybe, maybe not
16:56imirkin_: the argument at the end can't be "but i just spent a bunch of time on this, so even though it sucks, i'd like to merge it anyways" :)
16:57karolherbst: I didn't put much thoughts into what would be the "best" way of action here
16:57karolherbst: well, I am sure it is possible to cut the entire RA thing into smaller pieces and see what we may replace or rather fix
17:00imirkin_: not sure why you're so focused on RA
17:00imirkin_: it's pretty good given all the constraints.
17:01imirkin_: if you want to improve things, add a scheduling pass. or review RSpliet's.
17:01karolherbst: right and it has bugs nobody fixed so far
17:01karolherbst: I am sure if those were easier to fix, somebody would have already
17:01imirkin_: the bugs aren't in the RA.
17:01imirkin_: the bugs are just adjacent to the RA, like spilling
17:01imirkin_: and have nothing (directly) to do with the RA
17:01karolherbst: well right
17:02imirkin_: switching to a different RA won't make those bugs go away
17:02karolherbst: but the spill pass is called while doing ra
17:02karolherbst: you can't take the spill pass away without RA being not usable
17:03imirkin_: this conversation is going in circles. i'm done. i think you understand and agree with what i'm saying.
17:03imirkin_: specifically - if you replace it, it can't be worse.
17:05karolherbst: well I don't know what the best thing to do is here. just want to keep everything an option until somebody looked into it and comes to some kind of conclusion about what to do
19:18Lyude: When nvapeek says "..." what does that mean?
19:20mwk: it means all values in a range were equal to 0
19:23marmistrz_: karolherbst, and how much is "ready for occasional use after enabling it manually"? :D
19:23marmistrz_: (re: auto reclocking)
19:24karolherbst: mhh I don't think currently anything is
19:30marmistrz_: and "won't burn my GPU"? :P
20:56pmoreau: When you think it will be a quick pull request, and ten hours later you are still working on it and realising you need to add more stuff. --"
21:12RSpliet: imirkin_: RIG presumably is just Register Interference Graph ?
21:16RSpliet: In fact, that statement shouldn't have had a question mark
21:17imirkin_: that sounds nice and official.
21:17imirkin_: anyways, i've definitely seen the simplify thing in some papers
21:17imirkin_: it's based on some paper.
21:17imirkin_: the idea is that you "simplify" the graph iirc
21:21RSpliet: Ah yes, I guess that's the one part where RA is doing non-standard cleverness, which seems to be part of the spilling logic
21:30RSpliet: I suspect it's all less complex than it looks. Spill logic searches for RIG node with as many references as possible, puts them in a shortlist of low vs high degree (# interferences). It's a heuristic
21:32RSpliet: I suspect I must say RIG nodes because it might cover more than one SSE register (probably to deal with phi nodes, seems like the purpose for the loop in line 1247)
21:32RSpliet: And more random suspicions :-)
21:37imirkin_: it's really not that complicated.
21:37imirkin_: the simplify algo is the complicated thing
21:37imirkin_: but that's the thing from the paper
21:38imirkin_: the issue is in the literal logic for how the spiller adjusts stuff, in the presence of merged nodes
21:45RSpliet: Yeah, it sounds like the kind of thing where you go on a 3-day study-trace-debug-repeat bender and find the bugger, appreciating the RA code in the process. Starting from scratch isn't going to result in inherently better code than this...
21:46imirkin_: i think i have a handle on what's going wrong, but not necessarily how to fix it
21:46imirkin_: however it's a tenuous handle, and insufficient to explain to anyone
21:47RSpliet: Sounds like you've gotten to the end of day two of your bender ;-)
21:47imirkin_: [i.e. with the spilling issues]
21:47imirkin_: yeah, unfortunately it was like 6 months ago
21:47imirkin_: and i never got to day 3 :)
21:58Lyude: Would someone with the vbios repo mind looking at my nvf0 mmiotrace and see if they know what part of the hw initialization process this is happening around? https://paste.fedoraproject.org/paste/sCZ7ha3V2tXRI1pgzWnTKw I've been trying for a pretty good while now and I can't figure out where the right spot to throw those mmio writes for blcg actually is (they seem to be written at a different point then the
21:58Lyude: rest of the BLCG registers, and get written in that sequence twice)
22:09Lyude: karolherbst RSpliet ^ if you have any idea
22:12RSpliet: Lyude: already got hold of an old fashioned hex-editor to find those regs in the big blobby bios?
22:13RSpliet: oh wait, you
22:13RSpliet: re not sure whether it's an init table...
22:13Lyude: I can; but I'm not sure what the bios has to do with this. i'm really moreso just trying to figure out where to stick these writes in nouveau
22:13Lyude: like: all of the other BLCG writes fit in the right spot if you put them in therm->oneinit
22:14Lyude: erm, sorry, CG_CTRL
22:14Lyude: fb does the blcg writes at the end of oneinit
22:14Lyude: but i can't figure out where these writes need to be done in nouveau
22:14RSpliet: how about a hook that gets called from fb?
22:15Lyude: well yeah but the problem is I'm not sure -where- in fb's init process
22:15Lyude: additionally; it looks like they might happen after we set up some sort of memory related thing judging by the repetition of the writes later
22:15Lyude: or something that gets set up in a pair of two
22:16RSpliet: oh very likely, you might want to post some more context around those writes in fpaste, like 30 lines before and after or sth
22:16Lyude: yeah sure
22:17Lyude: RSpliet: https://paste.fedoraproject.org/paste/ESt1pTwK7WS3iD0kSzBDbg
22:17Lyude: it's definitely somewhere near the start of fb and i2c
22:18Lyude: (also you can look those up in the vbios repo, I uploaded my traces there)
22:19RSpliet: I suspect "PFFB.UNKC00.UNK08_SYSRAM_ADDR <= 0xfefdd000" is going to be your biggest hint
22:20Lyude: that's what I figured as well; but if I look where that gets written it's the same spot we're writing the rest of the registers: https://github.com/Lyude/linux/commit/11f353a3a9e1c689dad071b70e414298fa83d776#diff-5985947ac9b42b52b15d300e6335090c
22:22RSpliet: That's not a bad thing is it? Nouveau does not nearly initialise as much as the blob
22:22RSpliet: it doesn't touch clocks, it doesn't touch the line buffer...
22:23Lyude: that's true I suppose
22:23Lyude: just trying to be extra careful, since I've already found a few things CG things that you can only misprogram on certain gpus
22:24RSpliet: blob continues emptying the page that the UNKC00.UNK08_SYSRAM_ADDR is pointing to...
22:26RSpliet: Or... that's my suspicion. But that's not a very strong suspicion considering it's pointing to VRAM through PBUS, and that's physically addressed (so I doubt that can map to that SYSRAM page)