03:38 imirkin: karolherbst: on the bright side, reproduced the issue :)
03:38 imirkin: now i'll investigate.
03:40 imirkin: definitely a neat example of a fail
03:40 imirkin: no obviously natural reason for it to be dying
03:40 imirkin: the only odd thing is the pre-allocated regs for outputs, but that should be quite well-supported
03:40 imirkin: could be that this is the problematic part somehow though
03:41 imirkin: [esp odd is that there are _17_ of them, but that's still a fraction of the total 64. even if there were 0 overlap, we'd still have enough regs for all the values i think]
03:54 imirkin: hrmph. feels like there might be an off-by-1 somewhere in the live interval calculation
03:56 imirkin: (or something equally dumb)
03:59 imirkin: hrm, no. i think the intervals just work differently than i imagined?
04:00 karolherbst: imirkin: 17*4 > 64
04:00 karolherbst: I was not joking if I said those value take up vec4 slots
04:00 imirkin: ok
04:00 imirkin: i mean ... that's also a theory, but pretty sure not a correct one
04:00 imirkin: i don't see where that'd be happening
04:01 karolherbst: it's some issue in how we handle the nodes
04:01 karolherbst: wait..
04:01 karolherbst: maybe I remember the place
04:01 imirkin: perhaps the shift is getting applied where it shouldn't be
04:01 imirkin: since these are 32-bit regs (vs 8-bit or whatever)
04:01 imirkin: there's a shift of 2 applied in some places
04:01 imirkin: applying it incorrectly = death :)
04:01 karolherbst: no, it was something more annoying
04:02 karolherbst: imirkin: okay.. sooo
04:02 karolherbst: the isse was with the RelDegree table
04:02 karolherbst: *issue
04:02 imirkin: sad
04:02 karolherbst: yes
04:02 imirkin: that's the thing i understand the least.
04:02 karolherbst: yep
04:02 karolherbst: so
04:02 karolherbst: you index it and it returns something
04:02 imirkin: yes, i got that far :p
04:02 karolherbst: and it turns out, that it was so, that those values get a vec4 slot reserved
04:02 karolherbst: because I think they _might_ get used as vec4
04:03 karolherbst: but they just aren't
04:03 imirkin: no...
04:03 karolherbst: the shader is really really crafted to hit this issue
04:03 karolherbst: it won't happen without those tex
04:03 imirkin: yeah, i'm sure the merge's are key
04:03 karolherbst: sooo
04:03 imirkin: but it does insert the constraint mov's
04:03 imirkin: so at least that's nice
04:03 karolherbst: what the algo is doing is to assume the worst case
04:03 karolherbst: or well.. that's the reldegree table
04:04 karolherbst: so in the worst case, we have to assign vec4 slots, because of tex
04:04 karolherbst: or something
04:04 karolherbst: and I think it propagates through the shader
04:04 karolherbst: my knowledge here is very fuzzy as I looked into it like 2 years ago
04:05 imirkin: i mean ... all the RIG nodes are 1 color, except the 2 which are 4 (and need to be 4)
04:05 karolherbst: yeah, that's the fun part
04:05 karolherbst: the RIG stuff is all fine
04:06 karolherbst: I am convinced it's in the reldegree table and how we use it
04:06 imirkin: i'm definitely weirded out by the live intervals
04:06 karolherbst: :)
04:06 imirkin: lke
04:06 imirkin: it gets printed as [a, b)
04:06 karolherbst: when I started to look into this issue the shader had like 200 instructions and I was like: let's make it smaller
04:06 imirkin: oh, i guess for defs ... makes sense
04:06 imirkin: the reg is still used on instruction b
04:07 imirkin: but b can also redefine that reg if it wants
04:07 imirkin: anyways, neat example
04:07 karolherbst: yeah
04:07 imirkin: i won't have time to sort it out today
04:08 imirkin: but i'll try to figure it out later
04:08 karolherbst: I like how it's obvious that something is wrong :D
04:08 imirkin: yeah
04:08 imirkin: it never even gets to assinging any colors
04:08 imirkin: so something's pretty fucked up-front
04:08 karolherbst: yep
04:08 imirkin: question is 'what'
04:08 karolherbst: but the code totally looks fine
04:09 karolherbst: it's just the degree we use are odd
04:09 imirkin: i like it :)
04:09 karolherbst: that shader made me think that our RA is busted :p
04:10 karolherbst: maybe you find a different solution than to replace it, but who knows
04:10 imirkin: i can see why you might think that
04:10 karolherbst: although I think we just need to change how we deal with the reldegree stuff
04:11 karolherbst: back at the time I even understood what that RelDegree stuff is
04:28 imirkin: fun. nvc0 works, nve4 doesn't.
04:28 imirkin: has to do with the tex args coalesce probably then
04:29 imirkin: ah yeah, single quad arg vs 3 + 1
04:45 karolherbst: imirkin: look at how the value gets read out from RelDegree, I think this was key for me understanding what's going on
04:45 imirkin: yeah
04:45 imirkin: i'll dig into it more later
04:45 imirkin: heading off to sleep soon
04:46 karolherbst: me as well...
15:57 cwabbott: imirkin: in case you didn't know, because I don't think the nouveau code has a reference, the nouveau RA uses the same extension to graph coloring as util/ra
15:58 cwabbott: it's based on "Retargetable Graph-Coloring Register Allocation for Irregular Architectures"
15:58 cwabbott: so if you want to actually understand it you can go read that paper
23:28 anholt: heads up: I'm moving, and the jetson CI will be unavailable for a couple days.