00:36karolherbst: okay, there is really something wrong in the gddr5 r1373f4_init function
00:39karolherbst: mhh maybe not if I think about that
00:39karolherbst: pstates 07 and 0a have only a static clock for the memory, so this entire stuff doesn't matter for it
03:15karolherbst: I think I got it
03:15karolherbst: I think this "--" is wrong
03:17karolherbst: and it has to be a << 16 shift
03:17karolherbst: const u32 mcoef = (( ram->P2 << 16) | (ram->N2 << 8) | ram->M2);
07:03pmoreau: Going to update glibc version support in Valgrind, unless someone is against it. :-)
07:54imirkin: pmoreau: you should check with joi as to what his procedure was for that
07:55pmoreau: I made the same patch as he did for 2.21 (well, obiously I did change the 2.21 into 2.22 :D)
07:56pmoreau: but some testing would have probably been better...
07:57imirkin: ok, if you're doing the same thing he did, then you're in the clear
09:54pmoreau: mwk: I updated the pull request to take into consideration your comments. I also added a commit to rename CRTC to HEAD, as Nvidia uses HEAD.
09:55pmoreau: mwk: I'll need to rebase the EVO patches once the renaming patches are accepted, and add a patch for capabilities regs in PDISP.
11:54karolherbst: does anybody want to check if thats stabilize 0f pstate with gddr5 on gk104+ https://github.com/karolherbst/nouveau/commit/eaa1769dcd7f8295e037f2104ce60264938d8ddd
11:55karolherbst: ohh nvm
11:55karolherbst: doesn't seem to be stable enough yet
11:55pmoreau: Everything just crashed?
11:56karolherbst: not as bad as usual
11:56karolherbst: no sched errors or anything
11:56karolherbst: the values are still off a bit
11:56karolherbst: but at least I hoped this would be a bit more stable
12:06karolherbst: its really bad
12:06karolherbst: some mcoef values just crahses my system
12:06karolherbst: not the nouveau card, but my system with the intel gpu
12:13karolherbst: pmoreau: do you think it would help to get like 20 combinations of clocks and reg values from the blob?
12:16karolherbst: values from the blob at least seems to work for those both regs
12:23karolherbst: who is the one with the most knowledge about this gddr5 code, skeggsb ?
12:25RSpliet: but this has little to do with GDDR5 and a lot with PLLs
12:27karolherbst: sadly somehow I don't get the stuff now, I just know that these both values seem not to be right
12:28RSpliet: the boundaries in which PLLs work well are detemined by the technology at TSMC or GlobalFoundaries (or similar, I'm guessing TSMC), which is probably one of the bigger secrets in the industry
12:28RSpliet: all you have to work with is the PLL limits table in the VBIOS, the PLL locked bit to verify a value and the documentation of the different fields that have been RE'd
12:30RSpliet: chances are high that all used PLLs are roughly the same (although the RAM one might be more advanced), thus if you want to do some serious REing just pick a different PLL
12:30RSpliet: one that you can have dysfunctional for a while... back then I used one of the video engine PLLs
12:31karolherbst: I never worked with a PLL before that serious. I once used one to set the clock on an arm board, but that was like piece of cake
12:31RSpliet: well, serious is a big word
12:32RSpliet: there's like 6 parameters and a lot of clock tree routing
12:32karolherbst: I meant like, the values were well documented
12:32karolherbst: or the reg
12:32karolherbst: to use
12:32imirkin_: karolherbst: the main problem with pll's is that they have multipliers and dividers. and the ranges of combinations that work well are not exactly perfectly documented
12:32karolherbst: yeah I figured
12:32karolherbst: that's why I asked if I should collect the values from the blob
12:32karolherbst: I can reclock in 8MHz steps pretty high
12:32RSpliet: (and multipliers are dividers too, but in the feedback loop :-P)
12:33karolherbst: and got like 10 values the blob used
12:33karolherbst: maybe even more
12:33karolherbst: but I didn't go that high or low
12:33imirkin_: karolherbst: the other thing that you're missing is link-training i think
12:33karolherbst: I don't think that other stuff is wrong
12:34karolherbst: only these two regs seem to be the problem
12:34RSpliet: it would surprise me if that is the only problem
12:34karolherbst: yeah well, with the blob values it worked stable
12:34RSpliet: your card != pretty much every other card on the planet. Some stuff might work for you but not for others
12:34karolherbst: but 0a and 07 were unusable because of constant values inside code
12:34karolherbst: yeah I know
12:35karolherbst: but it may be possible to find some kind of pattern or anything
12:35RSpliet: my experience: yes, but not when working with a single card :-)
12:35karolherbst: but I have only one
12:35karolherbst: I would do the same with others, but I need nvidia-settings access for that
12:35karolherbst: with coolbits enabled
12:36RSpliet: any contribution is good... but make sure to exhaust your options and not think you're done too easily ;-)
12:37RSpliet: and wrt "having nvidia-settings", I have a bootable USB hard drive with a kernel that does mmiotrace and the blob just for that reason
12:37RSpliet: data on there I don't care about, so hangs and crashes are completely irrelevant
12:38karolherbst: with nvidia-settings I change the memory clocks ;)
12:38RSpliet: you missed my point
12:38RSpliet: such a HDD can be booted on any machine, including ones that you don't administer
12:40karolherbst: I don't have direct access to any other kepler card and can't think of any way to get another one currently anyway
12:40karolherbst: so I really only have my card here and all the mmiotraces in git
12:41karolherbst: thats the only things I can use right now (and a fermi card)
12:41karolherbst: and I already traced the blob here and most of the stuff seems to be okay
12:42karolherbst: nouvau isn't that much off, only this PLL stuff is really off
12:43karolherbst: interessting enough though: both regs have the same values for 4008MHz, 4038MHz and 4056MHz mem clock
12:46karolherbst: blob messed up. have to reboot :/
12:50prg: i've got a gk106, can i help somehow?
12:51karolherbst: prg: gddr5?
12:51karolherbst: then maybe
12:51karolherbst: which model?
12:52prg: though it keeps locking up with nouveau even if i'm not fiddling with pstates
12:52karolherbst: currently I am interessted for values "nvapeek 0x132000 0x30" on the blob
12:53karolherbst: especially highest pstate and different memory clocks
12:53karolherbst: especially reg 0x132004 and 0x132024
12:53prg: 00132000: 98010000 00011301 00000000 00000000
12:53prg: 00132010: 00000000 00000249 00000000 00000000
12:53prg: 00132020: 20030001 00032301 f0000000 00000300
12:53prg: when idle
12:54karolherbst: nouveau or blob?
12:54karolherbst: 00132000: 98010000 00010b01 00000000 00000000
12:54karolherbst: 00132010: 00000000 00000249 00000000 00000000
12:54karolherbst: 00132020: 20030001 00021d01 f0000000 00000300
12:54karolherbst: this is for me
12:54karolherbst: only the both regs are off
12:54prg: i have no idea what any of that means
12:55karolherbst: mhh, I am not sure either, only that 0x132004 and 0x132024 seems to be very important ;)
12:55karolherbst: could you open nvidia-settings for that card and set to performance mode?
12:55karolherbst: and read the regs again
12:55karolherbst: after it reports the high values
12:56prg: 00132000: 98030001 00011301 10000000 00000000
12:56prg: 00132010: 00000000 00000fff 00000000 00000000
12:56prg: 00132020: 20030001 00072801 f0000000 00000300
12:56karolherbst: thats interessting
12:56karolherbst: it behaves exactly as mine
12:56karolherbst: values a little bit different
12:56karolherbst: but basically same behaviour
12:57karolherbst: 0x132004 stays the same
12:57karolherbst: on nouveau it doesn't
12:57karolherbst: I think the current value there is for the "base" pstate clocks
12:58karolherbst: sadly its hard to hit the middle pstate on the blob
12:58karolherbst: so I can't really verify that
12:59prg: 00132000: 98010000 00011301 00000000 00000000
12:59prg: 00132010: 00000000 00000249 00000000 00000000
12:59prg: 00132020: 20030001 00011d01 f0000000 00000300
12:59prg: when it's on the second lowest level
13:01karolherbst: 0x132024 reg
13:01karolherbst: 07: 00021d01
13:01karolherbst: 0f: 00062801
13:04karolherbst: prg: which clocks do the memory have for the pstates?
13:04karolherbst: at least your values verify a bit what I already thought
13:05prg: 07: core 324 MHz memory 648 MHz, 0a: core 324-862 MHz memory 1620 MHz, 0d and 0f: core 549-1254 MHz memory 6008 MHz
13:05karolherbst: the 0x132024 reg shouldn't change for "deafult" pstate clocks
13:05karolherbst: prg: do you have coolbits enabled?
13:05prg: nfi what that is
13:06karolherbst: with that you enable some clocking options inside nvidia-settings
13:06karolherbst: like increasing/decreasing max clocks on 0f pstate
13:06prg: can't find anything like that in there
13:07karolherbst: its a xorg server setting
13:07karolherbst: Option "Coolbits" "8"
13:07prg: don't have that
13:07karolherbst: but currently I don't need anything from that
13:07karolherbst: I think your values are a good starts
13:08karolherbst: especially because our 0a pstate has the same memory clock
13:11karolherbst: prg: for the middle pstate I guess you just waited the right time and executed the command
13:11prg: was running a light game, so nvidia-settings showed it sitting there for a while
13:11karolherbst: ahh nice
13:13karolherbst: if you want you could do the same for nouveau, but I don't think its important, because 07 and 0a will look fine and 0f completly off
13:13prg: with 0f the system is gone
13:13prg: (or it was when i tried the last time)
13:13karolherbst: I have a hybrid gpu laptop luckily
13:14karolherbst: so I can do stuf flike that without issues
13:15karolherbst: okay, the 0x132004 reg seems to be constant as long as the base clock doesn't change
13:15karolherbst: like if the memory is clocked to the given value for a specific pstate, its always the same across all pstates
13:15karolherbst: how do get this value I don't know
13:16karolherbst: but I think the gpu sets it itself, because nouveau doesn't write it for 07 or 0a
13:16karolherbst: and nouveau seems to pick the right value
13:16karolherbst: or at least the gpu does so on 07 and 0a
13:17karolherbst: ohh no, a bit is wrong
13:17karolherbst: but basically its fine
13:18karolherbst: 0x132024 seems to be highly clock related
13:18karolherbst: prg: we have the same value there for the 0a pstate where we have the same clock
13:18prg: um, great?
13:19karolherbst: not sure
13:19karolherbst: would need more data
13:19karolherbst: could only be the case if its the base pstate clock
13:19karolherbst: I am pretty sure you could clock memory to 4008 with coolbits on 0f blob
13:19karolherbst: maybe the reg is also the same then
13:19karolherbst: maybe not
13:19karolherbst: I will check 6008
13:20karolherbst: but this is like +50% clock for me :/
13:20karolherbst: could be fine
13:21karolherbst: okay, seems to have worked
13:21karolherbst: same reg value
13:21karolherbst: even the first one is the same now
13:21karolherbst: now its getting a bit strange
13:23karolherbst: now thats something
13:23karolherbst: prg: now I got your 00132000 value for all pstates
13:23karolherbst: just by setting max clock +2000
13:24prg: that means...?
13:24karolherbst: don't know
13:24karolherbst: it seems to be a linear value? this would be most likely strange
13:25karolherbst: yeah, its not
13:25karolherbst: but its stays the same after switching back from 0f pstate
13:25karolherbst: so I think it most likely doesn't matter for the other pstates
13:26karolherbst: what the reg contains
13:27karolherbst: nouveau uses 00000a04 for me at 0f
13:28karolherbst: and this just seems like really wrong
13:29RSpliet: karolherbst: I hope you're aware that you really are re-inventing the wheel now
13:34RSpliet: let me repeat that in case info got lost in the void
13:34karolherbst: RSpliet: yeah I know, but I didn't find anywhere the information I got now somehow
13:34RSpliet: karolherbst: I hope you're aware that you really are re-inventing the wheel now
13:34karolherbst: blob just messed my system
13:35RSpliet: the read_pll function in nouveau already explains the different values in the register you're looking at
13:35RSpliet: no, it's not linear, they are three values of no more than 8 bits
13:35karolherbst: yeah It was a bad guess and I already knew it was wrong
13:35RSpliet: and then there is read_clk, which tries to decode the routing before it decides whether the PLL is used or bypassed
13:36karolherbst: reg 00132000 changes from 98010000 to 98030001 while going above 2400MHz memory clock
13:37RSpliet: knowing this, and knowing the PLL values are a multiplier, a divider and probably a ^2-divider, you can imagine that there are multiple PLL-coefficient values leading to the same clock
13:37karolherbst: and I know that nouveau isn't using the right clock for me
13:39RSpliet: it has an algorithm to generate a certain set of coefficients, and apparently additional restrictions need to be applied
13:40RSpliet: have you already looked up exactly what those bits in 132000 mean?
13:40karolherbst: I do that currently
13:40karolherbst: but nouveau uses the blob value for 0f
13:40karolherbst: but forgot how the lookup binary is called :/
13:41RSpliet: it's called "lookup"
13:42karolherbst: I have to add a value too
13:42RSpliet: you may
13:42RSpliet: and it will reveal interesting information
13:42karolherbst: bit 1 is enable
13:42karolherbst: okay, that makes sense
13:42karolherbst: the other bit is PLL_LOCK
13:43karolherbst: but why does that gets enabled after the memory clock hits a specific value?
13:43karolherbst: it wasn't set on the blob even on 0f pstate if the memory clock is below 2400MHz
13:43RSpliet: just for shits and giggles, look up 0x132020
13:43karolherbst: having it set to 2402 MHz even crashed the card
13:44karolherbst: okay but 20030001 sounds pretty uninteresting, because the value is always the same
13:44RSpliet: and... look at the code
13:45RSpliet: (nvkm/subdev/clk/gk104.c, start from read_mem)
13:46karolherbst: okay, wild guess: after a specific clock the other PLL is getting used instead the first one?
13:46karolherbst: no doesn't make sense
13:47RSpliet: keep reading... ;-)
13:47karolherbst: but is read_clk still memory related?
13:47karolherbst: or do I have to check read_pll now
13:48RSpliet: well, you don't have to check anything, but if unsure, I'd say drill deeper
13:48karolherbst: yeah, but core clock seems to be right
13:48karolherbst: read_pll may be interessting
13:49karolherbst: the switch is nice
13:51RSpliet: sclk means "source clock"
13:52karolherbst: for giggles I could check with blob values and see if the result is right
13:52karolherbst: but I highly assume it is
14:00karolherbst: RSpliet: what does read_div ? read from one reg and devide by value from the second reg?
14:01karolherbst: ohh its in the file
14:01karolherbst: nvm then
14:01imirkin_: mwk: any guesses on what SHR.W might mean? /*0008*/ SHR.U32.W R3, R0, 0x2; /* 0x3828008000270003 */
14:01imirkin_: [this is maxwell]
14:02karolherbst: imirkin_: whild guess: shift?
14:03imirkin_: SHR = shirt right, yes. what's the W?
14:03karolherbst: is there a shr with another letter?
14:04imirkin_: the W is a flag. it can just not be there.
14:04karolherbst: and there aren't any others
14:05karolherbst: imirkin_: https://code.google.com/p/asfermi/wiki/OpcodeLogic#SHR ?
14:05karolherbst: can't make anything out of that
14:05karolherbst: but maybe you can
14:05imirkin_: whoa. i was not aware of that info
14:05imirkin_: mwk: --^
14:05imirkin_: unfortunately it doesn't explain anything
14:05imirkin_: just tells you how to set it
14:06imirkin_: i guess it goes back to fermi though, good to know
14:07karolherbst: found another thingy
14:07karolherbst: its also inside PTX
14:07imirkin_: ptx isa describes it?
14:08karolherbst: "SHR.U32.W R0, R0, 0x1e;"
14:08karolherbst: just found it there
14:08imirkin_: which means what
14:08karolherbst: I assumed it may be part of PTX then
14:08imirkin_: doesn't look like there's a .w in ptx
14:10karolherbst: tid also has a .W flag
14:10karolherbst: but thats something different I guess
14:11karolherbst: no idea then
14:12imirkin_: wasn't really expecting you to have one ;)
14:12karolherbst: I know
14:13imirkin_: gah. arb_gpu_shader_fp64-tf is broken at least on maxwell. kind of assume broken everywhere.
14:13imirkin_: as is arb_get_texture_sub_image-get ???
14:15imirkin_: something wrong with cubemap arrays :(
14:16imirkin_: yeah, both of those are broken everywhere
14:20karolherbst: RSpliet: either I did something wrong or I got 0 with the blob values
14:21RSpliet: I'd assume the former :-P
14:21airlied: imirkin_: yeah tf is broken, I have unreviewd patches I think
14:21karolherbst: I got sclk = read_div(priv, 0, 0x137320, 0x137330); a 0 there
14:22imirkin_: airlied: ah ok. good to know. (i think i sorta remember those... assumed they had landed and something was wrong in nouveau)
14:22airlied: imirkin_: btw I started int64 for lols, though it might take some time to get anywhere more :)
14:22imirkin_: airlied: i actually did too
14:22imirkin_: airlied: started some parser stuff
14:23imirkin_: probably the exact same stuf fyou did
14:23airlied: yeah most likely :)
14:23karolherbst: RSpliet: found the issue
14:23airlied: the least fun stuff
14:23imirkin_: iirc i got to a point where i thought i could parse int64 constants
14:23karolherbst: I mean my mistake
14:23imirkin_: but didn't actually test it
14:24imirkin_: airlied: yeah, your patch looks very familiar ;)
14:24imirkin_: airlied: except i called it 'ln' instead of 'n64'
14:24imirkin_: but i like yours better
14:24RSpliet: karolherbst: for your entertainment: I drew out the schematics for GT215-GT218 on https://envytools.readthedocs.org/en/latest/hw/pm/gt215-clock.html#clock-source , it's not going to help you understand kepler, but gives you an idea of how tangled these trees might look
14:25imirkin_: airlied: you got a little further on the glsl stuff, but i think i started making ast mods
14:25imirkin_: airlied: [1-9][0-9]*[ul|UL] -- you probably meant (ul|UL)
14:25airlied: oh yes I did
14:25karolherbst: crystal is 27MHz?
14:25RSpliet: usually is
14:25imirkin_: also you don't handle hex consts
14:25RSpliet: in rare cases it isn't
14:26airlied: imirkin_: yeah I realised that before going to bed :-)
14:26imirkin_: airlied: i don't really plan on working on it though
14:26imirkin_: i've just never futzed with the parser, wanted to see what it was all about
14:26imirkin_: aka feel free to continue ;)
14:27karolherbst: RSpliet: how would be the value in the code be? 27 or 27000 or 27000000
14:27airlied: well my hope is it's just find every DOUBLE and add INT64 :-P
14:27RSpliet: karolherbst: also, I don't like how muxes look so square in this drawing tool I used, but just look past that. "a" is selector, "x" is value
14:27karolherbst: I assume 27000
14:27RSpliet: I think we did code in kHz as a middle ground between precision and 32-bit reliability
14:27karolherbst: these are muxes
14:27karolherbst: now it makes sense
14:28karolherbst: somehow I managed to completly be not well enough with that
14:28RSpliet: all but the VCO are muxes
14:29karolherbst: yeah, I know
14:29karolherbst: I just need a little poke then I know whats going on
14:30airlied: imirkin_: http://paste.fedoraproject.org/255517/67417814/
14:30airlied: I think is what is needed for tf fix
14:31imirkin_: airlied: errr
14:31imirkin_: are you sure it doesn't need the same dvec2 vs dvec3 handling?
14:32imirkin_: anyways... dunno
14:32imirkin_: trying to beat the G80 into submission
14:33airlied: imirkin_: yeah I might have a different patch somewhere, that one was one I just found here
14:36karolherbst: RSpliet: I 11904758.0 now, but this seems wrong. I think I will do it more carefully tomorrow
14:40airlied: imirkin_: sent the glext.h update at least :-)
16:41imirkin_: airlied: int64 -- should probably be int64_t too :)
17:46marcosps: imirkin_: around?
17:47marcosps: happy weekend :)
17:47marcosps: imirkin_: I worked a little in the tesselation support...
17:48marcosps: imirkin_: https://paste.fedoraproject.org/255527/68607814/
17:48marcosps: The only prblem now is how to get the 2nd value...
17:49imirkin_: yeah this is not the right approach
17:49imirkin_: the right approach is to make 2d indexing for dst work properly in the first place
17:49imirkin_: none of that should be based on whether it's in a tessctrl shader
17:49imirkin_: look at how the 2d stuff works for sources
17:49imirkin_: do the exact same thing for destinations
17:50marcosps: imirkin_: hum... got it...
17:50marcosps: imirkin_: at least I could learn about the dump :P
17:55marcosps: imirkin_: thanks, I'll take a look and send new patch ASAP...
19:08imirkin_: well that was fun. in hindsight i should have figured out the bug much sooner. oh well.
19:09imirkin_: although bugs tend to work that way. much easier to solve once you know what they are :)