04:08 RSpliet: karolherbst: ping
04:09 karolherbst: hi
04:09 RSpliet: as you might know, nouveau picks 1431b as it's coefficients for 0x137020
04:10 karolherbst: do you mean 137004?
04:10 RSpliet: (implying M == 27? )
04:10 RSpliet: no, I mean the PLL configured by 137020
04:11 RSpliet: although you're right that this register itself doesn't contain coeffs
04:11 karolherbst: ahh then 137024
04:11 RSpliet: the other ones all pick 1f for M
04:11 karolherbst: and P=1?
04:11 RSpliet: for all PLLs yes
04:12 RSpliet: they appear to have locked fine
04:12 karolherbst: with nouveau that is?
04:12 RSpliet: yes
04:13 karolherbst: mhh okay
04:13 karolherbst: maybe we should look a the entire pclock range and see if something else is odd
04:13 RSpliet: you have my trace for NVIDIA... I don't think I have that trace myself at the moment :-P
04:13 karolherbst: but maybe the issue is something completly different
04:13 karolherbst: nvapeek 137000 0x1000 from nouveau at 0f and nvidia at 0f would be good to have
04:13 RSpliet: (nor an installation where I can easily obtain it)
04:13 karolherbst: ahh okay
04:14 karolherbst: did you check if 0a runs fine?
04:14 RSpliet: yes, perflvl a runs fine
04:14 RSpliet: which bits do I need to touch to disable clock gating - to give nvatiming give valuable output?
04:14 RSpliet: or is that only a problem on Tesla?
04:15 karolherbst: no clue
04:15 RSpliet: mupuf:?
04:16 mupuf: hey there
04:16 mupuf: give me a sec to catch up
04:17 RSpliet: mupuf: on Tesla I had to disable clock gating in ptherm to get meaningful clocks from nvatiming, do we have bits like that on kepler?
04:17 mupuf: RSpliet: yes, it is only a problem on tesla
04:17 karolherbst: well nvatiming doesn't work at all for me here anyway
04:18 mupuf: karolherbst: well, instead of dismissing it, it would be better to understand its output or fix it
04:18 mupuf: this is the only tool that gives us a clock reading
04:18 karolherbst: true indeed
04:18 karolherbst: well the pcopy things doesn't work
04:19 karolherbst: the PCOPY[0] calls are stuc forever
04:19 karolherbst: *stuck
04:19 RSpliet: okay, in that case HUB6 and HUB7 are clocked ~125MHz too low
04:19 mupuf: karolherbst: you can limit the output to just the perf
04:19 karolherbst: ohh the hubs are working
04:19 RSpliet: they actually both correspond to the shader clock
04:21 karolherbst: mhh with nvidia I get 965/905 and it is clocked at 862
04:23 karolherbst: same with nouveau though
04:23 karolherbst: or nearle the same: 954/902
04:27 mupuf: karolherbst: that sounds weird...
04:28 karolherbst: mhh on 07 I get 405MHz which is the right value
04:29 karolherbst: there are some other oddities too though
04:29 karolherbst: PGRAPH.DISPATCH's clock: 1s = 18446744072485340365 cycles --> frequency = 18446744072485.339844 MHz (nvidia
04:29 RSpliet: hmm, okay, I confirm that hub07 is clocked too low with nouveau
04:30 mupuf: karolherbst: it likely read 0xbadf or something
04:30 karolherbst: mupuf: ohh makes sense
04:30 RSpliet: karolherbst: could you check what coefficients nvidia picks for 0x137024 for me please?
04:31 karolherbst: { M = 0x1b | N = 0x43 | P = 0x1 } and { M = 0x11 | N = 0x22 | P = 0x1 }
04:31 RSpliet: the first is nouveau?
04:31 karolherbst: no, both nvidia
04:31 karolherbst: it changes it later
04:31 RSpliet: oh okay
04:32 karolherbst: first one is from 100s and the second 200s
04:32 karolherbst: mupuf: seems like M isn't fixed, but the clocks are just so designed, that there is always a good M, but P stays 1
04:33 RSpliet: the second set is weird, there's no 1620MHz clock in my VBIOS anywhere
04:34 karolherbst: well
04:34 karolherbst: maybe nvidia does some crazy stuff when there are no cstates
04:35 RSpliet: or it clocks down PLLs in phases to increase the likelihood of locking in time?
04:35 karolherbst: they slide the pll on the tegra
04:36 karolherbst: but that is done by the hw
04:37 mupuf: RSpliet, karolherbst: They have this DFS switch in the PLLs, maybe that can explain the difference we see between the reported value and the one seen in nvatiming?
04:37 karolherbst: wasn't that for later chips?
04:37 RSpliet: mupuf: for nouveau the report in nvatiming corresponds with the coeffs we set... is that what you refer to?
04:38 karolherbst: mupuf: the DFS stuff is maxwell
04:39 karolherbst: gk110 has some odditiy too, but the first keplers are okay
04:39 mupuf: karolherbst: ah, sorry
04:39 mupuf: RSpliet: nope, I was refering to a funny bit in the PLLs NVIDIA added in maxwell
04:39 mupuf: under the name DVFS
04:40 mupuf: whch is likely the equivalent of clock throttling on idle
04:40 mupuf: that was introduced on tesla
04:40 karolherbst: hopefully
04:40 karolherbst: though I doubt it works the other way around too
04:41 mupuf: oh, right, it would be increasing the clock, which would be weird :D
04:41 mupuf: RSpliet: if nvatiming reports something varying with the coefficients, then everything is likely good
04:41 mupuf: excet maybe the clock tree parsing, which would account for the clock difference
04:41 karolherbst: yeah, but it's in the domain of the possibilities ;)
04:42 karolherbst: well on nouveau I get 11MHz lower clocks
04:42 RSpliet: mupuf: for hub07 it is, but the clock doesn't correspond with my expectations from the VBIOS ;-)
04:42 karolherbst: RSpliet: neither does it for me on a and f
04:42 karolherbst: or do you mean the pll config?
04:42 RSpliet: PLL matches nvatiming - neither match the VBIOS clock
04:43 karolherbst: what gives you nvatiming?
04:43 RSpliet: the command?
04:43 karolherbst: the both clocks
04:44 RSpliet:takes a deep breath
04:44 RSpliet: let me try again
04:44 RSpliet: for hub07
04:44 RSpliet: nvatiming reports a clock of 2116MHz
04:45 RSpliet: this corresponds with... nothing
04:46 karolherbst: 0: freq 2117 MHz unkn[0] 1a unkn[1] 1 voltage 4
04:46 karolherbst: ?
04:46 RSpliet: that's not hub07
04:46 karolherbst: ahh okay, it's hub6?
04:46 RSpliet: hub07 according to the VBIOS should be 2244
04:46 karolherbst: ohh okay, now I see it
04:46 RSpliet: following the coeffs set in nouveau: (810 * 67) / 27 = 2010
04:47 RSpliet: ah here we go
04:47 RSpliet: that's the ROP freq
04:47 RSpliet: okay, still good
04:47 RSpliet: hub07 is the clock afterwards
04:48 karolherbst: for me hub7: 2: domain 1 percent 105 min 810 max 906 and hub6: 966
04:48 karolherbst: and the second is lower on nouveau
04:48 RSpliet: nouveau sets it to 2116MHz
04:48 RSpliet: what coefficients are in my trace - ugh, could you just send me my trace please?
04:49 karolherbst: okay, domain 4 is hub6 and domain 1 is hub7?
04:49 RSpliet: idk, I don't give a damn about the boost table yet
04:50 karolherbst: k
04:50 karolherbst: well it somewhat fits for me
04:51 karolherbst: in 10 minutes I will be gone for like an hour or so
04:52 RSpliet: right
04:52 RSpliet: NVIDIA actually sets that clock to 2244MHz
04:52 RSpliet: HUB07 that is
04:53 karolherbst: yeah, that fits
04:53 karolherbst: hub6 too?
04:53 RSpliet: see the coeffs in 137040 -> { M = 0x1a | N = 0x48 | P = 0x1 }
04:53 RSpliet: nouveau currently sets it to 2116MHz, which I'd say is wrong
04:53 karolherbst: yeah, but do you think this affects anything that bad?
04:55 karolherbst: ohh which version of nouveau are you using currently?
04:55 RSpliet: then once the PLL is set to 1800MHz
04:55 RSpliet: which is not in the pstates
04:55 RSpliet: or actually 1803MHz
04:58 mupuf: karolherbst: there are weird relations between the clocks
04:58 mupuf: all the clocks are not independant
04:58 karolherbst: okay
04:58 karolherbst: okay
04:58 mupuf: being off by just a bit is often enough
04:58 karolherbst: yeah then I will play around with this a bit
04:58 mupuf: for example, on tesla, shader clock had to be 2x the core clock or more
04:59 karolherbst: so just messing with boost table percentage until it crashes :)
04:59 mupuf: but NEVER under that
04:59 mupuf: and indeed, sometimes, we would compute a shader clock that would be lower than 2x core and ... CRASH!
04:59 karolherbst: k, then it makes sense
04:59 karolherbst: I am only off by 11MHz with one clock
04:59 karolherbst: so maybe that's why I am fine
04:59 karolherbst: but RSpliet is off by 135MHz?
05:00 mupuf: karolherbst: as much as possible, we should try to generate exactly the same clock as the blob
05:00 mupuf: as in, same coefficients everywhere
05:01 karolherbst: k
05:01 karolherbst: sounds like a thing to do
05:03 karolherbst: RSpliet: so the pll at 137020 is too low? Or others (too)?
05:06 RSpliet: 137040
05:07 RSpliet: (aka HUB07)
05:08 RSpliet: let's see if manually increasing it fixes my problems
05:08 RSpliet: the same is probably true for HUB06, but that does not have a dedicated PLL
05:15 RSpliet: yeah, HUB06 (which uses 1370e0) is wrong too
05:16 RSpliet: also, nouveau is not very tidy with disabling lock detection after it's been detected
05:17 RSpliet: that said, manually increasing the two clocks makes no difference, rendering still hangs after one or two frames
05:20 RSpliet: in fact, it doesn't even touch the PLL lock scanning logic prior to testing it
05:21 RSpliet: which could lead to spurious timeouts if it's been disabled
05:21 RSpliet: I'll see if I can patch that up tonight...
05:22 RSpliet: (then again... time, gheh)
06:38 RSpliet: mwk: on G84 you write that the G80 cycles through partitions on every gob (256 bytes), I take it that's so that contiguous memory accesses are evenly distributed over partitions. Is this 256B gob-size and per-gob cycling still happening on Kepler, or did the parameters change?
06:41 RSpliet: ugh, sorry, I should be a bit more precise before bashing enter: In the hwdocs you write that the G84 cycles[...]
06:57 RSpliet: karolherbst: what could 0x1548 be, besides a trigger on bit 31, a selector on the lowest byte, and a 1 in bit 8 ?
06:58 karolherbst: mhh to what is this value related?
07:00 RSpliet: it's touched several times interleaved with changing PLLs
07:00 RSpliet: my first arsepull guess would be to train logic designed to reduce errors in clock-crossing domains
07:00 karolherbst: ohh you mean the iommu reg x1548?
07:00 RSpliet: yes
07:01 RSpliet: but it could just as well just temporarily pause parts of PBUS while changing clocks
07:01 karolherbst: mhh
07:01 karolherbst: it is touched after reclocking each domain for sure
07:02 RSpliet: or during
07:02 karolherbst: mhh
07:02 karolherbst: bit 31 seems to be COMMIT
07:02 RSpliet: sure? look at how it sets 0x80000102 before submitting and executing a memory reclocking script
07:03 RSpliet: and sets it back to 102 after
07:03 RSpliet: so could equally well be an enable bit
07:03 karolherbst: mhh I only see 101, then 8000101 writes and stuff
07:03 karolherbst: the value is commited after the first write for sure
07:03 RSpliet: in my trace there's 101, 102 and 104
07:03 karolherbst: and that bit clears itself
07:03 karolherbst: yeah in mine too
07:04 karolherbst: but it is commited directly after
07:04 karolherbst: 0x101 then 0x80000101
07:04 RSpliet: no, it's not self-clearing, the reg is never read by NVIDIA
07:04 mwk: RSpliet: gob is 0x200 bytes since Fermi
07:05 mwk: and the cycling algorithm is *very* different
07:05 karolherbst: RSpliet: ohh you are right :O
07:05 mwk: I have no idea how it works
07:05 karolherbst: let me play with that reg on my nvidia a bit
07:05 RSpliet: mwk: okay, but it does cycle roughly at n*gob granularity for small n?
07:06 mwk: oh, it cycles exactly at gob granularity
07:06 mwk: but which partition is selected at every turn is... complicated
07:06 RSpliet: for me that explanation is good enough, thanks :-)
07:06 mwk: it's not simple round-robin like on G80, that's for sure
07:06 karolherbst: RSpliet: mhhh, it doesn't seem to affect the gpu that much while running
07:11 RSpliet: doesn't mean it's not important for a stable reclock :-)
07:12 RSpliet: gnurou: is there anything you know about register 0x001548 and it's relation to clock changes?
07:15 karolherbst: RSpliet: I think i will mupufs lead for now and try to set the right clocks first
07:16 RSpliet: karolherbst: that's a good start :-)
07:17 RSpliet: domains 1 and 4 might correspond with HUB06 and HUB07 indeed (in arbitrary order), that might be something you can verify by monitoring the coefficients NVIDIA picks for 0x13704x and 0x1370Ex
07:18 karolherbst: ohh let me check something
07:18 karolherbst: I think I know what is wrong
07:19 RSpliet: your theory?
07:19 karolherbst: k
07:19 karolherbst: now I get nvidias clocks
07:20 RSpliet: (sorry, that wasn't cynical, I'm curious what your theory is)
07:22 karolherbst1: yeah well
07:22 karolherbst1: unloading nouveau crashes my kernel sometimes....
07:22 karolherbst1: I should look into this
07:23 RSpliet: karolherbst1: in fact, those domain numbers appear to correspond with the clocks as listed in order in the pstate table. So 1 -> hub07 (fecs?), 2 -> rop, 4 -> hub06 (tpc?)
07:23 mupuf: RSpliet: yep, that was my observation too
07:23 karolherbst: RSpliet: wait, I explain in a second
07:24 karolherbst: RSpliet: you remeber I told you, your vbios is one of the odd ones, without cstates?
07:24 RSpliet: yes
07:24 karolherbst: I would expect nvidia using 2244 for two hubs
07:24 karolherbst: and 2010 for something else?
07:24 RSpliet: it does
07:24 karolherbst: but with nouveau you get 2117 and 2244?
07:24 RSpliet: with nouveau I get 2117 twice
07:25 karolherbst: k..
07:25 woshty: My 105M seems to run hotter with nouveau. How to check if pm is fully enabled? How to view current voltage/freq settings?
07:25 karolherbst: then I'll show you
07:25 karolherbst: https://gist.github.com/karolherbst/1920c9a69c6eed5bb6fd
07:25 RSpliet: did you limit them based on the boost table header value, rather than the independent domain values?
07:26 karolherbst: I think nouveau uses the pstate.base clock on your chip
07:26 karolherbst: It could be that I messed that part up, right
07:26 karolherbst: let me check
07:26 mupuf: woshty: PM is manual on nouveau. You can check informations about the voltage using "$ sensors"
07:26 mupuf: and frequencies, you need to access debugfs, look for the 'pstate' file
07:27 karolherbst: RSpliet: do you want to check if this check is triggered for you? https://github.com/karolherbst/nouveau/commit/49f0f6823b807b9fbb8863958b68d83a670bd753
07:27 karolherbst: RSpliet: or if the creation of cstates is failing for some odd reasons
07:27 karolherbst: you should get one in each pstate, but I believe you have none
07:28 karolherbst: RSpliet: in nvkm_cstate_new, the "list_add(&cstate->head, &pstate->list);" call should be reached and nvkm_cstate_new be called for each pstate once
07:29 karolherbst: this function sets the domains to the right clock depending on the percentages in the boost table
07:30 karolherbst: RSpliet: or maybe nouveau doesn't parse the cstep table right, because yours is just different than all the others
07:30 RSpliet: karolherbst: unadjusted, the clocks for HUB06 and HUB07 should be 2244
07:30 RSpliet: because that's what's listed in the PSTATES
07:31 karolherbst: nvidia doesn't care what is listed in the pstates on my gpu
07:31 RSpliet: nouveau does, right?
07:31 karolherbst: pstate.base
07:31 karolherbst: but
07:31 karolherbst: pstate.base is only used as a fallback
07:31 karolherbst: when there is no cstate
07:32 RSpliet: that function is wrong in line 218 (223 after patching)
07:32 karolherbst: also
07:32 RSpliet: it adjusts the clock to cstepX.freq, it should get the domains subentry and take the clock from that?
07:33 karolherbst: no
07:33 karolherbst: it goes like this
07:33 karolherbst: parse the pstate table
07:33 karolherbst: put the base clock into pstate->base
07:33 karolherbst: each new cstates is a copy of pstate->base, with the clocks from the cstep table
07:34 karolherbst: then the clocks gets adjusted through the boost table
07:34 karolherbst: that's what nouveau is doing
07:34 RSpliet: but presumably that CSTEP table is only valid for the shader (GPC) clock
07:34 karolherbst: there are no domain subentries (except in the pstate table)
07:34 karolherbst: RSpliet: it doesn't work on kepler gpus that way anymore
07:35 karolherbst: usually there are like 40+ cstates
07:35 karolherbst: and there is no table for each domain for each of them
07:35 karolherbst: you take the cstep clock, adjust through boost tabel and there you get your domain clocks
07:35 RSpliet: okay, so with that logic
07:35 karolherbst: some clocks stay the same in each cstate though
07:35 karolherbst: video engines for example
07:36 RSpliet: the base clk for HUB07 should be 2117 (CSTEP value) * 1.06 (percentage in BOOST table) = 1144
07:36 RSpliet: *224
07:36 RSpliet: gah
07:36 karolherbst: so only domain 1,2 and 4 are changed
07:36 RSpliet: *2244
07:36 karolherbst: yeah
07:36 karolherbst: this is already done in nouveaus code
07:36 karolherbst: but
07:36 karolherbst: you have no cstates
07:36 karolherbst: either the creation fails or something else is odd
07:37 karolherbst: you have three in the vbios though
07:37 RSpliet: okay, but how do you explain how the base for HUB06 and HUB07 are *incorrectly* adjusted to not factor in the 1.06 ratio read from the boost table as it's base clock?
07:37 karolherbst: RSpliet: can you verify this loop is executed? https://github.com/karolherbst/nouveau/blob/49f0f6823b807b9fbb8863958b68d83a670bd753/drm/nouveau/nvkm/subdev/clk/base.c#L438-L444
07:38 karolherbst: RSpliet: I would guess an index issue somewhere in the code
07:39 karolherbst: RSpliet: maybe you should print out all the domain clocks in nouveau and see what is set to what
07:41 karolherbst: ohh only NVKM_CLK_DOM_FLAG_CORE are adjusted
07:41 karolherbst: which is gpc, hubk07, rop and hubk06
07:42 karolherbst: gpc is 0x0, hubk07 is 0x1, rop is 0x2 and hubk06 is 0x4
07:42 karolherbst: as the domain indicies
07:43 RSpliet: yes that's normal
07:43 RSpliet: what is domain->bios
07:43 karolherbst: RSpliet: just to clarify it, with nouveau 0x1 and 0x4 are set to 2117 MHz?
07:43 karolherbst: and on nvidia to 2244
07:44 karolherbst: hubk7 and hubk6
07:44 RSpliet: yes, didn't double-check domain 2
07:44 RSpliet: but to come back to my question, what is domain->bios?
07:44 karolherbst: and 0x2 is 2010?
07:45 karolherbst: ohhh
07:45 karolherbst: maybe the bios offset?
07:45 karolherbst: let me check
07:45 RSpliet: nvkm_clk_adjust expects the domain number
07:46 karolherbst: ohh
07:46 karolherbst: domain->bios is the table index
07:47 karolherbst: *row index
07:47 RSpliet: in the PSTATE table yes, in the BOOST TABLE it's the domain number
07:48 karolherbst: nvbios_perfSp is for the pstate table
07:48 karolherbst: so yeah
07:48 karolherbst: mhh
07:48 RSpliet: where does clk->base_khz come from?
07:48 karolherbst: maybe the clocks aren't adjusted at all?
07:48 karolherbst: baseclocks
07:48 karolherbst: ohh wait
07:48 karolherbst: yeah
07:49 karolherbst: BASE CLOCK table
07:49 karolherbst: but yours is empty
07:49 karolherbst: ohhhh
07:50 karolherbst: I think I messed up
07:50 RSpliet: okay, but I take it NVKM_CLK_DOM_FLAG_BASECLK is never set, so that check must be invalid
07:50 RSpliet: and skipped
07:50 karolherbst: https://github.com/karolherbst/nouveau/blob/49f0f6823b807b9fbb8863958b68d83a670bd753/drm/nouveau/nvkm/subdev/clk/base.c#L647
07:50 karolherbst: NVKM_CLK_DOM_FLAG_BASECLK is for the gpc
07:50 karolherbst: the cstep freq
07:50 karolherbst: yeah I think I need to add some checks here
07:51 karolherbst: RSpliet: can you load nouveau with config=NvBoost=2 ?
07:51 karolherbst: and check if the clocks are better?
07:51 karolherbst: but that still doesn't explain why the clocks aren't adjusted
07:52 RSpliet:apologises to the rest of the chan for a rather verbose debugging session
07:52 karolherbst: as long as it is nouveau related nobody is allowed to mind anyway :p
07:52 karolherbst: though we could add a nouveau-debugging channel for this kind of stuff :D
07:53 RSpliet: yes, with NvBoost=2 the clocks are better
07:53 karolherbst: awesome :)
07:53 karolherbst: then it's me who messed this part up
07:53 karolherbst: still unstable?
07:54 RSpliet: yep
07:54 karolherbst: k
07:54 RSpliet: 1481a were my coefficients, that corresponds with the blob
07:55 karolherbst: dmesg contains the usual garbage I guess?
07:56 karolherbst: RSpliet: k, we could do one thing. We only reclock memory
07:57 karolherbst: just to rule out that this isn't an issue
07:58 karolherbst: RSpliet: in nvkm_pstate_prog, don't call return nvkm_cstate_prog(clk, pstate, -1);
07:58 karolherbst: this should then only reclock memory
07:58 karolherbst: now I wish somewhat that memory is reclocked in a weird way
08:06 RSpliet: or I grab my old kernel and only reclock memory - the voltage issue blocks other engines from changing their clock
08:07 karolherbst: ohh with the old one only memory was reclocked anyway?
08:07 karolherbst: ohh yeah
08:07 karolherbst: that was what you said
08:08 RSpliet: that works fine
08:16 karolherbst: mhh
08:17 karolherbst: the last thing that comes into my mind is decreasing the cstate until it gets stable or just compare PCLOCK with nouveau and nvidia and check what is different
08:17 RSpliet: yeah, I assumed I could do that by reducing the BOOST entry
08:17 karolherbst: ohh no
08:17 karolherbst: cstep
08:18 RSpliet: sorry, that's what I meant
08:18 karolherbst: k
08:18 RSpliet: and I did, quite strongly - to 1605MHz
08:18 RSpliet: but to no avail
08:18 karolherbst: mhh
08:18 karolherbst: did 1080 work on f?
08:19 RSpliet: is that because I set NvBoost=2 ?
08:19 karolherbst: not that it is a memory issue whcih only gets tirggered by enough load
08:19 karolherbst: RSpliet: there is an issue with the baseclocks in my branch I didn't take care of yet
08:19 karolherbst: it seems to fail cstate creation whenever there are no base clocks
08:20 karolherbst: because any clock is higher than 0
08:20 karolherbst: usually
08:20 RSpliet: sure... what do I alter in the VBIOS to reduce my base clocks on kepler? the pstate? the cstep table?
08:21 karolherbst: base clocks only limit available cstates
08:21 karolherbst: they don't change clocks
08:21 RSpliet: (btw, that unk[0], is that the value they use for M by any chance in the PLL?
08:21 karolherbst: that would be weird
08:22 RSpliet: just random correspondence I pick up staring at it, we can look at that later
08:22 karolherbst: RSpliet: this is my cstep table: https://gist.github.com/karolherbst/ff21bd90f76251dc7870
08:23 RSpliet: right
08:23 karolherbst: and it the base clock says 1464, then the 26 is the cstate which is the normal max clock for that gpu on max load without boosting
08:23 karolherbst: if you enable boosting, the driver selects cstates above it
08:23 karolherbst: but the base clock alone doesn't change the available clocks
08:23 karolherbst: it only limits
08:23 karolherbst: as everything boosting related limits something
08:24 RSpliet: okay, so again: what do I alter in the VBIOS to reduce the clocks it sets on kepler?
08:25 karolherbst: in your case, the CSTEP table entries
08:25 RSpliet: I altered the higest entry - from 2117 to 1605, and that didn't change the clocks nouveau set
08:26 karolherbst: yeah, because I messed up the baseclock stuff
08:26 karolherbst: it should change it with config=NvBoost=2 set
08:26 RSpliet: oh right
08:30 karolherbst: I've updated my branch by the way: 1. P=1 fixed for core PLLs, 2. fixed that baseclock issue (hopefully)
08:31 karolherbst: uhhh
08:31 karolherbst: nouveau needs to disable interrupts more quickly on unload
08:32 karolherbst: my crash was inside gk104_fifo_intr+0x71a
08:32 karolherbst: while wakeing up a kworker
08:32 karolherbst: wake_up(&fifo->runlist[runl].wait);
08:33 RSpliet: hmm no, even with NvBoost=2 it doesn't change the clocks
08:33 karolherbst: weird
08:33 RSpliet: and note to whoever: 4188ac is not a register
08:33 karolherbst: but the plls are still right?
08:33 RSpliet: yes
08:34 RSpliet: come to wonder I fail in faking my vbios, but how hard can it be
08:34 mupuf: RSpliet: show us how you do it?
08:34 mupuf: and remember, unload nvidia
08:34 RSpliet: mupuf: there is no nvidia
08:34 karolherbst: mupuf: he does it for nouveau
08:34 mupuf: well, reload the module then :D
08:35 mupuf: mouahahah, captain obvious :p
08:35 RSpliet: but nvafakebios -e 6998:06 /home/rs855/gtx650.rom; insmod <path>/nouveau.ko config=NvBoost=2
08:36 karolherbst: well it sure worked with the blob for me
08:36 karolherbst: RSpliet: you could also mess with the pstate table...
08:36 RSpliet: /sys/kernel/debug/dri/0/vbios.rom lists the VBIOS with the original value
08:37 karolherbst: I never faked the vbios for nouveau though
08:37 karolherbst: ohhhh wait
08:37 karolherbst: RSpliet: I've got a brilliant idea for you
08:37 karolherbst: config=NvBios
08:37 RSpliet: yeah
08:37 RSpliet: just what I was thinking
09:02 RSpliet: karolherbst: with shader reduced to 1605MHz it runs fine
09:02 RSpliet: (other domains scaled proportionally)
09:04 RSpliet: 1861MHz still works
09:06 RSpliet: at 2GHz it crashed after a small number of frames
09:09 karolherbst: mhh okay
09:10 karolherbst: RSpliet: does something change if you reclock to a first?
09:10 RSpliet: no
09:16 karolherbst: mhh
09:17 karolherbst: any better idea than comparing PCLOCK regs?
09:19 RSpliet: understanding why after an NVIDIA reclock, 131c04 is 11201, whereas on a nouveau reclock it's 1001
09:20 karolherbst: 131c04 or 137c04?
09:20 RSpliet: 131c04
09:20 RSpliet: mind you, nvidia doesn't write to these regs
09:21 RSpliet: they get updated... possibly by a training sequence
09:21 karolherbst: maybe
09:29 karolherbst: RSpliet: there are some unknown bits in 0x137140, 0x1371d0 and 0x137250
09:29 karolherbst: W 0x137140 0x81200202
09:29 karolherbst: does nouveau also sets the reg to this?
09:29 RSpliet: that's not unknown, that's a divide-by-four on PLL-bypass
09:30 karolherbst: the 0x1200000 part?
09:31 RSpliet: same format as all the other div regs
09:31 karolherbst: mhh okay, demmio wasn't displaying it
09:32 RSpliet: from the top of your head, baseclock for the memory clocks was about 62.3MHz?
09:32 RSpliet: PLLs bring it up mountains?
09:32 karolherbst: you mean the refclock?
09:32 karolherbst: it's between 150-250MHz roughly for high clocks
09:34 RSpliet: the input of 13202x
09:38 karolherbst: yeah, 150-250MHz
09:39 karolherbst: the high PLL only multiplies
09:40 karolherbst: RSpliet: the lower ine has M=1, 5<=p<=7, 0x25<=N<=0x2b
09:40 karolherbst: any combination of that possible
09:40 karolherbst: and the input of it is the crystal
09:43 RSpliet: karolherbst: is it supposed to multiply by 18?
09:43 RSpliet: 00019c: R[0x132004] := 0x00011201 # PMPLL.MCLK0_COEF
09:44 RSpliet: I want to find out whether NVIDIA ever sets the memclk to be exactly 2GHz
09:44 karolherbst: I thought say so
09:44 karolherbst: you mean 2.5GHz?
09:44 RSpliet: no
09:44 RSpliet: 2
09:44 karolherbst: why 2?
09:44 karolherbst: and no, it doesn't
09:44 RSpliet: how do I know for sure?
09:45 karolherbst: because I looked into that
09:45 RSpliet: on my board?
09:45 karolherbst: my gpu has a 2/4 GHz clock
09:45 karolherbst: and nvidia is quite slobby setting the clocks
09:45 RSpliet: okay, so something very close to 2GHz
09:45 karolherbst: yeah
09:45 karolherbst: usually -+10MHz
09:46 RSpliet: that's fine by me
09:46 karolherbst: k
09:46 karolherbst: anyway, the PLL at 132000 only multiplies as you might noticed
09:46 RSpliet: yes, I noticed
09:47 karolherbst: but fN can be adjusted without issues on the pll at 132020, nvidia just doesn't do it well
09:48 karolherbst: one thing I wonder about: where does the 810MHz refclock comes from?
09:48 RSpliet: gk20a code has a table to translate p to an actual divider; are these the same PLLs used on other kepler boards
09:48 karolherbst: I am pretty sure that the refclock PLL was the bigger issue with the memory stuff
09:49 karolherbst: don't think so, let me check
09:49 karolherbst: that thing looks strange
09:49 RSpliet: yes, that's why I'm worried
09:50 karolherbst: RSpliet: did you check nvatiming with NvBoost=2 ?
09:50 RSpliet: the memclk is not in there, I'll do that some other time
09:50 karolherbst: I meant for the core
09:50 RSpliet: I know
09:50 RSpliet: we never set P to 8 I presume
09:50 RSpliet: for the core
09:50 karolherbst: but the cores are differently configured than the memory
09:51 karolherbst: well
09:51 karolherbst: by accident we don't
09:51 karolherbst: but it could happen
09:51 karolherbst: or do you mean for the memory refclock?
09:51 RSpliet: but P=8 occurs in NVIDIAs trace for mem refclk
09:51 karolherbst: ahh well, never happend on my gpu
09:52 karolherbst: then P=8 is also good
09:52 karolherbst: https://github.com/karolherbst/nouveau/blob/master_4.4/drm/nouveau/nvkm/subdev/fb/ramgk104.c#L980
09:54 karolherbst: ohhh
09:54 karolherbst: that's P what is 8?
09:54 karolherbst: mhhh
09:54 karolherbst: I meant N
09:54 karolherbst: PMPLL.MCLK0_COEF => { M = 0x1 | N = 0x8 | P = 0x1 }
09:54 karolherbst: from demmio
09:55 karolherbst: yeah well, that seems to be the low clock
09:55 karolherbst: there we only have one PLL to begin with
09:55 karolherbst: then one pll may contain garabage but it doesn't matter
09:56 karolherbst: ohh wait, down below there is also PMPLL.MCLK1_COEF => { M = 0x1 | N = 0x28 | P = 0x8 }
09:56 karolherbst: okay
09:57 karolherbst: 3.915 GHz
09:58 karolherbst: ohhh
09:58 karolherbst: wrong vlaue
09:58 karolherbst: 2.43GHz
09:58 karolherbst: yeah, that's the value
09:58 karolherbst: *2 and you got your 5GHz nearly
09:58 karolherbst: I ignored fN
09:59 karolherbst: let me see if I get P=8 on my gpu
10:01 RSpliet: 00013c: R[0x132030] := 0x14921007 # 0x132030
10:01 RSpliet: 00014c: R[0x132024] := 0x00082801 # PMPLL.MCLK1_COEF
10:01 RSpliet: 00019c: R[0x132004] := 0x00011201 # PMPLL.MCLK0_COEF
10:01 RSpliet: those add up to a very large clock
10:01 karolherbst: 4.860Ghz?
10:02 karolherbst: or 2.43GHz depending on how you want it
10:02 RSpliet: how? N is 12 there, not 8
10:03 karolherbst: (27000 * 0x28 / 0x8) * 0x12
10:04 karolherbst: fN being 0x1492 odd value
10:04 karolherbst: ohh wait
10:04 RSpliet: ah yes, I ignored the div value being changed, so I had an invalid refclk
10:04 RSpliet: makes sense
10:04 karolherbst: yeah well, fN doesn't matter anyway
10:05 karolherbst: but I don't get P=8 on my gpu
10:05 karolherbst: which clock does nvidia-settings reports on highest pstate?
10:05 RSpliet: idk
10:06 karolherbst: I thought when I put in your clock I should get the same...
10:07 karolherbst: I will try something
10:08 karolherbst: k, P=8 seems to work on my gpu, but I don't think there are many technically limits anyway
10:15 RSpliet: right, appears there's quite a lot left to RE in this area
10:18 RSpliet: oh, sorry for not explaining btw: 2GHz comes from the training table; I recall Ben telling me he never saw a reclock to 2GHz, but we've learned more about the PLLs since
10:18 karolherbst: well at least it seems that our main issue is rather core related, not memory (though there might be issues we will find, when our core stuff is more stbale)
10:18 RSpliet: I am curious whether part of the training routine should be executed at 2GHz
10:19 karolherbst: mem training?
10:20 RSpliet: yes
10:20 karolherbst: well
10:20 karolherbst: the closes I get to 2GHz is 1.998GHz
10:21 RSpliet: it doesn't have to be precisely 2GHz, that's not my worry
10:21 karolherbst: but how will that help us with the gpu you have?
10:21 RSpliet: my worry is that we execute part of the training routines at the maximum speed, or not at all, rather than at 2GHz
10:21 karolherbst: I really doubt it is memory related
10:24 karolherbst: as far as I am concerned the memory situation is quite good currently, only some display stuff missing while reclocking, or it is usually much more stable than reclocking the core
10:25 RSpliet: likely
10:25 karolherbst: you could run pixmark_piano and see if that also locks up the gpu, because with it the memory load is beneath 2%
10:26 RSpliet: I brought the memclk back to 1500MHz and it still hangs
10:26 karolherbst: clock from a pstate?
10:27 karolherbst: k
10:27 karolherbst: because there is a magic _not_use_ area in the memory clock
10:27 karolherbst: clocks near the limit where you go into 2 PLL mode are pretty unusable for whatever reasons
10:28 karolherbst: even with nvidia
10:28 RSpliet: I know
10:29 RSpliet: at a known stable memfreq it still hangs, so it's likely non-mem
10:30 RSpliet: gnurou: as I kind of spammed the backlog allow me to repeat my earlier question: is there anything you know about register 0x001548 and it's relation to clock changes?
10:57 karolherbst: RSpliet: how do you get to 0x001548 by the way?
11:00 karolherbst: mhh 0, 2, 1 is the order of clocking by the way
11:16 karolherbst: RSpliet: I am out of ideas, seriously. At that point I would just compare what is inside the mmio regs and see if I find something usefull in there, but even the trace doesn't touch much else and nouveau seems to be pretty close except for 1548
11:32 karolherbst: RSpliet: you know what? Maybe we should try out if a higher voltage fixes it, maybe there is something to actually lower the voltage requiernment or something else
11:34 karolherbst: RSpliet: so the idea would be to return info.max in nvkm_volt_map: after the if (vmap) check: https://github.com/karolherbst/nouveau/blob/stable_reclocking_kepler_v2/drm/nouveau/nvkm/subdev/volt/base.c#L117
12:51 karolherbst: mupuf: I think I will also do the power sensors configuration now, I should have done this earlier, but it makes sense. Any preference here or just max samples to get nice avarage values?
13:46 buhman: where's that A+ latest-nouveau image thing?
13:49 buhman: ahh https://github.com/hakzsam/archlive-nouveau
13:50 buhman: aren't there builds somewhere?
13:51 pmoreau_: buhman: There are some here: https://nouveau.pmoreau.org/
13:51 buhman: ahh
13:51 buhman: that's the page
13:51 buhman: should be /topic imo ;p
13:51 pmoreau_: Missing the ones from the past week due to valgrind failing to build; need to fix that
13:51 pmoreau_: s/ones/last one
13:53 hakzsam: oh archlive :)
13:54 pmoreau_: ;-)
13:54 pmoreau_: Still alive
13:54 hakzsam: yes
15:06 RSpliet: karolherbst: I'm afraid I don't have a kepler at hand right now, mind giving this one a test and verify nothing changed? https://github.com/RSpliet/kernel-nouveau-nv50-pm/commit/8ce553a53d6e222dd534735dac6a09133953a931
15:07 RSpliet: (apart from the values of the ctrl regs, they now disable lock test after use)
17:33 Nykalt: Does anyone know what the RAMRO entries actually look like?
17:33 Nykalt: and (weird question incoming) what "naughty"-method could I submit to a DMA channel to cause a write on RAMRO?
18:02 mwk: Nykalt: I do
18:03 mwk: https://github.com/envytools/envytools/blob/master/rnndb/fifo/nv1_pfifo.xml#L195
18:04 mwk: on a DMA channel, there's no way to cause a write on RAMRO
18:04 mwk: it's only used for PIO channel
18:04 mwk: channels
18:05 mwk: you're one of the PS3 guys, aren't you?
18:05 Nykalt: oh, I see. thank you. I forgot to check the XML files, heheh
18:06 Nykalt: yes I'm AlexAltea. I had to swap nicknames so that my other bouncer could connect
18:07 mwk: basically, aiming at any invalid method in 0-0xfc range on any subchannel will cause a write
18:07 mwk: and you control the data word, ie. every odd word
18:08 mwk: plus a few address bits
18:08 Nykalt: this was just another attempt of creating RAMHT entry. I thought about overlapping RAMHT and RAMRO and cause RAMRO to store something useful
18:08 Nykalt: I know you said yesterday something about PGRAPH, but I still don't get how these damn 0x401100+ regs behave
18:08 mwk: hmmm
18:08 Nykalt: but well further testing will make me understand
18:08 mwk: this is actually a reasonable plan
18:08 mwk: you control the second word, which is the interesting part for RAMHT
18:09 mwk: and you can know what the first word is going to be, which is the handle for RAMHT
18:09 mwk: the only thing you have to do is get a PIO channel
18:09 Nykalt: right, but I don't have PIO channels, and I don't have userland access to 0x800000 - ...
18:10 mwk: you don't?
18:10 mwk: how do you submit commands, then?
18:10 mwk: you should be easily able to switch a channel into PIO mode by flipping a bit in some PFIFO register IIRC
18:10 Nykalt: the driver allocates a buffer in CPU memory, and PFIFO fetches from there
18:10 mwk: yep, but how do you trigger the fetch?
18:11 Nykalt: as long as put != get, it will read it.
18:11 mwk: and how do you access put?
18:11 Nykalt: oh there's the driver info area, right
18:12 Nykalt: DMA control
18:12 Nykalt: sorry
18:12 Nykalt: DMA control, this thing: https://github.com/AlexAltea/nucleus/blob/master/nucleus/gpu/rsx/rsx.h#L23-L30
18:13 Nykalt: (that was written long time ago, i don't trust any of that)
18:14 mwk: that's the FIFO user area
18:14 mwk: somewhere at 0x800000+ MMIO
18:14 Nykalt: Oh, I see
18:15 mwk: try to determine what MMIO address it maps to
18:15 mwk: there are two options
18:15 mwk: there's 0x800000+ area, which is old-style, each channel is 0x10000 bytes, and it has full PIO capabilities and (IIRC) slightly limitted DMA caps
18:16 mwk: and there's 0xc00000+ area, which is new-style, each channel is 0x1000 bytes, and it has DMA capabilities only
18:16 Nykalt: ok
18:16 mwk: if they use the 0x800000 area, you should be able to just switch the channel to PIO mode, and you're done
18:17 mwk: it becomes the PIO submission area
18:17 mwk: and for 0xc00000... not sure how it behaves if the channel is PIO, maybe you can get a RAMRO error out of it
18:18 mwk: oh, and you'll need to disable the RAMRO interrupt, otherwise the hv will likely kick your ass for causing the error... but that's simple if you can access MMIO
18:18 Nykalt: ok, most likely it will be 0xC0000 as PIO doesn't seem to be used anywhere, but I will check
18:19 Nykalt: NV04_PFIFO_MODE could that be it?
18:19 mwk: yeah, I'm afraid so, but...
18:20 mwk: what page size does PS3 use?
18:20 mwk: yep, NV04_PFIFO_MODE is the PIO/DMA switch
18:20 Nykalt: 4 KB, 64 KB, 1 MB
18:21 mwk: 4kB, eh
18:21 mwk: I kind of hoped they had a larger page size that'd force them to use the old area for proper protection
18:23 Nykalt: yeah...
18:23 Nykalt: well thanks a lot once again :D
19:13 airlied: skeggsb: some race condition on i2c registing in -next? see linux-next: Tree for Mar 14
19:18 skeggsb: airlied: yeah i seen it, i dunno what's up there yet, we've changed nothing on our side there this cycle.. need to look at what is happening still
19:20 skeggsb: airlied: some other driver at the end of that mail is hitting the same WARN btw
19:43 airlied: ah cool can let i2c guys worry about it then :-P
19:44 airlied: might be an ordering issues they started warning on
19:44 skeggsb: it looks like:
19:44 skeggsb: /* Can't register until after driver model init */
19:44 skeggsb: if (unlikely(WARN_ON(!i2c_bus_type.p))) {
19:44 skeggsb: which, well, seems like we shouldn't be hitting that...
19:51 gnurou: RSpliet: this is a scratch register which usage is software-defined, I see nothing that indicates a relation to clocks