00:08 mupuf: karolherbst: nice finding (the temp offset)
00:08 karolherbst: I was searching for memory voltage thingy though :/
00:08 mupuf: TEMP_ALT and TEMP_ALT_OFFSET
00:09 karolherbst: nv117 doesn't have it :/
00:09 karolherbst: or the traces are strange
00:09 karolherbst: but this should be like always there
00:10 karolherbst: I assume there is something even newer on maxwell and on late fermi/kepler it was just an experiment
00:12 karolherbst: mupuf: what's wrog with variants="GF117:GM107" ?
00:12 karolherbst: GF119 doesn't seem to be picked up
00:25 mupuf: karolherbst: why don't you boot reator and check if the gm107 really does not have it
00:26 mupuf: nvidia very rarely deletes stuff from their hw
00:41 karolherbst: I meant like it wasn't used
00:45 karolherbst: its there
00:45 karolherbst: offset 0x40 though
00:45 karolherbst: 0x1d temp then
00:46 karolherbst: makes sense after boot
00:46 karolherbst: increasing
00:46 karolherbst: okay
01:09 karolherbst: mupuf: should there be a bitfield inside the reg or ist fine that demmio just prints the hex value?
01:10 karolherbst: mupuf: and later we should if there are ane pre GF119 with that reg
01:10 mupuf: yes, would be nice to check
01:10 karolherbst: anyone with a fermi here online?
01:10 mupuf: and no, no need for a bitfield
01:14 karolherbst: mupuf: that's all what is left for PTHERM: https://gist.github.com/karolherbst/954c8cfd28b0df8d6d73
01:16 mupuf: more fun for later :)
01:16 karolherbst: and I think nothing is related to memory voltage
01:16 karolherbst: because in my clocking trace there is only reads from 0x020144
01:16 mupuf: no, you would have a gpio for that, most likely
01:21 karolherbst: mupuf: [0] 18353.851804 MMIO32 R 0x00d638 0x00007000 PGPIO.GPIO[0xa] => { SPECIAL_IDX = NVIO_SLI_SENSE_0 | MODE = NORMAL | OUT | OE | IN } ?
01:22 mupuf: so?
01:24 karolherbst: its like the only thing inside the trace
01:25 karolherbst: does it seem okay for memory volting?
02:20 karolherbst: mupuf: I have some MXM GPIOs, I think I will also document them for nvbios
02:26 karolherbst: I also have a panel self refresh pin
02:26 mupuf: oh, sweet
02:27 karolherbst: "43 = HW Slowdown Enable. On assertion HW will slowdown clocks (NVCLK, HOTCLK) using either _EXT_POWER, _EXT_ALERT or _EXT_OVERT settings (depends on GPIO configured: 12, 9 & 8 respectively). Than SW will take over, limit GPU p-state to battery level and disable slowdown. On deassertion SW will reenable slowdown and remove p-state limit. System will continue running full clocks."
02:27 karolherbst: and this I also have
02:28 karolherbst: in total I have 9 unknown GPIO types, will make a commit for all of them
02:29 karolherbst: 134 = FB Clamp: This function is used to monitor the FB clamp signal driven by the Embedded Controller (EC) for JT memory self-refresh entry and exit.
02:29 karolherbst: memory self refresh? like turnon gpu off and this on, so that power consumption is really low?
02:30 karolherbst: or at least the cores and other stuff
02:35 karolherbst: "52 = Thermal Alert: Interrupt input from external thermal device. Indicates that the device needs to be serviced." also nice
02:35 karolherbst: I assume this is for automatic clocking if the gpu gets too hot
02:36 RSpliet: karolherbst: 46 is relevant from the top of my head
02:37 RSpliet: ah, and 24
02:57 karolherbst: RSpliet: yeah, I will just add everything I have here
02:57 karolherbst: but you mean like hex or dec?
02:58 karolherbst: ahh you mean for memory?
02:58 karolherbst: RSpliet: I only have 46
02:58 karolherbst: no 24
03:01 karolherbst: what can I do with this one? 135 = FB Clamp Toggle Request: This function is used to request the Embedded Controller (EC) to toggle the FB clamp signal.
03:01 RSpliet: decimal, for memory reclocking
03:01 karolherbst: yeah, found them in the doc
03:01 karolherbst: do I need like both or only one?
03:01 RSpliet: you don't *need* them, but those GPIOs are likely to change when reclocking memory
03:02 RSpliet: I think this is implemented in nouveau even
03:02 karolherbst: yeah, but I only have the 46 one
03:02 karolherbst: 24 seems to be the one switching between 1.5V and 1.8V
03:02 karolherbst: but I don't have it
03:03 RSpliet: if you don't have it, it's not used for sure ;-)
03:03 RSpliet: on Tesla 46 relied on the ODT setting
03:03 RSpliet: but just check whether the SEQ script in your trace contains a change of that GPIO line
03:04 karolherbst: I think "ENVY_BIOS_GPIO_THERM_ALERT = 0x11" is wrongly named
03:04 karolherbst: 0x34 should be alert
03:05 karolherbst: and 0x11 is just "Thermal Event Detect"
03:05 karolherbst: any ideas how to handle it?
03:11 karolherbst: https://github.com/karolherbst/envytools/commit/1f17265aeef77a965d9e11e31ad2c589f367fc06
03:17 RSpliet: karolherbst: I believe the list is derived from reverse engineering rather than from the open-gpu-docs
03:17 karolherbst: yeah I assumed as much
03:17 RSpliet: patches welcome
03:17 karolherbst: ^^
03:17 karolherbst: I am just worried about the rename
03:17 karolherbst: if you are fine with that, then its okay
03:17 RSpliet: why would you be worries?
03:17 RSpliet: *worried
03:18 karolherbst: maybe someone doesn't like it :D
03:18 RSpliet: then they will tell you
03:18 RSpliet: no need to "worry" about it ;-)
03:19 karolherbst: playing around with the ENVY_BIOS_GPIO_HW_SLOWDOWN_ENABLE gpio
03:26 karolherbst: RSpliet: having problem setting the mem volt gpio
03:26 karolherbst: SEQ script: https://gist.github.com/karolherbst/321a8f69ffe2a19f1a88#file-gistfile1-txt-L1257-L1406
03:26 karolherbst: blob seems to change it between 0x00002000 and 0x00007000
03:26 karolherbst: but I only manage to get 0x00006000 and 0x00007000
03:27 karolherbst: 0x00d638 is the reg
03:29 RSpliet: GPIO lines don't work as straight forward as most registers
03:29 RSpliet: look up the meaning of the lines and fiddle with an unconnected GPIO lines to see how you can make it change value
03:29 karolherbst: yeah
03:29 karolherbst: I assumed
03:30 karolherbst: after this SEQ script the gpio turns to 0x00002000 somhow
03:30 karolherbst: 0x00d604 doesn't exist at all
03:31 karolherbst: aha
03:31 karolherbst: now its 0x00002000
03:31 RSpliet: like magic
03:31 karolherbst: nvapoke 0x00d638 0x00006000 && nvapoke 0x00d604 0x1
03:31 karolherbst: !
03:31 karolherbst: pstate change to 0f without crash
03:32 karolherbst: thats unusual
03:32 karolherbst: okay nice
03:32 karolherbst: testing some heavier games now
03:33 karolherbst: gpio back to 0x00007000 after pstate change to 0a
03:34 karolherbst: nice
03:34 RSpliet: I'm fairly sure the code in nouveau attempts to set the GPIO, but you might have found a way to "unlock" GPIO changes. ... Or you got lucky with your swap to 0f :-P
03:34 karolherbst: it works
03:34 karolherbst: also bioshock infinite didn't crashed
03:34 karolherbst: okay, now it hangs
03:34 RSpliet: hmm
03:35 karolherbst: will check blob which value is for which state though
03:36 karolherbst: lsm
03:36 karolherbst: wow, the driver really didn't like that
03:36 karolherbst: FB_FLUSH_TIMEOUT
03:36 karolherbst: vm timeout 1: 0x00160000 5
03:37 karolherbst: mhh
03:37 karolherbst: seems less worse though
03:42 karolherbst: mupuf: updated
03:45 karolherbst: wow, the card is really in a bad mood now
03:46 karolherbst: will reboot
03:59 karolherbst: yeah
03:59 karolherbst: that sounds good
03:59 karolherbst: 00007000 is the value for the 07 pstate
03:59 karolherbst: its 00002000 on 0f
03:59 karolherbst: mupuf: one step nearer to stable 0f on gddr5 I guess
03:59 karolherbst: RSpliet: ^
04:01 mupuf: karolherbst: what GPIO is that?
04:02 karolherbst: 42
04:02 karolherbst: mem thingy
04:02 karolherbst: its 00007000 also for 0a
04:02 karolherbst: I could actually switch to 0f twice while somehting heavy was running
04:02 mupuf: RSpliet: do we really attempt to change the memory voltage?
04:02 karolherbst: yes
04:03 RSpliet: mupuf: it should
04:03 mupuf: ok, interesting
04:03 karolherbst: it goes back to 00007000 after a switch to 0a
04:03 RSpliet: on Tesla it does
04:03 mupuf: RSpliet: good job :p
04:03 karolherbst: but if I switch to 0f it stays 00002000
04:03 karolherbst: blob does the same
04:03 karolherbst: but there is still something missing
04:03 mupuf: so, what did you do to suddenly make it work?
04:03 karolherbst: magic
04:03 karolherbst: nvapoke 0x00d638 0x00006000 && nvapoke 0x00d604 0x1
04:03 karolherbst: :D
04:04 karolherbst: this changes 0x00d638 to 00002000
04:04 mupuf: where did you see this magic being done?
04:04 karolherbst: SEQ script
04:04 karolherbst: https://gist.github.com/karolherbst/321a8f69ffe2a19f1a88#file-gistfile1-txt-L1266-L1267
04:04 karolherbst: these two lines
04:05 karolherbst: do you know what is really strange?
04:05 karolherbst: nvapooke 0x00d604 gives ....
04:05 RSpliet: mupuf: it's hardly magic
04:05 mupuf: RSpliet: output enable?
04:05 karolherbst: so 0x00d604 doesn't exist :O
04:05 RSpliet: karolherbst: that's not true
04:05 RSpliet: mupuf: likely
04:05 karolherbst: yeah I know
04:05 mupuf: karolherbst: you need to start using the lookup tool
04:06 mupuf: and not being able to read a reg does not mean it does not exist
04:06 mupuf: well, it depends what you read back
04:06 karolherbst: yeah I know
04:06 karolherbst: I also found a reg nvapeek could read only sometimes
04:07 karolherbst: strange though, I don't have any 24 GPIO
04:07 karolherbst: mhh
04:07 karolherbst: gpio is 46 though
04:07 karolherbst: GPIO 10: line 10 tag 0x2e [MEM_VREF] OUT NEG DEF 1 gpio: normal
04:07 karolherbst: this one
04:08 karolherbst: 0x10f330 seems also important
04:09 karolherbst: yeah
04:09 karolherbst: changes value on pstate change
04:09 karolherbst: 00100070 for 0a and 07
04:09 karolherbst: 00100064 for 0f
04:11 karolherbst: okay
04:12 karolherbst: still something missing
04:15 karolherbst: mhh
04:15 karolherbst: FB PAUSE
04:19 karolherbst: okay...
04:20 karolherbst: either I was wrong and nouveau already does that or I messed the card in a way it works for nouveau now
05:50 karolherbst: what is pxbar?
05:50 karolherbst: seems something memory related, but can't figure what exactly
05:51 RSpliet: xbar sounds like a crossbar interconnect
05:51 karolherbst: I have a lot of writes into it
05:51 RSpliet: could be used for any kind of communication really
05:52 karolherbst: mhh
05:52 RSpliet: others might know better what this one is for
05:52 karolherbst: though the only value written into the regs is 0x0
05:52 RSpliet: oh, and nouveau pauses the card
05:52 karolherbst: so unlikely not important for memory?
05:53 karolherbst: while switching memory stuff?
05:53 RSpliet: yes
05:53 karolherbst: okay
05:53 karolherbst: I try to remove all unrelated line from the trace now :/
05:53 RSpliet: study the nouveau fuc implementation
05:53 karolherbst: its a little messy
05:54 karolherbst: I tried already to follow the memory calc stuff
05:54 karolherbst: but I think a trace would be alot more clear
05:54 RSpliet: a trace wouldn't reveal how nouveau pauses the card
05:54 karolherbst: how does pausing the card work?
05:55 RSpliet: study the nouveau fuc implementation
05:55 karolherbst: do you think the issue could be there?
05:56 karolherbst: RSpliet: you mean this? https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgk104.c?id=refs/tags/v4.2-rc6
05:58 RSpliet: no, I said "fuc implementation"
05:58 RSpliet: aka ramfuc
06:07 karolherbst: so the ramfuc.h file?
06:08 karolherbst: ahh
06:08 karolherbst: subdev/pmu/fuc
06:19 mwk: pmoreau: I'm sorry, I totally forgot you had two envytools PRs pending
06:19 pmoreau: mwk: No problem, they were waiting for some name cleanup I had to take care of. :-)
06:19 mwk: I commented on the blocklinear one
06:19 pmoreau: I saw your comments, I'll address them a bit later this afternoon
06:19 mwk: okay
06:20 pmoreau: Thanks for having a look!
06:20 mwk: I'll have a look at the display one
06:20 pmoreau: The EVO one? You had a look already iirc and you thought it was ok to merge, once we fixed the namings across envytools.
06:21 mwk: then it stays good, OK
06:22 pmoreau: Well, if you want to have another look to be sure, I won't stop you from doing it. ;-)
06:33 karolherbst: mwk: koriakin you?
06:34 karolherbst: mwk: "Thermal Alert: Interrupt input from external thermal device. Indicates that the device needs to be serviced."
06:34 pmoreau: karolherbst: He is
06:56 mwk: karolherbst: what about it?
06:56 karolherbst: you asked me something in the PR ;)
06:57 mwk: I asked that two lines down :p
06:57 karolherbst: or did you mean like all of them?
06:57 mwk: about SLI_SENSE
06:57 karolherbst: ahhh
06:57 karolherbst: I thought you meant the diff
06:57 karolherbst: okay, checking
06:57 karolherbst: SLI Raster Sync A
06:58 mwk: the SLI_SENSE name is based just on the stuff I easily figured out, ie. they're used to figure out if SLI connector is properly connected
06:58 karolherbst: SLI Raster Sync A: This signal is carried across the SLI bus to synchronize the RG between GPUs. This signal will always be set as Alternate.
06:58 mwk: turns out they have a more interesting use for it once the link is up
06:58 mwk: yep
06:58 karolherbst: the doc has nice bits
06:58 karolherbst: about these gpios
06:58 mwk: the magic start-a-frame signal
06:59 karolherbst: beware of 0x41 though :D
06:59 mwk: hm?
06:59 karolherbst: like 0x40 but "This signal is just the second GPIO that can be used for Raster sync from each GPU. It should only be defined when we have 2 pin sets being used on one board to allow more than two GPUs to run in SLI mode. One will be used with one pin set for input and the other will be used with the other pin set for output."
06:59 mwk: yep
07:00 pmoreau: karolherbst: From which documentation are you taking this out?
07:00 karolherbst: open gpu doc
07:00 mwk: it's exposed on the second SLI connector when there are two
07:00 karolherbst: pmoreau: ftp://download.nvidia.com/open-gpu-doc/DCB/2/DCB-4.x-Specification.html
07:00 mwk: ftp://download.nvidia.com/open-gpu-doc/DCB/2/DCB-4.x-Specification.html#_gpio_assignment_table
07:00 pmoreau: Oh right, thanks! :-)
07:01 karolherbst: the MXM bits sounds interessting though
07:01 karolherbst: just have to check what the pins are for
07:05 karolherbst: ...
07:05 karolherbst: GPIO0 GPIO1 GPIO2
07:05 karolherbst: yeah well
07:07 karolherbst: okay, I think the entire gddr5 0? => 0f things should be in here: https://gist.github.com/karolherbst/659802e29cfc1539d13b
07:07 karolherbst: a little bit more, though
07:07 karolherbst: but nothing should be left out
07:08 karolherbst: the COEF is different
07:08 karolherbst: CLK7 is memory?
07:30 karolherbst: I really need a second machine
07:54 marcosps1: imirkin: around? :)
07:54 imirkin: marcosps1: yeah... chances are the issue is that tgsi_text isn't prepared for parsing 2d output addressing
07:55 imirkin: and something goes horribly wrong when it encounters that scenario, leading to the totally wrong instruction
07:56 marcosps1: imirkin: But do you think I can send both patches now and send another ones to solve this address problem?
07:56 imirkin: is there a point in having those other patches upstreamed separately?
07:57 imirkin: imo it makes sense to fix a whole feature and upstream the patches as a group
07:57 marcosps1: imirkin: Ok so
07:57 marcosps1: imirkin: do you have any tip to solve the address problem? (I think this will be more challanging...)
07:58 marcosps1: *challenging
07:58 imirkin: marcosps1: look at where outputs are parsed
07:58 imirkin: and make sure that they have the same handling as inputs
07:58 imirkin: for 2-dimensional indexing
07:58 marcosps1: imirkin: Ok, I'll try to do it ASAP...
07:58 imirkin: i.e. <opcode> <dst> <src> <src> <src> etc
07:58 imirkin: look at the logic that handles the <dst>
07:59 imirkin: it's probably separate from the logic that handles the <src>'s
07:59 imirkin: (due to swizzling differences)
07:59 imirkin: destinations just get a writemask, while sources get a full swizzle
08:00 marcosps1: imirkin: noted here :)
08:02 marcosps1: imirkin: for now tess_eval is working fine, just tess_ctrl has this problem... I'm leaving, thank you for the help again :)
08:03 imirkin: only tess ctrl does 2d output writes
08:07 karolherbst: mupuf:
08:07 karolherbst: nvapeek 0x137000 0x10 : 00137000: 00010000 00012b1f e0000000 00000000
08:07 karolherbst: for 135MHz clock in blob
08:07 karolherbst: 0x137004 is COEF
08:08 karolherbst: the same as nouveau has for 405MHz
08:08 karolherbst: :/
08:08 karolherbst: so there is something else if you want to get lower clocks
08:10 karolherbst: or it just works, but nouveau reports it wrong. don't know
08:12 RSpliet: karolherbst: just look inside the nouveau clock code, this stuff is well implemented
08:12 karolherbst: RSpliet: this is somehow outside cstate bounderies
08:12 karolherbst: its inside the boost table
08:12 RSpliet: "clock code"
08:13 RSpliet: eg nvkm/subdev/clk
08:15 karolherbst: yeah know that part. But I don't think you can go to that deep clocks by just using the cstate/pstate mechanics as currently done
08:18 RSpliet: nvkm/subdev/fb/ramgk104.c generates clock values ln 763 onward
08:19 karolherbst: but this is for memory, isn't it?
08:19 RSpliet: yep
08:19 karolherbst: and also ddr3
08:19 karolherbst: but I meant the core and also have gddr5
08:20 RSpliet: GDDR5 code has a similar chunk
08:20 RSpliet: other cores are done inside nvkm/subdev/clk
08:21 karolherbst: yeah, these bits seem correct for me inside the boundaries of the cstep values inside the vbios
08:21 karolherbst: but I am more like talking about values outside these boundaries
08:22 RSpliet: ok, that bit I can't tell you. Nouveau probably doesn't set those clocks; question is whether the blob properly changes the clocks or rather has a clock gating mechanism
08:22 karolherbst: mhh
08:22 karolherbst: the boost table reports the values the blob is reporting
08:23 karolherbst: cstep starts at freq 810 MHz
08:23 karolherbst: boost table: 0: pstate 7 min 270 MHz max 810 MHz
08:23 karolherbst: and 270 is actually the lowest value the blob uses
08:23 RSpliet: sure
08:23 RSpliet: but there's more than one way to adjust a clock
08:23 karolherbst: yeah
08:23 karolherbst: I figured as much
08:24 karolherbst: so that's what I try todo, to find out how the blob does this
08:30 karolherbst: wow, there are a lot of unknown regs inside PCLOCK
08:31 karolherbst: mhhh
08:31 karolherbst: maybe
08:31 karolherbst: no
09:05 karolherbst: hmpf
09:06 karolherbst: does gddr5 0f work stable anywhere?
09:06 karolherbst: ...
09:06 imirkin: blob drivers ;)
09:06 karolherbst: rephrase: is there somebody who can use 0f pstate with gddr5 in a stable manner?
09:07 karolherbst: :D
09:07 karolherbst: at least the PCLOCK regs seems fine :/
09:07 karolherbst: couldn't find anything usefull for boost clocking either
09:08 karolherbst: except 0x137024 and 0x137044 look strange
09:08 karolherbst: but thats like CLK1 and CLK2
09:09 karolherbst: to bad I can't disable this boost clockig :(
09:10 karolherbst: or keep the blob at 07 max speed
09:14 karolherbst: well ...
09:45 karolherbst: okay, nice, found something now
09:46 karolherbst: https://gist.github.com/karolherbst/659802e29cfc1539d13b#file-gistfile1-txt-L74-L87 these lines aren't done inside nouveau I think
09:47 karolherbst: like 100% sure, but I don't know if they are important or something
09:50 karolherbst: yeah
09:50 karolherbst: and the regs are the same for nouvea and blob 07
09:50 karolherbst: but blob 0f are different
09:56 karolherbst: here are both scripts if somebody want to dig deeper: https://gist.github.com/karolherbst/a5dd956189a533ff3e6d
10:06 karolherbst: mupuf: any idea what I should try out? https://gist.github.com/karolherbst/a5dd956189a533ff3e6d#file-nvapeek-0x132000-0x30
10:06 karolherbst: its the PMPLL part
10:07 karolherbst: 00132004 looks really odd on nouveau 0f
10:22 imirkin_: anyone using reator?
10:53 imirkin_: ssssuper. yet-another difference in gm107 txq indirect handling compared to the other gens.
11:21 imirkin_: oh duh. of course none of the tess stuff works on maxwell
11:22 imirkin_: well at the very least opcode emission seems right. need to figure out how all the base address stuff works though =/
11:33 karolherbst: :/
11:36 imirkin_: and of course nvdisasm doesn't actually want to decode their own shaders
11:38 karolherbst: obviously
11:38 karolherbst: and my funny reg value isn't a costant in the source :/
11:50 imirkin_: urgh. looks like the vsetp decoding is off in envydis for gm107 :(
11:51 imirkin_: problem upon problem.
12:07 karolherbst: nice
12:07 karolherbst: I am sure that this line isn't right: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgk104.c?id=refs/tags/v4.2-rc6#n160
12:09 imirkin_: any thoughts on how to make it right? :)
12:09 karolherbst: value is 0xa04 but should be more like 0x10b01
12:09 imirkin_: off-by-one for N2 and way off for P2 i guess?
12:10 imirkin_: actually all the values are off
12:11 karolherbst: mhhh
12:11 karolherbst: used the blob value
12:11 karolherbst: and got something like this: [ 986.874409] nouveau E[ PFB][0000:01:00.0][0x00000000] ramcfg data for 4455MHz not found
12:49 karolherbst: imirkin_: do you know whats also strange?
12:49 karolherbst: nouveau just sets a different clock
12:49 karolherbst: not the one in the cstep or turbo table
12:49 imirkin_: might explain some of the differences ;)
12:49 imirkin_: iirc ben was talking about having to do double-reclocks some fo the time
12:50 imirkin_: dunno fi that's what's happening there
12:50 imirkin_: i.e. to jump to a really high freq, have to jump to a middle one first
12:50 karolherbst: I could try to set the clock to something like 3.5GHz
12:50 karolherbst: maybe this works stable enough
13:00 karolherbst: imirkin_: value in cstep is 4008MHz but nouveau sets it to 4050MHz
13:00 imirkin_: i don't know anything about that code, sorry
13:01 imirkin_: btw, i'm going to push a change to disable tess on maxwell
13:01 imirkin_: unclear if i'll be able to work it out by mesa 11
13:01 karolherbst: got 1620MHz on 0f now :D
13:02 karolherbst: but I think you can't just write random stuff in it
13:02 RSpliet: imirkin_: probably has something to do with link training
13:05 karolherbst: wow
13:06 karolherbst: yay
13:08 karolherbst: seems to work now
13:08 karolherbst: https://gist.github.com/karolherbst/a059addc76cdbf9acab8
13:08 karolherbst: performance seems to be a little worse though
13:09 karolherbst: and 0a and 07 get like 360MHz mem clock reported
13:12 karolherbst: okay, the 369 MHz mem clock is the real value indeed
13:12 karolherbst: imirkin_: what stresstest would you do?
13:13 karolherbst: changing every 0.5 seconds infinite?
13:13 imirkin_: unigine heaven shouldn't be too bad
13:13 karolherbst: running bioshock infinite
13:13 karolherbst: after like 10 changeses between 0a and 0f still working
13:14 karolherbst: yep
13:14 karolherbst: seems stable enough
13:14 imirkin_: iirc mupuf tested up to 1M switches ;)
13:14 karolherbst: :D
13:14 karolherbst: but I think it matters that you do something in between
13:14 karolherbst: also having a 360MHz mem clock isn't that good :/
13:15 karolherbst: slow as hell
13:15 karolherbst: pstates 07 and 0a are unusable now because of that :/
13:15 karolherbst: but I use constant values
13:17 karolherbst: borderlands presequel between 30 and 40 fps
13:17 karolherbst: sometimes even 45
13:17 karolherbst: its pretty awesome if you think about it
13:17 karolherbst: max settings
13:18 karolherbst: imirkin_: [ 4785.202773] nouveau E[bioshock.i386[1240]] nv50cal_space: -16
13:19 karolherbst: do you know what that could be?
13:19 imirkin_: yeah... ran out of pushbuf space
13:19 imirkin_: gpu not processing commands fast enough
13:19 karolherbst: ohh
13:19 imirkin_: which is generally aka gpu hung
13:19 karolherbst: mhh
13:19 karolherbst: memory clocked at 360MHz?
13:19 karolherbst: may be the issue
13:20 imirkin_: or rather, ran out of ib ring space
13:20 karolherbst: never happend before
13:20 imirkin_: but... same diff
13:20 karolherbst: okay
13:20 karolherbst: changed pstates 20 times while running borderlands
13:20 karolherbst: and no issue
13:21 karolherbst: furmark :D
13:22 karolherbst: okay, running
13:23 karolherbst: 1442 points
13:23 karolherbst: nice
13:24 karolherbst: GTX 680 was at 583 in the phoronix benchmark
13:24 karolherbst: mupuf: ping!
13:25 karolherbst: wow
13:26 karolherbst: the gputest triangle benchmark actually hangs my intel gpu now :O
13:27 karolherbst: okay
13:27 imirkin_: that's how fast the nvidia gpu is? fast enough that the intel gpu hangs its head in shame?
13:27 karolherbst: imirkin_: I reclocked while borderlands and bioshock, ran all the gputest benchmarks, and talos starts still fine
13:27 karolherbst: :D
13:27 karolherbst: its DRI_PRIME
13:27 karolherbst: and triangle pushes a lot of frames through
13:28 karolherbst: glxgears default size: 55504 frames in 5.0 seconds = 11100.729 FPS
13:29 karolherbst: yeah
13:29 karolherbst: my cursor sometimes just hangs
13:29 karolherbst: running just "talos" and glxspheres at top speed
13:30 karolherbst: talos at 20 fps high settings :)
13:30 karolherbst: ultra actually
13:30 imirkin_: is that good?
13:30 karolherbst: yeah
13:30 karolherbst: even the blob has problems there
13:30 karolherbst: talos has ingame fps counter
13:30 imirkin_: i have no idea what to expect from those games on non-shit gpu's
13:31 karolherbst: so I will check blob performance there too
13:31 imirkin_: i just know that all of these games barely run on any of my chips at lowest settings + 1200x800 or so res
13:31 karolherbst: dynamic lightning still a problem though
13:31 karolherbst: talos at full hd :)
13:32 RSpliet: karolherbst: yes, well a hammer isn't very useful at fixing lightbulbs
13:32 RSpliet: but I'm glad to hear you're enjoying your new-found performance :-P
13:32 karolherbst: :D
13:32 karolherbst: 0f stable at gddr5?
13:32 karolherbst: sounds good to me
13:33 karolherbst: but then we actually know where the issue is, don't we?
13:33 RSpliet: well, you know
13:33 RSpliet: I've been too busy with work to pay too much attention to the details
13:33 karolherbst: RSpliet: my "patch" https://gist.github.com/karolherbst/a059addc76cdbf9acab8
13:33 imirkin_: ship it!
13:34 karolherbst: :D
13:34 karolherbst: wow :O
13:34 karolherbst: thats a new record
13:34 imirkin_: ?
13:34 RSpliet: doesn't the original code come up with similar values?
13:34 karolherbst: ohh talos changed settings
13:34 karolherbst: too bad
13:35 karolherbst: I just though I hit like 70% blob performance
13:35 RSpliet: and/or test for PLL lock and revert otherwise?
13:35 karolherbst: imirkin_: blob at 40 fps, nouveau at like 19fps
13:35 imirkin_: heh
13:35 karolherbst: still 50%
13:36 imirkin_: still some ways to go i guess
13:36 karolherbst: yeah well
13:36 karolherbst: half is done I would say
13:36 imirkin_: are you all clocked up?
13:36 karolherbst: :D
13:36 imirkin_: 8GT/s?
13:36 karolherbst: yes
13:36 karolherbst: everything up
13:36 imirkin_: incl highest cstate?
13:36 karolherbst: but the blob may actually get higher
13:36 karolherbst: yes
13:37 imirkin_: well, the big things where we suck are instruction ordering and general compiler opts
13:37 RSpliet: imirkin_: it's not terrible...it helps though if you know what to optimise for :-P
13:37 karolherbst: yeah, but 50% is still good
13:37 imirkin_: although i somewhat doubt that amounts to a 2x in perf
13:37 karolherbst: mhhh
13:38 karolherbst: talos has some gl issues anyway
13:38 karolherbst: maybe a bad idea to test against that
13:38 imirkin_: but what do i know
13:38 RSpliet: karolherbst: what's the resulting clock frequecy of your card after applying that hack?
13:38 karolherbst: 4059?
13:38 karolherbst: yeah, 4059
13:39 karolherbst: without hack 4050
13:39 imirkin_: that's a lot of hz
13:39 karolherbst: cstate max is 4008
13:39 karolherbst: :D
13:39 karolherbst: imirkin_: blob allows to put like 4008 more on that
13:39 imirkin_: huh?
13:39 karolherbst: so max is 8016 if you really want to
13:39 karolherbst: coolbits
13:39 imirkin_: errrrrr
13:39 imirkin_: that seems high.
13:39 karolherbst: well
13:40 imirkin_: are you sure it's not a factor-of-2 diff in how things are printed?
13:40 karolherbst: I think it is
13:40 RSpliet: high as a kite... blob doesn't get it stable up there
13:40 karolherbst: raw cstate values are like 2004000
13:40 imirkin_: and it's ddr ram, i.e. double-double-rate, or quad-rate ;)
13:40 karolherbst: yes
13:40 karolherbst: so who knows what the real clock is anyway
13:41 imirkin_: [jk, in case that's not clear. ddr = double-data-rate]
13:41 karolherbst: RSpliet: fun fact: with that hack 0a and 07 stay at 360MHz mem clock
13:42 RSpliet: how is that fun? it probably just mean it bypasses the PLL altogether and just uses a post-divider to get relatively close
13:42 karolherbst: RSpliet: these are the values I was checking against: https://gist.github.com/karolherbst/a5dd956189a533ff3e6d#file-nvapeek-0x132000-0x30
13:42 karolherbst: nouveau is nicier to the hardware anyway
13:43 karolherbst: the blob just doesn't care about the cstates completly
13:43 RSpliet: I'm guessing crystal 27MHz, what-used-to-be-RPLL-in-nouveau-terms multiplies it by 80, post-divider of 16
13:43 karolherbst: yes
13:44 karolherbst: 00132004 reg : mcoef 00132024 reg : rcoef
13:45 RSpliet: I know
13:45 karolherbst: its funny that these values seems to matter a lot
13:45 RSpliet: well yes
13:45 RSpliet: you can't make a PLL do arbitrary things
13:45 karolherbst: yeah okay
13:45 karolherbst: but I think I got like 5-10% less performance now
13:45 karolherbst: maybe nouveua just calculates the current clock wrongly
13:46 karolherbst: and earlier the mem clock was set to 4500 or something
13:46 karolherbst: actually, I could try that out with the blob
13:46 RSpliet: my guess is that the ram->P2 may never be 0
13:47 RSpliet: which is kind of what the PLL table says too
13:48 RSpliet: so yet, there might be a bug in the calculation code for the PLL coefficients
13:48 karolherbst: okay, the blob seems to be able to do 2MHz steps
13:49 karolherbst: nope
13:49 karolherbst: I don't get the blob even near the values ouveau uses
13:50 RSpliet: karolherbst: so there's your lead for a proper patch: make sure the pll generation code adheres to the boundaries in the PLL limits table
13:50 karolherbst: mcoef: 10011101 00010e01 00010b01
13:50 karolherbst: these values I get
13:50 karolherbst: mhh
13:50 karolherbst: okay, will try that
13:51 RSpliet: (and double-check whether this is the case... not sure if P2 encodes a shift value or a divider. In the first case a shift of 0 is a division by 1, which is within range)
13:52 karolherbst: I could try to poke nouveau values in and check what the blob reads
13:52 RSpliet: it's not about decoding the values, it's about generating
13:52 karolherbst: I know
13:52 karolherbst: I just wanted to know what nouveau tried to use
13:52 RSpliet: you can try that, but it's likely already documented
13:53 RSpliet: reading code is simpler, because it doesn't crash your system ;-)
13:53 karolherbst: well
13:54 karolherbst: the blob was very unhappy with nouveaus values
13:57 karolherbst: I really don't know why
13:57 karolherbst: but my intel card seems to feel the pain the nvidia gpu has to endure
14:01 karolherbst: mupuf: up to checking some gddr5 cards today or tomorrow?
14:18 karolherbst: oh well
14:22 karolherbst: how can I debug that? "insmod: ERROR: could not load module Dokumente/repos/nouveau/drm/nouveau/nouveau.ko: Cannot allocate memory"
14:25 karolherbst: blob still loads :/
14:28 karolherbst: strange, works now
14:49 karolherbst: mhh
14:50 karolherbst: RSpliet: mcoef is like garbage for 07 and 0a all the time
14:50 karolherbst: or maybe these values shall not be stable at all
14:51 karolherbst: only 0f gets the same
15:55 karolherbst: well look at that
15:56 karolherbst: RSpliet: it seems like there is some sort of unstability with the values. Sometimes they are like the blob and sometimes not
15:58 RSpliet: karolherbst: randomness is not part of the algorithm, there must be something else wrong... uninitialised value?
15:59 karolherbst: mhh
15:59 karolherbst: or me messing with the values?
15:59 karolherbst: stock nouveau seems more stable
16:00 karolherbst: bit 2 is sometimes different though
16:00 karolherbst: okay, will investigate this tomorrow
16:00 karolherbst: have to sleep now
16:27 imirkin_: man, this maxwell tess stuff is no joke
16:28 imirkin_: all of a sudden it seems like they think that writing to gmem in tcp is a good idea
16:28 RSpliet: tcp?
16:28 imirkin_: tess control program
16:28 imirkin_: and there are these funny $directbewriteaddresslow/$directbewriteaddresshigh special register values
16:28 RSpliet: right... not that you need a protocol covering 7 OSI layers :-P
16:28 imirkin_: [which is where stuff is being written]
16:35 imirkin_: i'm going to ignore this and hope that some interested party comes along to implement it
16:35 imirkin_: maxwell can't even complete a piglit run with either nouveau or blob fw, so i'm not too worried about it getting a lot of use :)