03:18 Wolf480pl: Yesterday I was here and I had an issue with HUB_INIT timing out on NVE4, and I was told to try the kernel module from the hack-gk106m branch. It worked, but today it stopped working - "HUB_INIT timed out" is back.
03:28 Wolf480pl: but if I first try to `insmod nouveau.ko runpm=0` without modprobing dependencies - it fails because ttm is missing - then I `modprobe ttm`, then `insmod nouveau.ko runpm=0`, then it works ok, without any errors
03:45 Wolf480pl: actually, same thing happens with the mainline nouveau.ko
03:53 joi: karolherbst, imirkin: don't use --mmt-trace-file, mmt can detect those files automatically; tracing fail on newer blob is probaly due to another device file being opened
03:56 joi: demmt fails to find objects probably because they were created on another fd, which mmt didn't catch
03:58 karolherbst: joi: I see
03:58 karolherbst: will check
03:58 karolherbst: joi: you mean a device file in /dev, but even with the newest there is only nvidia0 and nvidiactl
03:59 karolherbst: and dri/card1 of course
04:02 joi: you should not see dri nodes with blob drivers...
04:02 karolherbst: wait
04:02 karolherbst: oh no, not that stupid bug again
04:02 joi: ?
04:02 karolherbst: yeah, stupid modesetting
04:03 karolherbst: https://bugs.freedesktop.org/show_bug.cgi?id=91388
04:03 karolherbst: this bug
04:03 karolherbst: X is just creazy
04:03 karolherbst: *crazy
04:03 karolherbst: now, my main X server loads modesetting for the nvidia card
04:03 karolherbst: allthough the nvidia card is loaded with blob on another one
04:03 karolherbst: if I load nouveau now, the main server crashes :)
04:10 joi: karolherbst: can you paste output of "strace -e open $command" on 352.21?
04:10 karolherbst: which command?
04:11 joi: bin/shader_runner tests/spec/arb_gpu_shader5/execution/sampler_array_indexing/fs-simple-texture-size.shader_test -auto
04:11 karolherbst: ohh, anyhting nvidia related I guess
04:11 karolherbst: k
04:12 karolherbst: https://gist.github.com/karolherbst/a609709fb083cc70ad39
04:14 joi: heh, so it's not some unknown /dev node :/
04:19 karolherbst: should I do something else?
04:21 joi: if you want you could debug it and find how those missing objects are created
04:24 joi: if on mmt side you'll inject OBJECT_INFO calls on object 0xcaf00002 (which demmt can't find) at each ioctl, you might figure out which ioctl (or something else) created this object
04:25 joi: mmt already has code for ioctl injection (inject_ioctl_call function)
04:32 karolherbst: imirkin: pretty wild guess, but could the 0f pstate instability can come from the fact, that the card stays in a lower PCIe mode?
04:51 mupuf: karolherbst: I doubt the PCIe clock changes the stability of the GPU
04:52 karolherbst: well, the cards stays at 2.5 link status anyway
04:52 karolherbst: couldn't a 5.0 or 8.0 status improve performance anyway?
04:53 karolherbst: at least from the blob driver I know, that it pushes link speed according to performance states
04:53 karolherbst: 07: 2.5, 0a: 5.0, 0f: 8.0
04:54 karolherbst: and because memory seems to be the issue on high states, maybe pci just is too slow for very high clocks
04:56 mupuf: karolherbst: it will change the load time
04:56 mupuf: that's it
04:57 mupuf: I doubt any modern game would be limited by the PCIe bw
04:57 karolherbst: mhh but how hard would it be to set the card to a higher link speed?
04:58 mupuf: well, how about you check it out for yourself ? :D
04:58 karolherbst: I am trying, but currently I didn't fund anything usefull
04:58 karolherbst: *find
04:58 mupuf: it should be easier to find the programming sequence than finding the PWM controler since we already know the addresses of the PCI registers
04:59 karolherbst: ahh, so we just check what the driver tells the card, so that the card set itself to a higher link speed, I see
05:01 mupuf: well, drivers are exactly that
05:01 mupuf: bashing registers in the right order
05:01 mupuf: and with the right values
05:01 mupuf: that is all there is
05:02 karolherbst: yeah, I see something related to it in the demmio output
05:02 karolherbst: PPCI.EXP_LNK_CMD_STA shows the current status
05:02 mupuf: some fancy hw have asynchronous command submission ... construct a list of operations to do in one big buffer and send it to the gpu
05:03 mupuf: now try to find a reclocking operation that increases the link speed or decreases it
05:03 mupuf: it is possible that it would be as simple as a write to PPCI.EXP_LNK_CMD_STA
05:03 karolherbst: yeah, I have to check where I get a 5_0 or 8_0 status
05:03 karolherbst: ...
05:04 karolherbst: now I know why grep on demmio doesn't work
05:04 karolherbst: or not that simple at least
05:04 mupuf: colors?
05:04 mupuf: you can disable colors
05:04 mupuf: I always less the output the mmiotrace
05:04 mupuf: demmio*
05:05 mupuf: and use the search function of less
05:05 karolherbst: I see
05:05 mupuf: when I deal with a mmiotrace a lot, I also decompress it
05:05 mupuf: it is much faster for searches
05:05 karolherbst: I have lines like this: PPCI.EXP_LNK_CMD_STA => { CMD = { ASPMC = 0 | CCC | CLOCKPM } | STA = { SPEED = 5_0GT | WIDTH = 16 | SL_CLK } }
05:05 mupuf: that's a read
05:05 karolherbst: yeah
05:06 karolherbst: https://gist.github.com/karolherbst/84e49016cf3d7e6b6ad4 ?
05:07 karolherbst: ohh
05:07 karolherbst: I see a pattern
05:07 karolherbst: maybe not
05:08 karolherbst: 197.202728 MMIO32 R 0x088088 0x11030140 PPCI.EXP_LNK_CMD_STA => { CMD = { ASPMC = 0 | CCC | CLOCKPM } | STA = { SPEED = 0x3 | WIDTH = 16 | SL_CLK } } this is 8_0 ?
05:08 karolherbst: most liekly
05:08 mupuf: karolherbst: http://pastebin.com/raw.php?i=ez5aLDFJ <-- on my maxwell card
05:08 karolherbst: :)
05:09 karolherbst: okay, patch for demmio coming
05:09 mupuf: what???
05:09 mupuf: what patch?
05:09 martm: mupuf: hi
05:10 mupuf: martm: hi
05:10 mupuf: karolherbst: the register 88150 seems like a status register
05:10 mupuf: or a force refresh or something
05:10 martm: mupuf: is it possible to access physical ram in gpu agnostic way map it to cpu address space, so that the gpu would know that those pages can't be used by gpu anymore?
05:11 mupuf: martm: no, you would need to change the fpu memory allocator
05:11 mupuf: gpu*
05:11 martm: hmm, yeah ok that is what thought
05:11 mupuf: karolherbst: the relevant writes are probably above
05:12 mupuf: the write to the register 0x088150 may just be to force the PCIe circuitry to go in and out of usage
05:12 karolherbst: preparing patch :)
05:12 mupuf: since everything is power and clock gated as much as possible
05:12 mupuf: karolherbst: what patch?
05:12 karolherbst: displaying 8_0 pcie speed
05:12 karolherbst: in demmio
05:13 karolherbst: pretty minor and unimportant
05:13 mupuf: AH!
05:13 mupuf: ok, that I accept as valid
05:13 mupuf: imirkin: hey, did you take the mmiotrace you wanted?
05:14 karolherbst: mupuf: https://github.com/karolherbst/envytools/commit/73634e92c7d9adb86580505ced24190e0612121d
05:16 mupuf: well, it sounds plausible :D
05:16 karolherbst: :D
05:16 mupuf: did you check with what nvidia returns in the system settings/
05:16 karolherbst: yeah
05:16 karolherbst: always checked with lspci and nvidia-settings
05:17 mupuf: very well, let me push it then
05:17 karolherbst: the demmio output also has now 8_0 values
05:17 karolherbst: it had 0x3 before
05:18 karolherbst: mupuf: makes sense, doesn't it? https://gist.github.com/karolherbst/d694d50867d74e413c26
05:19 mupuf: karolherbst: can you pastebin your patch? Can't find a way to get it from github
05:19 karolherbst: .patch
05:20 mupuf: thanks
05:20 mupuf: pretty clever, works great with wget
05:21 martm: hmm, can swiotlb used to make sure that video memory in gpu virtual address space is physically contiguous?
05:21 karolherbst: mupuf: it does also works with PRs
05:21 mupuf: only the TTM allocator could make sure of this ... but that would be a pain
05:21 karolherbst: mupuf: https://patch-diff.githubusercontent.com/raw/envytools/envytools/pull/11.patch ;)
05:22 mupuf: and it would also not be foolproof
05:22 mupuf: martm: you may be interested in glisse's patchset
05:22 mupuf: https://lkml.org/lkml/2015/1/5/715
05:22 karolherbst: mupuf: is it enough to check the PPCI stuff or should there be more
05:23 mupuf: everything should be in the PPCI range
05:23 mupuf: which is smaller than needed, apparently
05:23 mupuf: martm: what is your goal?
05:29 martm: mupuf: i want to make a big allocation in vram, to force a context to the location where i could read it easily, so it would be in physical location which i roughly know, then the script learns the context and knows where the appropriate addresses are located, and then start sending the opcodes via vbo, which is bit geeky
05:30 mupuf: and you want to do that with the proprietary driver, right?
05:30 martm: yeah
05:30 mupuf: well, I doubt this will ever suceed
05:31 mupuf: and why not simply use the peephole to do this?
05:31 martm: hmm, what is it, some pointers?
05:31 mupuf: I mean, you can access the entire memory of the gpu through a PCIe ram window
05:32 mupuf: if you are worried about finding where the blob allocates its contexts in memory, it should not be too hard to find
05:32 mupuf: and it should be a constant address to begin with
05:32 martm: mupuf: yeah but its hard to calibrate the whole vram, would want to do it roughly based of some quite new context, that only has some data in it
05:32 mupuf: you can read back the address of the context from a known location
05:33 mupuf: there is hw-based context management
05:33 mupuf: and the list of channels (graphics contexts) is available through MMIO
05:33 martm: in fact yeah this allocation stuff, is where i am not an expert of, i am not worried about that i can not send the context via vbo, based of open source code at least
05:34 mupuf: why do you want to do this via vbo?
05:34 mupuf: you know that we reverse engineered the IOCTL interface of the blob?
05:34 mupuf: https://github.com/shinpei0208/gdev everything you need
05:35 mupuf: the ioctls may have changed
05:36 mupuf: there is no way your approach could be done nicely. Using the ioctls is much easier and fool-proof
05:36 mupuf: and it will be gpu-independant
05:38 martm: because ibo or vbo is high level interface for it, and in source code it seems they all stuff that next to context, basically into context, at least index buffers, so its kinda easy and gpu agnostic way to try to do some vulkan like experiments:)
05:38 martm: but i am trying to grasp your earlier sentences at the moment
05:38 karolherbst: mupuf: okay, I think I am nearly there
05:39 karolherbst: found a 8_0 => 5_0 and a 5_0 => 2_5 speed change with the same commands in between but one or two values are different
05:39 mupuf: karolherbst: brilliant, put that to pastebin!
05:39 karolherbst: wait
05:39 karolherbst: this is down speed though
05:40 karolherbst: up speed seems to be different :/
05:40 mupuf: still, but it to pastebin, to save it
05:40 karolherbst: mupuf: https://gist.github.com/karolherbst/4c2400f85ee54c4aa11c
05:40 karolherbst: htis pattern is damn strong
05:40 mupuf: martm: well, you need to understand one thing, the blob does userspace-based command submission
05:40 karolherbst: :D
05:41 karolherbst: line 17 is different
05:41 mupuf: but you still need to make some calls to the driver to allocate a context and memory
05:41 martm: mupuf: still would you elaborate, how exactly i would know the context address, it seemed quite interesting, but klowledgly way out of my comprehension or understanding or league
05:41 martm: ?
05:41 karolherbst: but its a read :/
05:41 karolherbst: damn
05:41 mupuf: and no idea how pushing commands works, but the gdev code does that
05:42 mupuf: martm: I can elaborate, but I really don't think you will go anywhere close to what you want
05:42 mupuf: and using the gl api to write to a command buffer sounds completely nuts
05:42 mupuf: just go to a lower-level interface
05:43 mupuf: I would even say that you should just use nouveau for your experiment
05:43 mupuf: this is a serious project you are talking about
05:43 karolherbst: mupuf: updated: https://gist.github.com/karolherbst/4c2400f85ee54c4aa11c
05:43 martm: yeah with nouveau and all drm drivers, its not a problem to write a patch
05:43 karolherbst: there should be _:something_ usefull in them
05:44 karolherbst: ahhh
05:44 karolherbst: I think, yes
05:44 karolherbst: yeah
05:44 mupuf: a patch :D That will take a lot more just to implement vulkan's command submission stuff. But hey, how did you get access to it?
05:45 karolherbst: mupuf: look at lines 12-18 in file1 and file2
05:45 karolherbst: first is 5_0 to 2_5
05:45 karolherbst: second is 8_0 to 5_0
05:46 mupuf: karolherbst: thanks
05:46 karolherbst: 1 and 3 is the same
05:46 karolherbst: diffrerent time
05:46 martm: mupuf: i think it should be possible with windows based memory manager, there you could map physical pages in unified way, and theyd be locked from graphics use, then if context management is intelligent one round about knows where it is stuffed
05:47 martm: but in linux i do not know how to do that or android
05:47 karolherbst: okay, got it
05:48 karolherbst: W 0x08c040 0x80009000 0x8c040 <= value: {2_5: 0x80089000, 5_0: 0x80049000, 8_0: 0x80009000}
05:48 karolherbst: R 0x08c040 0x80009000 0x8c040 should read above value
05:48 karolherbst: mhh
05:48 karolherbst: W 0x08c040 {2_5: 0x80089000, 5_0: 0x80049000, 8_0: 0x80009000} 0x8c040
05:48 karolherbst: R 0x08c040 above value 0x8c040
05:48 karolherbst: W 0x08c040 above +1 0x8c040
05:48 karolherbst: R 0x08c040 above value 0x8c040
05:48 karolherbst: => state change
05:49 karolherbst: mupuf: seems valid?
05:49 karolherbst: there is a SLEEP 0.150000ms in between the last write and read on down speed though
05:50 karolherbst: no, 0.15 to 2_5
05:50 karolherbst: 0.115 to 5_0
05:51 mupuf: karolherbst: I got my maxwell to go from 2.5GT/s to 5.0GT/s
05:52 karolherbst: :)
05:52 karolherbst: nice
05:52 mupuf: nvapoke 0x08c040 0x80045801 <-- that was apparently enough for me
05:52 karolherbst: :D
05:52 karolherbst: okay
05:52 karolherbst: will try that
05:52 karolherbst: 8.0 why not
05:52 mupuf: you got 8.0?
05:52 mupuf: this 0x1 at the end is a commit bit
05:53 karolherbst: I will try
05:53 karolherbst: mupuf: https://gist.github.com/karolherbst/4c2400f85ee54c4aa11c#file-a-summary
05:53 karolherbst: this should sum it up
05:53 mupuf: and 0x08b8c0's lowest byte seems to indicate something
05:53 karolherbst: yeah, got 8.0
05:53 karolherbst: nvapoke 0x08c040 0x80009001
05:54 karolherbst: now some testing
05:54 karolherbst: this speed :O
05:55 karolherbst: 414.321398 frames/sec - 812.998019 Mpixels/sec glxspheres on 8.0
05:55 karolherbst: 239.625527 frames/sec - 470.202794 Mpixels/sec with 2.5
05:55 karolherbst: glxpsheres just nearly doubled
05:55 karolherbst: just by PCIe link speed
05:55 karolherbst: on 07 pstate
05:56 mupuf: yeah, super simple to change the bw
05:57 karolherbst: 684.428353 frames/sec - 1343.012691 Mpixels/sec
05:57 karolherbst: of
05:57 karolherbst: 0f 8_0
05:57 mupuf: I got up and down
05:57 karolherbst: now, that some serious performance :)
05:57 mupuf: lol, but that's a benchmark, not a game
05:58 karolherbst: yeah
05:58 mupuf: anyway, adding support for it may not be too hard
05:58 mupuf: we need to understand those values though
05:58 mupuf: and how to compute them
05:58 mupuf: bit 0 is the commit
05:58 karolherbst: will try talos
05:58 karolherbst: 0f was pretty stable though
05:59 mupuf: unless you are vram-limited, it should not make any difference
06:00 karolherbst: just let us hope it does
06:00 karolherbst: :D
06:02 karolherbst: it does
06:02 karolherbst: 16.2 fps => 18.4 fps
06:02 karolherbst: from 2_5 to 5_0
06:03 martm: mupuf: just one last thing, may you dig out the link for this glisse patchset for ttm?
06:03 martm: up
06:03 karolherbst: 20 fps on 8_0
06:03 karolherbst: mupuf: +25% performance?
06:03 karolherbst: :D
06:04 karolherbst: okay, 0f isn't more stable sadly
06:04 karolherbst: but still
06:05 karolherbst: imirkin: finished :)
06:08 karolherbst: mupuf: strange is though, that I don't think 3GB vram isn't enough usually :/
06:09 mupuf: karolherbst: interesting!
06:09 karolherbst: yeah
06:09 karolherbst: testing borderlands now
06:10 mupuf: We definitely need to get that included in nouveau then
06:10 karolherbst: yeah
06:10 karolherbst: weill run gputest too
06:10 mupuf: previous testing did show any speed improvement
06:10 mupuf: didn't show
06:10 karolherbst: maybe it depends on the games
06:10 karolherbst: antichamber may also hav ea big impact
06:10 karolherbst: maybe none
06:10 karolherbst: it was only gpu memory limited
06:10 mupuf: martm: https://lkml.org/lkml/2015/1/5/715
06:11 martm: thanks
06:13 karolherbst: mupuf: okay
06:13 karolherbst: borderlands from 24 avg to 26svg
06:13 karolherbst: *avg
06:13 karolherbst: so less than 10% there
06:15 mupuf: not bad again!
06:16 karolherbst: mupuf: antichamber no difference at 07, around <10% on 0a
06:16 karolherbst: from 37 to 40 fps
06:16 karolherbst: but this game is crazy anyway
06:16 karolherbst: 20fps at 07
06:16 karolherbst: :D
06:16 karolherbst: gpc doesn't matter
06:17 karolherbst: now let check gputest
06:20 karolherbst: mupuf: are you doing some tests, too? not that I have a strange setup and it only works here
06:20 mupuf: nope, not running any test
06:21 karolherbst: but I think it would make sense to have the same interface or pcie speed states like for pstats
06:21 karolherbst: and cstats
06:21 mupuf: glxgears does not get faster :D
06:21 mupuf: I can try xonotic if you want
06:22 karolherbst: glxpsheres should speed up a lot
06:22 karolherbst: yeah xonotic would be nice
06:22 karolherbst: but I doubt there iwll be much change
06:22 mupuf: hmm, the settings must be a bit high
06:22 karolherbst: yeah
06:22 karolherbst: I always used pretty high ones
06:23 karolherbst: talos had gpu memory set to ultra
06:23 mupuf: I am at 25 fps in average right now
06:23 mupuf: at 5.0GT
06:23 karolherbst: pstate?
06:23 mupuf: the lowest one
06:23 karolherbst: I noticed, that 07 doesn't care much about that
06:23 mupuf: I cannot reclock this gpu
06:23 karolherbst: :/
06:23 karolherbst: maybe it helps alittle though
06:24 karolherbst: how much fps at 2_5?
06:25 karolherbst: gpu test trianlge nearly doubled
06:25 karolherbst: mmhh
06:25 karolherbst: no
06:25 mupuf: yeah, it spends more time at 24fps abverage instead of 25
06:25 mupuf: meh
06:25 karolherbst: ot that much, but +50%
06:25 karolherbst: *not
06:25 mupuf: there are higher priorities for this gpu anyway :D
06:25 karolherbst: :D
06:26 karolherbst: but if that works, it should work for a lot of cards
06:26 mupuf: like, finding how to change the voltage
06:26 mupuf: anyway, your values and mine and different
06:26 mupuf: I wonder if it is the same values per families or if it depends on the vbios
06:26 mupuf: or if we should be able to compute them from the state
06:27 karolherbst: mhhh
06:27 karolherbst: yeah
06:29 martm: hmm, that hmm, makes wanna hmm
06:29 mupuf: imirkin: Hey, I get a lot of "nouveau: kernel rejected pushbuf: Device or resource busy" with maxwell
06:34 karolherbst: mupuf: so now I should check how to get these addresses and values?
06:35 mupuf: yeah, I'll help you
06:35 karolherbst: furmark 624 points at 2_5
06:35 mupuf: busy talking on #intel-gfx about a benchmarking tool I wrote, will start by comparing other cards' values first
06:35 mupuf: send me your ssh public key
06:36 mupuf: and I will give you access to our vbios repo database
06:36 mupuf: containing a ton of mmiotraces
06:36 karolherbst: ...
06:36 karolherbst: yeah okay how, allthough public is public
06:37 karolherbst: 640 on 8_0 :/
06:37 mupuf: I can't parse your sentence
06:37 mupuf: what do you mean?
06:37 karolherbst: I am thinking about of how to send you the public key
06:38 karolherbst: mhhh
06:38 karolherbst: its a AES-128-CBC key only anyway
06:38 karolherbst: don't thik I use it for anything currently
06:38 karolherbst: mupuf: is pm in irc fine?
06:39 mupuf: karolherbst: it is a public key, sure!
06:39 mupuf: ssh key, right
06:39 mupuf: not gpg
06:39 mupuf: and this is definitely not AES
06:39 mupuf: it should be rsa or dsa
06:39 mupuf: or another asymmetric algo
06:40 karolherbst: mhhh
06:40 karolherbst: maybe I messed it up
06:40 karolherbst: will regenrate keys then
06:40 mupuf: probable
06:40 mupuf: you can send me your public key that you use for github
06:40 mupuf: that is fine
06:41 mupuf: no need to have a key per service
06:41 karolherbst: I don't use any for github
06:41 karolherbst: application token
06:42 karolherbst: no, "Personal access tokens" is the name in github
06:45 mupuf: ah, ok
06:45 mupuf: well, I do not support that, sorry :D
06:45 karolherbst: I see
06:46 karolherbst: gpu test plot: 13000 at 2_5, 22300 at 8_0
06:47 mupuf: eww, that is bad
06:47 mupuf: can you jump from 2.5 to 8.0 without a crash?
06:48 karolherbst: yeah
06:48 karolherbst: usually
06:49 karolherbst: never had a crash now
06:50 mupuf: told you so :D
06:50 karolherbst: ?
06:50 karolherbst: :D
06:51 karolherbst: idea
06:52 karolherbst: mhh
06:53 karolherbst: the first number seems to be important though
06:54 karolherbst: the card really didn't like it to be poked with 0x70009000
06:55 karolherbst: I think I better reboot
06:55 imirkin: mupuf: mmt trace. and i got one, but i'll want more and i'll need to test nouveau fixes. so not done with it yet, sorry
06:56 imirkin: mupuf: but if you want to swap it out and put another card in, that's fine... i can do this later
06:56 mupuf: imirkin: I'll put the nv50 back in when I am done
06:57 imirkin: mupuf: i got the maxwell issues too. i suspect that gnurou's fix may be required.
06:58 mupuf: oh, I can test that
07:02 imirkin: normally you just alter the relevant fields, and don't touch other stuff
07:03 mupuf: agreed
07:04 karolherbst: mupuf: any specific file name or directory name required?
07:04 mupuf: yes
07:04 mupuf: of coursre :D
07:05 karolherbst: I mean I will put in nve6 and such
07:05 karolherbst: but does the trace needs a special name?
07:06 mupuf: ah!
07:06 mupuf: mmiotrace.txt.xz makes sense
07:07 mupuf: or nve6.txt.xz
07:07 mupuf: your call
07:07 mupuf: it does not matter :)
07:09 karolherbst: done
07:10 mupuf: adding your vbios too would be appreciated
07:11 mupuf: and the output of nvapeek 101000 > strap_peek
07:11 mupuf: oh, and open strap_peek and get rid of the "101000: "
07:17 imirkin: would be cool to add the pci width programming to the pstate stuff -- iirc the vbios specifies what link a particular pstate ought to use
07:17 mupuf: imirkin: yes
07:17 mupuf: that's what we are trying to figure out
07:17 karolherbst: mupuf: should I compress the bios?
07:18 mupuf: nope
07:18 karolherbst: imirkin: currently I added something like that for cstates but with a different file
07:18 karolherbst: but well
07:18 karolherbst: in the end it shouldn't be needed anymore
07:19 imirkin: well, i think it's fine to expose that stuff too
07:19 karolherbst: yeah
07:19 imirkin: but when programming the pstate, it should pick the right default
07:19 karolherbst: yeah
07:19 karolherbst: this works now
07:19 karolherbst: if you want, you can try it out yourself
07:21 imirkin: that's ok, i don't need your help to crash my box -- i can handle that just fine all by myself :p
07:21 karolherbst: :D
07:21 karolherbst: why should it crash?
07:22 imirkin: because it usually does
07:22 mupuf: karolherbst: if it does not crash, you are not trying hard-enough :D
07:22 karolherbst: :D
07:23 mupuf: so, not the entire register is double buffered
07:23 mupuf: darn it
07:23 karolherbst: ?
07:23 mupuf: finding the bitfields it going to be interesting
07:23 karolherbst: ahh
07:24 mupuf: so freaking annoying to have to reboot after each test
07:31 karolherbst: mupuf: do you know whats also nice: nvidia does switch pcie speed always when also switching pstates
07:31 mupuf: right, it is what imirkin was saying
07:32 mupuf: the values for your nve6 and mine and the same
07:32 mupuf: except for the 8GT/s I guess
07:32 mupuf: yepo
07:32 mupuf: sounds good
07:32 mupuf: let's check on other gpus now
07:34 karolherbst: there is this strange write into 0x8b8c0
07:34 karolherbst: don't know what this does
07:36 mupuf: yeah, there are a lot of writes we do not understand
07:36 imirkin: 8GT/s is PCIe 3.0 or something?
07:37 karolherbst: yeah
07:37 karolherbst: imirkin: talos speed up by 25% for me :)
07:37 imirkin: pcie 4.0 = 16GT/s
07:37 imirkin: although it's not a thing quite yet
07:37 karolherbst: yeah
07:38 karolherbst: but thre pcie 3.0 results are promising
07:38 karolherbst: gputest plot +50%
07:38 imirkin: neat
07:38 karolherbst: borderlands +10%
07:39 karolherbst: xonitc like +0.5% :D
07:39 karolherbst: no, I think it really only helps if the bus is really a small bottleneck
07:39 karolherbst: but seems to be the case often enough
07:40 karolherbst: and the values are easy pretty to find in the logs
07:41 karolherbst: imirkin: that's what the blob does: https://gist.github.com/karolherbst/4c2400f85ee54c4aa11c#file-a-summary
07:41 mupuf: so far, it seems pretty trivial to compute the numbers
07:42 karolherbst: nice
07:45 mupuf: here we go, I wrote a script that dumps all the values taken by 8c040 on all the DB
07:46 mupuf:is lazy
07:46 karolherbst: :)
07:46 mupuf: ls
07:47 karolherbst: I will check now if there is some kind of relation between pstats and pcie speed
07:48 mupuf: hmm, not a lot of GPUs do the pcie changes :o
07:48 mupuf: well, we do not have a lot of fermi+ mmiotraces :s
07:48 karolherbst: I see
07:49 mupuf: but hey, it really is simple to compute the value that needs to be set
07:51 karolherbst: talos at 07 pstate, ultra/ultra-buggy/ultra: 2_5: 5.2fps, 5_0:
07:51 mupuf: mask = c0000
07:51 karolherbst: ...
07:51 karolherbst: talos at 07 pstate, ultra/ultra-buggy/ultra: 2_5: 5.2fps, 5_0: 5.52 fps, 8_0: 5.6 fps
07:52 mupuf: and simply write 8 if you want PCIe 2.5, 4 for 5.0 and 0 for 8.0
07:52 karolherbst: yeah
07:52 mupuf: this works across the boards
07:54 karolherbst: talos at 0a pstate, ultra/ultra-buggy/ultra: 2_5: 9.2 fps, 5_0: 10.4 fps, 8_0: 11.2 fps
07:54 karolherbst: mupuf: nice
07:54 karolherbst: talos really shows nicely how gpc bottlenecked it is at 07
07:54 karolherbst: and how pcie bendwith is more important at 0a
07:55 RSpliet: it shouldn't take much trying to find the right bits, but I can't help but wondering whether we should inform the PCIe subsystem as well...
07:55 karolherbst: RSpliet: I think the card does it
07:56 karolherbst: I think
07:56 karolherbst: at least lspci is happy
07:56 karolherbst: imirkin: with the faulty options enabled: 4.8 at 2_5 => 6.2 at 8_0
07:57 karolherbst: but that doesn't say much though
07:57 RSpliet: you should look at game load times (texture uploads) too
07:57 mupuf: RSpliet: yeah, there are a bunch of surrounding writes
07:57 karolherbst: RSpliet: will try to do that
07:58 RSpliet: I never went the mile of figuring this out, but I'm sure it can make quite a difference on cards as old as 2nd gen Tesla
07:58 mupuf: the difference should be quite high
07:58 mupuf: RSpliet: it did
07:59 mupuf: xexaxo worked on the lane changes back in the days
07:59 mupuf: and that was messy
07:59 mupuf: changing the clock is apparently trivial
07:59 mupuf: will try my regular stress test
08:07 mupuf: well, seems pretty stable to me
08:07 karolherbst: yeah, I've got the same feeling
08:07 mupuf: as in, changing thousands of times per second the clock speed does not crash when running xonotic
08:07 mupuf: this was ... unexpected!
08:07 karolherbst: :D
08:08 mupuf: this almost feels like intel's hw where you just say which frequency you want a microcode will set it for you :)
08:08 karolherbst: :)
08:08 mupuf: so, I guess I should write a patch for nouveau then
08:09 mupuf: first, one for envytools
08:09 karolherbst: interface like pstate?
08:09 mupuf: since I at least know the meaning of 3 bits
08:09 karolherbst: nice
08:09 mupuf: no, pstate will change the speed as necessary
08:09 mupuf: as in, it should already be indicated in the pstate vbios table which clock needs to be set
08:09 karolherbst: I mean will there be a new interface or how do you want to implement it in nouveau?
08:10 RSpliet: karolherbst: no interface is needed
08:10 mupuf: no, no new interface. Nouveau will just set the PCIe clock when reclocking to a certain pstate
08:10 karolherbst: okay
08:10 karolherbst: so like the blob does
08:10 mupuf: yes
08:11 karolherbst: mhh
08:11 karolherbst: then no 8_0 on 0a :(
08:11 karolherbst: sad
08:11 RSpliet: you can always hack up your vbios if you really want to
08:11 karolherbst: wait
08:11 karolherbst: mhh
08:11 mupuf: and ask nouveau to read your vbios from a file instead of from PRAMIN
08:12 karolherbst: my vbios also just tells pstate 0f: 5_0 pcie
08:12 karolherbst: "-- ID 0xf Voltage entry 4 PCIe link width 255 5.0 GT/s --"
08:12 RSpliet: yeah, not sure if the vbios readout method supports the right encoding for 8_0 yet
08:12 karolherbst: I think the issue was, that the pcie3.0 feature was experimental at first
08:12 karolherbst: ahh okay
08:13 mupuf: karolherbst: hmm, the blob sets the 8.0 GT/s?
08:13 karolherbst: yeah
08:13 karolherbst: but I think with older version I needed a flag
08:13 mupuf: need to add support for it in nvbios too
08:13 mupuf: ah, because your vbios does not seem to indicate to use the 8.0 GT/s
08:13 karolherbst: yeah
08:14 karolherbst: I checked today without the options
08:14 mupuf: one thing at a time
08:14 karolherbst: but nvidia just uses 8_0
08:14 martm: glisse: can we talk about your patchset somewhere?
08:24 martm: more importantly do you think its possible to map minor bit of vram and a big amount of system memory with hmm device, so that gpu would understand that this is vram when i create the context, then i'd know it where stuff ends up?
08:24 martm: in
08:28 mupuf: karolherbst: when did the pcie 2.0 got introduced?
08:29 mupuf: ok, a long time ago
08:29 mupuf: the reg I see seems to be for nve4+
08:29 karolherbst: 8000?
08:29 mupuf: you mean geforce 8?
08:29 karolherbst: G9x
08:29 karolherbst: G8x doesn't have it
08:30 karolherbst: yeah, Geforce 8000+
08:30 mupuf: interesting, my GT 750 is supposed to be PCIe 8.0
08:30 mupuf: err, 3.0
08:30 mupuf: but it never goes to 8.0 GT/s
08:31 imirkin_: what about your mobo?
08:31 mupuf: you may be right, they may only have enabled this later
08:31 mupuf: imirkin: oh, right :D
08:31 mupuf: let me check
08:31 karolherbst: yeah
08:32 mupuf: oh, actuallym it was one of the first PCIe 3.0 mobo
08:32 mupuf: lucky me
08:32 imirkin_: prolly only 1 of the pcie slots has it
08:32 karolherbst: so 8_0 works?
08:32 mupuf: Z68S-G43-G3
08:32 imirkin_: ugh. msi. i'm never touching their shit again.
08:33 mupuf: karolherbst: when I set it, it just defaults to 5 GT/s
08:33 imirkin_: mupuf: Gen3 (1x16)
08:33 mupuf: imirkin: me neither, because of their bios update system
08:33 karolherbst: what does lspci tells you?
08:33 mupuf: karolherbst: lspci reads the same reg I am reading
08:33 imirkin_: so only one of the x16 slots has pcie3
08:33 mupuf: I guess the top one
08:33 imirkin_: usually yeah
08:34 karolherbst: should be on the board
08:34 mupuf: so ... I guess I should check with the blob
08:34 imirkin_: http://www.msi.com/media/product/five_pictures2_2467_20110928115335.jpg62405b38c58fe0f07fcef2367d8a9ba1/1024.png
08:34 soreau: imirkin: What problems did you have with MSI?
08:34 imirkin_: i don't see it written =/
08:35 imirkin_: soreau: they sold a GT210 that was advertised as DX 10.1 so i assume it was a GT21x, but in actuality it was a G84
08:35 karolherbst: its so much the biggest one :D
08:35 imirkin_: so i'm never touching their shit again.
08:35 karolherbst: :/
08:35 imirkin_: er, make that a "GeForce GT 210"
08:36 mupuf: ah, right, the time of the geforce 8/9
08:36 mupuf: nvidia got a lot of criticism from this
08:36 mupuf: for*
08:36 imirkin_: well, i tried to be careful. in hindsight, the clock frequency should have tipped me off, but i didn't look carefully enough.
08:39 RSpliet: karolherbst: on NVA3/5/8 there seems to be more black magic than what your patch describes
08:39 RSpliet: something related to bus width
08:39 karolherbst: mhh
08:40 RSpliet: and the register layout is different ;-)
08:40 karolherbst: yeah
08:41 karolherbst: I mean, I did only test it on my nve6 card yet
08:41 imirkin_: why keep things the same when you can change them from generation to generation
08:41 karolherbst: mupuf is trying them all ;)
08:41 imirkin_: that's the logic that applies to argument order for the tex instruction, i don't see why it shouldn't apply to everything else :)
08:43 karolherbst: RSpliet: but what kind of black magic to mean exactly?
08:43 RSpliet: oh I haven't found the trace bits for it
08:43 RSpliet: but just replaying logic I found in an old trace, that looks close enough to what you reproduced, doesn't make it Go Faster (tm)
08:44 karolherbst: but do you find any reads where 5_0 or 2_5 is stated?
08:44 RSpliet: yes
08:44 karolherbst: mhh
08:44 karolherbst: what does lspci tells you?
08:44 RSpliet: 2,5 all the way
08:44 karolherbst: it should tell you which pcie speed the card is using
08:44 karolherbst: mhh
08:44 karolherbst: cap?
08:44 RSpliet: 5.0
08:44 karolherbst: okay
08:45 karolherbst: maybe you could pastebin the part between 5_0 and 2_5 if its not more than 20 lines
08:45 RSpliet: that's probably not very useful, I'm not looking at trace from this particular card
08:45 RSpliet: (just wanted to find the 3 corresponding regs)
08:45 imirkin_: my GK208 looks like it can also do 8GT/s btw
08:45 karolherbst: :)
08:46 imirkin_: what do i fiddle with to get that?
08:46 imirkin_: (it's not the primary gpu, so i'm a lot more willing to destroy it)
08:46 karolherbst: imirkin_: depends
08:46 karolherbst: do you have a trace?
08:46 imirkin_: mmmmmaybe
08:46 imirkin_: not by the looks of it
08:46 karolherbst: imirkin_: I did something like that: cat mmiotrace.log.demmio | grep -A250 -B250 -e 8_0GT -e 5_0GT -e 2_5GT
08:47 karolherbst: then search for patterns like mine
08:47 karolherbst: but maybe mupuf can tell you how to get the values in a more direct manners already
08:47 imirkin_: ok, so it's not at the point where you've figured out the thing that needs to happen?
08:47 karolherbst: *manner
08:47 karolherbst: we did
08:47 imirkin_: what's the reg again? 8c040?
08:47 karolherbst: yeah
08:47 karolherbst: on my card at least
08:47 karolherbst: you could try a nvapeek
08:47 mupuf: imirkin: yes, let me commit the rnndb part
08:47 karolherbst: should give you 0x80089000
08:47 RSpliet: there's a few link training regs too I think
08:48 imirkin_: i get 40489000
08:48 karolherbst: mhh
08:48 tobijk: karolherbst: you have an nve6 right?
08:48 karolherbst: yes
08:48 karolherbst: imirkin_: maybe 0x40449000
08:48 mupuf: imirkin: means you run at 2.5 GT/s
08:48 karolherbst: will give you 5_0
08:48 karolherbst: ad 40409000 8_0
08:48 tobijk: i really should check my nve7 at some point in the future :>
08:48 karolherbst: mupuf: right?
08:48 imirkin_: erm... why does lookup fail so miserably :(
08:49 mupuf: try nvapoke 8c040 40089001
08:49 mupuf: that should set the highest
08:49 imirkin_: lookup -a GK208 8c040 40489000
08:49 mupuf: actually, you were at GT5
08:49 imirkin_: that fails for me =/
08:49 mupuf: not 2.5
08:49 imirkin_: lspci said 2.5
08:49 mupuf: because I have not pushed the patch :D
08:49 karolherbst: wait
08:49 karolherbst: mupuf: he got 40489000
08:49 mupuf: imirkin: ah, right, just like me
08:49 imirkin_: LnkSta: Speed 2.5GT/s, Width x8
08:49 karolherbst: 4048 9000
08:50 imirkin_: (coz it's a stupid gk208, only 8x of it is there in the first place)
08:50 imirkin_: er, x8
08:50 karolherbst: maybe 4040 9000 and then 4040 9001 works
08:50 karolherbst: !
08:50 karolherbst: 404 = 8x
08:50 mupuf: nvapoke 8c40 40049001 --> will set the speed to 5 GT/s
08:50 imirkin_: probably not
08:50 karolherbst: 800=16x ?
08:51 imirkin_: 8c040 maybe? :p
08:51 mupuf: imirkin: yes, sorry
08:51 mupuf: :D
08:51 karolherbst: mupuf: what about the second 4?
08:52 mupuf: karolherbst: it has nothing to do with the speed
08:52 karolherbst: I see
08:52 mupuf: so leave it alone, as imirkin said before
08:52 mupuf: imirkin: the value is double buffered and needs to be committed
08:52 mupuf: the blob then checks the commit bit is gone before continuing
08:53 imirkin_: mupuf: so your claim is that i could do 40409000 and get 8GT/s?
08:53 mupuf: imirkin: does not work on my card
08:53 mupuf: the fuses may disable it
08:53 imirkin_: does lspci say that the dev and link can both do it?
08:53 mupuf: that's why I wanted to check on the blob
08:53 tobijk: meh where to see the link speed? :>
08:54 imirkin_: this is what lspci says for me: http://hastebin.com/qawivesiyo.vhdl
08:54 karolherbst: I always did nvapoke 0x8c040 0x80009000 then nvapoke 0x8c040 0x80009001
08:54 imirkin_: hmmmm i guess the device doesn't care
08:54 imirkin_: or rather, the device doesn't advertise such things
08:55 imirkin_: (at least not in pci config space)
08:55 mupuf: LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
08:55 imirkin_: what about LinkCtl2?
08:55 mupuf: so, I guess I need to set up some regs before the hw lets me do it
08:55 imirkin_: er, LnkCtl2
08:55 mupuf: LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
08:55 imirkin_: aha
08:55 imirkin_: mine says 8GT/s
08:56 imirkin_: don't ask me wtf that field is... but... it's different ;)
08:56 mupuf: 0x088088 says I am at 2.5 GT/s
08:56 tobijk: i guess i should be happy: http://hastebin.com/exeracamud.xml
08:57 imirkin_: PPCI.EXP_LNK_CMD_STA => { CMD = { ASPMC = 0x3 | CCC } | STA = { SPEED = 8_0GT | WIDTH = 8 | SL_CLK } }
08:57 imirkin_: booya!
08:57 mupuf: tobijk: why? Because your card supports all the states?
08:57 karolherbst: mupuf: looks nice after patch
08:57 karolherbst: :)
08:57 mupuf: imirkin: don't want to insult you, but you don't have a large ... width :D
08:57 tobijk: does it not clock up then? (when activated? :>
08:58 mupuf: tobijk: what do you mean?
08:58 mupuf: We are cooking up the support for this
08:58 RSpliet: mupuf: yes, that's what I said 15 mins ago
08:58 RSpliet: there's probably some different black magic for setting the width
08:58 mupuf: RSpliet: the width is black magic, from what I remembered
08:58 imirkin_: 40 -> 46 fps on karolherbst's talos trace
08:59 tobijk: mupuf: ke so the nvapoke does not do something for me
08:59 karolherbst: imirkin_: :)
08:59 imirkin_: mupuf: it's as wide as i can go
08:59 mupuf: :D
08:59 karolherbst: imirkin_: 2_5 to 8_0?
08:59 imirkin_: like i said... stupid GK208
08:59 imirkin_: karolherbst: ya
08:59 karolherbst: mhh
08:59 mupuf: ok, so it does work for you too
08:59 karolherbst: disappointing
08:59 mupuf: tobijk: your turn
08:59 imirkin_: karolherbst: at lowest perf level
08:59 karolherbst: ahhh!
08:59 mupuf: nvapeek 8c040
08:59 karolherbst: 0a should change a _lot_
09:00 mupuf:will try to get 8_0, if the blob allows it
09:00 imirkin_: bleh, didn't load it with pstate=1
09:00 karolherbst: mupuf: NVreg_EnablePCIeGen3=1 :)
09:00 karolherbst: but it may work better without it
09:00 karolherbst: don't know with recent driver
09:01 martm: https://lkml.org/lkml/2015/5/21/764 this gets a bit too complicated to/for me
09:01 imirkin_: karolherbst: btw, i only have 2 perf levels
09:01 imirkin_: nouveau [ CLK][0000:01:00.0] 07: core 405 MHz memory 810 MHz
09:02 imirkin_: nouveau [ CLK][0000:01:00.0] 0f: core 967 MHz memory 2002 MHz
09:02 karolherbst: ....
09:02 karolherbst: 0f is dangerous
09:02 karolherbst: but if you are up to
09:02 karolherbst: might work
09:02 imirkin_: nah, this is ddr3
09:02 imirkin_: it works fine
09:02 karolherbst: ahh
09:02 karolherbst: :)
09:02 imirkin_: although i had the voltage issue too iirc
09:02 karolherbst: how much faster?
09:02 karolherbst: mhh
09:02 imirkin_: nouveau E[ CLK][0000:01:00.0] failed to lower voltage: -22
09:02 karolherbst: imirkin_: but does it report 967Mhz?
09:02 imirkin_: this is from before... right now i don't have pstate=1 :(
09:02 tobijk: ./nvapeek 8c040
09:02 tobijk: 0008c040: 80089000
09:03 karolherbst: tobijk: +1 :)
09:03 imirkin_: tobijk: nuke the second 8
09:03 mupuf: tobijk: nvapoke 8c040 80089001
09:03 imirkin_: nvapoke 8c040 80009001; nvapeek 8c040; nvapeek 88088
09:03 karolherbst: mupuf: can the non commit poke be skipped?
09:03 imirkin_: oh, i only did one poke
09:03 mupuf: karolherbst: no, you need to set it
09:03 imirkin_: and then read it back
09:04 karolherbst: I never read it back
09:04 imirkin_: which i assume posts it
09:04 karolherbst: I just poked twice
09:04 mupuf: imirkin: not, it is the 1 at the end that posts it
09:04 mupuf: that's the commit bit
09:04 imirkin_: hmmmm
09:04 imirkin_: well, i only did the one poke
09:04 imirkin_: with the commit bit set
09:05 imirkin_: and it worked.
09:05 imirkin_: but i did read the reg back
09:05 mupuf: right
09:05 imirkin_: since i assumed the write had to be posted
09:05 mupuf: and when you read, the commit bit is gone
09:05 imirkin_: yeah
09:05 mupuf: as expected
09:05 karolherbst: mhhh
09:05 mupuf: the commit bit takes less than 1us to be cleared
09:05 karolherbst: I need to set the value before commit
09:05 tobijk: mhm
09:05 imirkin_: there are some regs that are buffered, so you have to do *something* afterwards before they take effect
09:05 tobijk: 0008c040: 80009000
09:05 tobijk: 00088088: 11030043
09:06 karolherbst: gpu gone now :D
09:06 imirkin_: tobijk: PPCI.EXP_LNK_CMD_STA => { CMD = { ASPMC = 0x3 | CCC } | STA = { SPEED = 8_0GT | WIDTH = 16 | SL_CLK } }
09:06 imirkin_: you're good
09:06 karolherbst: "./nvapoke 0x8c040 0x8008901" was bad here
09:06 karolherbst: ohhh
09:06 karolherbst: it was garbage
09:06 imirkin_: off-by-one
09:06 karolherbst: ....
09:06 imirkin_: (digit)
09:06 karolherbst: well
09:07 karolherbst: nouveau doesn't get remove
09:07 karolherbst: reboot it is I guess
09:07 imirkin_: you probably wanted 80089001
09:07 karolherbst: yeah
09:07 RSpliet: I *think* 0x2 triggers lane width change, 0x1 triggers speed change
09:07 mupuf: yeah, once you screw up with the values, you die
09:07 imirkin_: and 0x3 triggers a dynamic hazard? :)
09:07 tobijk: imirkin_: right, its fine :)
09:07 karolherbst: mhh
09:07 mupuf: RSpliet: what 0x2?
09:07 RSpliet: mupuf: in 88088?
09:07 imirkin_: i think he means 8c040
09:08 RSpliet: imirkin_ you think better than I do
09:08 mupuf: hehe
09:08 mupuf: RSpliet: and where is the lane width value?
09:08 imirkin_: it should support x1 right? that's a thing i think?
09:08 imirkin_: and probably x4 too
09:08 RSpliet: beats me, i'm still staring at nva8
09:08 imirkin_: was x2 ever a thing?
09:08 mupuf: ah, ok!
09:09 mupuf: so this reg is also on the nva8?
09:09 mupuf: did not find it in the mmiotraces I have
09:11 mupuf: imirkin: [ 2082.975] (EE) NVIDIA(0): Failed to initialize the GLX module; please check in your X
09:11 mupuf: do you have this error in your setup?
09:11 imirkin_: hmmm, that extra 4 bit seems to do nothing
09:11 imirkin_: mupuf: no
09:11 imirkin_: mupuf: check my startx script
09:11 mupuf: yeah, that's what I am trying to do
09:11 imirkin_: iirc i had to manually ln -s libglx.so to something
09:12 RSpliet: on nva8 it's most likely 0x88460
09:12 imirkin_: those dirs it points to were carefully set up
09:12 imirkin_: i had to run ldconfig -something
09:12 imirkin_: and a couple of manual things
09:12 imirkin_: so just reuse my dirs instead of trying your own
09:13 imirkin_: are there traces with other bits set in 8c040?
09:13 mupuf: imirkin: there are
09:13 mupuf: let me PM you the list
09:13 tobijk_: so much for playing with that a bit more ;-)
09:13 karolherbst: mupuf: I know that GLX error
09:14 karolherbst: mupuf: https://github.com/Bumblebee-Project/Bumblebee/issues/675 like this?
09:14 imirkin_: mupuf: i have to work now, so won't be able to look
09:15 imirkin_: figured i'd futz with it a bit though ;)
09:16 mupuf: imirkin: does not work even with your script
09:16 mupuf: but go to work, I will just reboot on my blob distro and check there
09:16 imirkin_: mupuf: well, it worked for me last night
09:16 mupuf: :D
09:17 mupuf: well, I just screw up a lot then!
09:17 imirkin_: (i am at work... that's where the GK208 is)
09:17 karolherbst: mupuf: you need to at the system xrog to your module path
09:17 imirkin_: hrmph. memory clock appears to not have changed :(
09:17 imirkin_: AC: core 966 MHz memory 810 MHz
09:17 mupuf: karolherbst: it is done :)
09:18 karolherbst: k
09:18 karolherbst: imirkin_: :/
09:18 karolherbst: imirkin_: with my branch you can reclock even without volt support ;)
09:19 imirkin_: i didn't get any error
09:19 karolherbst: strange
09:19 imirkin_: iirc i get the error when trying to *downclock*
09:19 mupuf: karolherbst: the blob does not use the 8 GT/s speed
09:19 mupuf: so...
09:19 karolherbst: ...
09:19 karolherbst: even with the option?
09:19 karolherbst: and what blob version is it?
09:21 karolherbst: ohhh
09:21 karolherbst: mupuf: what cpu do you have?
09:21 imirkin_: hm, well i can't seem to get it to get off width x8 by futzing with that register
09:21 mupuf: karolherbst: haven't used the option
09:21 karolherbst: mupuf: https://forums.geforce.com/default/topic/516811/geforce-drivers/we-need-pci-e-3-0-working-on-x79-chipset-nvidia-33-/
09:21 mupuf: let's try that
09:21 karolherbst: ivb is the first intel cpu with native PCIe3.0 support
09:21 karolherbst: before that
09:21 karolherbst: some X79 boards can run at 8GT with PCIe2.0
09:22 mupuf: karolherbst: oh, my cpu is a crappy i3
09:22 mupuf: that may explain
09:22 karolherbst: model?
09:22 mupuf: i3-2120
09:22 karolherbst: that*s snb
09:23 mupuf: yep
09:23 mupuf: that may explain why it does not get enabled
09:23 karolherbst: right
09:23 mupuf: and my i7-920 is even older
09:23 karolherbst: ivb+ is required
09:24 mupuf:wonders if he should buy a skylake or just go for a broadwell
09:24 imirkin_: might have to wait for a skylake outside your office :p
09:24 mupuf: I know that :D
09:24 imirkin_: well i've done something to kill perf... now it's going at 30
09:24 imirkin_: instead of 45fps
09:24 imirkin_: heh
09:25 mupuf: hehe
09:25 mupuf: well, let's have a look at the bios part
09:26 karolherbst: mupuf: any idea why the blob does write without the commit flag first?
09:27 karolherbst: wait..
09:27 karolherbst: ahhhh
09:27 karolherbst: interesting
09:28 karolherbst: mupuf: https://gist.github.com/karolherbst/dc56b8a3d7cc690e1e5b
09:29 karolherbst: so maybe its kind of probing if the card will support it or not
09:29 tobijk: mh do you need to "poke" that register twice?
09:29 mupuf: karolherbst: I doubt that
09:29 mupuf: I think they first write the value and then commit it
09:30 mupuf: just to be sure that the commit bit is not triggered before the value has been completley sent
09:30 karolherbst: I see
09:30 mupuf: or it is an artefact of the way they write to a reg
09:30 karolherbst: the the first thre digits?
09:30 mupuf: AKA, the horror that is the tegra code
09:30 mupuf: this is probably the reason
09:31 mupuf: tobijk: no, no need to poke it twice
09:31 mupuf: karolherbst: what you see is not weird at al
09:31 mupuf: read the doc I pushed
09:31 tobijk: maybe i had done something wrong then, i had to :O
09:31 mupuf: https://github.com/envytools/envytools/commit/640ffeb9f95ce4f51c2c00fe9a5b00877f24bb83
09:32 mupuf: tobijk: possibly
09:32 tobijk: lets retest :>
09:32 tobijk: good the bios resets it just fine
09:33 karolherbst: mupuf: shouldn't be 80029000 or 80019000 be something ?
09:33 karolherbst: or is it like 8: 2_5 4: 5_0 2: 8_0 1: 16_0 0: highest?
09:36 imirkin_: fwiw looks like kepler was the first to support pcie 3.0
09:37 imirkin_: so makes sense that the reg would only be there on GK104- or at least be different on fermi
09:37 mupuf: karolherbst: exactly what you said
09:37 karolherbst: k
09:37 mupuf: and the rest is just other values we do not know yet what they are
09:37 mupuf: the first 8 is probably an enable bit
09:37 karolherbst: mhh
09:37 karolherbst: tobijk: had 404
09:38 mupuf: yes, so what?
09:38 tobijk: 404?
09:38 mupuf: and my maxwell has different values
09:38 karolherbst: I see
09:38 karolherbst: k
09:38 karolherbst: have to reboot again :/
09:38 mupuf: check it yourself, look at the other mmiotraces
09:39 tobijk: mupuf: just to point that out: poking once is enough as you said :>
09:39 mupuf: tobijk: good :)
09:45 tobijk_: i should stop killing my system ;-)
09:45 karolherbst: :D
09:45 karolherbst: yeah, me too
09:45 mupuf: with that done, there is just nouveau left!
09:46 tobijk_: :)
09:46 karolherbst: nice
09:46 karolherbst: mupuf: I get "-- ID 0xf Voltage entry 4 PCIe link width 255 8.0 GT/s --" now
09:46 karolherbst: :)
09:46 mupuf: magic!
09:48 tobijk_: mh did you happen to have the luck to downclock it again?
09:49 karolherbst: I could play around with it as I wanted
09:50 tobijk_: so what di you o for the downclokcing?
09:50 tobijk_: i played and hanged my system :D
09:51 karolherbst: nice
09:52 karolherbst: mupuf: whats the nicier way of doing this: https://github.com/karolherbst/envytools/commit/28029649f9043c4f5c2e6b87c5fa80a240e0a0ac
09:53 karolherbst: ?
09:53 karolherbst: with that I get my 56 vid entries
09:54 mupuf: karolherbst: hmm, make sure it works for all vbios
09:54 tobijk_: mupuf: if you are gonig for this, dont forget to set the link speed at every wakeup... :)
09:54 mupuf: and you may have a lead
09:54 mupuf: tobijk_: good point!
09:55 imirkin_: karolherbst: check what nouveau does... that parser is more advanced
09:55 tobijk_: my system resets the speed when nouveau goes to sleep again
09:55 karolherbst: nouveau gives me 56
09:55 mupuf: hmm, an nva8 pretends to have PCIe 3
09:55 imirkin_: mupuf: unlikely.
09:55 mupuf: imirkin: agreed :D
09:56 karolherbst: when was the version 0x50 introduces?
09:57 mupuf: karolherbst: use the vbios repo to check
09:58 karolherbst: mhh
09:58 karolherbst: mupuf: nouveau/nvkm/subdev/bios/volt.c ?
09:59 mupuf: and nvbios/ in envytools
09:59 mupuf: RSpliet: were you the one adding support for the PCIe speed in nvbios?
09:59 karolherbst: *cnt = nv_ro08(bios, volt + 3); in 4.2
09:59 RSpliet: back then, yes
10:00 mupuf: RSpliet: ack, it looks weird
10:00 mupuf: well,m I guess it is time to use my nvafakebios skills
10:00 RSpliet: you look werid
10:00 RSpliet: sorry, I need more context for that...
10:00 mupuf: RSpliet: that's a huge improvement from plain ugly :D
10:01 mupuf: did you check with nvafakebios that the blob was obeying the value?
10:02 RSpliet: yes
10:02 mupuf: interesting
10:02 mupuf: karolherbst: I need your help
10:03 karolherbst: mupuf: k
10:03 mupuf: well, let me check one thing and then I will need your help
10:03 mupuf: you will need to use the blob
10:04 karolherbst: no problem
10:11 mupuf: seems like the PCIe speed is not at the space it used to be :D
10:11 karolherbst: RSpliet: you have also 49 vids
10:12 karolherbst: nvbios should list less?
10:12 karolherbst: mupuf: you too
10:12 karolherbst: nve7
10:12 RSpliet: karolherbst: I have about 10 graphics cards, ranging from a TNT to a Kepler
10:13 karolherbst: then the nve7 one
10:13 RSpliet: even a Riva 128 at my parents place
10:13 RSpliet: ok
10:13 mupuf: and I have 30+ nvidia gpus :D
10:13 RSpliet: no, I think they are all supposed to be listed. We don't really do a lot with it, but I think the idea is that a perfmode has a variety of core/shader clocks attached, with varying voltages probably
10:14 karolherbst: I will bulk test my vid change in envytools with all cards supported
10:15 tobijk_: btw, where is the vbios database? i fail to see it :/
10:17 mupuf: tobijk_: that is because it is private
10:17 tobijk_: heh, ok
10:17 mupuf: give me your public ssh key and you will get access to it
10:17 mupuf: it is private because it contains email addresses
10:20 karolherbst: mupuf: with my nvbios change: https://gist.github.com/karolherbst/cdffaed2a39aa9dd77b6
10:21 tobijk_: and you just put out all emails :/
10:21 karolherbst: ....
10:21 karolherbst: silly me
10:21 karolherbst: gone
10:22 karolherbst: sorry for that :/ I completly oversaw it in the path
10:22 imirkin_: so with the faster clock looks like i get 48-49 fps with that trace. but still slow memory =/ i wonder why my memory doesn't clock up.
10:23 mupuf: imirkin: did you ask it to>
10:23 mupuf: ?
10:23 imirkin_: NvMemExec should default to 1
10:23 mupuf: there is an additional parameter you need to set
10:23 mupuf: since when?
10:23 imirkin_: like 3.16
10:23 imirkin_: drivers/gpu/drm/nouveau/nvkm/subdev/fb/ramgk104.c: if (!nvkm_boolopt(device->cfgopt, "NvMemExec", true)) {
10:25 mupuf: hmm
10:27 karolherbst: https://gist.github.com/karolherbst/b5485f6e5a7310437c87
10:27 tobijk_: that repo is damn slow :O
10:27 karolherbst: this is the diff now
10:27 karolherbst: mupuf: looks good?
10:28 tobijk_: mupuf: you got the repo hooked up at some isdn line? :D
10:28 mupuf: tobijk_: it is not slow, it is huge!
10:29 tobijk_: 4.00 KiB/s
10:29 mupuf: what speed do you download it from? It has 100MBit/s bw
10:29 RSpliet: ISDN is faster...
10:29 mupuf: wtf
10:29 mupuf: you are downloading from france through the moon or what?
10:29 tobijk_: looks like :)
10:30 mupuf: abort and try again, there is something wrong
10:30 RSpliet: does that work with pregnancy too?
10:31 mupuf: RSpliet: :D
10:31 imirkin_: time to get dual isdn?
10:32 mupuf: yeah, bond that connection, dude!
10:32 mupuf: :D
10:32 tobijk_: heh long time ago
10:32 mupuf: but yeah, my internet connection in Helsinki is up to 350 MBit/s
10:32 mupuf: and I really get 300
10:33 mupuf: tobijk_: darn it, you are right, I get a snail-like speed!
10:34 mupuf: karolherbst, how fast was it for you?
10:34 karolherbst: don't know
10:34 karolherbst: fast enough so that I didn't complain
10:34 karolherbst: didn't checked it though
10:34 mupuf: ack
10:35 tobijk_: lets test another ip version grml
10:35 karolherbst: is this fine this way or should I rather << 4?
10:35 mupuf: karolherbst: it weights 430MB
10:35 karolherbst: https://github.com/karolherbst/envytools/commit/28029649f9043c4f5c2e6b87c5fa80a240e0a0ac
10:35 tobijk_: oh
10:35 mupuf: so I guess you had more than 4kb/s
10:35 karolherbst: yeah
10:35 tobijk_: 5MB/s
10:35 karolherbst: I suppose
10:35 mupuf: ah, it finally picked up!
10:36 tobijk_: ipv6 broken on that box
10:36 mupuf: yep, I see you using half my bw!
10:36 mupuf: get off my lawn! :D
10:37 tobijk_: hehe :)
10:37 tobijk_: done
10:39 tobijk_: i'll add my bios later :>
10:40 mupuf: good
10:40 mupuf: vbios + mmiotrace + output of nvapeek 101000
10:40 mupuf: saved as strap_peek
10:40 tobijk_: ke
11:04 karolherbst: mupuf: okay, so what about the help you needed from me?
11:04 mupuf: karolherbst: not ready yet
11:04 karolherbst: k
11:05 karolherbst: imirkin_: I have an idea: piglit run time test :)
11:05 karolherbst: :D
11:06 imirkin_: have fun
11:11 mupuf: RRRrrr I can;t seem to get MaxwellBiosTweaker to run
11:11 mupuf: would have loved to check out what they have!
11:11 mupuf: it may require windows 7
11:12 karolherbst: :/
11:12 karolherbst: is it like working with hardware or is bios file enough?
11:15 mupuf: file is apparently enough
11:16 mupuf: those look interesting: http://forum.hardware.fr/hfr/Hardware/2D-3D/voltage-debride-unlocked-sujet_967737_1.htm
11:17 karolherbst: ohh mono
11:18 karolherbst: somehow it looks corrupt
11:19 mupuf: what does?
11:19 karolherbst: "System.TypeInitializationException: An exception was thrown by the type initializer for  ---> System.InvalidProgramException: Invalid IL code in : (): IL_0008: stloc.1 "
11:19 karolherbst: the tool
11:20 mupuf: lovely
11:34 imirkin_: skeggsb: any idea why my memory speed doesn't change on a DDR3 GK208 but the kernel module thinks that all's well?
11:36 imirkin_: skeggsb: this is a log with CLK/PFB=debug: http://hastebin.com/pizaqafici.txt
12:22 karolherbst: had anybody checked if the blob driver is changing PCIe state before or after pstate change?
12:22 karolherbst: allthough I should see the pstate change here myself
12:23 mupuf: yes, you can check that already :)
12:23 mupuf: and it does not really matter in the end, does it?
12:23 karolherbst: depends
12:23 karolherbst: if we know the order
12:23 mupuf: so, nothing interesting in maxwell vbios tweaker
12:23 mupuf: darn it
12:23 karolherbst: we can assume the order won't change across chips
12:23 karolherbst: maybe
12:24 mupuf: well, there is, but nothing related to the pcie speed
12:24 mupuf:is suspecting that there is a new table for that
12:24 mupuf: which is ... weird
12:24 mupuf: but they did it for the voltage already
12:24 karolherbst: so maybe its always like that: pstate => volting => clocking => pcie
12:24 karolherbst: but who knows
12:25 karolherbst: mupuf: keyword for psatate in demmio?
12:25 mupuf: no, it is voltage upping -> clock upping
12:25 mupuf: clock down -> voltage down
12:25 mupuf: the rest does not matter
12:25 mupuf: there is no keywoard for that
12:25 mupuf: just look for writes in PCLOCK
12:25 karolherbst: k
12:25 karolherbst: thanks
12:27 mupuf: will try with kepler
12:35 mupuf: RSpliet: you were not a liar, it works on fermi
12:35 mupuf: let's check kepler
12:36 mupuf: there is something bad with maxwell, either the blob is ignoring my changes or it does much better sanity checks ... or it is hardcoded
12:37 RSpliet: mupuf: I try to lie as little as possible yes
12:38 karolherbst: mupuf: would you follow me if I say there has to be some upvolting between pcie1.0 to pcie3.0 change and clocking operations?
12:38 mupuf: RSpliet: well, you got my suspicious!
12:38 mupuf: so, upvolting, changing to 8GT/s then reclocking?
12:38 karolherbst: mhh too many assumptions
12:38 karolherbst: no, first pcie then volting then reclocking
12:39 mupuf: yeah, matches what I remember
12:39 mupuf: but hey, again, who cares?
12:39 karolherbst: somebody who can't revolt at all?
12:40 karolherbst: if the order is good and there is only a handful of operations going on, then maybe I find the right thing by guesing
12:41 mupuf: karolherbst: you really do not make any sense here
12:41 mupuf: are you talking about what the hw wants or what is the best in our driver?
12:41 karolherbst: what the blob does
12:42 mupuf: not being able to change the voltage is a no-go
12:42 mupuf: why do you care what it does in this case?
12:42 mupuf: Have you read about cargo culting?
12:42 karolherbst: a bit
12:43 imirkin_: it's a like a cult where everyone wears cargo pants, right?
12:43 karolherbst: :D
12:44 mupuf: imirkin: oh yeah!
12:44 mupuf: so, right, the maxwell is just acting up
12:44 mupuf: the kepler allows me to change the pcie speed
12:45 mupuf: bastard maxwell :D
12:45 mupuf: karolherbst: ok, you have got a kepler, right?
12:45 karolherbst: I guess so
12:45 karolherbst: nve6
12:45 mupuf: can you test on your nve6 to fake the vbios?
12:45 karolherbst: doesn't work
12:46 karolherbst: but I can try again
12:46 mupuf: ?
12:46 karolherbst: maybe I did something wrong
12:46 mupuf: what do you mean by doesn't work?
12:46 karolherbst: but the blob didn't seem to see the changes
12:46 mupuf: what were you trying to change?
12:46 mupuf: and what version of the blob were you using?
12:47 karolherbst: I used 352
12:47 karolherbst: and I was trying to disable pstates
12:47 mupuf: works for me
12:47 mupuf: hmm
12:47 karolherbst: maybe I did something wrong
12:47 mupuf: ok, let's try that together
12:47 karolherbst: will try and we will see
12:48 mupuf: nvafakebios -e 0x7fb6:19 vbios.rom
12:48 mupuf: then quit X
12:48 mupuf: rmmod nvidia
12:48 mupuf: then restart x
12:48 mupuf: and make sure you do rmmod nvidia
12:50 karolherbst: "Edit offset 0x7fb6 from 0xa to 0x19 (hex, 8 bits)"
12:50 karolherbst: good?
12:50 karolherbst: wait
12:50 mupuf: sounds good
12:50 karolherbst: I try to understand what I should do
12:51 karolherbst: should nvidia be loaded before doing nvafakebios or after?
12:51 mupuf: it does not matter
12:51 mupuf: as long as you unload it before restarting X
12:51 karolherbst: k
12:51 karolherbst: so I should do it while the card is used by X
12:51 mupuf: it does not matter
12:52 karolherbst: k
12:52 mupuf: it should set the pcie speed to 2.5 for the highest perf lvl
12:52 mupuf: please verify using nvidia-settings
12:52 mupuf: when we confirm then, we can move to setting the pcie speed to 8 on the lowest perflvl
12:53 mupuf: and see how it fares!
12:53 mupuf: but one thing at a time
12:53 karolherbst: okay, got 8.0 GT/s
12:53 mupuf: for the highest perflvl?
12:53 karolherbst: yeah
12:53 mupuf: not good
12:53 karolherbst: okay again
12:53 mupuf: checked with system-settings?
12:53 imirkin_: he can't fake his vbios though
12:53 mupuf: imirkin: ah, you also checked he was doing it right?
12:54 karolherbst: mhh I trusted nvidia-settings there
12:54 karolherbst: maybe its wrong
12:54 karolherbst: lets try again
12:54 imirkin_: iirc the blob just reads from PROM
12:54 mupuf: what the heck
12:54 mupuf: well, sorry then karolherbst...
12:54 mupuf: imirkin: you'll be the one checking if you do not mind :D
12:54 imirkin_: either there's something funky about his hw
12:54 karolherbst: :D
12:55 imirkin_: what am i checking?
12:55 imirkin_: send me an email with instructions.
12:55 imirkin_: you have my vbios
12:55 imirkin_: i don't actively use that gpu, so unloading/loading various modules is easy
12:55 mupuf: even with the blob?
12:56 imirkin_: ya
12:56 imirkin_: it's a secondary gpu
12:56 imirkin_: my main X instance doesn't touch it
12:56 mupuf: ok, good!
12:56 imirkin_: i'm not using it coz it doesn't have a DP port while the onboard haswell does
12:57 imirkin_: and even if it did, nouveau doesn't support MST
12:57 karolherbst: mhhh, yeah don't know: I did this: start X with card, nvafakebios, stop X, remove nvidia, start X with nvidia again and check
12:58 mupuf: imirkin: nvafakebios -c1 -e 6ea4:09 vbios.rom
12:58 mupuf: karolherbst: you did the right thing
12:58 imirkin_: mupuf: email with instructions :p
12:59 imirkin_: [coz i might not do it right now]
13:03 mupuf: done
13:08 karolherbst: mupuf: so can I help you with something else or I am useless because traces nor nvafakebios works :(
13:11 tobijk_: mh my system really hates nvagetbios :-(
13:14 mupuf: karolherbst: you are not useless :p
13:14 karolherbst: okay, my card is
13:14 karolherbst: allthough
13:14 karolherbst: mhh
13:15 tobijk_: 88KB sized nve7 bios? seems a bit small
13:17 karolherbst: its fine
13:17 karolherbst: ...
13:18 karolherbst: now I was stupid...
13:18 karolherbst: mupuf: demmio can only read for nce* cards if they changed pcie speed?
13:19 karolherbst: ahh GK104-
13:19 mupuf: demmio does not know anything about pcie speed
13:19 mupuf: it just interprets a mmiotrace
13:19 imirkin_: or anything, really... it just pushes every read/write through rnndb
13:20 karolherbst: yeah I know, I am just doing silly things now
13:20 mupuf: I guess :D
13:20 mupuf: at some point, it will start making sense
13:20 imirkin_: [and it figures out the chipset variant by waiting for a read of mmio reg 0 to work that out]
13:20 mupuf: there is a lot to catch up with
13:20 karolherbst: I just xzcat every trace thorugh demmio to look for CONFIG_LINK ...
13:21 karolherbst: you can imagine how that went
13:21 imirkin_: no need to xzcat
13:21 imirkin_: demmio -f foo
13:21 karolherbst: ahh okay
13:21 imirkin_: should auto-unxz it
13:21 karolherbst: seems faster indead
13:22 karolherbst: mupuf: do you have a clue know how to check where to write what into from the bios or do you need more information?
13:22 karolherbst: I could check some traces
13:23 ukleinek:considers buying a machine having a GT 630 installed. That one should work fine with Debian jessie, right? (i.e. Linxu 3.16, Mesa 10.3.2)
13:24 imirkin_: ukleinek: most likely it should work, yes. worst case, you'll have to update some software
13:28 ukleinek: imirkin_: as long as I don't have to resort to the non-free driver that's ok
13:28 karolherbst: ukleinek: just asking, do you need a dedicated gpu?
13:28 imirkin_: ukleinek: a GT 630 is either a GF108, GK107, or GK208
13:28 imirkin_: all 3 of those chips are reasonably well-supported
13:29 imirkin_: the latter 2 are likely to even be able to reclock semi-properly
13:29 ukleinek: karolherbst: the main application is office stuff and video playback
13:29 karolherbst: mhh
13:30 ukleinek: it's part of a "silent" machine, so it should have working fan management
13:30 karolherbst: then you should be better of with an intel only laptop to be honest, but if you want to help nouveau, then you should take it
13:30 ukleinek: it's a desktop system
13:30 karolherbst: doesn't really matter
13:30 karolherbst: which cpu?
13:31 ukleinek: karolherbst: http://www.arlt.com/PC/Komplett-PCs/fluesterleise-PCs/PC-ARLT-Mr-Whisper-Pro-SSD-i5.html
13:31 ukleinek: so: i5
13:31 karolherbst: yeah
13:31 karolherbst: intel hd gpu on the cpu
13:31 imirkin_: sounds a lot like my box :)
13:32 karolherbst: ukleinek: if it comes to the worst: you just use the intel gpu
13:32 imirkin_: unless the intel gpu isn't hooked up, i'd definitely suggest using that
13:32 imirkin_: however on my box the intel gpu only has DP and VGA outputs
13:32 karolherbst: ...
13:32 karolherbst: appearantly I have three HDMI outputs...
13:32 karolherbst: at least xrandr is saying so
13:32 ukleinek: karolherbst: sounds like a nice idea. I will ask for that on the shop
13:32 karolherbst: :D
13:32 imirkin_: so if you need something else, you'll need a DP->x adapter
13:32 imirkin_: karolherbst: yeah, on the dock
13:33 karolherbst: yeah
13:33 karolherbst: allthough I think its for stuff like HDMI over DP
13:33 karolherbst: and HDMI over miniDP
13:33 imirkin_: as it happens, i needed precisely DP so it was more convenient to use the intel gpu
13:33 karolherbst: :)
13:33 imirkin_: karolherbst: nah, those still show up as DP, at least on my SNB
13:33 imirkin_: (i have a DP -> HDMI adapter)
13:33 karolherbst: ohh the board has HDMI and DVI
13:33 mupuf: karolherbst: no idea what your question means :s
13:34 karolherbst: mupuf: I mean, can you tell what to write where to change pci speed on any nvidia gpu by checking the bios?
13:34 ukleinek:wonders if the shop guy will understand my question. Assume I take a linux cd with me, what should I look for?
13:34 karolherbst: ukleinek: why not downloading it?
13:35 ukleinek: karolherbst: to test in the shop before buying
13:35 mupuf: karolherbst: no, the bios is just there to give you the data you need, as a driver, to operate
13:35 karolherbst: I see
13:35 mupuf: but it does not tell you how you are supposed to use them
13:35 karolherbst: mupuf: ahhh k
13:35 mupuf: to know how you are supposed to use them, you need to look at the mmiotraces
13:35 karolherbst: okay
13:36 karolherbst: I just thought, that all of the information could be in there somehow, or part of it
13:36 karolherbst: like where to write into
13:37 mupuf: nope :s
13:37 mupuf: well, sometime it is
13:37 karolherbst: would be too easy then I guess
13:37 mupuf: but it is the exception rather than the rule
13:37 mupuf: there are scripts inside the vbios that the driver needs to parse and execute
13:38 karolherbst: okay, so usually in the bios are only information which are really unique to the model
13:38 karolherbst: or most likely unique
13:38 mupuf: oh, I was about to say that the ram timing registers were written inside the vbios
13:38 mupuf: but it is just their values
13:38 mupuf: yes
13:38 mupuf: anything that the manufacturer of the gpu needs to adjust
13:38 mupuf: the rest is hardcoded in the driver
13:38 karolherbst: okay
13:39 karolherbst: makes sense
13:39 karolherbst: so for which cards do we know how to change pcie states? I could check for others then if you want
13:40 mupuf: hey hey hey, maybe nvidia uses the second vbios of my nv117
13:40 mupuf: hmm, I need to check into this
13:44 tobijk: mupuf: 128k vbios from nvagetbios which it says is broken or an 88k bios from debugfs?
13:44 mupuf: tobijk: have you tried nvagetbios -s prom > vbios.rom?
13:45 tobijk: yep same output as with pramin
13:45 mupuf: put the debugfs one then
13:45 mupuf: nouveau does the right thing
13:45 tobijk: ok :)
13:46 ukleinek: the documentation to the mainboard tells: "With the 4th Gen Intel® Core processors featuring Intel® HD Graphics core inside, MSI's 8 series motherboards that equips with 3 video ports support the triple display output a more flexible usage. "
13:46 ukleinek: that looks good, doesn't it?
13:47 karolherbst: sounds about right
13:47 karolherbst: ukleinek: I hope this weak dedicated gpu insanity will stop at some point
13:48 karolherbst: its even worse if you get something like HD 4600 + 720M or something
13:50 ukleinek: it the GT 630 a weak gpu?
13:50 karolherbst: yeah
13:50 karolherbst: pretty
13:51 karolherbst: I think ther emight be uses cases where you need the nvidia card, but then only blob will help you
13:51 karolherbst: with nouveau, the intel HD 4600 can compete with my nvidia 770M one in some games
13:51 karolherbst: but thats only because of driver state
13:52 ukleinek: so you're saying the gpu performance of the in-cpu-gpu is comparable to the GT 630?
13:53 karolherbst: you have to see, that the top of the line intel haswell gpu (5200) is as fast as GT 640
13:53 karolherbst: the 4600 is maybe half as poweful as the 5200 one
13:53 karolherbst: and the GT 630 is even so less pwoerfull as the 640 one
13:54 karolherbst: ukleinek: if you compare open source drivers, then the intel card is about twice as fast, maybe a little less
14:03 ukleinek: karolherbst: thanks for your information.
14:04 RSpliet: ukleinek: you might want to reconsider the relative performance between the two when the official driver is involved
14:05 RSpliet: not to mention the NVIDIA counterparts can do Cuda and OpenGL 4.5, which I'm fairly sure the IGP can't. It's not as if there's no use-case for those discrete relatively-low-power GPUs
14:06 karolherbst: I just assumed he doesn't want to use the blob one "<ukleinek> imirkin_: as long as I don't have to resort to the non-free driver that's ok"
14:07 RSpliet: well sure, but your statement "I hope this weak dedicated gpu insanity will stop at some point" is a bit coloured ;-)
14:07 Karlton: it's worth mentioning that all modern intel hardware still requires blobs
14:08 Karlton: just not from linux or mesa
14:09 Karlton: except for i915 that has blobs in linux-firmware too
14:11 Karlton: but that is only for really new intel gpus
14:20 tobijk: imirkin_: i had some mmiotraces for you last year for my nv86, did you commit them already in mupufs repo somewhere?
14:41 karolherbst: Karlton: mhh, haswell doesn't neet them though
14:43 karolherbst: RSpliet: yeah, I know, but once I saw something like a GT 610M as dedicated gpu and I just thought, yeah, that makes sense
14:43 martm: i minorly looked at my unknown land in the kernel, seems that most drivers will handle mmap, but it does not give control directly which physical pages it mmaps, i belive with something like kprobes or systemtap
14:44 martm: it can be controlled by tapping into pfn_map_range or something
14:46 mupuf:thinks that where is waldo is a simple game compared to "Where the fuck is this nva8?"
14:47 mupuf: hakzsam: Hey, did I give you an nva8 by any chance? :D
14:47 RSpliet: the one in labri or the one in your home?
14:47 mupuf: home
14:47 mupuf: the one in labri ... is still there :D
14:48 tobijk: mupuf: crate-wars? :>
14:48 mupuf: hehe, there are only 2 places where I keep my GPUs
14:48 mupuf: and it is not in any of those places
14:49 mupuf: meh, I will plug the nva3 instead
14:49 RSpliet: insert it like you mean it
14:49 RSpliet: damn, I've clearly done too little this weekend
14:50 mupuf: you were too busy being super sick :p
14:50 hakzsam: mupuf, no
14:50 karolherbst: mupuf: just checking mvce and saw this: "PUNITS.PCI <= { PCIE_VERSION = 2 | PCI_CLASS = DISPLAY | PCIE_SPEED = 2P5GT }"
14:51 hakzsam: mupuf, did you forget your a_?
14:51 hakzsam: *a8
14:51 mupuf: at least, I knew where to look for the nva3 ... the kitchen table, of course!
14:51 mupuf: ah ah, I doubt I forgot any card
14:51 mupuf: 02:00.0 VGA compatible controller: NVIDIA Corporation GT218 [GeForce 210] (rev a2)
14:51 RSpliet: gheh, NVA3, breakfast of champions
14:51 mupuf: it was in my main machine :D
14:51 mupuf: idiot :D
14:51 hakzsam: mupuf, ahah :)
14:51 karolherbst: :D
14:52 tobijk: :O
14:56 mupuf: RSpliet: do you remember what happens when the bios says to change the pcie speed but it is not supported?
14:57 RSpliet: I have no idea
14:58 RSpliet: I always ignored this bit of performance management so far
14:58 mupuf: same here
14:59 karolherbst: mupuf: MMIO32 W 0x088460 0xb06c2221 on nva8
14:59 mupuf: karolherbst: yes?
14:59 karolherbst: pretty sure
14:59 mupuf: pretty sure of what?
14:59 karolherbst: pcie speed change
15:00 mupuf: oh, well, I need to find one that supports it in my cardset
15:00 karolherbst: which cards do you have?
15:01 mupuf: I have 3 nvaX
15:01 mupuf: nva3, nva5 and nva8
15:01 karolherbst: its for nva8
15:01 RSpliet: no NVA0? :-O
15:01 mupuf: ah, right
15:01 mupuf: 4 :D
15:01 mupuf: but the nva0 is weird :D
15:01 karolherbst: so nva3, nva4 and nva5?
15:02 mupuf: ?
15:02 mupuf: a4 and a5 do not exist
15:02 RSpliet: of course A5 does exist, silly
15:02 mupuf: a0, a3, a5, a8, ac, aa and af
15:02 mupuf: sorry :D
15:03 karolherbst: ....
15:03 karolherbst: please .D
15:03 karolherbst: wanna test it on a3?
15:03 RSpliet: now I'm curious though, what BUS is the NVAC hooked up to? is it PCI?
15:03 RSpliet: /AGP/PCIe
15:03 mupuf: well, I will test it on a card that supports it :p
15:04 karolherbst: your nva3 it is
15:04 karolherbst: I hope
15:05 karolherbst: this one is strange
15:05 mupuf: nope, it does not support it
15:05 mupuf: and same for the nva5
15:05 mupuf: I wonder if it is supported at all on the nvaX
15:05 mupuf: I can try on the nva0
15:05 karolherbst: what? pcie speed change?
15:05 RSpliet: NVA8 more likely
15:05 mupuf: let's take out the beast
15:05 karolherbst: the trace indicates it does
15:06 mupuf: RSpliet: why? it is the slowest of all the sane NVAX
15:06 RSpliet: I know, but I've seen it on my laptop ;-)
15:06 karolherbst: there are 2_5 and 5_0 reads in teh trace
15:06 mupuf: karolherbst: what? :o
15:06 RSpliet: maybe you just need to fake together a bios and roll :-P
15:06 karolherbst: W 0x088460 0xb06c2221 for nva3
15:06 karolherbst: 2_5 to 5_0
15:06 RSpliet: karolherbst: what's the next read of 88460?
15:07 tobijk: RSpliet: roll and kill it on the way? :D
15:07 RSpliet: perhaps the card tries, the motherboard fails ;-)
15:07 karolherbst: 0xb06c2220
15:07 karolherbst: or wait
15:07 karolherbst: wrong trace
15:07 karolherbst: 43.316267 MMIO32 W 0x088460 0xb06c2221 PPCI+0x460 <= 0xb06c2221
15:08 karolherbst: later: 43.316456 MMIO32 R 0x088088 0x11020108 PPCI.EXP_LNK_CMD_STA => { CMD = { ASPMC = 0 | RCB | CLOCKPM } | STA = { SPEED = 5_0GT | WIDTH = 16 | SL_CLK } }
15:08 RSpliet: tobijk: oh please, you wouldn't believed how I harassed my GPUs in the process of reverse engineering the memory reclocking bits.
15:08 mupuf: indeed
15:08 karolherbst: before the write: 43.314492 MMIO32 R 0x088088 0x11010108 PPCI.EXP_LNK_CMD_STA => { CMD = { ASPMC = 0 | RCB | CLOCKPM } | STA = { SPEED = 2_5GT | WIDTH = 16 | SL_CLK } }
15:08 tobijk: RSpliet: hehe :)
15:09 karolherbst: I think there are enough traces to get pcie speed change working for mostly every card
15:09 RSpliet: provided you know how to get width from 8 to 16
15:09 tobijk: karolherbst: yeah as RSpliet noticed, the card may do it just find, but with an old mainboard if will just fail horribly :>
15:10 karolherbst: RSpliet: I was talking about the speed not the lanes ;)
15:10 RSpliet: or refuse the change, it's a two-way negotiation I believe
15:10 tobijk: that'd be lovely
15:10 karolherbst: so, who doesn't tried this at home today?
15:10 karolherbst: :D
15:15 mupuf: yeah, there is a nice potential of fucking up :D
15:15 mupuf: luckily, we can add that as a parameter
15:15 mupuf: NvPCIeExec :D
15:15 karolherbst: yeah
15:16 tobijk: the way to go (want problems -> reclocking, want even more problems -> pcie upclocking...)
15:16 karolherbst: I currently fear, that two nva5 cards use different writes :/
15:17 karolherbst: mhh
15:17 karolherbst: no, fals alarm
15:18 karolherbst: one card read the link cap though
15:18 mupuf: no idea how to do that from the kernel :D
15:18 karolherbst: mupuf: 2594.507497 MMIO32 R 0x088084 0x00052d02 PPCI.EXP_LNK_CAP => { SPEED = 5_0GT | WIDTH = 16 | ASPMS = 0x3 | L0SEL = 0x2 | L1EL = 0x2 | CLKPM | PORT = 0 } ;)
15:19 karolherbst: dirty workaround!
15:19 karolherbst: mhhh
15:19 karolherbst: no idea either
15:19 karolherbst: wait
15:19 karolherbst: mupuf: there is a mask
15:19 karolherbst: in drm
15:20 karolherbst: mupuf: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/drm_pci.c?id=refs/tags/v4.2-rc3#n368
15:25 karolherbst: okay, its "W 0x088460 0xb06c2221" for nva3, nva5 and nva8 for sure
15:27 karolherbst: and "W 0x088460 0xb06c2211" for back down to 2_5
15:29 karolherbst: somebody want to try that out?
15:30 mupuf: karolherbst: so, it is a bit more complex than that
15:30 mupuf: and it still does not work for me
15:30 karolherbst: :/
15:30 mupuf: you also need to write to 0x0884b8
15:30 mupuf: write 0x40 there
15:30 karolherbst: yeah
15:30 karolherbst: saw it
15:30 mupuf: then write in 0x088460
15:31 mupuf: and then it will increment a number in 00088188
15:31 mupuf: which I guess is a way to say it has done the work
15:31 karolherbst: mhh checking
15:31 mupuf: if you do not write 0x40 to 0x0884b8, the number does not get increase
15:31 mupuf: I will mmiotrace this gpu
15:31 mupuf: but I have low hope
15:31 karolherbst: this pattern exist indedd in the traces
15:32 mupuf: of course, I never mmiotrace my nva8...
15:32 karolherbst: nva8 does it in the trace as you said though
15:33 karolherbst: ahh yours
15:33 karolherbst: mhh
15:33 mupuf: yeah, the one at home
15:33 mupuf: not at my previous research center :p
15:33 karolherbst: mhh
15:33 karolherbst: maybe there is this third write needed as well
15:33 mupuf: let's take it out for a spin!
15:33 mupuf: third?
15:33 karolherbst: W 0x0884b8 0x00000040
15:34 mupuf: nvidia does not change the link speed
15:34 karolherbst: the one without commit
15:34 karolherbst: ...
15:34 mupuf: that's the first one
15:34 karolherbst: ohh
15:34 karolherbst: W 0x088460 0xb06c2220
15:34 karolherbst: W 0x088460 0xb06c2221
15:34 karolherbst: but if nvidia doesn't do it, there is less hope
15:34 mupuf: yes, that is what I see in the nva3 trace
15:34 mupuf: indeed :p
15:34 mupuf: that's what I keep telling you :D
15:35 mupuf: I need to mmuitrace this nva3 and then the nva8
15:35 mupuf: let's do that
15:35 karolherbst: funny that your trace tells me, it reads the 5_0 state out of the gpu
15:35 karolherbst: or is it another nva8?
15:38 martm: another learning day, mupuf, can this hw context fall back to gart/gtt/system memory too when there is no vram available?
15:41 martm: probably it can in which case or in any way, imo i don't big problems to do some minor enhancements in future for blobs or open source drivers, gonna go back to facebook movie
15:44 mupuf: karolherbst: yeah, another gpu
15:44 mupuf: I have met a lot of nvidia gpus in my life
15:44 mupuf: this is the one from labri that you are checking
15:44 mupuf: this is my previous' research center
15:45 karolherbst: okay, I think I understood now in which order the blob driver is doing stuff: "upclocking": upclock pcie speed, upvolt, upclock. "downclocking": "downclock, downvolt, downclock pcie speed
15:46 tobijk: sounds reasonable
15:48 karolherbst: if that's true, my upvolt has to be somewhere in there: https://gist.github.com/karolherbst/e8e5b6abf5769048492f
15:50 mupuf: karolherbst: it obviously is not the case :p
15:50 martm: i don't quite know but, prolly dx11 for instance is quite strong, it only lags due to draw call overheads compared to dx12, prolly the only possible enhancement there
15:50 karolherbst: :(
15:51 tobijk: karolherbst: in front of it? :D
15:51 mupuf: but you are getting the hang of it
15:51 karolherbst: yeah
15:51 tobijk: upvolt upclock pcie, ...
15:51 karolherbst: but why isn't it there?
15:51 mupuf: that is one of the first approach I went with
15:51 mupuf: comparing a trace where I knew everything and the new weird one
15:51 mupuf: and they do not match
15:51 karolherbst: :D
15:52 karolherbst: but I am not comparing old ones
15:52 mupuf: as in, where there should be voltage changes, there is ... nothing
15:52 karolherbst: I tried that and saw they are too different
15:52 karolherbst: ahh
15:52 karolherbst: yeah :/
15:52 karolherbst: how should it look alike if there is indeed a voltage change?
15:53 karolherbst: or lets rephrase that: how can I see where the driver reclocked to a pretty high cstate?
15:53 karolherbst: and something around that looks like a volt change
15:55 mupuf: if I knew, I would have told you already :D
15:55 mupuf: there is one thing though
15:55 mupuf: I already wrote about it in this channel
15:56 mupuf: at 0x10eXXX something
15:56 mupuf: close to another PWM controller
15:56 karolherbst: okay
15:56 mupuf: let me find it again
15:56 mupuf: try 10eb30 or 10eb40
15:56 mupuf: that's what I seem to remember
15:57 karolherbst: nope
15:57 karolherbst: will find
15:58 karolherbst: but if was 10eb30 for you indeed
15:58 mupuf: you cannot find it in your trace?
15:58 karolherbst: nope
15:58 mupuf: maybe it is only starting from maxwell that it is here
15:58 mupuf: lovely
15:58 mupuf: two more regs to find :D
15:59 karolherbst: :/
15:59 mupuf: darn it, already 2am
15:59 karolherbst: 2?
15:59 mupuf: yeah, Finland is ahead of you by an hour
15:59 karolherbst: I see
16:03 karolherbst: mupuf: nv117?
16:04 mupuf: yeah, that's the only gpu I have with this PWM controller
16:04 karolherbst: both traces are fine?
16:05 karolherbst: yeah, found it
16:06 karolherbst: did you test pcie upclock with it ?
16:06 karolherbst: looks exactly like mine the upclock stuff
16:11 mupuf:was fed up with remembering that tag 0x81 was PWM_VID
16:11 karolherbst: :)
16:15 mupuf: no luck, my nva8 also does not support the 5 GT/s speed
16:16 karolherbst: :/
16:16 karolherbst: but why...
16:16 mupuf: that is a very valid question
16:16 karolherbst: model?
16:16 mupuf: well, maybe this is an early one
16:16 mupuf: or only the mobile ones support it?
16:17 karolherbst: 210?
16:18 karolherbst: there are 8400 GS ones with only 2.5
16:18 karolherbst: there are 210 ones with only 2.5
16:19 karolherbst: the quadro only has 2.5
16:19 karolherbst: but all the others should have 5.0
16:20 mupuf: yes, I have a 210
16:20 mupuf: out of luck I guess :p
16:20 karolherbst: there are 210 also with 5.0 support
16:20 karolherbst: :)
16:21 mupuf: RSpliet: do you still have your nva8 laptop?
16:24 mupuf: I guess I should check a fermi before going to bed
16:24 mupuf:is only interested in the vbios parsing right now
16:24 karolherbst: I see
16:24 karolherbst: should be easy enough
16:25 karolherbst: which one?
16:25 karolherbst: maybe I will try with the fermi I have here then :)
16:25 mupuf: I am plugging the weird one, the nvc0
16:26 mupuf: GF100
16:26 mupuf:needs to get used to the nvidia names
16:26 karolherbst: I have a 630M here
16:26 karolherbst: but don't know which one
16:26 mupuf: that, I will never learn
16:26 karolherbst: I think the GF117 one
16:26 mupuf: check out what it means in codenames or, better, wikipedia
16:26 mupuf: https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units
16:26 karolherbst: yeah
16:26 karolherbst: I know it
16:27 karolherbst: its either GF108 or GF117
16:27 karolherbst: check for 630M, then you know why I am not sure ;)
16:28 mupuf: sooo! The nvc0 has it
16:28 mupuf: very well
16:28 karolherbst: :)
16:28 karolherbst: it works?
16:29 mupuf: what works? Changing the pcie speed with the same technique we use on kepler? No idea
16:30 karolherbst: mhh
16:30 karolherbst: it seems to have this other way
16:30 karolherbst: W 0x0884b8 0x00000040
16:30 karolherbst: W 0x088460 0xb06c2220
16:30 karolherbst: W 0x088460 0xb06c2221
16:31 karolherbst: it does a lot of things though
16:31 karolherbst: maybe I work on that on my fermi card
16:31 karolherbst: don't know if you want to do that
16:33 mupuf: well, as you have seen, this is indeed the same style as nva3
16:33 mupuf: which is great for me :)
16:33 karolherbst: yeah
16:37 mupuf: it works
16:38 karolherbst: very good
16:38 karolherbst: there are fermis with pcie x3 though
16:39 karolherbst: or are there?
16:39 mupuf: there is one thing that needed to be done for me first
16:39 mupuf: nvapoke 0x02241c 81
16:39 mupuf: after that, it is pretty trivial, nvapoke 00088460: b06c2221 --> 5GT/s
16:39 mupuf: nvapoke 00088460: b06c2211 --> 2.5 GT/s
16:39 karolherbst: yeah I see that
16:40 mupuf: the last 1 is again a commit bit
16:40 karolherbst: what about W 0x0884b8 0x00000040?
16:40 mupuf: not needed
16:40 karolherbst: k
16:40 karolherbst: "W 0x02241c 0x00000081 PUNITS.PCI <= { PCIE_VERSION = 2 | PCI_CLASS = DISPLAY | PCIE_SPEED = FULL "
16:40 karolherbst: obviously enough ... :D
16:40 mupuf: as opposed to 0x02241c 1
16:40 mupuf: which forces it to 2_5 GT
16:41 mupuf: yeah, it is indeed pretty obvious
16:41 mupuf: just like it is time for me to go to bed!
16:41 mupuf: see you!
16:41 karolherbst: bye
16:41 mupuf: imirkin_: plugging the nv50
16:41 karolherbst: didn't find anything usefull for my vid patch though :/
16:42 karolherbst: like the kernel is liying to me
20:57 imirkin: blah, looks like my mobo at home is old ... no 5GT/s at all :(
20:59 imirkin: hrmph, although it's supposed to be 2x PCIe 2.0 x16 slots
21:01 imirkin: might be the fault of the GF108 itself actually
22:00 imirkin: ugh. g80 just doesn't support first/last level clamping in the sampler view, only in the sampler :(
22:01 imirkin: last level is easy enough, but first level is an enormous pain. urgh.
23:25 mupuf: imirkin: too bad for the g80!
23:25 mupuf: I have grave news... My main pc died during the night
23:26 mupuf: after 6 years of continuous running, it decided to meet its creator (it still has a long way to Valhala ... err. Africa)
23:28 imirkin: mupuf: noooo :(
23:28 imirkin: i'm still debugging the g80
23:28 imirkin: getting closer
23:28 mupuf: I guess this settles the debate of what machine will I get next!
23:28 mupuf: have fun, reator is still alive and kicking
23:29 mupuf: I will check if swapping the PSU is enough to get the main pc to start again
23:29 mupuf: if not, then I guess it is either the mobo or the proco and I cannot do anything to fix it easily
23:30 mupuf: have fun with the g80!