00:13 pmoreau: tagr: Got it, I'll do that. :-)
01:33 RSpliet: tagr: vbios P table at offset 0x38, I think it was just labelled "BASE CLOCK" by karolherbst, but didn't quite like the name :-P
01:37 karolherbst: mhhh
01:37 karolherbst: seems like there is some ddos again?
01:41 RSpliet: karolherbst: there's always a certain level of DDoS going on on freenode
01:41 karolherbst: yeah, but today it was a bit more than usual
01:44 karolherbst: or they just limited to 20.000 connections :/
01:49 pq: yeah, I don't often see several seconds of lag, but there was just now
01:56 karolherbst: if somebody want to look over this, it would be great: https://github.com/karolherbst/nouveau/compare/master...karolherbst:stable_reclocking_kepler
02:22 tagr: RSpliet: hmm... there was some work going on a while ago about publishing information about these tables...
02:49 tagr: RSpliet: is that the table that has ID 0x43?
04:53 laust: Hello
07:26 Tom^: karolherbst: i found a new issue which i bet is gonna be one hell of a job to find. the card at idle on max clocks is making a very weird buzzing sound it shouldnt be doing. :P
07:26 Tom^: an electric buzzing one
07:27 Tom^: maybe its because im running at clocks i shouldnt be at on idle, and the fan ramps down so i start hearing it.
07:28 Tom^: clocks/volt
07:29 karolherbst: Tom^: does nvapoke 0x20200 22722455 change anything?
07:30 karolherbst: but I think with proper power gating the issue will be less noticeable too
07:32 Tom^: nah nvapoke doesnt change much, but on second thought i think im just hearing it because im on 1.8v idle which i wouldnt be otherwise, and the fan is just so slow im hearing buzz from either the card or my psu
07:32 karolherbst: 1.18?
07:32 karolherbst: ohh
07:32 karolherbst: you idle at 0f
07:32 karolherbst: yeah well
07:32 Tom^: yes. :p
07:32 karolherbst: don't do that :p
09:05 gordan: How well does nouveau compare to Nvidia's binary drivers in terms of performance for older cards (e.g. Fermi 480)?
09:05 gordan: I know there have until recently been issues with re-clocking on Kepler and later, but without that, is the performance comparable on older Fermi cards?
09:06 gordan: I cannot seem to find any recent information on the subject.
09:16 imirkin_: gordan: clock-for-clock, nouveau gets abotu 60-80% of blob perf
09:17 imirkin_: gordan: with fermi, the clock you get is the one your card happens to boot to
09:17 imirkin_: and there's no further reclocking (for now)
09:19 gordan: imirkin: Doesn't Fermi boot to an idle clock?
09:19 gordan: I thought all Nvidia cards going way back do that.
09:20 gordan: If it's stuck at boot clocks, then presumably that's going to be the same problem as with Kepler/Maxwell.
09:20 imirkin_: gordan: diff cards boot to diff clocks
09:20 imirkin_: gordan: on average, fermi gpu's boot to mid-level clocks, but it depends
09:28 gordan: Right. So if I hack the BIOS to always run at max clocks, I'll still only get 60-80% of blob's performance?
09:29 imirkin_: depends on the specific app, but that's a good estimate
09:30 gordan: OK, thanks.
09:31 imirkin_: (not sure how you'd hack the vbios...)
09:32 gordan: imirkin_: Using one of the many, many Nvidia VBIOS editing tools of course.
09:33 imirkin_: gordan: do they let you control something like that? seems unlikely.
09:36 gordan: Kepler BIOS Tweaker does, IIRC.
09:36 karolherbst: gordan: yeah well, for kepler cards that is ;)
09:36 imirkin_: gordan: really? i assumed they just manipulated the various reclocking tables that have clock freqs/etc
09:37 imirkin_: not the actual startup scripts used
09:37 gordan: Don't the startup scripts use the said tables as well?
09:38 imirkin_: unlikely
09:38 imirkin_: usually it's just fixed sequences of commands
09:38 gordan: I thought they did, but I may be wrong. Last time I was messing about with the script blobs was when I was trying to mod the Q6000 BIOS to reduce the RAM size from 6GGB to 1.5GB to make it work on a GTX480.
09:39 imirkin_: last i stared at the vbios assembly, they just executed the usual init tables
09:39 imirkin_: which in turn contain fixed sequences of commands
09:41 gordan: Fair enough.
09:42 gordan: So how much of the 20-40% performance hit is in fact coming from the clock speeds being off?
09:42 imirkin_: 0
09:42 gordan: I see. Thanks.
09:42 imirkin_: like i said, that's clock-for-clock
09:43 karolherbst: imirkin_: I think those kepler benchmarks are no fair deal, because nouveau clocks higher than the blob :/
09:43 imirkin_: meh wtvr
09:43 imirkin_: i don't mind cheating, as long as it's in our favor
09:43 karolherbst: :D
09:43 imirkin_: ;)
09:43 karolherbst: ahhhh meh
09:44 karolherbst: infinite while loop
09:44 karolherbst: :(
09:44 imirkin_: those can take a while to complete
09:44 karolherbst: yeah lol
09:44 imirkin_: back in the bad old days computers would detect those and halt to prevent from wasting precious cpu time
09:45 karolherbst: :)
09:45 karolherbst: at least gcc could have warned me
09:45 imirkin_: you're right -- blame gcc. that's who's to blame.
09:45 karolherbst: well it was obvious to notice
09:45 karolherbst: while (var) { other_var = something }
09:45 karolherbst: ...
09:46 karolherbst: anyway, I have to reboot now, don't I?
09:46 gordan: In fairness, GCC has a lot to answer for when it comes to performance of compiled code.
09:46 karolherbst: or can I just kill those kworkers
09:46 gordan: Intel's C compiler still produces binary code that utterly annihilates GCC at runtime performance.
09:47 imirkin_: karolherbst: killing the kworkers is a bit tricky
09:47 karolherbst: mhhh
09:47 karolherbst: I know
09:47 imirkin_: gordan: on intel chips... probably. does it really annihilate? or do we have diff opinions of what an annihilation looks like?
09:48 karolherbst: imirkin_: a lot more SIMD optimizations done by the compiler
09:48 karolherbst: on ffmpeg this can mean up to 15% more perf
09:49 karolherbst: but I think most of the gain fomes from the math library they used instead all those glibc functions
09:49 gordan: imirkin_: On AMD chips, too. On anything with SSE, in fact.
09:49 gordan: ICC does a pretty damn good job at vectorizing things. GCC still falls flat on it's face with that sort of thing.
09:50 karolherbst: gordan: you can use the icc math library with gcc
09:50 karolherbst: then you also gain most of the perf increase
09:50 gordan: That'll speed up the math library stuff, it won't speed up your own code.
09:50 imirkin_: ah. so in the majority of cases, it performs about the same then?
09:50 karolherbst: imirkin_: yeah
09:50 imirkin_: since vectorization is only a thing for pretty rare scenarios
09:50 gordan: If it's pointer chasing, it'll be similar.
09:50 karolherbst: gordan: it will, beacuse sorting stuff is also math ;)
09:50 gordan: imirkin_: Not rare at all.
09:50 imirkin_: doing math in code is rare.
09:50 gordan: Depends on how much time your code spends where.
09:51 gordan: Anything that has loops can be vectorized.
09:51 imirkin_: hakzsam: do you have a trace somewhere? i want to see what's going on
09:51 karolherbst: well, seems like I have to reboot :/
09:51 gordan: e.g. building MySQL with ICC speeds it up by about 30% or so on average.
09:52 gordan: I honestly expected more, my number crunching code speeds up by almost a factor of 4x.
09:52 hakzsam: imirkin, scp reator:/home/hakzsam/gf110-cuda-vectorAdd.mmt .
09:52 gordan: Back in the days of Pentium 4, the difference was more like 7.2x
09:53 gordan: (a.k.a. Pentium 4 didn't suck, it's the compilers that were too crap to leverage it's features properly)
09:53 imirkin_: hakzsam: it's booted into a partition that doesn't have my keys apparently... can you put it up somewhere?
09:54 imirkin_: gordan: yeah, that's like saying that the alpha didn't suck, the compilers were just unable to properly use its memory ordering rules
09:54 hakzsam: imirkin, I don't have access to fdo here.. by email?
09:54 imirkin_: gordan: and same for ia64
09:54 imirkin_: hakzsam: email or filebin
09:54 karolherbst: meh, freetype updates are always strange :D
09:55 karolherbst: ...
09:55 hakzsam: imirkin, http://filebin.net/a629b1lxwb
09:55 imirkin_: thanks
09:57 imirkin_: sadness. so it uses constbuf to upload code? gr.
09:57 gordan: imirkin_: Indeed, I wholeheartedly agree. AMD has a LOT to answer for, for getting us all stuck with a bastard offspring of x86 for years to come.
09:57 hakzsam: imirkin, yah
09:57 imirkin_: gordan: amd64 is way better than ia64
09:58 hakzsam: have to go, see you
09:59 gordan: imirkin_: Let'e not go there.
10:00 gordan: I look forward to the day x86 and bastard variants thereof are dead and burried, be it at the hands of ARM, MIPS or something different entirely.
10:07 karolherbst: but what about all those games :O
10:09 gordan: Well, maybe Nvidia will open source the technology they use in K1/X1. IIRC it is based on the same approach that Transmeta used in the Crusoe/Efficieon lines.
10:10 gordan: Takex x86 binary code and cross compiles it to native code, with caching.
10:10 gordan: Similar to DEC's FX!32 did on Alpha.
10:10 gordan: Near native x86 performance on a different bare metal hardware.
10:10 gordan: On Transmeta stuff the real instruction set of the underlying hardware was never even published, AFAIK.
10:11 gordan: And besides, games don't need that much CPU these days, relatively speaking, most of the gaming stuff tends to be GPU bottlenecked, so running x86 stuff with something like Exagear on ARM wouldn't that terrible.
10:11 imirkin_: huh? K1/X1 are arm
10:12 gordan: From the user perspective, they look, smell and feel like ARM, yes.
10:12 gordan: As did Transmeta stuff.
10:13 karolherbst: mhhh, I am big fan of duck typing
10:13 karolherbst: so then K1/X1 are ARMs, right?
10:13 gordan: I seem to recall reading that underneath they are quite different. Nvidia were going to make them x86 SoCs, but in the ongoing spat with Intel over the "No SLI without Nvidia PCIe hub, unless it's AMD because we don't make AMD motherboards", Intel wouldn't licence them any more x86 stuff (as they didn't licence them QPI which killed Nvidia's motherboard business overnight).
10:14 gordan: I cannot seem to find the article in question right now, but the gist of it is that K1 and T1 are a different microarchitecture underneath but there is ARM code cross compiling going on down to something fancy.
10:19 karolherbst: how does that looks to you? https://gist.github.com/karolherbst/85324c8255690937d3fa
10:19 karolherbst: clk: base: 705 MHz, boost: 797 MHz
10:42 karolherbst: Tom^: do you have some time?
10:44 karolherbst: ohh actually, I thinkI can do that on reator
10:48 imirkin_: karolherbst: what's the diff between boost 1 and 2?
10:49 karolherbst: 1: "marketing boost clock" is max, 2: all clocks enabled with valid voltages
10:50 karolherbst: 1 should be always under the tdp for most of the cards, where 2 can go over it currently
10:50 imirkin_: and by "marketing" you mean "contents of boost table"?
10:50 karolherbst: because we don't check the tdp
10:50 karolherbst: imirkin_: yeah, there is a boost_clock, which is identical to what teh gpu was sold with
10:51 karolherbst: but this is more like an avarage clock
10:51 karolherbst: the clock the consumer can expect the gpu to boost to
10:52 karolherbst: currently nouveau don't even check for a valid voltage, so :/
10:52 karolherbst: but 2 would be what is closest to what nouveau does currently
11:48 hakzsam: imirkin_, well, I need a trace which uses BIND_TSC/TIC with COMPUTE
11:48 hakzsam: do you have one btw?
11:49 imirkin_: hakzsam: no... just copy the gf100_3d stuff and hope for the best
11:50 hakzsam: I would prefer to make sure it works :)
11:50 imirkin_: overrated
11:50 imirkin_: anyways... easy enough, just do something with texture()
11:50 hakzsam: yep
12:45 hakzsam: imirkin, pull request updated
13:01 karolherbst: Tom^: could you test the stable_reclocking_branch from my repostiory?
13:01 karolherbst: this is basically stock nouveau just with some improvements
13:01 karolherbst: you will need nouveau.pstate=1 again
13:01 karolherbst: and the pstate file is in sysfs again
13:10 imirkin_: hakzsam: lgtm
13:11 hakzsam: imirkin_, thks, I'll push
13:11 karolherbst: by the way: any objections on this one? https://github.com/envytools/envytools/pull/21
13:12 imirkin_: karolherbst: ship it
17:01 marcosps: imirkin: around? I'm testing a shader program, and it's crashing with nouveau (testing with current master and using DRI_PRIME to use nvidia board)
17:02 marcosps: imirkin: but dmesg doesnt show anything realted to the crash, just some ACPI messages...
17:02 marcosps: imirkin: when it crashes, my entire gnome session goes down...
17:12 marcosps: Hi guys, running apitrace it shows a glClear as imcomplete :)
19:53 Tom^: karolherbst, will do on friday. boarding an airplane to stockholm in an hour and im there for two days.