00:00 mupuf_: why? Because that's what we need
00:00 karolherbst: the blob also uses sometimes higher voltage for the same greq
00:00 karolherbst: depending on temperature or load
00:00 mupuf_: the driver should not put a voltage higher than the minimum voltage that is sufficient
00:00 karolherbst: but it does
00:00 karolherbst: the voltage your patch uses are stable though here
00:00 mupuf_: no, the table is just not telling us what we want
00:01 mupuf_: the temperature can be affecting a little, but this should just be one coefficient
00:01 mupuf_: as for changing the voltage under load, I doubt it
00:01 mupuf_: you change cstate under load
00:02 mupuf_: and it changes the voltage
00:02 karolherbst: yeah, but the PWM had 64 for 1 °C and 61 for like 50°C
00:02 karolherbst: mhhh
00:02 mupuf_: I will not take your word for it :D It is easy for the blob to just be updating the state needlessly
00:02 karolherbst: blob uses 0x2e for 705MHz at idle
00:02 mupuf_: I will try to reproduce on a very-controlled scenario
00:02 karolherbst: but 0x3e for 862-135 MHz under load
00:03 mupuf_: like the one I used to get my reading
00:03 karolherbst: okay
00:03 mupuf_: you need to check all the clocks too
00:03 mupuf_: not just the main ones
00:04 karolherbst: will do
00:04 karolherbst: but the blob uses also non cstate clocks
00:04 karolherbst: so
00:04 mupuf_: really? :D
00:04 mupuf_: that's fun
00:04 karolherbst: yeah
00:04 karolherbst: 135MHz is clock used by blob for low load
00:04 karolherbst: but lowest cstate is 405MHz
00:04 karolherbst: the boost table has this clock though
00:05 karolherbst: brb
00:05 karolherbst: also I can clock higher then the highest cstate
00:06 mupuf_: you know that the cstate are not everything
00:06 mupuf_: there is the pstate too
00:06 karolherbst: pstate has the same bounds
00:07 karolherbst: that's my boost table entry: "0: pstate 7 min 270 MHz max 810 MHz"
00:07 mupuf_: ....
00:07 mupuf_: well, 135 is half 270
00:08 karolherbst: yeah
00:08 mupuf_: :D
00:08 mupuf_: not sure where it gets us, but ... good to know :D
00:08 karolherbst: but PM_Mode table says 405MHz
00:08 karolherbst: also the pstate file of nouveau says 405
00:09 karolherbst: mhh the boost table is strange though
00:10 mupuf_: yes
00:10 karolherbst: https://gist.github.com/karolherbst/397ce9495d9986336211
00:10 mupuf_: it is not entirely understood
00:10 karolherbst: the min/max are like the pstate sysfs file of nouveau
00:10 karolherbst: but the "header" has a lower min
00:12 karolherbst: fun though, the blob seems to use 15MHz steps or something
00:13 karolherbst: will try to collect voltage data for each clock depending on temperature...
00:13 mupuf_: :)
00:14 mupuf_: I would say force the blob to use the maximum clock
00:14 mupuf_: and vary the voltage
00:14 mupuf_: you'll see if the blob changes something based on temperature
00:15 karolherbst: I already did that
00:16 karolherbst: it was between 61 and 64 (the pwm value)
00:16 mupuf_: you forced it to the max clock?
00:16 karolherbst: yeah
00:16 mupuf_: hmm, small variance
00:16 karolherbst: only low temp had an effect though
00:16 mupuf_: would be fun to see if the voltage is lowered as we reach the FSRM thresholds
00:16 karolherbst: I think after 50°C nothing changed
00:17 mupuf_: ack
00:17 karolherbst: but I only tried 90°C max or something?
00:17 karolherbst: maybe 95
00:17 mupuf_: well, that is a task you can do
00:17 mupuf_: just make sure that the clocks DO NOT CHANGE :D
00:17 karolherbst: 100% gpu core load?
00:17 karolherbst: :D
00:17 mupuf_: no
00:18 karolherbst: mhh otherwise I get only 705MHz
00:18 karolherbst: not 862MHz
00:18 karolherbst: there has to be load
00:18 mupuf_: do not set a load
00:18 karolherbst: even if I tell the driver to go into max perf mode
00:18 mupuf_: otherwise, you will see the effect of boost
00:18 karolherbst: I meant I run a benchmark in the background
00:18 karolherbst: mhhh
00:18 karolherbst: there is no boost ;)
00:18 mupuf_: boost lowers the speed when the temperature gets too high
00:19 karolherbst: shouldn't the clock change in nvidia-settings if the blob boosts?
00:19 karolherbst: or should I read the PLLs out
00:19 karolherbst: but as far as I know, there is no boost on Linux so far
00:19 karolherbst: at least not for my gpu
00:20 karolherbst: but I could also check what the blob does at 135MHz and 705MHz at no load
00:21 mupuf_: reads the PLLs out
00:22 mupuf_: oh, you have a power sensor too, sweet
00:22 mupuf_: a power sensor, but no power budget
00:22 mupuf_: well done :D
00:22 mupuf_: oh, and it does spread spectrum
00:22 mupuf_: no idea how anyone reversed this table though
00:23 karolherbst: mupuf_: will your pwr tool read the "right" freq out?
00:23 mupuf_: yes
00:23 karolherbst: okay, then there is no boost
00:23 karolherbst: because otherwise the blob has to boost by at least 150MHz :D
00:24 karolherbst: otherwise I would be disaapointed
00:25 mupuf_: they say in the table, up to 5%
00:25 mupuf_: that's almost nothing
00:25 mupuf_: I will check again, but I do not see why linux would be different than windows
00:26 mupuf_: they use the same code
00:26 karolherbst: the table says 112 for me
00:26 karolherbst: "1: domain 4 percent 112 min 810 max 1932"
00:27 mupuf_: nvidia-smi -q -i 0 -d SUPPORTED_CLOCKS --> sweet!
00:27 mupuf_: http://devblogs.nvidia.com/parallelforall/increase-performance-gpu-boost-k80-autoboost/
00:27 mupuf_: it is domain 4
00:27 mupuf_: it is not your core frequency
00:28 mupuf_: 4:0x7f1d: 2a 03 00 00 : hub06 freq = 810 MHz
00:28 karolherbst: k
00:29 karolherbst: mupuf_: "Unknown Error" :D
00:29 mupuf_: hehe
00:29 mupuf_: have to go
00:30 karolherbst: k
00:45 karolherbst: mupuf_: https://gist.github.com/karolherbst/397ce9495d9986336211
00:46 mupuf_: ack, I wonder why having an increase voltage when cold
00:47 mupuf_: but hey, I am not a quatum physicist
00:47 karolherbst: mhh
00:47 karolherbst: after 97 its downclocks into 07 pstate and 405MHz
00:48 karolherbst: mhh
00:48 mupuf_: yeah, that's another behaviour I could probably add to nouveau quite easily
00:48 karolherbst: mupuf_: ranges with two voltages are because of which way you change the temp
00:49 karolherbst: if you go up, the voltage drops at the upper bound
00:49 karolherbst: but increases only if you reach the lower bound
00:49 mupuf_: ok, so there are histeresis cycles
00:49 mupuf_: lovely
00:49 mupuf_: anyway, can't work on this right now
00:50 mupuf_: have fun!
00:50 mupuf_: find this information in the vbios :)
00:50 karolherbst: the other engines are also clocked differently
00:50 karolherbst: not only the core
00:50 karolherbst: but the others also change depending on the load, well
00:51 karolherbst: 1411MHz has the voltage id of 35
00:51 karolherbst: 812500-937500 in the map table
00:56 karolherbst: pwm 0x21 - 0x35
00:57 karolherbst: blob uses 0x2e-0x32
00:57 karolherbst: looks similiar to what I saw with 862MHz
00:58 karolherbst: blob seems to be in the uper third of the voltage map table range
00:58 karolherbst: *upper
01:19 karolherbst: mupuf_: ohh my gpu fan reacts to nvaforcetemp
01:20 mupuf_: no shit sherlock :p
01:20 karolherbst: you meant it may read the power consumption
01:21 karolherbst: but voltage, clock and load stayed the
01:21 karolherbst: the blob can't control the fan though
01:21 karolherbst: and nouveau also can't, allthough it tries
01:22 karolherbst: so there has to be something done over the MXM bus
01:22 karolherbst: maybe these 26.28.30 pins?
01:23 karolherbst: mhh maybe I remember it wrong and the fan is indeed connected to the card
01:24 karolherbst: mupuf_: why don't I have a power budget?
01:24 karolherbst: aren't the entries valid?
01:24 karolherbst: peak: 80W, sounds reasonable
01:27 karolherbst: mupuf_: at some clocks there aren't any voltage changes depending on the temp, I think pwm 38 is somehow the min value the blob uses here
01:28 mupuf_: karolherbst: read the line with a *
01:28 mupuf_: in the power budget
01:29 mupuf_: and no, 80W is not a reasonable power budget for your GPU :D
01:29 mupuf_: you are insane :D
01:29 mupuf_: 10 or 15W is a max
01:29 karolherbst: why not?
01:29 karolherbst: 80W is fine
01:29 karolherbst: I have a 150W AC thingy
01:30 karolherbst: 180 actually
01:30 mupuf_: and how do you dissipate the heat out of a laptop?
01:30 karolherbst: good fans?
01:30 karolherbst: its a 770M
01:30 mupuf_: with tiny whiny (yes, that's what they do) fans
01:30 karolherbst: 960 cores at 862MHz ?
01:30 karolherbst: yeah, official TDP is 75W
01:31 karolherbst: the laptop can also be built with a 780M
01:31 karolherbst: which as 110W
01:31 karolherbst: *has
01:31 karolherbst: 122 actually
01:31 karolherbst: :D
01:31 karolherbst: 80W is fine
01:31 specing: mupuf_: mostly through the keyboard and onto karolherbst's hands
01:31 mupuf_: https://www.techpowerup.com/gpudb/2129/geforce-gtx-770m.html
01:32 mupuf_: shit, you are right
01:32 mupuf_: this is INSANE
01:32 karolherbst: yeah
01:32 karolherbst: :D
01:32 karolherbst: guess what
01:32 karolherbst: +135MHz overcloking and I hardly reach 80°C
01:32 specing: laptop works 20 minutes on max power @ battery?
01:32 mupuf_: and you can hardly touch your laptop? :D
01:32 karolherbst: specing: something like that
01:32 mupuf_: brb
01:32 karolherbst: no, the keyboard is always cold
01:32 mupuf_: then we parse this freaking table wrong
01:32 specing: karolherbst: interesting
01:33 karolherbst: the cpu and gpu are pretty much at the back
01:33 karolherbst: specing: intel gpu
01:33 specing: oh so it is a big laptop?
01:33 karolherbst: the nvidia one is off usually
01:33 karolherbst: yeah
01:33 specing: ok
01:33 specing: "laptop"
01:33 specing: meaning mobile desktop with a built-in UPS
01:33 karolherbst: specing: this one http://www.notebookcheck.net/Review-One-K56-3N2-Clevo-P157SM-Notebook.93275.0.html
01:33 karolherbst: different branding though
01:33 karolherbst: but clevo p157sm
01:34 karolherbst: and only 770M
01:35 specing: an optical drive?!!?
01:35 karolherbst: :D
01:35 specing: what is this, 2005?
01:35 karolherbst: mhh I removed mine
01:35 karolherbst: and now I have three HDDs
01:35 karolherbst: specing: for old games maybe?
01:36 karolherbst: usually a gaming laptop needs an optical driver otherwise it would upset customers
01:36 karolherbst: sometimes games have 30GB downloads
01:36 karolherbst: so....
01:36 specing: lol bios interface
01:36 karolherbst: I have a bios mod for shitty stuff
01:37 karolherbst: like everything
01:37 specing: I'm slowly migrating to libreboot
01:37 karolherbst: yeah well, I don't think it will work on my laptop anytime soon
01:37 specing: *never*
01:38 karolherbst: the keyboard is awesome anyway
01:38 karolherbst: fancy LED stuff :D
01:38 specing: LOL
01:38 specing: is there a nipple?
01:38 karolherbst: where?
01:38 specing: in the keyboard, doh
01:38 karolherbst: nope
01:38 specing: so its trash
01:38 karolherbst: why?
01:38 specing: nipples are epic
01:38 specing: try it
01:38 karolherbst: mhh somehow I didn't like them
01:38 specing: now I only use a mouse for games
01:39 specing: a real mouse*
01:39 karolherbst: yeah well, I have a mouse too, so
01:39 specing: because the nipple is so fantastic
01:39 specing: plus it saves a lot of space on my desk
01:40 specing: no thinklight?
01:41 specing: oh man, the touchpad looks so ugly
01:42 karolherbst: yeah, sadly it isn't the best, too
01:42 karolherbst: its one of the poor parts if it
01:44 specing: looks to me like the GPU is the only good part of it :P
01:46 karolherbst: the CPU is also good
01:46 karolherbst: and the display is quite okay
01:58 specing: karolherbst: cpu is backdoored, display is TN
01:58 karolherbst: oh well, there is not much choice for the cpu, is there?
02:01 karolherbst: and what is especially bad about TN displays?
02:01 karolherbst: TN seems to have the lowest latency
02:03 karolherbst: I always thought TN for gaming and IPS for professionals graphic design
02:03 karolherbst: and this display is one of the better TNs out there
02:05 karolherbst: mupuf_: this is the GPU with the fan: http://www.notebookcheck.net/fileadmin/Notebooks/One/K56-3N2/innen2.jpg
02:06 karolherbst: seems like the connector is on the mother board indeed
02:07 karolherbst: ohh right I need to get my subwoofer to work somehow...
02:07 karolherbst: never cared about that really
02:26 specing: karolherbst: TN is for cheap, IPS is for less cheap
02:27 specing: and IPS panels are cheap too
02:27 specing: just that laptop manufacturers put a high margin on them because its "premium" or something
02:30 karolherbst: yeah well, but doesn't IPS have latency problems?
02:30 karolherbst: I know they are better for color quality and stuff
02:30 karolherbst: but if you don't want to buy an expensie IPS you get high latencies afaik
02:31 specing: not sure if you would call that a problem... it is ~3ms higher or something
02:34 specing: and the panels are not expensive
02:34 specing: the chromebook pixel 3:2 one was $70
02:34 specing: 2560x1700
02:35 karolherbst: I see
02:36 specing: 13"
02:38 specing: afaik medion sells fullHD IPS laptop with IPS for 500-600 eur
02:38 specing: with SSD*
02:38 karolherbst: IPS != IPS, it depends a lot on how good they actually are
02:39 specing: well...I got the impression that even the worst IPS is better than the best TN
02:39 specing: eh I can't into words
02:40 karolherbst: mhhh
02:40 karolherbst: really
02:40 specing: or I can
02:40 specing: hmm
02:40 karolherbst: I highly doubt that
02:40 karolherbst: mine TN is pretty good actually
02:40 karolherbst: no color distortion from the side
02:40 karolherbst: and only a bit from up and below
02:41 karolherbst: black on white font is a problem, but everything else is fine
02:42 specing: I have two T400 thinkpads and the display is washed out more than my dishes
02:48 karolherbst: wow, then you are unlucky
02:49 karolherbst: but somehow I get the feeling, that lenovo laptops are worse than the name tells you
02:49 karolherbst: I never heard about a "good" lenovo laptop
02:55 specing: the keyboard is great
02:55 specing: and the thinklight, and the dual battery hotswap system (I get 11h on a charge, 16h if I hotswap with another battery)
02:56 karolherbst: mupuf_: lowest voltage used by blob is 837500
02:56 specing: and the ultrabay concept and the docking and trackpoint
02:56 karolherbst: I am not saying they are bad for business and work usage, but for performance intense stuff I never heard anything good
02:56 specing: karolherbst: We have a project to build own own laptops: openlunchbox.com #openlunchbox
02:57 karolherbst: please less JS "effects" on the side
02:57 specing: what
02:58 specing: the wordpress theme? Lol
02:58 karolherbst: the smooth scrolling doesn't look good
02:58 karolherbst: and it stutters
02:59 karolherbst: with browser smooth scrolling it looks fine though
03:00 karolherbst: I guess there is an smooth scrolling emulation somehow
03:00 karolherbst: mhh
03:00 karolherbst: which actually is also enabled, when the browser has it
03:00 karolherbst: this is strange
03:01 specing: *shrug*
03:02 specing: I don't think anyone is a webdev or has time for it
03:02 specing: much better stuff to do
03:02 karolherbst: yeah, but then please disable it ;)
03:02 karolherbst: it feels odd
03:02 karolherbst: at least here
03:04 karolherbst: specing: "Laptops over 13″ wide are too large for mobile use while travelling on trains, planes and buses." actually no, 15" is nice, too
03:04 karolherbst: but I also say I can't work on anything smaller then that
03:04 karolherbst: even 13" is too small for me
03:05 karolherbst: but I like the idea of an open ECU
03:05 karolherbst: usually ECU do important stuff, but if you can't change the settings of it, it does maybe stupid does
03:05 karolherbst: *stuff
03:17 karolherbst: mupuf_: 837500 is the lowest max_voltage
03:17 karolherbst: in the map table
03:18 karolherbst: ohh, okay, that makes sense then
03:20 karolherbst: I need to set temp under 0 :/
03:21 specing: karolherbst: my 14" T400 feels just perfect
03:21 karolherbst: yeah maybe 14" is also good
03:21 specing: well, both of my T400 :)
03:21 specing: if it had an open ECU and an IPS it would be great
03:24 karolherbst: *EC
03:24 karolherbst: ECU is for cards and stuff
03:24 karolherbst: *cars
03:27 karolherbst: ohh mhh, I should be more carefull when toying with nvaforcetemp
03:27 karolherbst: reached 84°C ...
03:27 specing: karolherbst: yes, EC
03:28 specing: karolherbst: there is also an open ECU project @ openlunchbox
03:28 specing: (btw)
03:28 karolherbst: mupuf_: I get the feeling, that max_voltage is used at lowest temp
03:28 nchauvet: sorry, I just messed my git send-email command to nouveau ml, (I haven't validated my subscribtion to the list at that time), if there is a list moderator to allow my post, if not, I will try to resend using the same Message-ID later today
03:29 karolherbst: and depending on temp, it lowers the voltage
03:29 karolherbst: and the lowest cstate entry is garbage
03:29 nchauvet: this is about http://permalink.gmane.org/gmane.linux.ports.tegra/23428 and http://permalink.gmane.org/gmane.linux.ports.tegra/23429
03:29 karolherbst: voltage map ID[12].voltage_max is the lowest voltage used by the blob
03:30 karolherbst: it never goes lower
03:58 fling: Will 4.2 fix these errors? -> http://dpaste.com/1RD788A
03:58 fling: Display hangs when I open a huge picture with xombrero on 3.18
03:59 karolherbst: fling: you could try if newest kernel/mesa/ddx/xorg-server fixes it
04:00 karolherbst: if not, then it may be able to track it down
04:00 karolherbst: but usually with old software you won't get much support here
04:01 fling: ok, I will update userspace stack prior upgrading the kernel.
04:03 karolherbst: mupuf_: slowly I see some kind of pattern between freq and voltage used
04:09 karolherbst: mupuf_: my data so far: https://gist.github.com/karolherbst/397ce9495d9986336211
04:11 karolherbst: its hard to collect data per temp for a specific freq though :(
04:14 specing: < karolherbst> usually EC do important stuff, but if you can't change the settings of it, it does maybe stupid does
04:14 specing: if it ends up like nouveau you won't be able to change anything anyway :)
04:14 karolherbst: ?
04:15 specing: I miss the clocking settings of the nvidia blob :(
04:15 karolherbst: coolbits ?
04:15 specing: yeah
04:15 karolherbst: its there?
04:15 specing: yeah
04:15 karolherbst: ohh
04:15 karolherbst: you mean on nouveau
04:15 karolherbst: well
04:15 specing: yes
04:15 karolherbst: this is a difficult part
04:15 karolherbst: first: how to do it right
04:15 specing: can't change anything without recompiling/rebooting with idk what
04:16 karolherbst: second: how to do it without damaging cards :p
04:16 specing: you can damage cards? 0.o
04:16 karolherbst: mhh depends
04:16 karolherbst: heat is a problem
04:16 karolherbst: currently nouveau won't downclock on high temp
04:16 specing: I damaged my G84M by gaming too hard on it
04:16 specing: went to like 105'C
04:16 karolherbst: see
04:16 karolherbst: ugh
04:16 karolherbst: with nouveau?
04:16 specing: for couple of hours
04:16 specing: no silly, nvidia blob
04:17 karolherbst: well
04:17 karolherbst: then its nvidia fault
04:17 specing: actually its european commission/parliament's RoHS fault
04:17 specing: goddamn unleaded solder!!!!
04:17 karolherbst: why?
04:18 karolherbst: lead-free has higher melding points
04:19 specing: I think it is the thing that holds the die to the package that is the problem or something
04:19 karolherbst: mhh
04:19 karolherbst: but lead is something you usually don't want to use
04:20 karolherbst: for various reasons
04:20 specing: karolherbst: it is also less sticky, probably suffers more from metal fatique
04:20 specing: reasons being?
04:20 karolherbst: toxic
04:21 specing: walking in the city for 1 min is much more toxic than that
04:21 specing: unless you inhale lead, which ... well idk
04:21 karolherbst: nope, skin contact is also kind of bad already
04:22 karolherbst: but its rare that this happens
04:22 specing: when do you contact lead?
04:22 specing: yeah
04:22 karolherbst: the main problem is garbage though
04:22 specing: actually it should happen at all
04:23 specing: garbage must be seperated anyway
04:23 specing: and then taken to a recycling plant
04:23 karolherbst: ,...
04:23 specing: so no problem there either
04:23 karolherbst: yeah extract lead from electroics
04:23 karolherbst: fun times
04:24 karolherbst: nobody does that, because its too expensive
04:24 specing: electronics are a special waste and must be disposed of properly
04:24 karolherbst: yeah, and where does electroic garbage lands in the end?
04:24 specing: well, that is not lead's problem, is it?
04:24 karolherbst: garbage without lead is not as bad as garbage with lead actually
04:25 karolherbst: there are of course other problems
04:25 specing: garbage with lead takes more time to become garbage!
04:25 karolherbst: but lead is something you can replace by other things
04:25 specing: but not well
04:25 specing: I hate RoHS solder
04:25 specing: there is no replacement for lead
04:25 specing: nothing comes even close
04:26 karolherbst: tin solder is fine, too
04:26 karolherbst: never had problems with it
04:27 Hoolootwo: tin whiskers are a problem and it's a pain to rework
04:27 karolherbst: ohh right
04:28 karolherbst: but working with lead yourself is kind of not good :/
04:31 specing: karolherbst: you can wear a mask. Something you cannot do in the city without getting cops called on you
04:31 karolherbst: :D
04:32 specing: there was a contaminant measurement station in the centre of the nearby city
04:33 specing: they quietly moved it after values were above maximum allowed
04:33 karolherbst: :D
04:36 specing: can't even imagine how deadly it is living in beijing and other cities where they have daily odd/even license plate bans
04:40 karolherbst: mupuf_: https://gist.github.com/karolherbst/397ce9495d9986336211#file-gistfile1-txt-L11-L28
04:41 karolherbst: specing: yeah
04:41 karolherbst: sometimes you really see how dirty the air is :/
04:42 karolherbst: mupuf_: so I think I somehow know how the stuff works no, the blob uses usually ranges with 5 voltages
04:42 karolherbst: usually stepped like the upper part sugested with 705MHz
04:43 karolherbst: and then there are steps by around 14MHz which increase the pwm value by 1
04:43 karolherbst: 38 is pwm of 07 pstate
04:43 karolherbst: 40 is lowest value at 0a and 0f
04:51 karolherbst: mupuf_: max voltege seems to be volt_table[freq.VID-4].voltage_max
04:51 karolherbst: *cstate
04:53 karolherbst: mhh okay last assumption isn't quite right
04:58 karolherbst: okay, the freqs are the cstates actually
04:59 karolherbst: upper bound is 64 on 0f
05:05 karolherbst: now the fun part starts
05:05 karolherbst: changing the max clock drastically lowers the voltage
05:30 karolherbst: mupuf_: got it
05:30 karolherbst: the voltage map table is right
05:31 karolherbst: got the entire voltage range for a clock covered
05:31 karolherbst: will try some more
05:33 karolherbst: mhh, for lower clocks its kind of wrong sadly :/
05:36 karolherbst: seems like I was wrong :/
05:41 karolherbst: mhh so setting the max clock in nvidia-settins very high actually decreases power consumption...
05:41 karolherbst: wtf
05:41 karolherbst: and lowering it, increases power consumption
05:59 karolherbst: overclocking from 862MHz to 992MHz only increases temp by 3°C with a gpu core only benchmark
06:06 specing: lol
06:07 karolherbst: specing: voltage stays the same
06:08 RSpliet: have you tried turning it off and on again?
06:08 RSpliet: :-P
06:08 RSpliet: I'd suspect a problem in the measurement set-up
06:09 karolherbst: mhh
06:09 karolherbst: I think the blob uses a stupid algo to overclock
06:10 RSpliet: their algorithms are hardly ever stupid
06:10 karolherbst: this one is
06:10 RSpliet: they are often derived from hardware requirements
06:10 karolherbst: lower clocks get different voltage according to max clock set
06:10 RSpliet: and obviously, I don't see those
06:11 karolherbst: so if I change the max clock, the voltage changes for a specific clock
06:11 RSpliet: well, that makes sense? it assumes your max clock corresponds with the max voltage you have
06:11 karolherbst: well
06:11 karolherbst: the highest clock gets the same voltage
06:11 karolherbst: if 700 is the highest, it has the same voltage as 1000 if this is the highest clock
06:12 karolherbst: I don't see why this should be a "good" way to do it
06:13 RSpliet: ok, sure... voltage requirement isn't necessarily linear with clock speed
06:13 karolherbst: driving lower clocks at higher voltage as you are required too is stupid
06:13 karolherbst: RSpliet: I think you don't understand
06:13 karolherbst: if the gpu runs at 700MHz
06:13 karolherbst: the used voltage changes with the max clock
06:13 karolherbst: so if I lower the max clock used
06:13 karolherbst: then the voltage at 700 increases
06:13 karolherbst: if I raise the max clock, voltage at 700 decreases
06:14 RSpliet: yes, that does make sense if the mapping is "max volt corresponds with max speed", you spread out your voltage range over a larger range of clocks
06:15 RSpliet: even if they keep the voltage at 700 equal
06:15 karolherbst: but why should you do that?
06:15 karolherbst: this makes underclocking somehow useless
06:15 RSpliet: GPUs aren't designed for underclocking
06:15 karolherbst: and overclocking the only sane thing to do
06:15 RSpliet: they're designed to run within the parameters layed out in the VBIOS
06:15 karolherbst: there is no reason to drive the gpu at non maxed out clocks then
06:16 RSpliet: anything outside it (including CoolBits) is your own responsibility
06:16 karolherbst: yeah sure, but the algo is somehow bad nethertheless
06:16 RSpliet: no
06:16 RSpliet: I think you're missing the point of "optimise for the common case"
06:16 RSpliet: where optimising is a combination of power consumption, stability and reliability
06:17 karolherbst: yeah okay, I don't say that the blob does something bad at stock clock, just changing the max clock may lead too strange "behaviour"
06:18 karolherbst: so now, after analyzing a bit what the blob does for a specific clock, there are some bounds to the voltage used + a strong dependency to the voltage of the max clock
06:25 karolherbst: mhh this sounds bad "[76647.281159] pci 0000:01:00.0: Invalid ROM contents"
06:35 karolherbst: wow, I just found a shader with 222 BBs ...
06:35 karolherbst: or maybe the id is just corrupted a bit
06:36 karolherbst: even the exit instructions isn't the last in the entire function
06:47 karolherbst: y
06:47 karolherbst: sry
06:49 karolherbst: imirkin_: how can I extrace the generated binary out of the mmt again?
07:01 imirkin: karolherbst: demmt
07:01 karolherbst: imirkin: and then just search for the binary?
07:03 imirkin: search for START_ID
07:03 imirkin: repeat until you get the shader youw ant
07:05 karolherbst: k
07:42 karolherbst: imirkin: is a 400MB mmt file bad?
07:43 imirkin: not sure what you mean
07:43 imirkin: the quality of an mmt file isn't dependent on its size
07:43 karolherbst: I meant more how much time does it take to process this file
07:43 imirkin: fast.
07:43 karolherbst: mhhh
07:43 karolherbst: then something is wrong
07:44 imirkin: perhaps you're doing something unexpected?
07:44 karolherbst: "--3827-- WARNING: unhandled syscall: 318" is this bad?
07:44 imirkin: mmmmmm
07:44 imirkin: it's not _good_
07:45 karolherbst: mhh
07:45 karolherbst: using newest blob
07:45 imirkin: but it could just be a new linux syscall
07:45 karolherbst: ohh okay
07:45 imirkin: yeah. getrandom. who cares.
07:45 karolherbst: okay
07:46 karolherbst: okay, still doesn't get the trace through demmt, strange... ohh wait
07:46 karolherbst: because I am stupid that is
07:46 karolherbst: forgot -l
07:46 karolherbst: ...
07:48 karolherbst: "??? [unknown: ffe007f7] [unknown instruction]"
07:48 karolherbst: ohh wow, a lot of unknown stuff
07:48 karolherbst: https://gist.github.com/karolherbst/e08289cb63003b453fe1
07:49 karolherbst: maybe I need to update first
07:50 karolherbst: mhh nothing new except glibc 2.22 support
08:04 karolherbst: imirkin: end of a program, do you know what this is about? https://gist.github.com/karolherbst/867fba2c1262573acbbb
08:05 imirkin: yeah don't worry about that
08:05 imirkin: there's no real way to tell where the program ends
08:05 imirkin: demmt does it based on upload ranges
08:05 imirkin: however if 2 upload ranges overlap or are adjacent, it merges them
08:06 karolherbst: ohh k
08:08 karolherbst: and this one is garbage? https://gist.github.com/karolherbst/e08289cb63003b453fe1
08:09 imirkin: that's... quite odd
08:10 imirkin: since it's at the beginning of the shader
08:10 imirkin: sometimes it might upload BS shaders to start with perhaps/
08:10 imirkin: should definitely not be the case by render time
08:11 karolherbst: mhh
08:11 karolherbst: this occurs more then just once
08:12 karolherbst: but directly after another programm
08:12 karolherbst: not that direclty
08:12 karolherbst: but pretty close
08:16 karolherbst: mhh type is TCP
08:17 imirkin: ah, probably the empty TCP then
08:17 imirkin: surprising it doesn't start with an exit
08:18 imirkin: perhaps that got overwritten
08:18 karolherbst: update the gist: https://gist.github.com/karolherbst/e08289cb63003b453fe1
08:18 imirkin: it's never executed, just uses its shader header
08:18 karolherbst: ohh k
08:18 imirkin: PB: 0x00000020 GK104_3D.SP[0x2].SELECT = { PROGRAM = TCP }
08:18 imirkin: yeah, i think if it were enabled, there'd be another bit set
08:19 karolherbst: okay
08:19 karolherbst: I thought it would be nice to check the pixmark_piano benchmark for stuff nouveau does bad, but this 4000 instruction program ...
08:20 karolherbst: I choose this, because it only uses the gpu core, no memory load
08:21 karolherbst: mhhh, I have strange joinats: 20001c07 600001b3 joinat 0x7ae0 [unknown: 00001c00 00000000]
08:23 karolherbst: funny, I found something the blob could optimize ...
08:24 karolherbst: maybe there is a reason
08:25 imirkin: that 0x1c is probably a predicate... joinat's can be predicated? that's crazy...
08:25 imirkin: or maybe it's just setting those bits because it can
08:25 karolherbst: https://gist.github.com/karolherbst/ae6ee04452f80a1f0ca1 can't the second mov stored the negated value so that the "neg" can be left out in the fma operation?
08:25 imirkin: 0x1c would correspond to p7, which == "always true"
08:25 karolherbst: mhh
08:26 imirkin: mmmaybe
08:26 imirkin: but $r1 might get used again
08:26 imirkin: oh, but it doesn't
08:26 karolherbst: yeah
08:26 karolherbst: sin overwrites it, doesn't it?
08:26 imirkin: but the neg doesn't matter
08:26 imirkin: it wouldn't be any fewer instructions without it
08:27 karolherbst: and the value can't be moved inside the fma?
08:27 imirkin: no, only the second arg can be immediate
08:27 karolherbst: k
08:28 karolherbst: maybe the fma with the neg is faster? :D
08:28 imirkin: unlikely.
08:29 karolherbst: but this program is crazy
08:29 karolherbst: over 4000 instruction and no single load/store
08:29 imirkin: frag shader?
08:29 karolherbst: FP is frag?
08:30 imirkin: frag shader outputs are picked up from registers directly
08:30 imirkin: FP = fragment program, so yes
08:30 karolherbst: I thought load/stores come into play if you need to spill?
08:30 imirkin: yes
08:30 imirkin: i thought you also meant generic stuff like "export"
08:31 karolherbst: no
08:31 karolherbst: spilling
08:31 karolherbst: nouveau doesn't spill, too
08:31 karolherbst: mhh
08:31 karolherbst: still big perf difference
08:31 imirkin: you normally don't spill
08:31 karolherbst: okay
08:31 imirkin: yeah, coz i bet they schedule intelligently
08:32 karolherbst: sadly PROG_DEBUG doesn't tell me the sched instructions ... :D
08:46 karolherbst: imirkin: what does registers / thread mean? I read that kepler has 255 registeres per thread, but somehow I don't quite get how this should work
08:46 karolherbst: ohh wait, its only about GK110, but still
08:47 mwk: karolherbst: that's a max, you can actually select that
08:47 mwk: the thing is, there's a total of 64k registers per processor or so
08:47 mwk: so the more regs per thread you enable, the fewer threads you can have running at once
08:47 mwk: so you'd better use as few as possible
08:47 karolherbst: okay
08:48 karolherbst: but using more can speed up stuff because of less spilling?
08:48 karolherbst: and less load/store
08:48 mwk: quite possibly, yes
08:48 karolherbst: currently reading a paper about GK110
08:53 karolherbst: currently I try to understand what is a good way to schedule stuff, but its quite hard to see whats "better"
08:53 imirkin_: alternatively a little bit of spilling can be faster than a lot more register usage
08:53 imirkin_: it's a bitch :)
08:54 karolherbst: :D
08:54 karolherbst: but when does that happen
08:55 karolherbst: how can parallelism increased for gpus? reducing live ranges?
08:55 imirkin_: reducing total register usage
08:55 imirkin_: i.e. reducing the max # of live values
08:56 karolherbst: doesn't this include reduced live ranges?
08:56 imirkin_: a bit, sure
08:56 karolherbst: what I was thinking about: I could hold a list of "active" registeres and try to schedule all instructions first, which depend on regs with only one use?
08:57 imirkin_: mwk: can you see anything bad coming from this? http://patchwork.freedesktop.org/patch/58760/
08:58 mwk: uh, no idea
08:59 karolherbst: what is the best way to set how many regs are used in a binary? look for highest reg number?
08:59 imirkin_: i suspect that outside of our manually-constructed blit shaders, this situation is very rare
08:59 imirkin_: karolherbst: in codegen, there's a foo->maxGPR i think
09:00 imirkin_: and we have to tell the gpu what that is so that it can do its allocations properly
09:00 karolherbst: when I see a "$r63" in the output, does that mean the program uses 64 regs?
09:00 imirkin_: no
09:00 imirkin_: $r63 == 0
09:00 imirkin_: always
09:00 imirkin_: you can't write a value to it
09:00 karolherbst: ohh okay
09:00 imirkin_: also any registers above the max gpr are == 0, i think
09:00 karolherbst: mhh okay
09:00 karolherbst: then I should check max gpr
09:00 imirkin_: i.e. if set you max gpr to 9, the $r10 == 0
09:00 imirkin_: max gpr is determined after RA :)
09:00 karolherbst: okay
09:01 imirkin_: but you can fidn out the architectural max from the target
09:01 imirkin_: targ->getFileSize(FILE_GPR)
09:02 karolherbst: and from the prog?
09:02 karolherbst: ohh prog->getTarget()
09:03 glennk: i think a program using r0, r2 vs r0, r1 would both count as 2 by the hardware?
09:03 imirkin_: i doubt it has register rewriting
09:04 glennk: radeons do
09:04 imirkin_: well, this hasn't been put to the test
09:04 imirkin_: i assume that using r0, r2 == 3 registers
09:04 imirkin_: (well, 4 probably... iirc it's in chunks of a certain size)
09:04 karolherbst: okay, that program uses 63 regs
09:04 imirkin_: karolherbst: there's a $r62 in there?
09:05 glennk: well, the warp count is in chunks
09:05 karolherbst: mhh, smaller shaders also have 63? mhh
09:05 karolherbst: even for a program with only "r10" as max
09:05 imirkin_: looks like 4 gprs is minumum
09:05 imirkin_: but after that it can be whatever
09:06 imirkin_: karolherbst: $r63 is used as a 0
09:06 imirkin_: so that you don't have to load immediates
09:06 imirkin_: registers can go anywhere
09:06 imirkin_: ($r255 on gk110+)
09:07 imirkin_: and don't even dare trying to look at nv50
09:07 karolherbst: imirkin_: prog->getTarget()->getFileSize(nv50_ir::FILE_GPR) is always 63?
09:07 imirkin_: for kepler, sure
09:07 imirkin_: for kepler1
09:07 imirkin_: and fermi
09:07 imirkin_: for kepler2/maxwell it's 255
09:07 imirkin_: and for nv50 iirc
09:07 karolherbst: mhh I thought I could check what is the highest amount of regs used for a program
09:07 imirkin_: coz it's in half-registers
09:07 imirkin_: there are really 128 of them
09:08 imirkin_: but you get to use short instruction if you stick to 64
09:08 imirkin_: karolherbst: before RA?
09:08 imirkin_: the whole point of RA is to determine that ;)
09:08 karolherbst: ohhh mhhh
09:08 night199uk: hey
09:08 karolherbst: how can I optimize to less reg usage then?
09:08 imirkin_: karolherbst: fewer live values at once
09:08 imirkin_: karolherbst: perhaps i mentioned that you should be tracking the # of live values
09:09 imirkin_: and try to make sure you have no more than N
09:09 karolherbst: and live values are files with non scheduled usage but schedule sources?
09:09 imirkin_: and then pray that it translates into proper register usage
09:09 imirkin_: so e.g. if your target is 16
09:09 imirkin_: then you should schedule until you get up to 16 values
09:09 imirkin_: after which you should give preference to instructions that decrease the number of live values
09:09 imirkin_: or at least don't increase them
09:10 karolherbst: mhh
09:10 imirkin_: perhaps those comments didn't make sense initially
09:10 imirkin_: but i hope they make sense now
09:10 karolherbst: I wanted to try it out to have min live values
09:10 imirkin_: but the first step in all this is merely *computing* the number of live values
09:10 imirkin_: which is a bit tricky. i don't recommend worrying about cross-BB things
09:11 imirkin_: just do it within a BB
09:11 karolherbst: what is a "live value"
09:11 karolherbst: ?
09:11 imirkin_: value that has to be kept in memory in order to perform a later computation
09:11 karolherbst: and memory also mean register?
09:11 imirkin_: so like if you want to do a = b + c + d + e + f
09:11 imirkin_: you could do it as
09:11 imirkin_: x = b + c; y = d + e; x = x + f; a = y + x;
09:12 imirkin_: then you have to have 2 live values
09:12 imirkin_: in order to compute a
09:12 imirkin_: (assuming you can't just clobber b/c/d/e/f)
09:12 mupuf_: seems like you two are having fun :)
09:12 karolherbst: a lot
09:12 karolherbst: mupuf_: I already had fun without him
09:12 karolherbst: a lot
09:12 night199uk: heh
09:12 karolherbst: stupid blob
09:12 night199uk: hey, imirkin_?
09:13 imirkin_: karolherbst: https://en.wikipedia.org/wiki/Live_variable_analysis
09:13 karolherbst: mupuf_: I found the second way to decrease/increase voltage on blob
09:13 night199uk: you have 5 to give me ideas on debugging mode setting?
09:13 imirkin_: night199uk: i have 5. dunno if i'll be able to help you with your issues.
09:13 night199uk: :-)
09:13 imirkin_: night199uk: modesetting is a bit outside my area of expertise... seems to work ;)
09:13 karolherbst: mupuf_: increasing/decreasing max freq changes voltage used at a specific freq
09:13 night199uk: those code snippets you sent me the other day were hella useful
09:13 night199uk: nv50_display.c
09:14 karolherbst: and with that I get easily outside the ranges in the voltage map table
09:14 night199uk: i didn’t find all that code before because of the way the regs are calculated now rather than direct
09:14 karolherbst: upper and lower end
09:14 night199uk: so i was able to compare that pdisp setting code to what’s in the UEFI
09:14 mupuf_: that's indeed one way
09:14 night199uk: but my display is still stuck on ‘mode not supported'
09:14 imirkin_: perhaps there are multiple bugs ;)
09:14 karolherbst: mupuf_: increasing max freq can actually reduce power consumption if you stay under the prior max freq :D
09:15 imirkin_: night199uk: are you setting up the SOR properly?
09:15 night199uk: yet, i made my pdisp very close to what’s in nouveau
09:15 mupuf_: sure, because this fucking table makes no fucking sense
09:15 night199uk: good question; that’s where I wanted to get some ideas
09:15 karolherbst: mupuf_: I think rather the blob makes no sense...
09:15 karolherbst: for clocks around 700: PWM value between 40 and 62
09:15 night199uk: if there are some key regs (like SOR you just mentioned) to look for
09:15 night199uk: all the mode timing related stuff seemed to be in this PDISP section
09:16 mupuf_: well, do you track all the clocks or not?
09:16 night199uk: what would be SOR regs to check if any, do you think?
09:16 imirkin_: what gpu are you on again? gf108?
09:16 night199uk: yup, gf108
09:16 karolherbst: mupuf_: yes
09:16 karolherbst: mupuf_: one specific of them had a range of 40-62 depending on temp and max clock
09:16 night199uk: i have the mmiotrace you sent me for gf108
09:16 mupuf_: karolherbst: max clock for the voltage map entry?
09:16 karolherbst: mupuf_: this isn't complete, but this gives an idea how the blob chooses pwm value https://gist.github.com/karolherbst/397ce9495d9986336211#file-gistfile1-txt-L11-L28
09:17 imirkin_: night199uk: and dvi, yes?
09:17 karolherbst: mupuf_: no, core max clock
09:17 night199uk: but that is freaking HUGE… if i have to i’ll sit down and go through it piece by piece
09:17 night199uk: yup dvi
09:17 mupuf_: karolherbst: in the boost table?
09:17 imirkin_: http://cgit.freedesktop.org/~darktama/nouveau/tree/drm/nouveau/nvkm/engine/disp/sornv50.c
09:17 karolherbst: mupuf_: nope, coolbits overclocking
09:17 imirkin_: http://cgit.freedesktop.org/~darktama/nouveau/tree/drm/nouveau/nvkm/engine/disp/corenv50.c
09:17 karolherbst: this gist is only with temp changes
09:18 karolherbst: first range is 07 pstate: pwm alwys 38
09:18 karolherbst: pstate 0a-0f has a min pwm of 40
09:18 mupuf_: karolherbst: dude, now YOU make no fucking sense :D
09:18 karolherbst: :D
09:18 mupuf_: ok, I am not sure you got why voltage needs to be increased
09:18 imirkin_: unless it's hooked up via pior? i never really understood the difference
09:18 mupuf_: let's go back to some theory
09:18 imirkin_: p = parallel, s = serial? dunno.
09:18 karolherbst: mupuf_: I know why
09:18 karolherbst: that's not the point
09:18 karolherbst: okay
09:18 karolherbst: I try it again
09:19 imirkin_: mupuf_: do _you_ get why voltage has to be increased? i'm not sure i do.
09:19 karolherbst: for example: the blob uses for 705MHz pwm values from 46-50 depending on the temperature
09:19 karolherbst: but
09:19 mupuf_: imirkin_: sure, I do
09:19 karolherbst: if I change the max clock to 997, it uses 40-44 for 705MHz
09:19 imirkin_: care to explain for those of us on the short bus?
09:19 mupuf_: it has to do with how long it takes to charge the gate
09:19 mupuf_: the gate of the transistor
09:19 night199uk: imirkin_: thanks, let me double check these as well
09:19 karolherbst: if I change the max clock to 997, it uses 40-44 for 705 MHz
09:19 mupuf_: which is kind of a capacitor
09:19 imirkin_: for the parasitic capacitance within?
09:19 karolherbst: ....
09:20 night199uk: http://pastebin.com/jD5x9C1a
09:20 mupuf_: well, it is parasitic, but also quite useful
09:20 karolherbst: if I change the max clock to 730MHz, then the blob uses 59-63 for 705MHz
09:20 night199uk: imirkin: that’s a dump if my pdisp sets from the custom UEFI drive and some similar ones i pulled from some nouveau traces on the web
09:20 RSpliet: mupuf_: does nvidia have a patent for this kind of micro-optimisation?
09:20 mupuf_: if you remember your equations of the charging time of a capacitor, you see that the voltage matter if you plan to reach a threshold fast
09:21 mupuf_: RSpliet: I am quite convinced that it is just bad tooling on our side
09:21 imirkin_: night199uk: the nouveau print is a dump of something i don't exactly know what
09:21 night199uk: interesting that the uefi driver seems to program a lot more into PDISP than nouveau
09:21 imirkin_: night199uk: not sure how to read the uefi bit
09:21 mupuf_: one does not fuck around with the voltage needlessly and aimlessly
09:21 karolherbst: mupuf_: yeah, I was just wondering, why the clock changes the voltage for a specific clock depending on the value of the highest clock used
09:21 mupuf_: it is a sure way of having a brownout
09:22 night199uk: imirkin: okay - the nouveau one was just pulled from a jira somewhere nouveau related, i was just trying to find a good reference of the PDISP sets to be honest and it seemed reasonable
09:22 RSpliet: mupuf_: that would be my first guess, but I'm calling prior art on it if NVIDIA now suddenly has an idea for further power reduction :-P
09:22 karolherbst: mupuf_: I didn't play around with the voltage, the blob did
09:22 imirkin_:needs to go back and look at how transistors work
09:22 mupuf_: karolherbst: sure, I was saying that I think nvidia would not be doing changes needlessly
09:22 imirkin_: mupuf_: does it matter if it's a FET transistor vs a NPN-type junction?
09:22 karolherbst: yeah, still I was suprised
09:22 mupuf_: and a change needs to be made ONLY when you change the frequency
09:22 karolherbst: yeah I know
09:23 karolherbst: that's why I found that wrong
09:23 mupuf_: imirkin_: there are nothing but FETs in a processor
09:23 imirkin_: mupuf_: ok... but does your comment apply to NPN as well?
09:23 mupuf_: NPN is switching current, fets are switching voltages
09:23 mupuf_: no, because there are no capacities
09:23 imirkin_: ok
09:23 imirkin_: iirc we worked mostly with TTL, which iirc is npn
09:23 mupuf_: the gate is insulated from the other poles in a FET
09:24 imirkin_: (or pnp, whatever)
09:24 mupuf_: yes
09:24 mupuf_: and this insulation is what creates the capacity
09:24 imirkin_: right
09:25 imirkin_: do they use a dieletric or something? or just a natural capacitance?
09:25 mupuf_: well, they use a dielectric, yes
09:25 imirkin_: k
09:25 mupuf_: with a high k constant
09:25 imirkin_: don't want to be the guy with the tweezers ;)
09:25 mupuf_: right :D
09:26 mupuf_: so, back in the logical days, we had a performance level requiring a specific voltage
09:27 mupuf_: setting a voltage higher than it would work, but would be ineficient because it would increase both the static and dynamic power consumption
09:27 mupuf_: and increase the temperature which would in turn increase the static power consumption :D
09:28 mupuf_: so, high-voltage is bad
09:28 mupuf_: and low voltage is bad because it means we need to reduce the frequency to guarantee the switching times
09:28 mupuf_: what we want is the minimum voltage + a safety margin
09:29 mupuf_: the safety margin is needed to take care of the case when we a big chunk of the GPU suddenly wakes up and starts draining power
09:29 mupuf_: the intensity the voltage controller needs to output is suddenly much higher
09:29 mupuf_: and that means the voltages is going down because of the increased losses through the joules effect
09:30 mupuf_: well, actually, it is not that
09:30 mupuf_: it is due to how DC-DC conversion is done
09:31 mupuf_: but anyway, if all the chip was to power itself up at the same time, the dI/dt would be so high that the voltage would drop under the threshold
09:31 mupuf_: so, that's why clock gating and power gating is done in a tree
09:31 mupuf_: and there are built-in timers that will wait for a few cycles before delivering power and then the clock
09:32 mupuf_: imirkin_: remember the JOEs registers? That's what they are
09:32 mupuf_: I renamed them all and added the names from nvidia
09:32 mupuf_: I also added a shit ton of them
09:32 mupuf_: after painfully checking their bitfields
09:32 mupuf_: there are a ton of them...
09:34 mupuf_: so, the more you reduce the safety margin, the slower you need to wake up
09:35 mupuf_: but if you are too slow, then you cannot take advantage of micro pauses
09:36 mupuf_: so, one would expect that it is a careful decision to compute those voltages
09:36 mupuf_: for all the different frequencies
09:37 mupuf_: but then, the voltage mapping table is says: set the voltage between .812 and 1.1V
09:37 mupuf_: WTF
09:40 karolherbst: mhh
09:40 karolherbst: I don't think the mapping table is that drastic
09:40 karolherbst: or is it for you?
09:40 mupuf_:do not get the value of putting the knowledge on how to compute those voltages in the driver when they could simply be in the bios
09:41 karolherbst: mh right
09:41 karolherbst: but I hope you understood what I meant what the driver does
09:41 karolherbst: even if that includes coolbits messing
09:42 karolherbst: but I somehow understood which values the blob uses for a specific freq at least
09:42 karolherbst: even if I couldn't calculate them myself because I don't know which freqs to use
09:42 mupuf_: well, what I was saying is that you can reduce the safety margin when the gpu is not as crazy
09:42 karolherbst: yeah okay
09:42 karolherbst: but let me show you a simply example
09:43 karolherbst: if you want to use the voltage values from 600MHz for 700MHz you increase the max clock by 100MHz
09:43 karolherbst: then the blob does that
09:43 karolherbst: I don't get why, but it does
09:43 mupuf_: exactly what I was saying :D
09:44 karolherbst: yeah
09:44 mupuf_: makes no sense
09:44 mupuf_: shit!!!!
09:44 mupuf_: have to go!!!
09:44 karolherbst: k
09:44 karolherbst: have fu
09:44 karolherbst: n
09:46 mupuf_: be back in 30 minutes
09:49 mrnuke_i_: Hi, I'm wondering if there are any recent workstation cards that still use free microcode
09:49 imirkin_: mrnuke_i_: anything kepler should be able to use open microcode
09:50 imirkin_: mrnuke_i_: and actually GM107 as well
09:50 imirkin_: GM200+ require signed blobs (but not encrypted)
09:51 mrnuke_i_: So Quadro K1200 should still be good then :D
09:52 imirkin_: sorry, i don't really know much about marketing names
09:52 mrnuke_i_: I was getting that from http://nouveau.freedesktop.org/wiki/CodeNames/
09:52 mrnuke_i_: IT's listed under GM107
09:53 imirkin_: ok. although maxwell support is worse than kepler
09:53 imirkin_: kepler kinda-sorta has reclocking, maxwell doesn't
09:53 imirkin_: also kepler's a lot more mature
09:55 Yoshimo: can maxwell make any medium progress without signed blobs?
09:55 karolherbst: Yoshimo: the idea of signed blobs are that you only can use signed blobs
09:57 Yoshimo: i don't know how much you can do without blob on such a card
09:57 RSpliet: very little
09:57 imirkin_: Yoshimo: well, the current stack should basically work on GM200+
09:57 mrnuke_i_: imirkin_: how bad can maxwell support get? It's kind of hard to find any card with three or more DP outs
09:57 imirkin_: Yoshimo: anyways, we can make progress if ben releases his tree with cpu-side ctxsw
09:58 imirkin_: mrnuke_i_: what are you trying to do?
09:58 Yoshimo: sorry imirkin what is ctxsw?
09:58 imirkin_: Yoshimo: context switching
09:58 imirkin_: Yoshimo: but either way even if he did, i know i def don't have the hardware
09:58 imirkin_: and don't know of anyone who'd be interested in doing the work either
09:58 imirkin_: so... meh
09:58 imirkin_: that's why i haven't pushed him on it
09:59 mrnuke_i_: imirkin_: I'm trying to build a server/workstation with zero blobs. Dual-socket, coreboot firmware. Three 4k monitors
09:59 imirkin_: mrnuke_i_: and you want this to work today?
09:59 imirkin_: mrnuke_i_: and what do you plan to do on these monitors?
10:00 mrnuke_i_: imirkin_: I plan to open lots of terminals to hack on things. And the ocassinal nexuiz/xonotic rage release
10:00 imirkin_: mrnuke_i_: ok... in that case you're probably ok with nouveau. afaik 4k support is pretty iffy
10:01 imirkin_: mst is presently unsupported
10:01 mrnuke_i_: imirkin_: 4k is iffy on every other driver I've seen. and MST didn't work on any intel GPU I tried
10:01 imirkin_: mrnuke_i_: hmm... MST worksforme on a haswell
10:02 imirkin_: mrnuke_i_: and afaik radeon has it working as well
10:02 mrnuke_i_: that's why I'm happy with separate DP outputs for every monitor
10:02 imirkin_: at least for the DP hub scenario
10:02 imirkin_: but 4k monitors often require MST by themselves as they're 2 panels
10:02 mrnuke_i_: imirkin_: So, I tried MST on a sandybridge, it could recognize the monitor, but it never output a working video signal
10:03 mrnuke_i_: imirkin_: I also tried it on a macbookpro, and that one timed out trying to detech the monitors
10:03 mrnuke_i_: imirkin_: though both worked well when I disabled MST on the monitor (and mine do show up as a single panel, luckily)
10:03 imirkin_: i don't think SNB supported MST
10:03 imirkin_: MST is DP1.2
10:03 imirkin_: SNB is probably DP1.1
10:03 mrnuke_i_: MST works on sandy with windows
10:04 imirkin_: hm ok
10:04 mrnuke_i_: or maybe it was ivybridge. The machine was a Lenovo T520
10:06 imirkin_: IVB def has DP1.2
10:06 imirkin_: you might ask the intel folks about non-working MST in linux. it _should_ work
10:06 imirkin_: but i don't think a single DP port has enough bw for multiple 4K monitors...
10:07 night199uk: you can’t get SST monitors?
10:07 mrnuke_i_: you could run them at 30Hz a piece. Not ideal, but for desktop use, it's quite fine
10:07 night199uk: i got the new asus pb279q which are SST DP
10:07 imirkin_: you can get SST monitors
10:07 night199uk: i also have 3 x 4k but not under linux
10:07 imirkin_: last i checked (admittedly a while ago), most were MST
10:07 night199uk: 3 x asu pb279q 4k sst
10:07 mrnuke_i_: night199uk: Dell P2715Q is SST, but you can also enable MST and daisychain them
10:07 night199uk: the newer gen are all sst
10:08 night199uk: mrnuke_i_: yup, i was trying to decide between those and these
10:08 night199uk: ended up getting these
10:08 imirkin_: mrnuke_i_: either way, if they're SST, i suspect they should work with nouveau
10:08 night199uk: the price is better now for sst 4k
10:08 imirkin_: MST is one of those "would be nice" features, but it requires very specific hardware
10:08 imirkin_: which almost no one has
10:09 imirkin_: namely the combination of an nvidia gpu that has a DP output, and a MST-capable setup
10:09 mrnuke_i_: MST is one of those standards created by drunk people out of touch with reality
10:09 night199uk: i read about a lot of problems with MST so avoided it
10:09 imirkin_: most people have one or the other, but not both
10:09 imirkin_: (actually there's a third important component: willingness to hack on it)
10:10 night199uk: and, i wouldn’t bother getting 3x4k again now :-/
10:10 night199uk: not yet
10:11 night199uk: wait til the prices fall more or the sw support for 4k gets better
10:11 imirkin_: mrnuke_i_: anyways, you may find that at default clocks, none of the gpu's are fast enough to reasonably support 3x 4k
10:12 imirkin_: so i'd still recommend a kepler, where you'd be able to move the clocks up at least a little
10:12 imirkin_: if not to the highest perf level
10:13 mrnuke_i_: so, maxwell can't get to higher clocks because it's not implemented yet, or because of some other technical isuse?
10:13 night199uk: imirkin_: hey - quick question on what you sent me earlier… these data lists?
10:13 imirkin_: well, it's not RE'd yet
10:13 night199uk: { 0x0808, 0x610a48 },
10:13 imirkin_: night199uk: no idea what they are
10:13 night199uk: i can’t see where those are used
10:13 night199uk: ahhh, thanks :-)
10:13 imirkin_: night199uk: also they're different for diff gpu's... look at some of the other files
10:14 night199uk: np, i’ll go hunting
10:14 night199uk: these look interesting for what i’m troubleshooting
10:14 imirkin_: mrnuke_i_: changing clocks on memory is sadly a very complex matter
10:14 night199uk: seems to be a PDISP register (left) and a MMIO register (right)
10:14 imirkin_: turns out that turning off vram, reconfiguring it (properly!), and then turning it back on, all without the gpu dying in the middle, is a tricky process
10:14 mrnuke_i_: imirkin_: just initializing memory is a pain in the arsenal. I did DDR3 init on a desktop board a few years back
10:15 imirkin_: yeah, so ddr3 is a breeze compared to GDDR5
10:15 mrnuke_i_: imirkin_: BTW, will nouveau be able to initialize the GPU if we don't run the PCI option ROM in BIOS?
10:15 imirkin_: mrnuke_i_: yeah, should be
10:15 mrnuke_i_: sweet!!!!
10:15 imirkin_: mrnuke_i_: worst case you have to tell it to force the init
10:15 night199uk: how are you running the PCI option ROM?
10:15 imirkin_: but... note that it will achieve this by (effectively) running the option rom
10:15 night199uk: oh, you’re not?
10:16 imirkin_: it won't execute the x86 instructions within
10:16 imirkin_: but it'll run through its tables
10:16 night199uk: coreboot i guess
10:16 night199uk: doesn’t execute option roms
10:16 mrnuke_i_: are those code tables, like atombios, or just descriptor tables?
10:16 imirkin_: both
10:16 imirkin_: descriptor tables that tell it how things are hooked up
10:16 imirkin_: and a bunch of code listings to write various values to various registers
10:17 imirkin_: as well as some things that are executed when e.g. modesetting
10:17 imirkin_: which are somehow board-specific
10:17 night199uk: there is a vm that runs the code stored in the bios
10:17 imirkin_: sssort of.
10:17 imirkin_: the instructions are pretty high level things
10:18 mrnuke_i_: that sounds worryingly similar to atombios
10:18 imirkin_: mrnuke_i_: i believe it's identical
10:18 mrnuke_i_:facepalms
10:18 night199uk: atombios?
10:18 imirkin_: (i mean, the specific details are a bit different obviously)
10:18 night199uk: oh, the amd alt
10:19 imirkin_: ati, but yes
10:19 night199uk: amd uefi vbios at least are allegedly all in EBC though
10:19 night199uk: i’ve not cracked one open yet
10:19 imirkin_: the nvidia one might have been around longer... NV5+ definitely have it, i think some NV4's do too
10:19 imirkin_: (that's Riva TNT/TNT2, for the record)
10:20 night199uk: hey talking about that
10:20 night199uk: i noticed there are a bunch of unimplemented instructions in nouveau in the nvidia rom instruction set
10:20 mjg59: night199uk: Microsoft won't sign EBC, so there are very few EBC devices
10:21 night199uk: guess they’re not used in the port scripts and nouveau doesn’t use the same uefi bios based scripts for mode setting
10:21 night199uk: i could probably provide impl’s for those at some point if they’re useful though
10:21 imirkin_: night199uk: mmmaybe. i recently added a couple of opcodes that appeared in the x86 code to execute some old vbios's
10:21 imirkin_: night199uk: but we haven't gotten a valid "unknown opcode" report in quite a while
10:21 imirkin_: there are a few where we fake it...
10:21 night199uk: yeah, i figured it was a ‘do it when we need it’ thing
10:22 imirkin_: i.e. ignore the isntruction because it does something annoying and seemingly unimportant
10:22 night199uk: mjg59: interesting
10:22 night199uk: mjg59: so amd is not EBC?
10:22 mjg59: Correct
10:22 night199uk: i have to check it up
10:22 mjg59: Option ROMs will almost always be x86
10:23 night199uk: i wonder how long that will hold true
10:23 night199uk: although i didn’t know microsoft wouldn’t sign
10:23 imirkin_: mrnuke_i_: you're basically SOL if you want an actual generically-working GPU. they're all a little customized, and that customization is expressed in terms of a funky sequence of register writes
10:23 night199uk: the number of option rom exploits now is getting huge
10:23 imirkin_: that set up things like memory parameters, etc
10:23 mjg59: Microsoft have no plans to support EBC
10:23 night199uk: it’s a massive attack vector
10:24 night199uk: although maybe ebc isn’t the right solution for that
10:24 mjg59: ebc buys you nothing from a security perspective
10:24 mjg59: You're programming a device that can do arbitrary DMA
10:24 night199uk: yer, i guess so
10:24 mjg59: In an environment with no memory protection
10:25 mjg59: If you can get arbitrary code in there, you win
10:26 mrnuke_i_: imirkin_: I've been SOL for years. But let's say someone maliciously modifies the code tables. Could the nouveau interpreter be fooled into doing funny things, like overwriting kernel memory?
10:26 imirkin_: mrnuke_i_: yes
10:27 imirkin_: mrnuke_i_: but it would have to be someone *exceedingly* clever
10:27 imirkin_: like mwk. or MAYBE mjg59 if he worked at it. that's about it.
10:27 night199uk: the code language is relatively restricted
10:27 mrnuke_i_: so there's no safeguard built into the interpreter. Not that you couldn't hide a "Do DMA" commang in the register writes
10:27 imirkin_: the various commands are of the form "write value X to register Y"
10:28 imirkin_: right, so it'd basically require you to set up a VM on the gpu
10:28 imirkin_: and then perform writes via that mapping table
10:28 imirkin_: to convince the GPU to do sysmem writes
10:28 night199uk: there are different interpreters… one in the uefi vbios itself, the driver has another, nouveau has another
10:28 mrnuke_i_: well that could be prevented by an IOMMU
10:28 night199uk: so those are each exposed i guess
10:28 imirkin_: mrnuke_i_: sure.
10:28 night199uk: and differently for each
10:28 imirkin_: mrnuke_i_: if you have an iommu, there's no way (outside of parser bugs) to write to kernel memeory
10:29 night199uk: but the instruction set is small and limited so only marginally exposed
10:29 imirkin_: however the instruction parsing is quite simple, so while i wouldn't RULE OUT any bugs, they're certainly unlikely
10:29 night199uk: agreed
10:29 imirkin_: it's never been fuzzed though. probably some divisions-by-0 in there somewhere
10:29 night199uk: at least in the vbios it always uses fixed offsets
10:30 mrnuke_i_: imirkin_: well, then the situation is a thousand times better than atombios. I was working on removing their interpreter a few months back, but after meeting huge resistance from radeon guys, my boss cancelled the project
10:30 imirkin_: mrnuke_i_: afaik atombios is the same thing...
10:30 imirkin_: just a sequence of instructions
10:30 imirkin_: not x86 code
10:30 imirkin_: (there's obviously x86 code in there too so that the option rom can execute)
10:31 imirkin_: mrnuke_i_: here's a quick sample from a vbios i happen to have laying around: http://hastebin.com/bivumawuya.1c
10:32 imirkin_: why does it have some BS like "NV_REG R[0x000000] &= 0xffffffff |= 0x00000000" you ask?
10:32 mrnuke_i_: imirkin_: those tables look so cute
10:32 imirkin_: "no idea" i answer :)
10:32 mrnuke_i_: probably the GPU expects you to read the register
10:32 night199uk: register 0, right?
10:33 night199uk: actually no
10:33 imirkin_: but with those commands you can pretty much get the gpu to do anything you want
10:33 imirkin_: so a malicious user would be able to achieve that.
10:33 mrnuke_i_: or, most likely, they have reference code with comments about what to modify for each register
10:33 imirkin_: well, register 0 is read-only
10:33 mrnuke_i_: but the vendor didn't need to touch those and never bothered to remove them
10:33 imirkin_: perhaps they want to post the write to PMC.INTR
10:33 imirkin_: (that's reg 0x200)
10:34 night199uk: yer
10:34 night199uk: i found plenty of oddities like this even in the driver
10:34 night199uk: reads with absolutely no purpose
10:35 mrnuke_i_: night199uk: back when I did DDR3 init, I had documentation from the vendor. Lots of "write that, then read the register". Of course, the value you got was irrelevant, but you had to do the read for thing to work
10:35 night199uk: for the DDR3 init, was it custom?
10:36 night199uk: the intel MRC is closely guarded, right?
10:36 night199uk: i don’t know how many open MRCs there are around
10:36 night199uk: you work on coreboot?
10:37 mrnuke_i_: yeah. at Intel
10:38 specing: what
10:38 mrnuke_i_: this thing is called FSP nowadays
10:38 night199uk: you work on coreboot @ intel?
10:38 mrnuke_i_: yeah
10:38 night199uk: FSP is free though, right?
10:38 night199uk: i thought FSP was only targetted at embedded, too
10:38 night199uk: baytrail and others
10:38 mrnuke_i_: I'm not sure it's either available for download or redistributable
10:38 night199uk: oh
10:38 night199uk: there was a link
10:39 night199uk: maybe its not FSP
10:39 mrnuke_i_: coreboot used to distribute MRC binary for sandy/ivi, but that was pretty specific to chromebooks
10:39 night199uk: here
10:39 night199uk: http://www.intel.com/content/www/us/en/intelligent-systems/intel-firmware-support-package/intel-fsp-overview.html
10:39 night199uk: with downloads
10:40 night199uk: although i didn’t check if they were secured
10:42 mrnuke_i_: there was chat about it on #coreboot a few years back about that, but there was a restrictive click-through involved. I'm not sure if that's still the case
10:42 night199uk: ah
10:42 night199uk: No cost or royalties: Intel FSP helps to reduce bill of materials (BOM) costs
10:42 karolherbst: mupuf_: I still think that the voltage table is at least not wrong, because the values seem to work
10:42 mrnuke_i_: BS. FSP makes integration into coreboot a PITA.
10:42 imirkin_: mrnuke_i_: for more info on the opcodes and what they all do, take a look at http://cgit.freedesktop.org/~darktama/nouveau/tree/drm/nouveau/nvkm/subdev/bios/init.c
10:43 night199uk: heh
10:43 night199uk: just quoting what it says on the site :-
10:43 karolherbst: mupuf_: but maybe a safty margin should be added? using the lowest voltage might be not stable enough allthough it should
10:43 night199uk: why maintain coreboot though?
10:43 night199uk: sure intel must have a tiano based bios
10:43 mrnuke_i_: night199uk: chromeos
10:43 night199uk: my goal once i get done on this nvidia uefi driver is to start working on booting this board with a custom tiano
10:44 mrnuke_i_: google want to use coreboot, so I have a job
10:44 night199uk: they didn’t start on uefi yet?
10:45 mrnuke_i_: night199uk: just try saying that FLA to any of the chrome firmware folks. You'll get burned at the stake
10:45 night199uk: haha
10:45 night199uk: i have no idea about chrome
10:45 night199uk: or chromeos
10:45 mrnuke_i_: neither do the poeple who write the firmware for those machines :p
10:45 night199uk: i’m very far away from the google niche
10:46 night199uk: lol
10:46 mrnuke_i_: chromeos sucks. No person in their right mind would use it to do _actual_ _work_
10:46 night199uk: tbh i don’t even know what it is
10:46 imirkin_: mrnuke_i_: what fraction of computer use is 'actual work'?
10:47 karolherbst: :D
10:47 imirkin_: by someone using chromeos -- clearly 0% ;)
10:47 mrnuke_i_: LOL
10:47 night199uk: about 0.03% of what i use computers for :-)
10:48 karolherbst: imirkin_: to be clear about those live values. A live value is a valie with the source scheduled and all uses non scheduled?
10:48 night199uk: sounds like another google niche
10:49 night199uk: i’m sure i’ll get beaten with it at work one day
10:49 night199uk: ‘but google do it, so it must be AWESOME!'
10:49 imirkin_: karolherbst: and *at least one use* unscheduled
10:49 imirkin_: karolherbst: i.e. values that can't be overwritten
10:49 imirkin_: yes, a niche like android?
10:49 night199uk: *facepalm*
10:49 karolherbst: imirkin_: ohh right, was thinking about that, but wrote the wrong thing
10:50 night199uk: i honestly have no idea - what’s the relationship between chromeos and android?
10:53 night199uk: oh you just meant android isn’t a niche? :-)
10:53 karolherbst: night199uk: I thought there is none?
10:54 karolherbst: chromeos is just a linux distribution with a google made UI?
10:54 RSpliet: does porn count as actual work?
10:54 imirkin_: so is android
10:54 imirkin_: RSpliet: only if you're the one being filmed
10:54 night199uk: Rpliet: absolutely i hope so
10:54 RSpliet: or the one filming...
10:54 imirkin_: meh... that's debatable
10:54 night199uk: RSpliet: if it does my filing my expenses just got a lot easier
10:55 karolherbst: imirkin_: but I think on chromeos you can actually install any Linux software
10:55 karolherbst: where on android this is a bit tricky
10:55 RSpliet: karolherbst: well... does it use X?
10:56 imirkin_: karolherbst: you can't do anything on android. and you can't do anything on chromeos
10:56 karolherbst: I suppose so
10:56 night199uk: so who actually uses chromeos then?
10:56 night199uk: and for what?
10:56 imirkin_: who uses tablets? and for what?
10:57 RSpliet: imirkin_: I do... painkillers to be precise, to relieve the headache that chromeos gives me
10:57 RSpliet: but this is all wayyyyy off-topic ;-)
10:57 night199uk: heh
11:01 karolherbst: imirkin_: the live values stuff gets a bit tricky, because I don't schedule a BB, but parts of it :/ have to think about it first how to do it right :/
11:17 RSpliet: imirkin_: btw, since drm-next of current (so with the rewrite) my VTs (and plymouth, fwiw) don't work any more
11:17 RSpliet: have you noticed something similar?
11:18 RSpliet: during boot I basically lose picture as soon as nouveau loads (monitor in DPMS state or sth), and doesn't return until the login manager displays
11:20 mrnuke_i_: imirkin_: well, I just ordered a Quadro K1200 (maxwell)
11:21 mrnuke_i_: imirkin_: when you see me all angry and pissed off, you'll know I received it :p
11:30 imirkin_: mrnuke_i_: well, i would really have recommended a kepler. oh well.
11:31 imirkin_: RSpliet: are you on a ppc box? :)
11:31 imirkin_: or other bigendian
11:32 imirkin_: coz there was a fix that ben didn't push yet
11:32 imirkin_: RSpliet: which gpu btw?
11:32 imirkin_: i've only tested it on NV34 + ppc and GK208 but without display
11:35 RSpliet: imirkin_: nope, x86_64, I think NV94 at the moment
11:45 imirkin_: RSpliet: bisect ;)
11:48 RSpliet: as soon as I find the time and have figured out what breaks nv94 reclocking that didn't in 4.2, will do...
11:50 RSpliet: oh, and once I have a kernel that doesn't completely mess up Fedora packaging and tooling; I expected drm-next to be marked a 4.3 kernel already so I now have a 4.3 package with a 4.2 kernel
11:50 RSpliet: fun things break if you do that...
11:56 mrnuke_i_: imirkin_: I will pay for that, don't worry :p
17:09 mupuf_: skeggsb: hey, do you have a vbios from the gm204/6 you have?
17:09 mupuf_: or anyone here
17:09 mupuf_: I tried to read a bios from techpowerup but nvbios just craps itself out with it
17:10 mupuf_: actually, anyone here with new gpus, please send me your vbioses and mmiotraces
17:10 skeggsb: yeah, nvbios segfaults on mine too iirc
17:10 mupuf_: well, it does not matter, I can try to fix nvbios to make it work for it :)
17:11 mupuf_: but in the case of tech power up, the P table is just junk
17:11 mupuf_: and all the rest, actually
17:11 mupuf_: and I want to at least verify that the voltage table makes sense on the new maxwells before enabling voltage control on it
17:12 mupuf_: well, now that I think of it, I guess it is pointless since it likely is a privileged operation
17:12 mupuf_: it would be ridiculous if we could change the voltage but not the fan speed
17:12 imirkin_: mupuf_: i think there are some in bugzilla too
17:12 skeggsb: it's ridiculous full stop :P
17:13 mupuf_: skeggsb: :p
17:13 mupuf_: imirkin_: thanks, will have a look
17:13 mupuf_: skeggsb: did you see that the voltage table has a flag byte that tells you know you should read the header?
17:14 mupuf_: there is the GPIO mode
17:14 mupuf_: and the PWM one
17:14 mupuf_:is almost done writing the patches
17:14 mupuf_: I think I am now satisfied with the result
17:14 skeggsb: i just mailed my gm206 vbios to the dumps addy fwiw
17:14 mupuf_: it does not look too crappy :D
17:14 skeggsb: yeah, i'm not surprised by that, i expected something similar
17:14 mupuf_: thanks!
17:15 mupuf_: there is one thing though, instead of the typical 27000 kHz constant we have for the crystal
17:15 mupuf_: I had to use 27*1024 kHz
17:16 mupuf_: do you think they really changed the crystal and we never noticed that our configuration of ptimer suddenly started being off by 2.4%?
17:17 skeggsb: PTIMER's calibration is hard-wired since gk20a anyway
17:17 mupuf_: oh, interesting
17:17 mupuf_: but that still does not fix the other ones :p
17:17 mupuf_: I guess that needs to be added to a todo list
17:17 skeggsb: i'd be surprised if it'd changed, but that should be not too hard to check
17:17 mupuf_: and I may need to edit the patch if I see that the crystal indeed changed
17:18 mupuf_: right, just run nvatimings :D
17:31 mupuf_: [27018.530043] nouveau 0000:01:00.0: volt: Using PWM mode
17:31 mupuf_: [27018.532242] nouveau 0000:01:00.0: volt: current voltage: 0uv
17:31 mupuf_: oopsie :D
17:31 mupuf_: the code looked good, but damn was it broken :D
17:31 mupuf_: and still is :D
17:31 mupuf_: just the typical need to add ! to the conditions
18:06 mupuf: the patchset I sent seems to work pretty well :)
18:07 mupuf: I will ask karol to test this new version
18:07 mupuf: night!
18:07 mupuf: it is 4am here...
18:09 imirkin_: mupuf: didn't karol have pwm with version 0x40?
18:09 imirkin_: oh, no. it's 0x50. nevermind.
18:11 mupuf: yeah, he already tested an ugly version of the patchset
18:12 mupuf: well, I shall say he is testing it continuously, given how often he reclocks
18:14 imirkin_: hehe
18:33 marcosps: Hi guys :)
18:37 marcosps: imirkin: could he move that isImmd64Load to Instruction class?
18:37 imirkin_: the target has to have knowledge of what is loadable and not loadable
18:38 imirkin_: diff ISAs will allow different things
18:38 imirkin_: in fact, i'm not sure what all the maxwell isa allows
18:39 imirkin_: ok, looks like it's the same. but who knows what some future isa will bring. this needs to be in the target.
18:39 imirkin_: now, whether a 64-bit immediate is being loaded or not, i'm a lot more pliable as to where that lives
18:40 imirkin_: but the "can this be loaded into this instruction" check absolutely must live in the target
18:46 marcosps: imirkin_: So, duplicate some checks will not hurt anyone... :)
18:48 imirkin_: i'd definitely like to see hwo it looks
18:48 imirkin_: but that seems the simplest so far.
18:48 imirkin_: you can share the code somewhere
18:48 imirkin_: but the 20-bit check can't exist in generic code
18:49 imirkin_: only in target code
18:50 marcosps: imirkin_: Ok, I'll work on it and share the code with again when I fixed all your comments.
18:50 imirkin_: we're talking about just a few lines of code though
18:50 imirkin_: so... probably not too significant.
18:51 marcosps: Also, about the first 20 bits, was it some misunderstanding of mine, or we need to check it for something...?
18:51 imirkin_: you need to check that everything *below* the top 20 bits is 0
18:51 imirkin_: top 20 bits can be anything
18:54 marcosps: imirkin_: So, I'm already doing it in the diff I sent to you...
18:55 imirkin_: so my comment was wrong?
18:55 imirkin_: btw, i highly recommend you send these emails to the list as well
18:55 imirkin_: that way (a) other people can provide feedback and (b) others can learn from your questions and the answers that are given
18:56 imirkin_: skeggsb: should reverse prime work if PGRAPH is disabled?
18:58 marcosps: imirkin_: right now I'm a little ashamed because I alamost don't know what/how to do somethings... But I hope to get more familiar with the code when this task is finished...
18:58 imirkin_: marcosps: this stuff is complex
18:58 imirkin_: there is a TON of highly specific knowledge that you don't have
18:58 imirkin_: so it's ok
18:58 marcosps: imirkin_: thanks :)
18:59 marcosps: imirkin_: this comforts me, because it seems I'm the only one here trying to learn, all other guys are experienced in graphics programming :)
19:00 imirkin_: what other guys
19:00 imirkin_: i think you're overestimating the number of 'experts' here...
19:02 skeggsb: imirkin_: i have no idea actually
19:02 marcosps: imirkin_: :)
19:02 skeggsb: imirkin_: i still get confused about what is "prime" and what is "reverse prime" :P
19:02 imirkin_: skeggsb: a friend has a problematic GPU (hopefully your workaround will help), but i want to give him some way to use the reverse prime outputs
19:03 imirkin_: skeggsb: prime = offload gpu rendering
19:03 imirkin_: skeggsb: reverse prime = offload outputs
19:03 imirkin_: [he also ultimately wants MST, but i told him he'll have to wait on that one]
19:04 skeggsb: i really should finish that...
19:04 imirkin_: yes... your WIP branch is only 1.5 years old...
19:04 skeggsb: well, i know what needs to be done, just haven't done it properly yet :P
19:04 imirkin_: yeah, i know what needs to be done too... 'MST' :)
19:05 skeggsb: i'm kinda hoping someone fixes up the core drm mst helpers to be nicer for atomic before i have to do it myself
19:05 skeggsb: airlied: ;)
19:36 airlied: skeggsb: mlankhorst will do it :-P
19:55 marcosps: imirkin_: The first 20 bits of the double immediate should be placed in the emmiter, right?
19:56 imirkin_: errr
19:56 imirkin_: in the instruction that is emitted, yes
19:58 marcosps: imirkin_: So, time to fix the emmiter...
20:00 marcosps: imirkin_: I was looking for some uses of 0x1a, to verify how to encode it in mesa code... but I couldn't found any example inside mesa...
20:00 imirkin_: ??
20:00 imirkin_: 26 is just the offset of the bits in the 64-bit instruction encoding
20:00 imirkin_: each instruction is a 64-bit thing
20:01 imirkin_: bits 26..45 are the 20 bits of the immediate
20:01 marcosps: Hum...
20:02 imirkin_: (0x1a == 26 btw)
20:02 marcosps: imirkin_: yes, the hex -> dec is fine, but I'm thinking how to apply this bitwise op :)
20:03 imirkin_: see how the other immediaets are done
20:03 marcosps: Unfortunately, I used bitwise ops just a couple of times, so I have to take a look how to extract it...
20:03 imirkin_: it's *literally* the exact same thing
20:03 marcosps: hum....
20:04 imirkin_: http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp#n317
20:05 marcosps: imirkin_: Yes, I'm looking to it... I was looking about the 26 shifted to left...
20:05 imirkin_: getting all nice and comfy with bit manipulation is probably a good thing to do before messing with this stuff of course
20:05 imirkin_: code is 2 32-bit values
20:05 imirkin_: aka 1 64-bit value
20:05 imirkin_: code[0] is the first 32-bits, code[1] is the second 32 bits
20:06 imirkin_: so if you want to write a 20-bit value to bits 26..45
20:06 imirkin_: then you're writing 6 bits in 26..31 of the first 32-bits and bits 0..13 of the second 32 bits
20:24 marcosps: imirkin_: Thanks for the explanation about the emitter.
20:47 marcosps: imirkin: http://pastebin.com/nGjNMTuy
20:48 imirkin_: marcosps: so... the reality is actually that only double 64-bit immediates are allowed
20:48 imirkin_: marcosps: you should probably make it a double constructor...
20:49 imirkin_: your encoding also seems backwards
20:49 imirkin_: code[0] should get u64 >> 44 &0x3f
20:50 imirkin_: and so on
20:50 imirkin_: + if (i->op == OP_SET) {
20:50 imirkin_: this has gotta go
20:50 imirkin_: unless you're keeping it for debugging
20:50 imirkin_: in which case it's fine
20:51 imirkin_: i->getSrc(1)->asImm()->reg
20:51 imirkin_: surprised that doesn't crash
20:51 imirkin_: but you're on the right track
20:52 marcosps: imirkin_: nice to see a quikc review :)
20:52 marcosps: imirkin_: when you say about the double ctor, you say about ImmediateValue?
20:52 imirkin_: yes
20:52 imirkin_: i could go either way on that...
20:52 imirkin_: should defniitely mark it as TYPE_F64 though
20:53 imirkin_: you're also missing a lot of little checks
20:53 imirkin_: but we can iron those out later
20:54 marcosps: imirkin_: So, can I drop my new ctor with uint64_t and use the double one that already exists?
20:54 imirkin_: ah yes
20:54 marcosps: hum...
20:54 imirkin_: just... be careful to pass in the bits interpreted as a double
20:54 imirkin_: you can't just do (double)foo
20:55 imirkin_: more like *(double*)&foo
20:55 imirkin_: but that will make the type aliasing people mad
20:55 marcosps: I liked my new ctor there... but, as you wish I will remove it :)
20:56 marcosps: imirkin_: double is tricky ....
20:56 imirkin_: it
20:56 imirkin_: it's double tricky!
20:56 marcosps: :)
21:13 marcosps: imirkin_: double is so tricky, that even our *(double*)&foo makes our double to be a nan...
21:15 imirkin_: that's coz you're assembling it wrong
21:17 marcosps: hum...
21:18 imirkin_: [at least that's my guess]
21:18 marcosps: All things remained, just the parameter's type and the attribution. I'll try it on a different form...
21:19 imirkin_: + uint64_t a = (uint64_t)merge->getSrc(0)->reg.data.u32 << 32 | b;
21:19 imirkin_: i think you want the opposite
21:19 imirkin_: i.e. b << 32 | a
21:19 imirkin_: i can never remember =/
21:20 imirkin_: yeah. high bits are in b.
21:20 marcosps:thinks about all crazyness in bit ops,...
21:21 imirkin_: well there's no fundamental right answer to these things... it's all convention
21:21 imirkin_: little endian vs big endian, etc
21:21 imirkin_: (from gulliver's travels)
21:27 marcosps: Hum...
21:27 marcosps: imirkin_: I'll still trying hehehe
21:28 marcosps: so, the thing is: I need to get the src[1] << 32 | src[0]?
21:28 imirkin_: ya
21:28 imirkin_: you had src[0] << 32 | src[1]
21:29 marcosps: imirkin_: this worked :)
21:30 marcosps: but, now it's bringing 0, when I have 1.000 set on IMM...
21:30 marcosps: btw, it's 1.00000000
21:32 imirkin_: pastebin the full debug otuput?
21:32 marcosps: imirkin_: http://pastebin.com/mmzmM4B4 the output
21:33 marcosps: imirkin_: http://pastebin.com/6tagRSCH shader
21:33 imirkin_: and your code?
21:35 marcosps: imirkin_: http://pastebin.com/m1KyRFSE
21:35 imirkin_: uint32_t a
21:35 imirkin_: a << 32 == a
21:37 imirkin_: also, isImmd64Load should be returning false for merge u64 %r67d %r65 %r66 (0)
21:37 imirkin_: yet i see the "double" print
21:38 marcosps: imirkin_: so, I need to put uint64_t for a too right?
21:38 imirkin_: that would be one way of doing it.
21:38 marcosps: hum...
21:39 marcosps: I changed to uint64_t and removed the shift, and it still shows 0 there :)
21:39 marcosps:likes this shift dance
21:41 imirkin_: ??
21:41 imirkin_: why'd you remove the shift??
21:41 marcosps: because with the shift it shows nan again :)
21:41 marcosps: imirkin_: http://pastebin.com/GuviupwK
21:42 imirkin_: well, you have to shift it
21:42 imirkin_: but
21:42 imirkin_: merge->getSrc(1)->reg.data.u32
21:42 imirkin_: that's wrnog
21:42 imirkin_: wrong*
21:43 imirkin_: you need to get the value being loaded in
21:43 imirkin_: the value in reg.data.u32 is ~random
21:43 imirkin_: so you need to get the insn, and then get *its* ->getSrc(0)
21:43 imirkin_: i.e. what isImmd32Load does
21:44 imirkin_: you should get isImmd32Load to also return the value being loaded
21:44 imirkin_: that way you don't have to duplicate the work
21:44 marcosps: the work is getting bigger every time we talk :)
21:44 imirkin_: to you, maybe
21:44 imirkin_: as you realize things you miss
21:45 imirkin_: bbl
21:50 marcosps: imirkin_: Now I got it. It worked here!
21:52 marcosps: imirkin_: http://pastebin.com/XxDdkpLY
21:52 marcosps: imirkin_: It's almost 2 AM here... I need to sleep and work tomorrow morning :)
21:53 marcosps: imirkin_: Thanks for all the help today. I hope tomorrow I'll continue to work on this task.
21:53 marcosps: See ya!
21:53 marcosps: imirkin_: Also, I just want to know if that value 0.007812 is right..
23:40 mlankhorst: airlied: what's wrong with mst and atomic then? seems to work for me..
23:53 imirkin: marcosps: >>> struct.unpack("d", struct.pack("L", 0x3f80000000000000))
23:53 imirkin: (0.0078125,)
23:55 imirkin: marcosps: but your encoding is off: 00000008: 00011c01 168e3f00 set b32 $r4 neu f64 $r0d $r0d [unknown: 00000000 00003f00]