00:04karolherbst: 9.5W now :)
00:05karolherbst: 10W at 07 pstate
00:05pmoreau: Down from?
00:05karolherbst: 10W => 9.5W
00:05karolherbst: mhh maybe 10.5W => 9.5W
00:05karolherbst: it is a bit unstable though
00:06karolherbst: nvafill is a bit broken
00:06karolherbst: it also pokes the offset in?
00:06karolherbst: or is htis intended
00:06karolherbst: when I do "nvafill 0x20200 0x10 27724545"
00:07karolherbst: I get 00020200: 27724545 27724549 2772454d 27724551
00:08pmoreau: This is not the intent according to the README file in nva/
00:15karolherbst: that wasn't that good what I did
00:15karolherbst: pmoreau: yeah, I thought so too
00:16karolherbst: found the issue
00:16karolherbst: nva_wr32(cnum, a+i, v+i); // <----
00:16karolherbst: v+i ?
00:25karolherbst: mhh okay, so I save 0.75W at stock clocks
00:25karolherbst: 11.68W => 10.9W
00:26karolherbst: 1.3W at 0f pstate
00:27karolherbst: 18.7W => 17.4W
00:27karolherbst: seems to work?
00:27karolherbst: is there something more to clock gating than just setting the regs? (despite findin the right values)
00:29karolherbst: so 0x20200 0x60 are clock gates
00:29karolherbst: for various stuff
00:30karolherbst: I get something like that on my kepler card: https://gist.github.com/karolherbst/bfa9fd697d5d894bee6d
00:30karolherbst: does anybody else want to check that range?
00:31hakzsam: karolherbst, yeah, probably
00:32hakzsam: I think most of those regs are already known but names can be different
00:32karolherbst: though there isn't something for that in rnndb
00:33karolherbst: I noticed that I can still run GL apps on the gpu after setting those bits
00:33karolherbst: so what does "clock gates" do? :D
00:34karolherbst: ohh per decreases
00:35karolherbst: by like 1%
00:37karolherbst: blob doesn't change the regs after setting them?
00:41karolherbst: ohh I know
00:46ratherDIfficult: Once the current execution thread is selected, a corresponding group of mode bits 202 are decoded in operation 1432. Upon mode bits 202 being decoded in operation 1432, a control vector is afforded which includes a plurality of bits each of which indicate whether a particular code segment is to be accessed in ROM 1404 for processing the corresponding vertex data. Upon determining whether a code segment should be accessed in ROM 1404 and e
00:55ratherDIfficult: karolherbst: dunno status paid, but this patent really suggests that rasterizer code segment is read from ROM
00:55ratherDIfficult: wether it's the case with newer chips i dunno, however it's citated very much from nvidia later patents
01:05ratherDIfficult: by even, well ok but how the heck to read from either pc or sp..gpu can do that internally but, gotta look if cuda or shaders can too?
03:25karolherbst: seems like the reg is already inside rnndb
03:25karolherbst: looks right so far according to tegra sources
03:31mupuf: karolherbst: I wonder why they look right :D
03:32karolherbst: mupuf: because it works
03:32karolherbst: got a black screen now
03:32karolherbst: after turning every engine off
03:32karolherbst: auto reduced power already, but it still worked
03:32karolherbst: and conumption was the same with load on on or auto
03:32mupuf: I meant that I took this information from the tegra sources
03:32karolherbst: I figured somebody looked that up
03:32mupuf: and I tried to reproduce but got 0 W difference
03:32karolherbst: I get a difference
03:32mupuf: I spent ages on clock and power gating
03:33karolherbst: it kind of works for me
03:33mupuf: clock gating are low-hanging fruits
03:33mupuf: and they mostly just need to be programmed and we can forget about them
03:33mupuf: except for a few engines
03:33karolherbst: 9.8W with everything off
03:33mupuf: power gating is messier
03:33mupuf: but hey, you may have one of the master enable bits set by default
03:33karolherbst: 13.3W with everyhting on
03:33mupuf: there is one in PMC
03:34karolherbst: most of the engines are set to auto by default for me
03:34mupuf: but there may be another one somewhere else
03:34karolherbst: or partly
03:34mupuf: did you change the values only for the ptherm range?
03:34mupuf: you know that you also need to set all the CG regs in each engine? :D
03:35karolherbst: I don't know that
03:35mupuf: if you want more perf, you'll need to do it
03:35karolherbst: okay, eng set to run, blk to auto
03:35mupuf: otherwise, you wait for absurdly long delays before enabling the blocks
03:35mupuf: but it is a safe default
03:35karolherbst: I see
03:35mupuf: which is what we want from a default boot state
03:36karolherbst: anyway, did you see my pwm graphs?
03:36mupuf: I documented all the ones I found
03:36ratherDIfficult: i gonna head out to see a babe, the documentation is pretty good at http://envytools.readthedocs.org/en/latest/hw/falcon/isa.html i have looked into that link, everything should be there to exploit nvidia card fully
03:36mupuf: and there are a ton of them
03:36mupuf: I saw one but it only had the core clock
03:36mupuf: which is not what I asked for :D
03:36mupuf: we need ALL the clocks
03:36karolherbst: okay, this graph is at stable load (== stable clock) but I adjusted max clock: http://www.plotshare.com/sessions/662564845/Plot2.png
03:37karolherbst: I know what you mean, but this isn't what I wanted to show you
03:37karolherbst: this graph is full load, and I just adjusted max clock: http://www.plotshare.com/sessions/202135770/Plot2.png
03:37karolherbst: max clock is the actual clock in the last graph
03:38mupuf: I see
03:38mupuf: seems like we will have some fun with this
03:38ratherDIfficult: mwk: good work basically, cheers, bbsomeotherday
03:38mupuf: but I would suggest we stop using their tool for now until we get what they are doing
03:38mupuf: I would rather want to understand the vbios tables first
03:39karolherbst: mhh votlage clock mappins is basically easy then
03:39karolherbst: there is always an upper and lower bound within a pstate
03:39mupuf: easy? did you prove your function? :D
03:39karolherbst: 07 is always 0x38 pwm for me
03:39karolherbst: 0a/0f is lower 40, upper 63
03:40karolherbst: maybe it was 64, really doesn't matter though
03:40mupuf: ok, and how do you compute that? Based on the cstate, right?
03:40karolherbst: and then each "clock step" has a valid range
03:40karolherbst: I checked what the blob did
03:40karolherbst: each step as a range of 5 values
03:40karolherbst: like 50-54
03:40karolherbst: 50 at low temp
03:40karolherbst: 54 at high temp
03:40karolherbst: other way around
03:41karolherbst: 54 at low temp, 50 at high temp
03:41karolherbst: so now, the lower cstates start at ranges beneath the lower bound (for me)
03:41karolherbst: so they always get 40
03:41mupuf: you know what we need to do?
03:41karolherbst: if you go one state higher, then do +1 lower/upper bound
03:41karolherbst: until you hit the pstate upper bound
03:42karolherbst: so if cstate 20 has 37-41, you get actually 40-41
03:42karolherbst: and cstate 21 has 38-42 (actually 40-42)
03:42mupuf: we need to write a tool in the userspace that monitors the clocks and computes the voltage necessary
03:42mupuf: and compare that to what the blob does
03:42mupuf: and does that in real time
03:42karolherbst: I tweaked your tool to get the pwm value for me
03:42mupuf: our function will be OK if the voltage is always higher than the blob
03:43karolherbst: currently it is always lower
03:43karolherbst: at last for the pwm based ones
03:43mupuf: we need to work on that
03:43mupuf: and this is something you can do if you have time
03:43mupuf: you do not need to alter the bios
03:44mupuf: and this is something you can run on reator on any other card afterwards
03:44karolherbst: again, nvafakebios, doesn't work for me :)
03:44karolherbst: yeah I know
03:44mupuf: and it is a very good way of comparing our policity
03:44mupuf: that's what I'm saying, you do NOT need to change your vbios
03:44karolherbst: ahh okay
03:45mupuf: do you see the value?
03:45mupuf: it will help us a great deal, I assume
03:45karolherbst: but I have a table for my card basically
03:45karolherbst: I just stopped at some point because it went to boring to keep the card at a stable clock :/
03:45karolherbst: or too hard
03:45mupuf: Who cares about your card, we need to work on all of them :p
03:45karolherbst: I know
03:46mupuf: I agree, it can be boring
03:46mupuf: but automating stuff makes it less boring and it also improves the result
03:46karolherbst: but my assumption is, the blob takes the highest cstate, computes it's voltage id, and then does this-> cstate id-1, voltage id -1
03:47mupuf: prove it, build the tool and test your assumption
03:47karolherbst: how can I fake load a stable way?
03:47mupuf: why fake the load? Just run your tool in real time
03:47karolherbst: it is messy to hit a specific cstate on the blob, when the cstates are like 14MHz away from each other
03:47mupuf: and run all the workloads you have
03:48mupuf: if you run your tool at 100Hz, you should be able to compare with the blob's output
03:49mupuf: and find out what their algorithm is
03:49mupuf: we cannot do it any other way
03:49mupuf: it is too dynamic
03:49mupuf: and this is especially true for PWM cards
03:49mupuf: the others are mostly fine because they only have 16 to 32 voltages
03:49mupuf: not 96
03:49karolherbst: or more
03:50karolherbst: did you upgrade the kernel on reator?
03:50mupuf: or more on the maxwell, yes
03:50mupuf: oh, shit, sorry
03:50karolherbst: I wanted to check your 0x60 div maxwell card
03:50mupuf: I just worked on my finnish yesterday
03:50karolherbst: I bet the blob does use higher pwm values than 0x60
03:50mupuf: well, you can always reboot on the partition with the blob
03:50mupuf: grub-reboot 3
03:50mupuf: then reboot
03:51karolherbst: mhhh, usually I just want to unload/load modules :/
03:51mupuf: which modules? t
03:51karolherbst: unload nouveau, load nvidia
03:52karolherbst: but now it doesn't matter really
03:53karolherbst: okay, clock gating: all on 13.4W, all auto: 10.1W
03:53karolherbst: all stop: 10.1W
03:53mupuf: please send me the commands you ran on your card. it will help me
03:53karolherbst: but this is expected
03:53karolherbst: nvafill is broken, do you know?
03:54karolherbst: it has to be v
03:54mupuf: never used it
03:54karolherbst: not v+i
03:54karolherbst: yeah, for clock gating you should :D
03:54karolherbst: all on nvafill 0x20200 0x60 27722400
03:54karolherbst: all auto nvafill 0x20200 0x60 27722466
03:54karolherbst: all off nvafill 0x20200 0x60 277224aa
03:55mupuf: nvafill 0x20200 0x4 27722400 --> What should this do, for you?
03:55karolherbst: seems to be enough, but not really enough
03:55karolherbst: only 13.1W
03:56karolherbst: the blob seems to hit the first 8 regs
03:56karolherbst: except the 5th one?
03:56karolherbst: need to fill 0x20 range, to get 13.4W
03:57karolherbst: 0x14 needed for 13.4W actually
03:57karolherbst: maybe the 5th one is important :D
03:57mupuf:drops, can't do that at work
03:57karolherbst: 0x1c is higher though
03:58karolherbst: mupuf: you said you have some power sensor patches already?
03:59karolherbst: I would like to add this to hwmon, or is it already done in your patches?
03:59mupuf: done by my patches
03:59karolherbst: I did some hwmon work to expose my core voltage :D
04:00karolherbst: mupuf: are the patches somewhere public? I would like to take a look at them
04:00mupuf: and I will forbid you to work on this, if you don;t mind. Not because I want to take credit for my work, but because what should be done depends on nvidia's decision with regards to the PMU fw
04:00mupuf: oh, very nice for the voltage!
04:00karolherbst: mupuf: https://github.com/karolherbst/nouveau/commit/3809b64367ae97f399fa4e922a6e740752724177
04:00mupuf: did you follow the sysfs interface of hwmon or did you come up with your names?
04:00karolherbst: I tried to get memory voltage as well
04:00karolherbst: but this is messy
04:00karolherbst: mupuf: I followed sysfs interface
04:01karolherbst: I also added nice lables :)
04:01mupuf: looks good. We will have problems with the memory one, yes
04:01karolherbst: there are also some unlucky ones without any VID gpios :/
04:02mupuf: well, if they do not have any voltage control, who cares about them?
04:02karolherbst: but the pstate table listed different votlages
04:02mupuf: you should not expose the voltage if the card does not support it
04:03mupuf: I know your code will return an error code, but I would rather not expose files that are not readable at an time
04:03mupuf: will be fun to check whgat the blob does on those
04:03karolherbst: pmoreau has such a card
04:04karolherbst: vbios already added
04:04mupuf: and I say that in a non-sarcastic way
04:04karolherbst: I know
04:04mupuf: oh, those cards!
04:04mupuf: well, screw them
04:05karolherbst: anyway, I just wondered, because the lower pstates want 0.9V and the highest 1.01C
04:05mupuf: I mean, I doubt they have any kind of voltage management
04:05karolherbst: there is no GPIO for that at least
04:05karolherbst: but also a voltage table with 4 entries
04:06mupuf: there may be an ACPI interface to increase the voltage
04:06mupuf: as I said, screw them for now!
04:07mupuf: we have enogh on our plate for cards where reclocking actually matters
04:09karolherbst: so, now let's check what happens on maxwell blob
04:09karolherbst: anyway, it is not rare, that kepler owners have low core clock with my gddr5 patch at 0d/0e/0f pstates :/
04:10karolherbst: actually we don't know without my patch, because usually the card just crashes :)
04:10karolherbst: mupuf: grub-reboot 3 gave me your rc8 kernel again :/
04:10karolherbst: never used grub in my live, so :)
04:10mupuf: give me a sec
04:12pmoreau: mupuf: Eh! Don't screw my card like that!! :@
04:13mupuf: pmoreau: hehe
04:13karolherbst: pmoreau: do you have any troubles with the slower card on highest pstates?
04:13mupuf: karolherbst: my bad, it was grub-reboot 2
04:13mupuf: 0 is for nouveau
04:14mupuf: anyway, here it is
04:14mupuf: happy testing
04:14pmoreau: karolherbst: I'm not sure which one is the slower card. Maybe the G96 by default?
04:14karolherbst: should be the MCP
04:14pmoreau: I had no trouble with the MCP79 on highest pstate iirc
04:14karolherbst: mhh okay
04:14karolherbst: then it doesn't seem to matter
04:15karolherbst: mupuf: now I need a password :D
04:15mupuf: shit, sorry, adding your key
04:16karolherbst: okay, removing nouveau while engines are off, bad idea
04:16pmoreau: karolherbst: But it's been a long time since I last reclocked to highest pstate (or maybe I never did that even).
04:16pmoreau: I can play with it this evening
04:16karolherbst: pmoreau: I see
04:16mupuf: karolherbst: done
04:16karolherbst: yeah, you should test this, and if it crashes, then there might be a voltage control in the end
04:16karolherbst: mupuf: thanks
04:17pmoreau: The main motivation of having reclocking was to reclock to lowest pstate to get more battery :D
04:17karolherbst: pmoreau: :D
04:17karolherbst: strange tesla cards
04:18pmoreau: And fix the G96 to be able to power it off, and save even more battery
04:18hakzsam: pmoreau, power-gating would be better ;)
04:18hakzsam: go ahead :D
04:18karolherbst: mupuf: ohh is it a different partition?
04:19pmoreau: hakzsam: Yeah, but... Fixing the G96 had the nice side effect of also not having Nouveau crash after both cards are initialised. :-)
04:20pmoreau: And I already have my hands more than full with the SPIR-V compute thing ;)
04:21hakzsam: yeah, CL is also very important for Nouveau
04:23karolherbst: how to disable vsync on blob? :D
04:23mupuf: karolherbst: yes, I mounted the other one in /tmp/toto/
04:23pmoreau: mupuf: :D
04:23mupuf: pmoreau: :D
04:23pmoreau: mupuf: I love using "toto" everywhere!
04:23karolherbst: also you may want to change the language, but it's fine for me
04:23mupuf: of course, toto, tata
04:23pmoreau: Probably by far my main debug message
04:23pmoreau: Of course ;)
04:24mupuf: who needs foo and bar when you have toto and tata
04:24pmoreau: Exactly! And don't forget titi
04:25karolherbst: mhh strange
04:25karolherbst: I only get 60fps with nvidia
04:26karolherbst: should be off
04:26karolherbst: ahh there is also __GL_SYNC_TO_VBLANK
04:26karolherbst: __GL_SYNC_TO_VBLANK=0 :)
04:29karolherbst: mupuf: can I install glxspheres on the blob partition?
04:29karolherbst: I need something heavier then glxgears :/
04:29karolherbst: or is there something else already?
04:29mupuf: sure, make sure you have the space for it
04:29mupuf: you can use xonotic
04:30mupuf: and heaven
04:30mupuf: and possibly valley
04:30karolherbst: is there something already installed?
04:30mupuf: yes, all of those
04:31mupuf: valley is not, but heaven and xonotic are
04:31karolherbst: do you know where heaven is installed?
04:32karolherbst: k, thanks
04:32mupuf: xonotic is easier to launch though
04:33karolherbst: I fear that xonotic isn't heavy enough :/
04:46karolherbst: ahhh, much better :)
04:46karolherbst: the card is getting warmer
04:46karolherbst: mhhh okay
04:47karolherbst: mupuf: it seems you are right. The blob always stays below the div
04:47karolherbst: 0x54 is the duty
04:47karolherbst: which is pretty high
04:47karolherbst: but okay
04:49karolherbst: the blob is mocking me
07:12karolherbst: mwk: would you like to look at this fix?
07:13karolherbst: I kind of wondered if it was intentionnaly this way
07:15karolherbst: mupuf: any idea how long it will take to figure stuff out with nvidia regarding power sensor? I am suprised that some firmware is needed for that :/
07:17mwk: karolherbst: it was this way on purpose
07:17mwk: if you want to fill a range of regs with a *single* value, there's nvapoke <start> <size> <value>
07:18karolherbst: ohhh okay
07:18karolherbst: I see, thanks
07:24karolherbst: mupuf: what are these CG reg you were talking about?
07:24karolherbst: HW_CGBLK* ?
07:41kfsf: how is the hdmi sound output handeled on nouveau supported cards?
07:42kfsf: i would like to know that before i buy a nvidia card
08:02ratherDIfficult: kfsf: i never had hw, wait when allmighty imirkin wakes up, this guy knows some of it and is suprisingly willing to help in most cases, though probably someone made a matrix showing something too
08:07kfsf: ratherDIfficult: your nvidia card have no hdmi output?
08:20ratherDIfficult: kfsf: i have some old crap, but it playd s-video, i had two of them, i was satisfied, but the computer it was bundled with was slow according to todats standart, so no hdmi card i have
08:20ratherDIfficult: todays, but nouveau worked ok
08:21ratherDIfficult: i bed your card would work with hdmi nicely, but ask details from someone else how to get started with, i'd go putting it working using google
08:22kfsf: i would like to know if its exact same behavour like at radeon drivers. is there always the audio soundcard listed in the alsa settings like on radeon?
08:23ratherDIfficult: kfsf: yeah i belive so, pulseaudio should list it, and any sound daemon really, they route to also i belive anyways
08:24kfsf: is it also disabled by default (but listed in alsa) and the sound gets enabled by EDID checks of the monitor?
08:25imirkin: kfsf: for the majority of nvidia gpu's (GT215+... earlier ones are not well supported for hdmi audio), the hda device is always there
08:25ratherDIfficult: ohm, that i do not know, but mixer has hdmi audio , should have yes to play with, goes over wire directly to anywhere wire was connected to
08:25imirkin: kfsf: however i *have* seen an instance where it only appeared when an hdmi cable would get plugged in.
08:25imirkin: kfsf: we don't handle the latter case particularly gracefully. i believe this was a laptop.
08:25karolherbst: mupuf: I pushed all the blob values inside the CG regs but nothing changed
08:25karolherbst: nothing as in nothing
08:26imirkin: kfsf: as for enabled/disabled, that's a product of what your userspace does, not the driver
08:26imirkin: kfsf: afaik there's no way to *disable* the audio entirely, but you can just not play anything into it
08:27imirkin: kfsf: also afaik no one's ever tried hdmi audio + maxwell -- perhaps that doesn't work. i'd stick to earlier chips (also for other reasons)
08:27kfsf: imirkin: on radeon driver there are sometimes EDID reading problems. In the case you have connected the pc to the tv and use tv as sound output, sometimes you boot up the pc and have then no sound
08:31karolherbst: so who does want to test some clock gating? Would be nice somebody could then check power consumption or anything related
08:36imirkin: kfsf: i've never personally tested any of this, but i haven't really heard any complaints either
08:36imirkin: kfsf: there were some bugs on kepler with hdmi (due to improper audio setup) a while ago, but those were fixed over a year ago
08:37kfsf: imirkin: did you know if hdmi audio output is disabled by default and only enabled if EDID reports audio support?
08:37imirkin: kfsf: the only complaints i do hear about tv's is regarding the fact that they overscan
08:37imirkin: kfsf: afaik there's no way to disable hdmi audio
08:37imirkin: although perhaps we only enable it based on edid. i don't remember.
08:38ratherDIfficult: kfsf: sounds like it works well on all newer cards, just as i thought
08:43ratherDIfficult: kfsf: i think someone comments on this more, but i say if you are nouveau fan, i think kepler is a good bet, if you are gamer, choose something has more cores, i.e 7billion transistor one for pcie computer is of course very powerful card
08:43ratherDIfficult: but there are thinner keplers too, imo they were in three classes or versions based of how many cuda cores and stuff
08:58ratherDIfficult: kfsf: as i remember 1.7billion transistor one which runs most games allready, and of course all your hdmi fancy resos 4k prolly allready, but to play games on that , better choose 3.5billion or 7billion one
09:05ratherDIfficult: it seems noone wants to type with me, i get sober, and try to get a card to work with too, this shit is hard today, too much work ahead, but i migrate to fpga's
09:05kfsf: ratherDIfficult: thanks for the informations :)
09:06kfsf: ratherDIfficult: did you know if nouveau require some closed source binary firmware or is this only the case at the most recent 900 series where they require a signed firmware?
09:08ratherDIfficult: kfsf: not entirely, well don't buy maxwell card, i suggested kepler one, maxwell is definitely on blob firmware, but i think seems that kepler has both
09:09ratherDIfficult: i.e some open source version of the fuc or falcon or such stuff, or binary one can be used too, for details imirkin comments since bskeggs is not in timezzone yet
09:10imirkin: kfsf: video decoding acceleration (i.e. h264/etc) requires closed firmware
09:10kfsf: imirkin: on every nouveau supported card?
09:10imirkin: kfsf: also there's no way to get GM20x going without signed firmware blobs (although in fairness, even with those blobs, you couldn't get anything working with nouveau either)
09:11imirkin: kfsf: all GM10x and older GPUs don't require any blobs for 3d acceleration with the latest nouveau
09:11imirkin: GM107 since 4.1, older stuff has been working for much longer
09:11imirkin: GM107 has a slew of various issues, i'd recommend a kepler if you were in the market for a new board
09:11imirkin: to use with nouveau
09:13kfsf: imirkin: it seems like nouveau is the "fastest" linux driver you can get for 3d and 2d acceleration when you dont accept any binary blobs on your system at all
09:13imirkin: kfsf: nouveau can't work without what's in the VBIOS
09:14imirkin: kfsf: similar to atombios in many respects
09:14kfsf: kfsf: the radeon driver have much better support for most things, but all of those things (speed, opengl,...) supported by those radeon driver are simply lost when you dont install the amd binary blobs
09:14imirkin: kfsf: i'm not sure why you care whether those blobs are on your HD or in a PROM on the gpu board...
09:15imirkin: i'd definitely recommend an amd board over nvidia
09:15kfsf: imirkin: yes, that the next part. but on radeon you have the closed source binary blobs you have to install on the system AND you have the closed source fimware on the card itself. nouveau seems to only need the fw on the card
09:16imirkin: with open-source amd drivers, it's only closed-source firmware blobs on the card for the command processor and video decoding units
09:16imirkin: basically the same situation as nouveau, except that nouveau has open-source firmware for the context switching unit
09:17ratherDIfficult: i think i go with kepler to hack on, because of the tools, yeah i do not even care when the whole driver is closed source, definitely i don't care about firmware
09:17kfsf: imirkin: there is the openatom (open radeon bios). this should "fix" the binary blobs on the card itself
09:18imirkin: i guess atombios might allow nastier things than vbios command tables do
09:19kfsf: imirkin: you know http://sourceforge.net/projects/openradeonbios/ ?
09:21kfsf: imirkin: i know that its possible to send FM-signals with the GPU. thats why i think its important to finaly replace just everything.
09:22kfsf: imirkin: i dont know why http://sourceforge.net/projects/openradeonbios/ is not really famous. its the only thing available and the thing the whole market is missing
09:23Yoshimo: maybe people are not missing it at all?
09:29kfsf: its important. you dont know whats inside the closed source fw and your gpu card can send wireless data without you knowing that. and you just dont care?
09:31Yoshimo: i think this is stuff you can do if the rest of the hardware is working
09:47karolherbst: kfsf: just because something is called "firmware" it doesn't mean it can do everything
09:47karolherbst: having every possible wifi driver inside the gpu vbios would be a little overkill if you ask me :/
09:47imirkin_: kfsf: how would your gpu do that?
09:48imirkin_: kfsf: at least for nvidia, the vbios contains commands that only affect the gpu device. if you have an iommu, that should prevent the gpu device from doing unauthorized things
09:48imirkin_: if the gpu device really wanted to do that, it could have on-board prom firmware with the relevant bits :)
09:50kfsf: imirkin_: http://bk.gnarf.org/creativity/vgasig/vgasig.pdf
09:51kfsf: pdf is named "FM radio transmitter using a VGA graphics card"
09:52karolherbst: I don't know how you get form "FM radio transmitter" to fully wifi functionality
09:52karolherbst: you can also transfer bits with your sound card between devices, big deal
09:52karolherbst: because for 1kb you need like hours
09:52karolherbst: ohh maybe not hours, but minutes
09:52mjg59: And it's still only exfiltrating stuff that you've sent to the GPU
09:52karolherbst: but that doesn't mean you can use your sound card as a wifi device
09:52mjg59: So, other people's passwords but not your own
09:53karolherbst: this is in a <100 kHz domain this radio transmitter
09:53karolherbst: you can't really do much with that
09:54kfsf: you know that the Intel ME Firmware have access to all the hardware. there is a factory-admin-password that is always guilty and that can be used to turn on the pc. then you can dd out the parts of the hdd over fm you are interested into
09:54karolherbst: kfsf: this is dangerous not because it is called "firmare"
09:54karolherbst: think about it
09:55mjg59: kfsf: The ME doesn't have direct access to external GPUs
09:55kfsf: its Management Environment. i know
09:55imirkin_: can you guys take this to #tinfoilhat or something?
09:55mjg59: If it *did*, it could just replace your known-good firmware with hostile firmware and do this anyway
09:56kfsf: mjg59: it have access to PCIe and can talk with the GPU
09:56mjg59: There are all kinds of reasons to be annoyed at non-free GPU firmware
09:56mjg59: But system security isn't really one of them
09:56karolherbst: anyway a gpu firmware isn't hostile by itself
09:56mjg59: The non-free firmware on your SSD is a much bigger concern there
09:56kfsf: imirkin_: there is noone in #tinfoilhat . I think they are all just hiding :D :D
09:57mjg59: Or the non-free firmware on your actual wifi card :p
09:57kfsf: mjg59: http://www.openssd-project.org/wiki/The_OpenSSD_Project
09:57kfsf: mjg59: you should only use ath9k supported wifi cards
09:58karolherbst: if we already talk about crazy stuff, then stick to gpus and think about ways how to make the VGA cable less attackable :)
09:58mjg59: kfsf: You're running an OpenSSD?
09:58kfsf: mjg59: they dont have any binary blobs for running and the preinstalled
09:58mjg59: And, again, the ME can just replace the running firmware on your wifi card
09:58mjg59: If this is your threat model, the GPU firmware is really not the biggest problem you have by a *long* way
09:59kfsf: mjg59: yes, the ME is the most important part that have to be replaced. thats also why its listed "Panic Level 9000+" http://www.coreboot.org/Binary_situation
10:00kfsf: mjg59: but because it cant be replaced, intel is dead for security purposes since ~2010
10:00mjg59: You can't currently replace the code running on the system management processor on AMDs either so /o\
10:02kfsf: mjg59: its way better then the intel cpus. intel just turns off your computer after 30 minutes and more modern ones dont boot anyway.
10:03kfsf: mjg59: here is a really fast multicore system you can build free(libre) computers with: ASUS KGPE-D16
10:03kfsf: mjg59: https://raptorengineeringinc.com/coreboot/kgpe-d16-status.php
10:05kfsf: back to topic: Would be great if someone could help do additional development on the binary situation at nvidia and amd :)
10:06kfsf: already done work on amd: http://sourceforge.net/projects/openradeonbios/
10:07kfsf: and i would love to see never supported nvidia 900 series by nouveau. The free software community should not accept when a hardware developer make such hardware crap like requiring signed drivers
10:10karolherbst: kfsf: I think you missunderstood: nouveau doesn't really use non free vbios
10:10karolherbst: I mean you have to parse the vbios on the gpu, there is no other way, because information like clocks, and other card specifics are stored there
10:11imirkin_: also the specific commands to get it going, which are different board-to-obard
10:11karolherbst: there are also some "blob" scripts in there, but I don't think they are used at all?
10:11imirkin_: no, they are
10:11imirkin_: can't do anything without 'em
10:11imirkin_: the init ones get skipped if the board's already initialized
10:12karolherbst: I see
10:13karolherbst: but are there partly "unknown" or does one know what they actually do?
10:13imirkin_: they init the board...
10:13pmoreau: It's mostly (only) setting regs iirc
10:14pmoreau: So you can use rnndb to figure out what they are doing
10:16kfsf: pmoreau: what do you think, when someone ask you how long you would take to make a free blob replacement, what would your answer be?
10:16imirkin_: pmoreau: yeah, like setting UNK1, UNK2, UNK3, UNK4, etc ;)
10:16pmoreau: imirkin_: Exactly! :D
10:17kfsf: imirkin_: what do you think, when someone ask you how long you would take to make a free blob replacement, what would your answer be?
10:17pmoreau: imirkin_: But some are known :)
10:18imirkin_: kfsf: my answer would be that we have nouveau already, and it's indistinguishably free from any other free implementation
10:19pmoreau: kfsf: Yeah, why would you want to start a new free driver?
10:19kfsf: imirkin_, pmoreau : i mean gpu vbios replacement
10:19kfsf: like http://sourceforge.net/projects/openradeonbios/ for AMD cards
10:20imirkin_: kfsf: can't be done.
10:20pmoreau: Ok, this was my first guess, but with imirkin_'s reply, I thought I was wrong
10:20imirkin_: kfsf: boards are produced faster than one could figure out what to do with them.
10:20imirkin_: each board revision needs something potentially different
10:22kfsf: imirkin_: i know of the fast development. but your answer at "can't be done" is just based on time. not on "the gpu itself checks the signature of the VBIOS" and the check key is burned into the GPU itself, right?
10:22pmoreau: Maybe when everything else works correctly, one could have a look at it.
10:22imirkin_: kfsf: time and board availability. there are a LOT of boards.
10:22imirkin_: kfsf: starting with GM200, the answer might be coz of required key signatures
10:23kfsf: imirkin_: there are those reference boards (at least at AMD). this are the one that should be supported mainly
10:23imirkin_: that's great. however you can't buy a reference board, and no board actually sold needs identical things.
10:24kfsf: imirkin_: ah, thats a question i always liked to ask. this key signatures are required by the GPU chip itself or by the VBIOS?
10:25imirkin_: the GPU itself requires signatures for the PMU which in turn is required to load secure ctxsw firmware, which in turn is required for acceleration
10:26kfsf: imirkin_: such reference gpu cards are not been sold?? https://www.techpowerup.com/205245/nvidia-geforce-gtx-980-reference-board-pictured.html
10:28kfsf: imirkin_: with other words - the nvidia 900 series can be run without 3d acceleration when you dont load the signed binary blob? That would be just same result like on AMD when you dont load there the binary blob - and thats fine
10:28karolherbst: kfsf: how is that fine?
10:29imirkin_: kfsf: GM20x modesetting is supported today.
10:30kfsf: karolherbst: sorry, unclear wording. its fine for me. just having graphics and 1080p video playback rendered by the CPU is fine for MYSELF.
10:31karolherbst: imirkin_: can tearing prevention work without accell support? I always thought you need some kind of compositing for that
10:32kfsf: imirkin_: i have read a bit more about the "not available reference boards" at nvidia. many people are using reference boards.
10:32kfsf: imirkin_: it looks like to be same situation like on AMD. When there is the NVIDIA logo on the PCB over the PCIe-slot, then it is in normal cases a reference board. On amd its "AMD"
10:35karolherbst: kfsf: the situation is, if somebody wants to do that, it is fine, but most of the time, there are issues wich are just more important than other things
10:36kfsf: imirkin_: there is still my question: what do you think, when someone ask you how long you would take to make a free blob replacement, what would your answer be?
10:37kfsf: imirkin_: 10hours, 100hours?
10:37karolherbst: kfsf: you can't replace the vbios, because you need certain information from that
10:37karolherbst: kfsf: precise this question: for _one_ vbios or for _one_ chipset?
10:37imirkin_: kfsf: i would answer is that nouveau is free.
10:37imirkin_: so... 0 hours.
10:38karolherbst: and you don't messure programming taks in hours :)
10:38kfsf: karolherbst: for one vbios from one reference card with those voltage regulators and so on
10:39karolherbst: kfsf: mhhh I would say, mhhh 1 year?
10:39karolherbst: but this is pretty optimistic
10:40kfsf: karolherbst: that would be perfectly fine :)
10:40karolherbst: if you say so
10:41kfsf: now havng a 790ti with freed up vbios would be great :)
10:41karolherbst: kfsf: but as I said: you will need your vbios on your card
10:41kfsf: it was released before 1,5years
10:41karolherbst: there is no way around it
10:41ratherDIfficult: i go out, but the security debate is clearly not meant for me, since the 10years i've been physically surveilanced and hacked and exploited entirely, i don't feel like i have something to hide anymore, and the freaky techniques world has i don't think its possible to prevent access when there is enough interest in you
10:42kfsf: karolherbst: when you have a reference PCB bios that would support all reference cards produced. that are more then enough
10:42karolherbst: kfsf: no
10:43karolherbst: there is card specific stuff in that vbios
10:43karolherbst: not chipset specific
10:43kfsf: karolherbst: yes, like voltage regulations and so on
10:44kfsf: karolherbst: this is pcb specific and not only chipset specific
10:44karolherbst: if you want to do that, you could help RE the vbios :)
10:45kfsf: so every card that is technically same (every reference card) can run then the freed up vbios
10:45karolherbst: which would be one model
10:46karolherbst: the thing is, what should the result look like
10:47kfsf: karolherbst: its not one model. its mostly EVERY model that is been released first from every gpu develper. you can take the bios of the ASUS Nvidia reference board and flash it to MSI nvidia reference board - working perfectly fine
10:47karolherbst: if you need to respect everything what's in the vbios and you end up writing the same stuff again, it like putting "GPL" on a copy of the vbios
10:47mupuf: karolherbst: pushed all the blob values?
10:47mupuf: HW_CGBLK is what I was talking about, yes
10:47karolherbst: mupuf: yeah like 400 nvapokes :/
10:48karolherbst: I bet I did it wrong
10:48karolherbst: kfsf: I think you really missunderstand the situation
10:48mupuf: no idea when nvidia will release their fw along with their interface. The idea is that the power sensor is polled by the PMU/PDAEMON
10:48mupuf: yes, there are a ton of pokes
10:49mupuf: I have a tool to poke parts of a mmiotrace
10:49karolherbst: ohh nice
10:49mupuf: so as you can just trim a mmiotrace and replay it
10:49karolherbst: yeah I did some grep, cut, echo magic :D
10:49mupuf: I thought i pushed it in envytools, but i may be wrong
10:49mupuf: if not, it is in /root/ of reator
10:50mupuf: so, the power sensor is annoying because I do not want to get code in the kernel for no good reason
10:50mupuf: as in, I don't want the code to get ripped off
10:50mupuf: I have the code almost ready
10:50mupuf: I need to rebase it on the new nouveau code
10:50karolherbst: I see
10:50mupuf: I can land the vbios part though
10:50karolherbst: yeah I think this would be okay
10:51karolherbst: but how does it work currently within your tool?
10:52kfsf: can someone explain me why i am wrong with my toughts? I have tested that by myself. you can flash "reference nvidia pcb" bios from one vendor to the "reference nvidia pcb" of an other vendor. they are just labeled asus, msi, ...
10:53kfsf: this are reference PCB: https://www.techpowerup.com/img/14-09-15/70c.jpg
10:53pmoreau: mupuf: Need to test the code on other Maxwell cards?
10:54mupuf: karolherbst: what tool? The crap code I sent you? It is a hack
10:54karolherbst: kfsf: because you would have to do that for each model of each generation of cards, and nobody will do that before the vbios is fully REed
10:54karolherbst: mupuf: yeah :D
10:54mupuf: pmoreau: which code?
10:55mupuf: karolherbst: pwr_read is a hack, it is a reverse engineering tool. If it does not work, fix it. We cannot do that in the kernel
10:55pmoreau: "I do not want to get code in the kernel for no good reason" and "I have the code almost ready"
10:55pecisk: Sandro just hinted that we will might have to choose navies in future
10:56pecisk: oops, wrong window :)
10:57mupuf: pmoreau: ah, this code. Well, if you need to read your power usage, I can provide the userspace code
10:57mupuf: it is easier
10:57karolherbst: mupuf: I know this is a hack, but what's wrong with adding nice power sensor support in the kernel?
10:57mupuf: nothing wrong, I want it more badly than you
10:57karolherbst: I see
10:57mupuf: but the kenrel is not the right place for it, the right place is in pdaemon
10:57pmoreau: mupuf: It's just that I got access to a Maxwell card at work, so I could maybe try some things from time to time.
10:57karolherbst: I htought you had some kernel things dones already or did I missunderstand you
10:58karolherbst: so you want to have it in pdaemon
10:58karolherbst: and for this to work you need firmware?
10:58mupuf: yes, and I got that working too ... but nvidia decided signing their pdaemon
10:58mupuf: and we need to support their interface
10:58mupuf: so we need to move our code to the new interface
10:58kfsf: i always asked me: when nouveau have reclocking problems, would the get "fixed", when you dont have to reclock? i think yes, right? Its just modding the bios to only support one speed. so no slowdown, just always higher power usage
10:58mupuf: but it is not documented
10:58mupuf: so we are stuck..
11:00karolherbst: kfsf: laptop users and this is a bad idea
11:00karolherbst: mupuf: I see
11:01karolherbst: but if you do that inside pdaemon you can read those values out through a nice kernel interface and push it into hwmon I assume
11:01mupuf: we keep being promised it is coming soon, but it is getting quite long :D
11:01karolherbst: but what is the reason why this is bad for the kernel?
11:01kfsf: karolherbst: not if they dont care about the battery time because the always have AC connected. Am i right with what i told?
11:01karolherbst: kfsf: nope, laptop user _do_ care
11:01mupuf: we have known about this issue for 2 years (we got warnings) and it has been a year since it happened
11:02mupuf: the reason it is bad is that we cannot have two masters on an i2c bus
11:02mupuf: we would need a mutex on the bus
11:02mupuf: this is doable, but if nvidia's fw does not do it, then we are screwed
11:02mupuf: and we would have to disable the code
11:02kfsf: karolherbst: not all notebook users care. maybe most, but not all. Back to the question: Am i right with my idea?
11:03mupuf: so, this is a potential problem and I want to avoid introducing code that could give out crappy results down the line
11:03karolherbst: kfsf: if you say we should strip functionality, then yes, because that won't happen
11:04karolherbst: you can do that for your card, but don't expect nouveau to ditch support for embedded vbios
11:04karolherbst: mupuf: mhh okay, I think I slowly get it
11:04kfsf: karolherbst: and by the way: solution for notebook users: before gaming flash the one-gpu-speed-only bios and reboot. After gaming flash back original bios
11:05mupuf: karolherbst: yeah, you do not have all the information
11:05mupuf: and I am bad at sharing them
11:05mupuf: as I said, this is not because I want to take credit for my work :D
11:05karolherbst: kfsf: now it's getting silly, try to convince any laptop user to do that, then continue to convince me
11:05karolherbst: mupuf: I know
11:05mupuf: I just think it is better use of your time not to care about this issue right now
11:05karolherbst: mupuf: I am just interessted :)
11:06mupuf: very well :)
11:06kfsf: karolherbst: when i told my mom she have to press one button (script flashing bios) before gaming and after gaming press an other button - thats defenetly not a problem for any user
11:06mupuf: so, what were the results on maxwell?
11:06mupuf: the blob does not use the highest cstates?
11:07karolherbst: kfsf: it is, because I won't reboot just because I want to play a game
11:07karolherbst: kfsf: as I said this is going to be silly, so I stop now
11:07karolherbst: mupuf: mhhh
11:07karolherbst: mupuf: it does, just the pwm value is much lower
11:07karolherbst: 0x54 on highest pstate
11:08mupuf: so, the voltage mapping table was easy? :D
11:08karolherbst: strange that 0x72 worked
11:08kfsf: karolherbst: its a solution until nouveau have working reclocking. The must-reboot its WAY better then not beeing able to play the game you want to play
11:09karolherbst: mupuf: no clue, I think the max value is just something else? I don't know
11:09mupuf: I do not understand what the max value means
11:09karolherbst: we need to get the voltage somehow
11:10karolherbst: I assumed the blob will tell us, but...
11:10karolherbst: maybe we have to use windows for that :D
11:10mupuf: maybe the temperature is the one deciding factor?
11:11karolherbst: it is, but not completly
11:11karolherbst: with different temp I get 5 differen pwm values
11:11karolherbst: the pwm value doesn't change above 50°C though
11:11karolherbst: so this is kind of useless
11:12mupuf: so there are other factors
11:12karolherbst: max clock is one :D
11:12karolherbst: but you wanted to ignore nvidia tool for now
11:12mupuf: and what does maxclock mean?
11:12mupuf: as in, hw-wise?
11:12karolherbst: since kepler (over-)clocking changed
11:12mupuf: that's all I care about
11:12karolherbst: this is coolbits domain
11:13karolherbst: you can set an offset to the max clock in the blob
11:13karolherbst: something between -135MHz and +135MHz
11:13karolherbst: if this offset changes, the pwm values are changing for a specific clock like I showed you in the graph
11:47karolherbst: mupuf: do you know which problems there were with clock gates?
11:47karolherbst: auto seems to work for me, but I didn't do any long run tests
12:02mupuf: not all the engines support auto IIRC
12:37karolherbst: mupuf: mhh okay
12:37karolherbst: but you don't know any specifics I guess
12:37karolherbst: I have only a mobile gpu and just use it for gl stuff, so I can hardly test everything
12:40mupuf: well, I know we need to create an infrastructure that will allow the driver to enable/disable clock gating
12:41mupuf: when it needs to read/write to the register address space of this register
12:41mupuf: sounds good, right? LD
12:42karolherbst: what is the best place for that? not ptherm?
12:43karolherbst: I saw that the blob checks the reg while clocking though :/ maybe my traces are just weird but well
12:44karolherbst: sadly I don't have any personal gains from that :/
12:50mupuf: oops, you did not get it
12:50mupuf: some engines do not support the automatic clock gating
12:50mupuf: and they need to be in stop mode
12:51mupuf: what happens when you try to access a register of an engine that is clock gated?
12:51mupuf: well, you get ... an error
12:52mupuf: so, one needs to get it out of the clock gating mode for that
12:52mupuf: and so, the kernel needs to know the state
12:53hakzsam: basically, you disable CG before reading/write to registers and you re-enable it just after
12:54mupuf: but the kernel needs to track that
12:54mupuf: this is kind of the point of why ben added nv_rd32(subdev, ....)
12:54mupuf: so we could do it
12:54mupuf: and we need to
12:54mupuf: and if not onlu for clock gating
12:54hakzsam: that's why we need an infrastructure for each different blocks
12:54mupuf: but also for power gating
13:00imirkin_: pmoreau: any chance you can get me that glxinfo? (glxinfo -l -s against your G96)
13:00imirkin_: [on mesa 11.0]
13:00pmoreau: imirkin_: Oh right! Sorry, had forgotten about it
13:00imirkin_: np, me too :)
13:03pmoreau: Something is wrong with my graphic stack apparently.
13:04pmoreau: What's the env var for debugging libGL and so on?
13:04karolherbst: mupuf: none engine is in stop mode for me with blob :/
13:04karolherbst: at least clock gates
13:05karolherbst: and everything inside those 0x20200+ regs
13:05karolherbst: pmoreau: LIBGL_DEBUG
13:05pmoreau: And = to what? I tried verbose, debug and all but still got no output
13:07imirkin_: pmoreau: LIBGL_DEBUG=verbose glxinfo
13:07imirkin_: what does that print?
13:07pmoreau: name of display: :0
13:07pmoreau: Error: couldn't find RGB GLX visual or fbconfig
13:07pmoreau: And the same without the verbose
13:07imirkin_: are you sure you're looking at the full output?
13:07imirkin_: i.e. not just stdout
13:07pmoreau: Where could have the rest of it gone?
13:07imirkin_: the rest would go to stderr
13:08pmoreau: I doubt xterm decided to through stderr to /dev/null on its own
13:08pmoreau: But who knows
13:08imirkin_: can you do
13:08imirkin_: LIBGL_DEBUG=verbose strace -f -e open glxinfo
13:09mupuf: karolherbst: some engines will have to. At least on older hw, so we need the infrastructure
13:09imirkin_: pmoreau: perhaps your X server doesn't have libglx loaded?
13:10imirkin_: [check Xorg log]
13:10pmoreau: It did load it
13:10pmoreau: imirkin_: https://phabricator.pmoreau.org/P24
13:11imirkin_: pmoreau: xorg log
13:11pmoreau: imirkin_: https://phabricator.pmoreau.org/P25
13:12pmoreau: I missed the dlopen messages
13:12imirkin_: [ 11518.517] (EE) AIGLX error: dlopen of /usr/lib/xorg/modules/dri/nouveau_dri.so failed (libLLVM-3.7.so: cannot open shared object file: No such file or directory)
13:13pmoreau: I'll upload to 3.7...
13:13pmoreau: And hope I can still compile Mesa
13:13imirkin_: you can resolve it however you want
13:14imirkin_: but you need to be able to dlopen nouveau_dri.so :)
13:17karolherbst: mupuf: I was thinking I add support for auto engines first, because this seems easier
13:17karolherbst: I just have to figure out which one I can set to auto
13:17pmoreau: Now I can't open my compiled version of Mesa, cause it needs 3.6 :D
13:17pmoreau: Oh well
13:18mupuf: karolherbst: just put that in the init function
13:18mupuf: that should do the trickl
13:18karolherbst: yeah I know
13:19karolherbst: but which one I am allowed to set to auto I do not know
13:19pmoreau: imirkin_: Got your data anyway: https://phabricator.pmoreau.org/P26
13:19mupuf: and fini should put it back to the original value (run?)
13:19mupuf: but that won't solve everything
13:19mupuf: we still need to set the master enable bit
13:19imirkin_: pmoreau: great thanks!
13:19mupuf: so you will need a cg subdev
13:19karolherbst: mupuf: do you know which one that is?
13:19karolherbst: because it kind of works for me out of the box
13:19mupuf: it is documented by nvidia
13:19mupuf: it is in PMC
13:20mupuf: I added it to rnndb IIRC
13:21karolherbst: okay, will check it
13:25karolherbst: mupuf: ohhh this is PMC.ENABLE
13:26mupuf: I am not sure how to handle the clock gating on nouveau
13:27karolherbst: mhh the blob leaves some bits on even on unload
13:27mupuf: it is a little more complex than just bashing the registers
13:27karolherbst: but enables lot more
13:27mupuf: I guess we could have a cg subdev
13:27mupuf: and have this subdev only export the master enable bit
13:27mupuf: and allow clock gating some engines
13:28ratherDIfficult: mupuf: i heard you several times talking about clock gating, and i can officially say that you are smarter then me, cause i do not know wtf, is clock gating
13:28mupuf: and also export the state of the engines
13:28karolherbst: power consumption is now more stable after enabling more bits
13:28imirkin_: could do it via the pgob interface... or something
13:28karolherbst: 9.860W is the card now
13:28mupuf: ratherDIfficult: WTF :D
13:29mupuf: imirkin_: PGOB is a state, not a hw feature
13:29mupuf: it just means that the engine is power gated on boot
13:29mupuf: right now, we have a hack to get the engine out of the power gated mode
13:29imirkin_: sure, but... you can adapt :)
13:29mupuf: isn't it done in the init() function of pgraph?
13:30karolherbst: ohh nothing really changed for me in PMC.ENABLE :/
13:30karolherbst: ohh no, it is the same
13:30karolherbst: okay, nice
13:31karolherbst: what is BLG?
13:31imirkin_: block-level gating
13:31karolherbst: this is the only thing not enabled by default for me
13:32karolherbst: "W 0x000200 0xffffffff" in the blob trace
13:32karolherbst: okay, why not
13:32karolherbst: yeah, the card just enables what it wants
13:32karolherbst: and the blob reads the actual value back
13:33karolherbst: the blob is wierd
13:33imirkin_: the blob knows what it's doing
13:33karolherbst: I know
13:34karolherbst: but it starts by disabling nearly every bit again
13:34karolherbst: and slowly enables one bit by another
13:34imirkin_: first it checks which engines are available
13:34imirkin_: then it initializes them one by one
13:34karolherbst: and does a 0xffffffff later again
13:34imirkin_: for good measure ;)
13:34karolherbst: it sometimes disables some bits again mhhh
13:35imirkin_: hakzsam: where did you get those values from?
13:36imirkin_: hakzsam: also note that you're setting bits outside of the mask in the gf100 case
13:36karolherbst: mupuf: okay, so is there a difference in power gating handling compared to clock gating?
13:37karolherbst: or is it basically the same regs
13:39hakzsam: imirkin_, from mmio traces and yes, I just saw that bit which is outside of the mask...
13:40ratherDIfficult: mupuf: is it like how yiou guide a clock line through? heh just one thought before googling
13:41mupuf: hakzsam: and as said privately, check the bitfields again
13:41mupuf: I doubt it is 0x3ffff on fermi
13:41ratherDIfficult: its stupid thing, i see that now they have instant diodes...
13:41hakzsam: it is on nvd9
13:41karolherbst: what is this linux/tegra-powergate.h file :/
13:41hakzsam: mupuf, but obviously not for the gf100 case ;)
13:42mupuf: hakzsam: ?
13:42mupuf: how can you be so sure?
13:42karolherbst: ohh android sources, nvm
13:42mupuf: karolherbst: yop
13:42hakzsam: mupuf, because one bit is outside of the mask 0x3fff?
13:42karolherbst: okay, I still have more W then the blob ...
13:42karolherbst: what else? :D
13:43mupuf: karolherbst: power gating, of course :p
13:43karolherbst: ohhh okay
13:43mupuf: and this one is much messier to do
13:43hakzsam: mupuf, is there a fermi plugged in reator btw?
13:43mupuf: but hey, nvidia will release their fw and handle that for us, yeepee!
13:43karolherbst: what the heck
13:44mupuf: but we still will need to add support for all the cards older than maxwell
13:44karolherbst: powergate is on a board level on tegra?
13:44mupuf: hakzsam: nope
13:44karolherbst: not part of the gpu driver
13:44hakzsam: mupuf, would you want to plug one, please? I would like to check bitfield of some regs
13:45karolherbst: mhh I have to investigate this later
13:45mupuf: hakzsam: I could, but I would have to come out of my chair :D
13:45mupuf: which one do you want?
13:45hakzsam: all fermi, except GF117 and GF119
13:46mupuf: can I see the scan of the regs on your nvd9?
13:46mupuf: especially the one where you set the bit
13:46mupuf: I have no reason to think the bit would be non-sticky
13:47mupuf: but one never knows
13:47imirkin_: well it's not just a scan... it's actually interesting to see what the read/writes were in the trace
13:47hakzsam: mupuf, http://hastebin.com/gineroxodo.vala
13:50mupuf: imirkin_: of course!
13:50mupuf: hakzsam: ack, interesting
13:50ratherDIfficult: anyways good luck, the terminology is off my level, why can't you do som easier explanations sometimes?
13:51mupuf: ratherDIfficult: wikipedia has everything on it, AFAIR
13:51hakzsam: mupuf, did you plug the fermi?
13:51mupuf: hakzsam: nope
13:51ratherDIfficult: like anyways, to pausa a card there is trap and interrupt possibilities for mmio and command buffer dma
13:52hakzsam: imirkin_, btw, this could help a bit http://cgit.freedesktop.org/~darktama/nouveau/commit/?id=68227671487233ba5b01a5ac64baa8d896431f97
13:52ratherDIfficult: and with that said, there is no problem to get all the bits out, without caring what the firmare is
13:53hakzsam: imirkin_, that's why I said "unknown engines" in the commit msg :)
13:53imirkin_: is there significant perf increase by bumping the clock without the memory?
13:53hakzsam: +1000FPS with glxgears (and vblank_mode=0)
13:53mupuf: imirkin_: would be nice to know which timeout is which
13:53imirkin_: surprising that glxgears cares that much
13:54mupuf: if you feel like reing
13:54hakzsam: imirkin_, but I didn't test with heaven yet
13:54glennk: imirkin_, usually takes msaa and/or deferred rendering to start stressing memory on the mid/high range cards
13:55imirkin_: glennk: boot clocks can be pretty low
13:55imirkin_: glennk: like 10% of max speed
13:55imirkin_: for memory, not only shader cores
13:55glennk: right but compare the memory bandwidth of 8x msaa vs 1x
13:55imirkin_: if your point is that 8x msaa has higher memory bandwidth requirements, then i agree.
13:56glennk: its just where you hit the bottlenecks first, with 1x its often core before memory
13:56imirkin_: for memory that's clocked up to capacity, sure
13:56hakzsam: now, the next step is to fix memory reclocking on my gf119 (should be not as easy as the previous patch)
13:57imirkin_: but for memory that comes up at 10% of its actual speed... dunno
13:57imirkin_: hakzsam: i'd be in favor of a patch that enabled it without memory relocking
13:59glennk: glxgears just writes depth and color and no texture reads so it'll hit fill rate limits before memory, likely even with memory at 1/10 its max speed
13:59hakzsam: imirkin_, mmh, so I need to disable NvMemExec
13:59imirkin_: hakzsam: yes
14:00hakzsam: mupuf, same bitfield on GF114, let me update my patch and increase that mask
14:03mupuf: set the mask only for this bit
14:05hakzsam: is this really a problem? I don't think
14:05hakzsam: I used the mask returned by nvascan as for other regs
14:05mupuf: well, I would advice getting rid of the mask and doing a plain write
14:06imirkin_: hakzsam: does the blob mask or write?
14:06hakzsam: we don't do that for gk104
14:06imirkin_: (you can tell if it's a mask if it reads it right before)
14:06mupuf: hakzsam: or better, write 0xfffffffff and look at what the blob resets
14:07hakzsam:  43.551104 MMIO32 R 0x122310 0x00000078 PIBUS.MMIO_HUB.CFG[0x4] => 0x78
14:07hakzsam:  43.551116 MMIO32 W 0x122310 0x00000800 PIBUS.MMIO_HUB.CFG[0x4] <= 0x800
14:07hakzsam: imirkin_, I would say it's a mask?
14:07imirkin_: sure looks like a mask to me.
14:08hakzsam: mupuf, do you agree with those masks?
14:08imirkin_: hakzsam: they're sequential in the trace, right?
14:08imirkin_: hakzsam: you didn't just grep?
14:08hakzsam: sequential, yes
14:10hakzsam: imirkin_, I'll force NvMemExec to false for gf100 and that should be enough for now, right?
14:12mupuf: hakzsam: well, you do not know what the blob masks, so how about you try it?
14:12mupuf: force one reg to all 1 and see what it resets
14:12mupuf: you guessing is not OK
14:13hakzsam: I think it is, but I will check once I have finished the patches which allow user to reclocking the core clock :)
14:14hakzsam: *to reclock
14:23hakzsam: imirkin_, sent
14:24imirkin_: should give skeggsb something to look at on his flight back :)
14:25hakzsam: 07: ~1920 FPS, 0f: ~2820 FPS (with glxgears on my gf119)
14:25hakzsam: great improvements :)
14:25imirkin_: how about heaven?
14:25hakzsam: I'm going to check
14:40mupuf: hakzsam: so, I did the check for you
14:40mupuf: the blob does not mask stuff out, it writes the value
14:41mupuf: # nvapeek -c1 0x122330
14:41mupuf: 00122330: ff73ffff
14:41mupuf: # nvapeek -c1 0x122330
14:41mupuf: 00122330: 00100064
14:41mupuf: I hate cargo-culting stuff and using random values
14:41hakzsam: you're right
14:41mupuf: so please check for all the registers
14:42hakzsam: I'll do, and this should be similar on gk104 but we also use mask ...
14:42mupuf: prove it ;)
14:42mupuf: and that's what cargo-culting means
14:43mupuf: copying without understanding
14:43hakzsam: imirkin_, http://hastebin.com/uqegajekat.avrasm
14:44imirkin_: hakzsam: nice... that's like a 50% bump
14:44imirkin_: from 0.000000001 fps to 0.0000000015 fps ;)
14:44hakzsam: yeah :)
14:44mupuf: can you check with nvatimmings if the clock of the domain is the one expected?
14:45mupuf: we may not have the clock tree in the right state
14:52eyestrain: hello, i have a question. i get eyestrain from dithering, so i want to turn it off for all users, even at the login screen when nobody is logged in yet. how to do this? http://nouveau.freedesktop.org/wiki/Dithering/
14:54eyestrain: xrandr --output DP-1 --set "dithering mode" off
14:54eyestrain: this is what must be set for all users and even the command line
14:54imirkin_: hmmm... it defaults to on?
14:55eyestrain: its auto!
14:55eyestrain: i think my monitor does not report 8 bit
14:56imirkin_: i don't see anything that would set the dithering mode to non-off on init
14:57eyestrain: you believe its only active when the user sets it manually to on?
14:57imirkin_: or auto
14:58eyestrain: well its auto by default
14:58eyestrain: i use debian 8
14:58imirkin_: DITHERING_MODE_OFF = 0x00
14:58imirkin_: and i don't see what would ever set the dither mode other than userspace
14:58imirkin_: what gpu do you have?
14:58eyestrain: nvidia quadro nvs 295
14:59imirkin_: that's a G96 or so?
15:00imirkin_: i see it.
15:00imirkin_: yes, you're right, it gets set to auto ;)
15:00eyestrain: ok so how do we force off before or when any x starts?
15:01imirkin_: well it's just a kms property
15:01imirkin_: figure out how to set those and you're good to go
15:02imirkin_: there's the modetest tool, probably others
15:03imirkin_: eyestrain: what happens when you go to http://jonas-baehr.de/~mick/gradient.png ?
15:03imirkin_: do you see stripes or not?
15:04eyestrain: nope, no stripes
15:04imirkin_: so i guess we misdetect
15:05imirkin_: mind pastebinning the otuput of 'xrandr --prop' ?
15:05eyestrain: you see, this dithering problem is real big for sensitive people like me, we have forum threads going on about PWM and dithering, and using AMD cards is a big no because they force temporal dithering
15:05eyestrain: Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 8192 x 8192 DP-1 connected 1920x1080+0+0 (normal left inverted right x axis y axis) 531mm x 298mm EDID: 00ffffffffffff0009d1387945540000 3317010380351e782e30a5a455539f27 0d5054a56b80d1c081c081008180a9c0 b30001010101023a801871382d40582c 4500132a2100001e000000ff00564344 3034393034534c300a20000000fd0032 4c1e5311000a202020202020000000fc 0042656e512045573234343
15:06eyestrain: hm thats difficult to paste here
15:06imirkin_: pastebin please. or hastebin. or something else.
15:06imirkin_: but not in-line here
15:07imirkin_: cool thanks. now to remember how to operate parse-edid...
15:08eyestrain: its off because i used xrandr before
15:08imirkin_: yeah i get it
15:12imirkin_: hm yeah. that edid is a bit sparse.
15:13eyestrain: it's AMVA+ monitor, i think they only use 8 bit
15:14imirkin_: BenQ EW2440L -- that's the model right?
15:14imirkin_: so it doesn't have bpc filled in
15:15imirkin_: which makes us set it to 6, with a few exceptions where we set it to 8
15:15eyestrain: panel is BMS M240Q006 V0
15:15imirkin_: only set it to 6 for LVDS (and eDP). for DP it should default to 8
15:16eyestrain: whats LVDS, is it HDMI?
15:16imirkin_: it's internal laptop panel usually
15:17imirkin_: as is eDP
15:23imirkin_: i wonder if it detects 8bpc but decides to dither anyways
15:27eyestrain: to make it 10 bits?
15:27imirkin_: not sure, sorry
15:28eyestrain: i think most monitors dither themselves, so if they're 6bpc (cheap IPS), then on top of that the graphics driver might dither, too
15:29imirkin_: this logic was very much written with stupid cheap internal laptop panels in mind
15:29eyestrain: yeah i see, hope you can change the behavior
15:29imirkin_: patches welcome
15:30imirkin_: i.e. figure out why it decides to dither, and come up with an alternative heuristic that will fix it for your use-case
15:34eyestrain: any more information i could provide?
15:39eyestrain: BWT: the dithering, it's also used in the high resolution text console?
15:39eyestrain: bwt = by the way
15:41imirkin_: the dithering is a modesetting property
15:41imirkin_: it's set to whatever you set it to. you can use the modetest tool (part of libdrm) to change it. there are probably other ways of doing it as well.
15:42eyestrain: ok thank you for your time! i'll look into kernel mode settings and when i find out more i'll report
15:50RSpliet: oh boy... I might have just opened a wasp nest... fun stuff for the coming days
15:50imirkin_: oops? :)
15:50RSpliet: nah, no oops
15:51RSpliet: but those opcodes previously (lazily) labelled exit
15:51RSpliet: are probably related to a branching mechanism rather
15:51imirkin_: that's *like* exit :)
15:51RSpliet: our friend Max might consider it exit
15:52RSpliet: it's wrong, but faster