00:00karolherbst: when I am on battery my GPU only draws 40W
00:01karolherbst: and I am sure nouveau doesn't limit it
00:03Lekensteyn: "only" heh
00:03karolherbst: Lekensteyn: check /sys/class/power_supply/BAT0/current_now
00:03karolherbst: I think we messed the EC up a little
00:04karolherbst: ohh wait
00:04karolherbst: no, my mistake
00:06karolherbst: Lekensteyn: could it be, that the value doesn't make much sense?
00:08karolherbst: they are too high
00:09karolherbst: if divided by 0x10 then it starts to make sense
00:10karolherbst: Lekensteyn: but the problem still remains: it's just a 16bit value, too inacurate and too volatile
00:11karolherbst: ohhhh wait
00:11karolherbst: silly me
00:11Lekensteyn: endianess by any chance?
00:12karolherbst: but still
00:12karolherbst: GPU idling at 0xf: 0xf350 values
00:12karolherbst: oh no, this is actually fine
00:13karolherbst: we just need to figure out how to map this to mA values
00:15Lekensteyn: two-s complement: (0xf350 ^ 0xffff) + 1
00:16karolherbst: not quite
00:16karolherbst: the value seems a bit too high actually
00:16Lekensteyn: I have values around 1A, but that is with GPU off
00:17Lekensteyn: with GPU on it doubles (1.9-2A)
00:18Lekensteyn: karolherbst: you killed your machine? :P
00:18karolherbst: gk104_fifo_intr strikes again
00:19karolherbst: GPU off: email@example.comV
00:20Lekensteyn: what is the relative different in power consumption
00:20karolherbst: GPU on@ 10W: firstname.lastname@example.orgV
00:21karolherbst: GPU email@example.comW: firstname.lastname@example.orgV
00:22karolherbst: ohh wait, the second value was at 14.84V
00:27karolherbst: it somehow fits though
00:29karolherbst: according to this, my keyboard LEDs drain 4W
00:30Lekensteyn: I think I'll reduce the power consumption of my laptop and lights given the time ;)
00:30karolherbst: Lekensteyn: :D
00:31Lekensteyn: any final tests you want to do?
00:31karolherbst: Lekensteyn: I could add the code to the wmi module I am using
00:31Lekensteyn: to read battery status?
00:32karolherbst: it already reads out stuff from the EC
00:32karolherbst: like fan speed
00:32Lekensteyn: hm, but this info should normally be shown in sysfs (power_supply), let me check the source
00:33karolherbst: Lekensteyn: I am sure it simply uses the ACPI methods
00:33karolherbst: and the ACPI method returns 0xffffffff
00:33karolherbst: so what should it do?
00:34Lekensteyn: apply a hack like this: https://github.com/Lekensteyn/acpi-stuff/blob/master/Clevo-B7130/BatteryFix.dsl https://github.com/Lekensteyn/acpi-stuff/tree/master/Clevo-B7130/clevo-battery-fix
00:34Lekensteyn: assuming that your _BST method has this check for negative values
00:34Lekensteyn: like this https://github.com/Lekensteyn/acpi-stuff/blob/master/Clevo-B7130/BatteryFix.dsl#L35
00:35Lekensteyn: apply it once, and then you can read through sysfs
00:35Lekensteyn: (don't simply load it, but adapt your own acpi method :))
00:36karolherbst: I do it through the wmi
00:36karolherbst: I already added hwmon stuff to it, so I can add this as well
00:37Lekensteyn: if you patch your BST method, then tools like powertop also report correct vlaues
00:50karolherbst: Lekensteyn: https://gist.github.com/karolherbst/9ed0a6d52779fa35a759ddfa581ccf9c
00:55karolherbst: now the voltage
00:57karolherbst: +0x8 offset to BPR0
00:58karolherbst: also 16bit value
00:58karolherbst: but now with a different endianess
00:58karolherbst: because it makes sense
00:58Lekensteyn: is 0x32 for me
00:58Lekensteyn: that is
00:58karolherbst: 0x2a+8 ;)
00:59Lekensteyn: can you pass a copy of your dsdt? (or more specifically, your EC region)
01:03Lekensteyn: here with offsets: http://sprunge.us/OeLR (I made mistakes in the past with calculating offsets, so having a lookup table is helpful :))
01:04karolherbst: Lekensteyn: updated: https://gist.github.com/karolherbst/9ed0a6d52779fa35a759ddfa581ccf9c
01:04karolherbst: I think I should group those things better
01:07Lekensteyn: let me know if you made progress with finding the battery usage, I'll enter a different power state now!
01:07karolherbst: what do you mean?
01:07karolherbst: uhm wait
01:08karolherbst: no, should be fine
01:08karolherbst: hu? I though this is the power usage already?
01:09Lekensteyn: what I meant is when you figure out why it consumes extra power
01:09karolherbst: first I really need to know
01:09karolherbst: now I work with precise values
01:09karolherbst: before that my assumption was based on calculated garbage
01:09karolherbst: I really like this now :) good think we talked about it :p
01:10karolherbst: mhh odd
01:10karolherbst: the value can't be right though
01:11karolherbst: or not enough sampling
01:12karolherbst: my laptop draws 5W more than what the GPU reports... :(
17:00mark4: i think im having problems with nouveau. i cannot see any other possiblity of it being anything else with the problems im seeing.
17:00mark4: firstly, every time I power on my start up has to be this
17:01mark4: 1: boot, 2: log in, 3: startx, 4: TAB OUT, 5: killall X, 6, startx. if i do not kill and restart X in that way NOTHING responds in X at all. ever
17:01mark4: cannot launch anything, dont get menus
17:01mark4: second issue:
17:02mark4: if i close the lid and the laptop suspends and i come back later again, the laptop does not "Seem" to respond but theres a difference to the above issue
17:02mark4: if i try to drag a existing window with the mouse it does not seem to move
17:02mark4: but if i suspend again and unsuspend it has moved
17:02mark4: i think things are moving but the frame buffer is no longer being updated. the ONLY fix in this case is a reboot
17:03mark4: cannot killal X and startx because the frame buffer still doesnt work
17:04imirkin: are you using plasmashell?
17:04mark4: no whatas plasmashell?
17:05mark4: kde plasma? fsck no that thing is garbage
17:05imirkin: some kde thing
17:05imirkin: what is "tab out" btw?
17:05mark4: ran plasma for about a year and every 2 weeks of using it plasma totally deletes 100% of its config and you have to totally reconfigure it from scratch
17:05mark4: control alt f2, log in, killall X
17:05mark4: alt f1
17:05imirkin: ah, switch vt
17:05imirkin: is the more common nomenclature.
17:06mark4: tab out is very common gamer speak lol
17:06mark4: i used to be a wow addict
17:06imirkin: usually it implies switching between applications in a window manager
17:07imirkin: switching to a different VT is a very different things.
17:07mark4: yea i know lol
17:07mark4: but were nit picking here
17:07imirkin: when you've switched VT's, X is effectively suspended
17:07imirkin: no X application can make any progress
17:07mark4: i know this
17:08imirkin: so ... i'm not sure what you're complaining about
17:08mark4: you dont understand, i launch X and nothing responds at all. no launching of anjything UNTIL i switch VT and KILL x and start it a second time
17:08mark4: so boot, log in, start x, ctrl alt f2, log in, killall x, exit, alt f1, startx. now everything works
17:10imirkin: i see
17:10imirkin: and what are you looking for?
17:11mark4: also, if i suspend my laptop, once i come out of suspend the frame buffer never gets any updates from any output to it. drag a window and it seems to stay where it is. but if i suspend again and come out of suspend the window WAS moved, its just the FB is never updated to show that move.
17:11mark4: what am i looking for? not having to KILLALL X every time i reboot the laptop?
17:11imirkin: yeah, that'd be nice
17:11mark4: i would lilke to do boot, log in, start x and be working
17:12mark4: is this a common issue with nouvuea?
17:12imirkin: first time i've heard of such a thing
17:12mark4: ok because i was going to say, how do the devs think this would be acceptable to anyone lol
17:12mark4: if its not common its not their fault
17:12imirkin: i meant more like 'what are you looking for from me'
17:12imirkin: rather than in the more general sense
17:13mark4: well any help you can give now that your familiar with the problem lol
17:13mark4: so far all we did was clarify what my issue was
17:13imirkin: pastebin dmesg and xorg log from that first "broken" X invocation would be a start
17:14mark4: there is NOTHING in dmesg or xorg.0.log
17:14imirkin: they're empty?
17:14mark4: i mean, nothing unusual
17:14mark4: no errors
17:15imirkin: all is well and yet nothing is working
17:15imirkin: without providing additional information, you're on your own
17:15mark4: until i kill X and start it again, then everyting is working fine
17:15mark4: not sure what additional info i can supply lol
17:16mark4: Latest version available: 1.0.15
17:16mark4: 01:00.0 VGA compatible controller: NVIDIA Corporation GT218M [NVS 3100M] (rev a2)
17:16imirkin: you can supply the dmesg and xorg logs
17:18mark4: [ 90.917] (EE) Failed to load module "fbdev" (module does not exist, 0)
17:18mark4: [ 90.917] (II) LoadModule: "vesa"
21:41karolherbst: currently thinking about what we shall return if the GPU is powered of though hwmon
21:41karolherbst: always 0 or something like -EAGAIN?
21:42imirkin_: EAGAIN is probably better if that's well-handled by hwmon
21:43imirkin_: or ENODEV
21:45karolherbst: hwmon isn't the problem here
21:45karolherbst: it simply returns what the driver wants
21:45karolherbst: userspace is the issue
21:45karolherbst: I think there are monitoring application which simply skip sensors returning an error
21:46karolherbst: but ksysguard seems to be broken for me right now, cause it doesn't give me the sensor list anymore
21:46imirkin_: ask the hwmon guys?
21:46imirkin_: and/or read the docs for the APIs?
21:48imirkin_: doesn't say anything useful
21:48imirkin_: check the libsensors source?
21:48karolherbst: probably the best idea
21:48karolherbst: at least "sensors" either returns the value or N/A
21:49imirkin_: that's a good start
21:49imirkin_: oh, also check what e.g. amdgpu does
21:49karolherbst: something broke
21:50karolherbst: power1_input returns -22 as the file content
21:51karolherbst: uhhh okay meh
21:59karolherbst: much better
22:00karolherbst: imirkin_: I guess I shall fix this the right way: https://github.com/karolherbst/nouveau/commit/a4860d3ae6e5deec5998ce62b047fa586be83b75.patch
22:01imirkin_: why is val becoming an err value?
22:01imirkin_: that sounds like the bug :)
22:01karolherbst: because we don't seperate value and return code in nvkm
22:01imirkin_: so the thing consuming it needs to differentiate
22:02imirkin_: (and make the assumption that negative current/temperature are unlikely to happen)
22:02imirkin_: or separate the two out
22:02imirkin_: whichever way is fine, but the current thing is nto :)
22:02karolherbst: well on a GPU it makes no sense to have negative values
22:02karolherbst: except the temperature sensors could report such
22:02karolherbst: I think I fix it the right way
22:03karolherbst: and split ret/value apart
22:03karolherbst: all the way
22:03imirkin_: and hopefully for ben too :)
22:03karolherbst: the current code is worse than what we had before, because the old hwmon didn't had this split as well
22:04karolherbst: so we simply returned negative values and hwmon detected this as an error
22:04karolherbst: I think
22:04karolherbst: or maybe not?
22:04karolherbst: oh well, doesn't matter anyway
22:17karolherbst: it seems like the new prefered way for return type is long and not int
22:26karolherbst: and we may soon have to convert to m°C, but somehow I don't care enough about it to change that as well
22:45RSpliet: Guess we can't circumvent the negative value problem by converting hwmon to accept temperatures in kelvin? Getting a negative valid temperature then is about as likely as overflowing a 64-bit counter...
23:08karolherbst: imirkin_: I hope this is more to your liking? https://github.com/karolherbst/nouveau/commit/fbc648c728b091cff5be3ed79b9338182211a415
23:08karolherbst: RSpliet: it already accepts negative values
23:08imirkin_: seems fine
23:09karolherbst: RSpliet: which is the entire issue we are talking about already
23:09karolherbst: if it would interpret negative values as error codes, there wouldn't be any issues
23:26karolherbst: okay, now why do we return 0.6V if the GPU is powered off
23:27karolherbst: uhh, okay,that's why
23:27karolherbst: at least sensors does something reasonable: https://gist.githubusercontent.com/karolherbst/4dd67eeb320606e3205d84c429e5f75d/raw/1e190c32575f4bc31c75c6c5476d1d41e349261b/gistfile1.txt
23:29karolherbst: and ksysguard seems to be able to handle those sensors as well
23:29karolherbst: allthough it interprets an error code as 0, which is technically wrong
23:32tobijk: karolherbst: don't bother too much about ksysguard, with 4.13 it can read out cpu clocks anymore :D
23:32karolherbst: I don't tough legacy software anyway
23:32karolherbst: and in kde anything <5 is legacy
23:33tobijk: ksysguard 5.10.5 that would be :>
23:33tobijk: i meant kernel 4.13 :>
23:33karolherbst: I see
23:33karolherbst: well it doesn't list the sensors for me
23:34tobijk: oh i actually have a nouveau sensor
23:34tobijk: never noticed
23:35tobijk: well, doesent work
23:35karolherbst: we report quite a lot of stuff though hwmon