00:06karolherbst: huh.. new docs: https://download.nvidia.com/open-gpu-doc/Host-Fifo/volta/gv100/
00:36HdkR: Why the heck isn't this website just a github repo somewhere?
00:43karolherbst: beats me
00:44airlied: seems like something that should be asked :-)
00:45karolherbst: that power sensor stuff is anoying me
00:46karolherbst: even now with the values running the nvidia driver
00:46karolherbst: nvidia-smi reports 21W, but when I calculate it, I get 28W
00:47gnarface: maybe aliens are stealing the extra 7W as a tax
00:47gnarface: like in that rick and morty episode
00:49karolherbst: probably :p
00:50karolherbst: the ina3221 doc also has some tables with offset data... but applying those also don't get me spot on :/
00:50karolherbst: ohh, actually I forgot something to apply
00:56karolherbst: but I doubt I actually have to adjust it myself
01:02karolherbst: on a closer look, those adjustments are in a <1% area
01:03m00n: Hello, just wondering if there is any support for SLI setups? If not, what happens to the secondary card, does it just sit there idle? I don't have a second monitor anymore, so I cant devote it to a screen.
01:05karolherbst: m00n: I guess the GPU just does nothing
01:06gnarface: m00n: in theory you could run cuda stuff on it or use it as a seat warmer or something
01:06gnarface: i note this from the featurematrix page "SLI or even multicard setups are very rare among developers. You should start hacking with us, if you have such a setup."
01:07gnarface: (but every status is N/A or TODO) https://nouveau.freedesktop.org/wiki/FeatureMatrix/
08:44karolherbst: mupuf: btw, did you ever run into the issue, that you weren't able to recalculate the power consumption reported by the nvidia driver?
08:44karolherbst: was checking that yesterday and I wasn't able to get the same value as nvidia-smi reported
08:45karolherbst: 21W reported, 28W calculated through the values read out through i2c :/
08:46karolherbst: mupuf: but... it totally fits
08:47karolherbst: on high load: 83W vs 144W
08:47karolherbst: and 160W is what I got as the highest reported one under nouveau
08:47karolherbst: I am just wondering if you have any idea what this is about
08:47karolherbst: error seems bigger the higher the consumption
08:48mupuf: karolherbst: sounds like you have a different resistor than what you think
08:48mupuf: or the ADC is not linear, which would be weird
08:48karolherbst: the error isn't linear
08:49mupuf: karolherbst: can you try plotting it?
08:49karolherbst: yeah, I will
08:49mupuf: X/Y plot
08:49karolherbst: have to check when I have time for that today :) today is meeting day
08:49mupuf: AKA scatter plot
08:49karolherbst: yeah... currently I only have three stable values
08:49karolherbst: adaptive idle, high perf idle and full load :)
08:50karolherbst: I try to find more stable values I could check
08:51karolherbst: mupuf: btw, we also have the config value for the alarm register in the vbios :)
08:51karolherbst: most likely
08:53karolherbst: i2get ... 0x0f w: 0x0360 vbios: rail.config + 0x3: 60 34... it's a bit off, but maybe they mask it
08:53karolherbst: or maybe it's something else...
08:54mupuf: karolherbst: don't try to get stable values
08:55mupuf: just run whatever load and keep on logging the expected vs nvidia's
08:55karolherbst: mhh... actually, I had some tool reading out the power consumption from nvidias driver...
08:55karolherbst: just have to find it
08:56karolherbst: ohh, doesn't matter on gm204 anyway... nvidia-smi might be able to report it in this non weirdo gui thing
09:00karolherbst: nvidia-smi --query-gpu=power.draw --format=csv | tail -n 1: 11.72 W
09:04karolherbst: i2cget is just stupid as it returns values in big endian :/
09:14mupuf: why are you using i2cget?
09:14karolherbst: it works
09:14mupuf: but this is not what nvidia reads
09:14karolherbst: what does nvidia read?
09:15mupuf: use nvaspyi2c
09:15mupuf: this is spying the i2c bus to get the exact measurement nvidia got
09:15mupuf: it decodes the SDA/SCL lines
09:15mupuf: as the PMU polls on it
09:19karolherbst: mupuf: doesn't work here
09:20karolherbst: all reads are errors
09:21karolherbst: mupuf: and even if, if we would get other values through that, we are in a world of pain anyway, because we don't know _how_ nvidia got those values in the first place
09:22karolherbst: keep in mind that this is a gm204
09:25karolherbst: mhh, allthough a few values seem to get out
09:25karolherbst: just pain to parse it
09:25karolherbst: anyway, the values are the same
09:28karolherbst: uff, scratch that, some values are obviously wrong as the reg -> data matching doesn't work relibly
09:29karolherbst: getting a shunt value in a bus reg and vice versa
10:00karolherbst: mupuf: ohh, actually, the error is a f(x)= a + bx function...
10:01mupuf: karolherbst: you are probably probing the wrong bus, there are many buses ;)
10:02karolherbst: no, I was probing the right one
10:02karolherbst: but I got the same values as I got with i2cget, just i2cget is way more reliable
10:05karolherbst: mupuf: https://gist.github.com/karolherbst/d10fdd9d1c5a33230ab177e0347149d7
10:24karolherbst: x*0.55 + 4.35 is what gets me to nvidias values :/
10:28karolherbst: mupuf: soo, any ideas?
10:49karolherbst: only thing in the rails are those "60 34" bytes... allthough there are two more rail entries in the vbios with values...
10:50karolherbst: sadly we can't just fake the vbios on that gpu :/
10:51karolherbst: mhh.. maybe on my kepler we have the same issue, but nvidia-smi doesn't report the power consumption there
10:51karolherbst: should probably check on a maxwell1 at work
11:07mupuf: I don't see why there would be an offset
11:07mupuf: but yeah, this is likely described in the vbios somewhere
11:07mupuf: seems stupid though
11:16karolherbst: yeah... :/
11:17karolherbst: just to make sure, I tried the values from nvaspyi2c and calculated the power consumption, and I also got values like 140W and so on :/
11:17karolherbst: soo something is weird there
11:18karolherbst: I tried to see if we have something similiar on kepler/maxwell1 going on in order to be able to re the vbios
11:19karolherbst: mupuf: what is bothering the most is, why the sensor itself returns too high values? I thought the INA3221 is supposed to be accurate enough to not do this
11:41mupuf: it's not what is worrying me
11:41mupuf: show me your code
11:41mupuf: and try with different values for the resistor
11:42mupuf: this makes no sense physically to need to add an affine function to fix the output
11:45karolherbst: mupuf: 6 mohm seem to work for low loads... but for high loads the power consumption is still too high
11:45karolherbst: something is just super odd here
12:28karolherbst: mupuf: are you currently at home?
12:28mupuf: karolherbst: nope, sorry
12:28mupuf: and reator is not available anymore. It has another drive
12:28karolherbst: ahh, I see
12:28karolherbst: stupid maxwell2 :/
12:29karolherbst: there is no way to get the power consumption on kepler gpus with the nvidia driver, is there?
12:36mupuf: kepler 2 had it IIRC
12:36karolherbst: right.. but I have no kepler2 at home :p
12:37karolherbst: I already see it coming: I will go to the office today, check the GPUs there and none of those show this behaviour I have on my gm204 at home...
12:38karolherbst: but... I think the maxwell1 I have at the office don't have any power sensors anyway...
12:38mupuf: karolherbst: yeah, that's annoying
13:46karolherbst: mupuf: ... I have maxwell1 one here where the power consumption is reported... but it doesn't have a sensor
13:47karolherbst: I guess this would be fun if we want to re how to calculate the power consumption on gpus without a sensor
13:48mupuf: oh, interesting
13:48mupuf: it likely sources it from the voltage regulator
13:48karolherbst: on idle it reports 1W
13:49karolherbst: but the budget is also only 30W :D
13:49HdkR: 1w total GPU usage while idle would be nice
13:50karolherbst: mupuf: btw, I've got myself an eGPU case for work in order to fix crashes/bugs with those cases, but it's super nice to just unplug/plug new GPUs without having to reboot :)
13:50mupuf: karolherbst: nice!
13:50karolherbst: modprobe -r nvidia; .. turn off case, switch GPU, turn on case; modprobe nvidia :)
13:50mupuf: great idea! Is it working well?
13:50karolherbst: nouveau crashes when you unplug/turn off the case :D
13:50karolherbst: but besides that, yeah
13:50karolherbst: rendering seems to work fine
13:50HdkR: Oh hey, I was wanting to try that
13:50karolherbst: the PCIe bus is just .... slow
13:50karolherbst: 4x 3.0...
13:51HdkR: Since I have an eGPU as well
13:51karolherbst: I've found a case for 240€ :)
13:52karolherbst: and it has a 400W PSU
13:52karolherbst: or 450?
13:52karolherbst: anyway, it's enough
13:52karolherbst: mhh... I ran out of gpus to check :/
13:53HdkR: Main issue I have is pushing a 4k display back over the pipe at 60hz isn't actually feasible
13:53karolherbst: not really, no
13:53HdkR: Caps out at ~47FPS when running games and things D:
13:54karolherbst: FHD is the max you should go
13:55mupuf: karolherbst: this is great for testing though
13:55karolherbst: ohh actually, let me check my pascal, just in case
13:55mupuf: fewer chances of the host dying
13:55karolherbst: mupuf: why is that?
13:56karolherbst: afaik there is no additional protection
13:57karolherbst: you have your PCIe bus, you can do DMA, you can mess up
13:59HdkR: IOMMU doesn't save you from everything
14:00karolherbst: iommu is optional
14:00HdkR: Don't hurt me like that
14:01HdkR: iommu should never be optional :P
14:01HdkR: As much as I love PCIe devices being able to snoop any memory it wants to
14:02karolherbst: I really hate those 6+2 pin plugs...
14:03karolherbst: no power consumption reported with a GTX 780 either :/
14:04HdkR: Always start with the two pins first, makes life easier
14:05HdkR: Little plastic tab on the 6pin side for holding the two pins in place sucks :P
14:08karolherbst: mhhh.. actually... let me check something
15:12karolherbst: mupuf: ever ran into the issue that the i2c adapter doesn't support the smbus commands?
15:13mupuf: this is indeed possible
15:13mupuf: I was reading some datasheet last month and they generally try to be compatible, but no guarantees
15:13karolherbst: maybe it's because of volta... *sigh*
15:18karolherbst: *sigh*... no GPU available at the office which would be in any kind helpful for me....
15:18HdkR: What would be helpful?
15:18karolherbst: HdkR: gpu where nvidia-smi reports the power consumption _and_ where we are able to fake the vbios
15:19karolherbst: like a 780 ti I think?
15:19HdkR: Yea, that's a very small window
15:19karolherbst: maybe some quadro/tesla cards as well
15:19karolherbst: mupuf has a few of those :p
15:20karolherbst: HdkR: or you tell us what we do wrong while reading out the power sensor :p
15:21HdkR: lol, I'm just an SM tard. I don't know anything about that anyway :P
15:36karolherbst: HdkR: mhh, but I thought that a GTX 780 supported the power.draw option :/
15:37karolherbst: or maybe it never did
15:40HdkR: Not sure, never owned one :P
15:46karolherbst: heh.. something is od
15:46karolherbst: I just can't use the i2c stuff on any device on that machine
15:48karolherbst: even nvidia fails to use it as it seems
15:50HdkR: Seems like a pretty big issue
15:50karolherbst: heh... after loading nouveau it works
15:51karolherbst: ohhh... I am sure something super stupid is going on here
15:55karolherbst: HdkR: are you aware of the nvidia build stuff checks for certain i2c things?
16:00karolherbst: ohh, I have an idea
16:01HdkR: build stuff?
16:02HdkR: I don't know anything about any i2c communication anyway
16:12karolherbst: why is nvidia making it so hard for us :D
16:22Hijiri: X is freezing after a while when I'm playing a certain game in WINE, and dmesg brings up messages like this https://paste.debian.net/hidden/d15c9183/
16:22Hijiri: I see some stuff about Chrome_dThread, don't know what that is, but maybe firefox is crashing with the game?
16:24Hijiri: 8actually that might not be related, it's not present before some other instances of the crash
16:24Hijiri: but maybe it is, I don't know
16:38karolherbst: Hijiri: uff, yes.. the hardware context essentially crashed and we just don't handle it all that well
16:38karolherbst: and then X just freezes for no reason at all
16:39barteks2x: Could nouveau cause the whole system to become very unstable and eventually cause a kernel panic from using opengl from multiple threads?
16:39karolherbst: Hijiri: MESA_DEBUG=flush might help, but might also reduce perf
16:39karolherbst: barteks2x: might happen
16:39karolherbst: if the GPU crashes, yes
16:39barteks2x: then it happened to me a ~ half an hour ago
16:40karolherbst: maybe, maybe not. could be anything
16:40barteks2x: the kernel panic has traces of nouveau in there
16:40karolherbst: I have some WIP patches to tackle that... it just needs more investigation though
16:40karolherbst: barteks2x: sure, but it could be any other issue inside nouveau as well ;)
16:41barteks2x: except I know what happened when staring MinecraftForge. Java became an unkillable, frozen process, and from there the system became extremely unstable
16:42karolherbst: uff, yeah, that sounds like multithreading issues... but sadly my branch doesn't seem to work all that much better with minecraft
16:42karolherbst: I might get back to it this week
16:43barteks2x: think there is this one moment where it's uploading textures from one thread, and doing rendering on another thread
16:44barteks2x: and good to see there is some progress on this
16:45barteks2x: it's been a while since there was anything done on it(last I remember was in 2016)
16:46karolherbst: anyway.. gotta go
17:13Hijiri: barteks2x: is this on the loading screen
17:13Hijiri: I worked around it by turning it off
17:13Hijiri: I think you can turn it off in the splash.properties config file
17:13barteks2x: yes, this is loading screen. But I frequently switch the run directory (development nvironment) so I don't always have it disabled
17:14barteks2x: and it on;y crashes sometimes, and this is the first time it took down the whole system with itself
17:15karolherbst: mupuf: any idea when reator is ready to be used again?
17:15Hijiri: usually for me it would break X, but I'd be able to kill it and I would just have to start a new session
17:15Hijiri: hi karol, thanks for the info earlier
17:16Hijiri: Could turning off vsync affect the crash at all? I set my game to not use it because I thought it might fix some other issue (it didn't), and I don't think it broke things before that
17:16mupuf: karolherbst: only on demand
17:16barteks2x: for me it made the java process unkillable, mouse acceleration stopped working, keyboard key repreating on long press stopped working, DNS name reslution would hang the process and make it unkillable, and eventually process manager froze, and then I got kernel panic
17:16karolherbst: mupuf: that would be fine.. I just want to look into that sensor stuff
17:16mupuf:re-purposed it as a game console since it was not seeing much use
17:17mupuf: I'll plug the drive back tomorrow with the right gpu
17:17karolherbst: mupuf: do you know which ones are the good ones?
17:17mupuf: I'll check
17:17barteks2x: and vsync doesn't affect it
17:17karolherbst: ohh, I can check as well
17:18Hijiri: I meant my efz crash
17:18Hijiri: not minecraft
17:18karolherbst: mupuf: ahh, I know which one, the kepler titan :)
17:18mupuf: yeah, this one would have it
17:18mupuf: and the 690 also IIRC
17:19karolherbst: ohh, really?
17:19karolherbst: I wasn't sure about that one, but that would be one I
17:19karolherbst: 'd check
17:19karolherbst: mupuf: but.. I was checking a 780 today and there was nothing... :/
17:20karolherbst: but I also think a 780 ti has it.... well
17:20karolherbst: the titan has it for sure
18:16Hijiri: well, I removed the no vsync environment variable, and efz hasn't crashed since
18:16Hijiri: I don't think it will help minecraft though, this is unrelated
22:09cosurgi: hm. anybody knows what's the IRC channel for i915 xserver-org problems?
22:09cosurgi:has some problems with laptop detecting a connected projector.
23:34karolherbst: mupuf: please don't forget to prepare reator :)