00:10 chillfan: So far for my 940M (maxwell) I have found a way to cause a full lock up (without using nvboost). Regardless of clock setting used, I just needed to run xonotic from the phoronix test suite on the "ultra" setting (1920x1080). Kernel 4.10-rc2, Xubuntu 16.10
00:10 chillfan: sysrq doesn't work when it locks up
00:17 chillfan: with DRI_PRIME set to the nouveau card
01:07 chillfan: perhaps it's just xonotic though, still testing :)
01:11 imirkin: skeggsb: any advice on how to get more info out of the lockup for chillfan? how do you debug this stuff?
01:45 skeggsb: imirkin: usually ssh, or netconsole if that fails
01:51 imirkin: ... and he's gone
01:52 imirkin: skeggsb: you really just do everything with netconsole/ssh? impressive.
02:04 skeggsb: beyond error messages in dmesg, it's trial-and-error.. which, is less fun
02:06 imirkin: skeggsb: btw, it seems like pre-fermi hdmi audio got broken at some point
02:06 imirkin: that point was long ago. somewhere in the 3.10 ... 4.4 range
02:06 imirkin: so that's 3 rewrites to pick from ;)
02:54 funfunctor: How do I use nvascan again exactly?
03:02 nyef: skeggsb: In nvkm/engine/disp/disp/hdagt215.c, around line 62 (give or take), is an nvkm_wr32(device, 0x61c440 + soff, (i << 8) | args->v0.data[0]);. The [0] should be [i]. It's clearly a necessary change, but I have yet to determine if it is sufficient.
03:04 nyef: (Said change at least gets the ELD to come through correctly again.)
03:06 imirkin: nyef: send patches :)
03:07 nyef: Send where?
03:08 nyef: I _have_ patches.
03:08 nyef: That said, I still haven't heard any evidence that something else isn't wrong.
03:09 imirkin: nouveau@lists.freedesktop.org + dri-devel@lists.freedesktop.org
03:09 imirkin: subscribing first will prevent them from having to be manually approved
03:10 nyef: Are attachments okay, or should I find a non-gmail email client to use?
03:10 imirkin: you can use gmail to send them properly with git send-email
03:11 imirkin: nyef: this is what i use: https://hastebin.com/notuteyapa.pl
03:11 imirkin: (in .gitconfig)
03:11 nyef: And since this is a kernel thing, the nouveau list is more appropriate than the dri-devel list, right?
03:11 funfunctor: imirkin: nvascan 0xaddr 0xsize should be the correct invocation right?
03:12 imirkin: funfunctor: sounds right. note that this will perform writes to your device, so you have to use care.
03:12 nyef: Hrm! I should probably find a way to test that outside of a mailing-list context.
03:12 funfunctor: imirkin: yes it could brick the hw or smoke could come out I know! But thanks!
03:12 imirkin: nyef: dri-devel is hooked up to a patchwork instance, nouveau@ isn't. either or both lists are ok.
03:13 imirkin: nyef: yeah, just send it to yourself ;)
03:13 imirkin: the envelopesender thing is just coz i use a different email address than @gmail.com one
03:13 imirkin: normally that's not needed
03:13 imirkin: make sure to have a S-o-b line and all the usual kernel patch items
03:14 funfunctor: imirkin: $ sudo ~/envytools/build/nva/nvascan 0xfea00000 0x10000
03:14 funfunctor: where BAR0 == 0xfea00000 and its size being 64K
03:14 funfunctor: I just says 'R' for everything?
03:14 imirkin: funfunctor: nvascan is only designed to work with nvidia stuff
03:14 imirkin: funfunctor: i think it always maps one of the BAR's
03:14 funfunctor: imirkin: I ported envytools to my device
03:14 imirkin: the offset would be relative to that bar, usually
03:14 imirkin: not absolute, since bar location isn't generally constant
03:15 funfunctor: imirkin: https://cgit.freedesktop.org/~funfunctor/envytools/commit/?h=decklink&id=04fb5b9d608338752413ff8fb849bf9042184928
03:15 funfunctor: ah could be the reason
03:15 funfunctor: imirkin: no the BAR is read from PCI conf space
03:16 imirkin: right, i know :)
03:16 nyef: Okay, patch sending isn't going to happen tonight, for various reasons.
03:17 funfunctor: imirkin: $ sudo ~/envytools/build/nva/nvascan 0 0x10000 just gives me a bunch of 0xfff's
03:17 imirkin: that means that all the bits are writable i guess
03:17 funfunctor: I think it caused the hardware to reset to its default state
03:17 imirkin: could be
03:18 funfunctor: well all the state that was there before got wiped out
03:18 imirkin: that's what nvascan does...
03:19 funfunctor: but I think it should restore the state after
03:19 funfunctor: [ 2399.677265] BlackmagicIO: WARNING: The device "DeckLink Duo 2" has firmware that is newer than the version shipped with the driver (device: 0xff driver: 0x14)
03:19 funfunctor: [ 2399.677285] BlackmagicIO: Enabled device "DeckLink Duo 2 (8)" x15/5 Gbps (ffffffff,510103) FW Date: ff-ff ff:ff
03:19 funfunctor: [ 2399.681376] BlackmagicIO: DeckLink Duo 2 (8) as blackmagic!io3 [0000:01:00.0]
03:19 funfunctor: oh dear
03:20 funfunctor: It may of wiped SPI flash
03:20 nyef: funfunctor: Weren't you planning to take a backup of that?
03:20 funfunctor: nyef: I did yes
03:21 nyef: Okay, so you're not completely sunk, then.
03:21 funfunctor: a hard reboot resolved it any way
03:36 nyef: aplay -L lists two HDMI outputs, "hdmi:CARD=NVidia,DEV=0" and "hdmi:CARD=NVidia,DEV=1". Playing to them with -D results in silence. It's not the sound file, because it plays fine with no -D parameter, just over the built-in speaker rather than the TV.
03:37 imirkin: nyef: i think it's supposed to be like aplay -D hw0,1
03:37 nyef: How do I know what numbers to use, then?
03:37 imirkin: oh wait. do aplay -l
03:38 imirkin: http://superuser.com/questions/53957/what-do-alsa-devices-like-hw0-0-mean-how-do-i-figure-out-which-to-use
03:40 nyef: Still no luck.
03:40 nyef: Are both devices supposed to b "HDMI 0 [HDMI 0]", or should one of them have a different number
03:40 nyef: ?
03:40 nyef:obviously can't type well tonight. /-:
03:44 imirkin: dunno, sorry
03:44 imirkin: i don't have hdmi
03:46 nyef: Hrm. There's some possible indication that one of them should be "HDMI 1", not "HDMI 0".
03:47 funfunctor: Can nvawatch consider a range instead of a specific address?
03:47 nyef: ... Which suggests that if I put a displayport->hdmi adaptor into the chain (thus using the mini-DP connector instead of the HDMI connector), it may work on both channels.
03:49 nyef: And... it just locked up.
03:51 nyef: Not even the magic sysrq seems to be working.
03:55 nyef: ... Okay, so, hw:0,7 corresponds with an HDMI cable plugged into the displayport interface.
03:55 nyef: ... And there are three ELDs, but only two HDMI devices?
03:56 imirkin: you're way beyond what i know about how this all works
03:57 nyef: Okay then.
04:04 funfunctor: I think perhaps a good addition to `nvapeekstat` is a flag that allows you to watch over the whole BAR and look for registers that change very very frequently?
04:05 funfunctor: to identify counters or whatever they are?
04:09 nyef: I guess the other question is, why do I have three ELDs but only two devices?
04:10 nyef: Or is one of them for the non-HDMI digital output?
04:18 nyef: ... Turns out that the "dynamic minors" option is important.
04:21 nyef: ... And turning off the audio infoframe doesn't seem to affect the sound output.
04:27 nyef: Ahh... There's the DP output, the HDMI-over-DP output, and the HDMI output. That'd explain the three ELDs.
04:28 nyef: And suggests something to try in terms of disambiguation.
07:24 funfunctor: mwk: Hi
07:24 funfunctor: I think i've found quite a number of things now but I need to discuss with someone
08:41 karolherbst: interesting, a fermi card without the speedo value
11:18 RSpliet: airlied: Could you chase whoever is necessary to get https://github.com/skeggsb/nouveau/commit/4cf443042eaaaf0520442d2580ac0e9a2d65828c upstream (and backported?) asap? It's a serious regression...
11:20 RSpliet: skeggsb: thanks for chasing that bug further
11:26 mwk: *sigh*
11:26 mwk: matching crazy hardware perfectly is never easy
11:27 mwk:has extracted all possible results of the rcp instruction and is now trying to figure out how to compute these
11:33 funfunctor: mwk: What interesting facts came you come up with about the number 0x40404040 ?
11:34 mwk: umm... none really?
11:35 funfunctor: dam ok
11:35 funfunctor: mwk: i've started to identify some registers now
11:36 funfunctor: mwk: I've noticed that most of the state seems to repeat itself over 0x200 block multiples?
11:36 funfunctor: is that usual?
11:37 funfunctor: mwk: like check this out https://paste.fedoraproject.org/523627/48396185/
11:41 funfunctor: it is this pattern I am trying to make sense of https://paste.fedoraproject.org/523628/14839620/
11:41 funfunctor: I don't understand the relevance of 0x404040..
11:49 RSpliet: funfunctor: I think I've seen that in low-end hardware before... could be just a matter of having a few "don't care" bits in the address translation
11:50 RSpliet: I mean, the "replicate over 0x200" thing
11:50 funfunctor: it seems like one of those 0x55AA tricks
11:50 funfunctor: oh?
11:51 airlied: RSpliet: chase skeggsb :-)
11:51 RSpliet: although 0x224 appears to differ from 0x20, so could just as well be a replicated resource
11:51 RSpliet: airlied: thanks, will try
11:51 RSpliet: he's a busy man though :-)
11:52 RSpliet: funfunctor: either way the 0x40404040 has too little context to make sense of it. What's interesting is that there's 10 of them...
11:52 RSpliet: two-and-a-half 32-bit registers
11:52 karolherbst: RSpliet: send email to linux-stable and CC ben and airlied
11:53 karolherbst: and the drm ML in CC
11:53 RSpliet: karolherbst: I think Ben's patches require manual labour before they can be merged into the kernel - some path adjustment I think
11:56 karolherbst: RSpliet: you have to adjust the path a little
11:56 karolherbst: it's no big deal
11:57 karolherbst: just appened "drivers/gpu"
11:57 pmoreau: karolherbst: "drivers/gpu/" don’t miss the final "/" ;-)
11:58 karolherbst: right
11:58 RSpliet: karolherbst: of course it isn't, but it doesn't feel like my end decision to wrap it up and push it forward, no matter how trivial the patch seems.
11:58 karolherbst: I think skeggsb has a script for this, but quite sure
12:26 RSpliet: funfactor: I don't quite know what it even is you're REing, but for all I know it's "we have ten replicated resources, and bit 6 is the enable bit". Just looking at numbers hardly ever gets you enough information
12:26 Jeansf: funfunctor: all that is with LLVM is imo particularly rough code in the core, sorry to have been annoying you if you follow this, but tom stellard has managed it, i feel no joy to conflict with any of the LLVM folks
12:27 RSpliet: mmiotrace existing drivers to understand in what context these bits are touched, see what the mask is of these registers, have a specific goal you're searching for rather than trying to understand every bit
12:27 funfunctor: RSpliet: right that is what I am doing
12:28 RSpliet: good, sorry if I'm stating the obvious then :-)
12:28 funfunctor: RSpliet: the hw is a blackmagic decklink capture card
12:28 funfunctor: RSpliet: nar thats ok
12:32 funfunctor: RSpliet: I am looking at per 0x200 offsets from BAR0 since it looks to me like the state is replicated more or less https://paste.fedoraproject.org/523533/48393718/
12:32 funfunctor: with only a few exceptions
12:32 funfunctor: that state transition is interesting
12:33 RSpliet: It did look that way, but presumably a write in the 0x0-0x200 area is not reflected in the 0x200-0x400 area?
12:33 funfunctor: see line 9, i've identified that 067f300f is a register value and can toggled it from a method call to the vendor SDK
12:34 funfunctor: RSpliet: that's a good question I will try it tomorrow when I am next by the hw
13:32 pmoreau: stikonas: ping
15:25 nyef: Ah! Found out where I re-broke HDMI audio on my MCP89.
16:29 NanoSector: hmm, since kernel 4.10 I think my laptop uses a lot more power in idle
16:29 NanoSector: right now it's using 14W compared to ~6/7W on kernel 4.8
16:30 NanoSector: Nouveau also keeps suspending and resuming my GPU at seemingly random so I think that might be related?
16:32 pmoreau: Nouveau shouldn’t be resuming your GPU, unless told to.
16:33 NanoSector: pmoreau: dmesg shows nouveau resuming my GPU and then putting it to sleep a second or two later at random
16:33 NanoSector: I'm also getting ACPI errors on boot since kernel 4.10
16:33 karolherbst: there are various applications which do silly things
16:33 karolherbst: every lspci call will resume your gpu
16:33 NanoSector: :o
16:33 nyef: It... why?
16:34 karolherbst: because it reads the pci config space
16:34 nyef: Or, better question, how?
16:34 NanoSector: well unless something in Plasma 5 is continuously forgetting my laptop's specs and needs to recheck I don't know what is
16:34 nyef: Eesh.
16:34 NanoSector: I'll upload my dmesg somewhere
16:34 NanoSector: well I'll wait a little actually
16:35 pmoreau: Have you tried reverting back to an older kernel while keeping the same userspace stack?
16:35 NanoSector: pmoreau: I think 4.8/4.9 was fine, I could check both tonight
16:36 NanoSector: I'm pretty sure 4.8 was fine since I was here diagnosing an issue where my GPU would never sleep with the new sleep method and it was eventually fixed causing powertop to report 7W again
16:36 pmoreau: But, since you tried 4.8, did you update Plasma 5 or some other tools?
16:36 NanoSector: yeah
16:37 NanoSector: I honestly don't know if it still happens on 4.8 so I'dn eed to recheck
16:37 pmoreau: If 4.9 was fine, it shouldn’t be too difficult (hopefully) to bisect.
16:37 NanoSector: I only remember this 'breaking' since 4.10
16:37 pmoreau: Though there was the atomic modesetting serie…
16:37 NanoSector: hmm
16:37 NanoSector: launching QupZilla seems to trigger nouveau
16:38 NanoSector: yup that's it
16:38 pmoreau: :-D
16:38 NanoSector: I think QtWebEngine probes available acceleration methods
16:39 pmoreau: I would imagine
16:39 karolherbst: :(
16:39 karolherbst: why would they do that
16:39 NanoSector: still my laptop uses 14W even with QupZilla closed
16:40 pmoreau: Wait a bit for the card to go back to sleep, maybe?
16:40 NanoSector: I closed it more than a minute ago
16:40 NanoSector: I'll check if vgaswitcheroo reports it off
16:40 pmoreau: :-/
16:41 NanoSector: wait, vgaswitcheroo isn't in /sys/class
16:41 NanoSector: or in lsmod
16:41 pmoreau: /sys/kernel/debug
16:41 nyef: Time to check your /proc/config.gz to see if it's enabled at all?
16:41 pmoreau: cat /sys/kernel/debug/vgaswitcheroo/switch
16:41 pmoreau: (as root)
16:41 NanoSector: heh, it's there, thanks pmoreau
16:42 NanoSector: 1:DIS: :DynOff:0000:01:00.0
16:42 pmoreau: Should be good then…
16:42 pmoreau: Still saying 14W?
16:42 NanoSector: yeah
16:42 pmoreau: :-(
16:42 karolherbst: it could be that something else is drawing more power or the gpu isn't turned off properly
16:42 NanoSector: dinner's up though, will play around tonight :)
16:43 pmoreau: Either l1k or Lekensteyn was looking at one case where the GPU was reported off, but was still consuming power
16:43 pmoreau: I *think* there was a bug report, let me check
16:45 pmoreau: :-D
16:46 pmoreau: The certificate for bugs.freedesktop.org expired today.
16:46 Lekensteyn: what?
16:46 pmoreau: Like 24 minutes ago
16:46 karolherbst: .....
16:46 karolherbst: I assume there is no new one? :D
16:46 karolherbst: of course not
16:47 karolherbst: ...
16:47 karolherbst: like this is the first time this happens
16:47 Lekensteyn: letsencrypt time?
16:48 Lekensteyn: NanoSector: if your gpu is waking at random times, it is time to patch libdrm
16:48 RSpliet: Hahaha, I wonder if NVIDIA intentionally tried to make a reference to the 1988 "The Titan Graphics Supercomputer architecture" ( http://ieeexplore.ieee.org/document/14344/?arnumber=14344 )
16:48 pmoreau: "This site uses HTTP Strict Transport Security (HSTS) to specify that Firefox may only connect to it securely. As a result, it is not possible to add an exception for this certificate." …
16:49 karolherbst: pmoreau: we have a phrase for that in germany: "einmal mit profis arbeiten" what lieterally means: working with professionals
16:50 karolherbst: maybe you could add a "once" to that
16:50 Lekensteyn: oh they are already using LE, their cron is probably broken
16:50 pmoreau: I didn’t know "profis"
16:50 pmoreau: Lekensteyn: LE?
16:51 Lekensteyn: Let's Encrypt
16:51 Lekensteyn: I reported to #freedesktop now
16:52 pmoreau: Thanks
16:53 RSpliet: Who's coming to SHA2017?
16:54 pmoreau: Most likely not :-/
16:54 RSpliet: I'm unsure right now tbh, need to see where other activities go
16:55 karolherbst: RSpliet: I am most likely
16:55 RSpliet: but if I do, it'd be nice to organise a hackathon there ;-)
16:55 karolherbst: we should create a nouveau tent
16:55 karolherbst: any villages we could add outselves to?
16:56 RSpliet: I'm likely to stay in the hypothetical "Hack In The Random 2600NL Data Box" village... who last year decided to sit inside the family corner for reasons of peace and quiet
16:57 RSpliet: but we can contemplate something nicer... if I go that is
17:24 nyef: Got both 3D output and working audio. Progress!
17:28 nyef: ... Working volume controls on the audio would be nice, but I suppose that's what pulseaudio is for.
17:31 Jeansf: nyef: yeah it ticks on top of some other api though, dunno why it's being used as default but, yeah i have HDMI controls in alsamixer and pulsaudio gui also has them
17:47 NanoSector: no it's definitely something from 4.10 as 4.9 has the 'normal' power usage of ~7W
17:47 NanoSector: <Lekensteyn> NanoSector: if your gpu is waking at random times, it is time to patch libdrm
17:47 NanoSector: missed that, sorry
17:48 NanoSector: it's not random but it's when i launch qupzilla, i just found
17:49 NanoSector: pmoreau: so 4.9 and 4.8 work fine, it's just 4.10
18:40 imirkin: nyef: hdmi audio doesn't have volume controls. you're supposed to have a volume control on whatever the output source is.
18:40 imirkin: nyef: separately, perhaps CEC is achievable on nvidia, in which case you can control your tv's volume
18:40 imirkin: [assuming your tv talks CEC as well]
18:40 imirkin: i'm not aware of anyone looking into this for nvidia hw.
18:42 imirkin: however recently linux has gained support for a CEC framework + hw support on some SoC
18:45 nyef: imirkin: Okay, that makes a certain amount of sense. And pa should be able to pre-scale the data before it hits the HDMI channel.
18:47 imirkin: nyef: alsa can too. pulseaudio has ~0 actual uses
18:47 imirkin: unfortunately someone decided to ship it by default in popular distros, so users end up having to deal with it
18:47 nyef: alsamixer doesn't seem to have any controls that affect the sound level other than a simple mute.
18:49 imirkin: that's right
18:49 imirkin: you have to set up a software thing in front of it
18:50 imirkin: nyef: http://alsa.opensrc.org/How_to_use_softvol_to_control_the_master_volume
18:50 imirkin: look at the softvol thing
18:50 nyef: Fair enough. I think that my next step is to try and get some patches ready for submission or at least review.
18:50 pmoreau: NanoSector: Well… I guess you now have the great privilege of bisecting the issue! :-D
18:50 imirkin: and/or dmix
18:51 pmoreau: NanoSector: If you feel like it, of course :-)
18:51 NanoSector: pmoreau: but i iz noob :P
18:51 pmoreau: :-D
18:52 imirkin: NanoSector: time to lose the training wheels
18:52 NanoSector: noooo D:
18:54 NanoSector: pmoreau: honestly though I'm willing to provide debug data and stuff, but I don't know how to bisect issues (except for endlessly recompiling different revisions of the 4.10 kernel)
18:54 imirkin: nyef: nevermind - according to aplattner, desktop gpu's don't have the hw to support CEC: https://devtalk.nvidia.com/default/topic/642712/hdmi-cec-support-/
18:54 imirkin: NanoSector: search for 'git bisect' on the google machine
18:55 NanoSector: oh that's in git, derp
18:55 NanoSector: should've known
18:55 NanoSector: thanks
18:55 pmoreau: NanoSector: Bisecting would require to recompile different versions of the kernel, except if you do the bisection on the out-of-tree Nouveau repo, in which case fewer compilation of the kernel should be needed
18:56 NanoSector: which repo do i take? skeggsb's?
18:56 pmoreau: Yes
18:57 NanoSector: aight will see if i can get anywhere
18:57 pmoreau: Thanks! It should make it easier to find out what broke it.
18:57 RSpliet: NanoSector: the only special thing about bisect is that it cuts the search space in half for every iteration (by taking the middle git revision and ditching the irrelevant half on your feedback). So instead of compiling n kernels, you'll compile 2log(n) kernels. Adds up if you don't know where to search for it :-)
18:58 NanoSector: ouch
18:59 RSpliet: (eg. if you think it's one of a thousand commits on nouveau, you'll have 10 compiles at most. Make sure to point git bisect to the nouveau dir to have it make better decisions on how to reduce the search space, you don't want it to pick irrelevant points :-) )
19:02 NanoSector:clones
19:07 NanoSector: RSpliet: ahh, thanks
19:07 RSpliet: (and again: definitely limit search to drivers/gpu/drm/nouveau only - otherwise you'd be searching through way more commits than necessary :-) )
19:08 imirkin: NanoSector: is there anything to suggest it's a nouveau issue?
19:08 NanoSector: oh I'm looking at https://github.com/skeggsb/nouveau
19:08 NanoSector: imirkin: when my GPU is turned on under 4.9 I'm observing the same power usage
19:09 imirkin: ok, so you're sure it's nothin in, e.g., the pci subsystem
19:09 imirkin: or the acpi subsystem
19:09 imirkin: or who knows where
19:09 NanoSector: i have no clue
19:09 NanoSector: :P
19:09 imirkin: which is my point - unless you're damn sure it's inside nouveau, don't restrict it to just nouveau
19:09 NanoSector: 4.9 and up started spitting ACPI errors related to the GPU but 4.9 doesn't seem like it's affected
19:10 NanoSector: actually 1 is related to Intel GFX and one to NVIDIA
19:11 NanoSector: 1.209840] ACPI Error: Method parse/execution failed [\_SB.PCI0.RP05.PEGP.DD02._BCL] (Node ffff8802568e3690), AE_NOT_FOUND (20160831/psparse-543)
19:11 nyef: ... Looks like a backlight control problem?
19:11 NanoSector: but that works fine :x
19:11 NanoSector: [ 1.209824] ACPI Error: [\_SB_.PCI0.GFX0.DD02._BCL] Namespace lookup failure, AE_NOT_FOUND (20160831/psargs-359)
19:12 NanoSector: other one which is for Intel I think
19:13 nyef: Brightness Control Levels or something like that.
19:13 NanoSector: but anyway that's pretty much the reason I'm suspecting Nouveau, even more since it had more issues with powering down my GPU in the past
19:13 nyef: Part of the ACPI backlight driver.
19:14 nyef:was digging through some of this stuff for his own system two weeks ago.
19:14 NanoSector:is digging in the dark
20:06 airlied: imirkin: fyi msg chanserv quiet #nouveau Jeansf or stuff like that
20:06 imirkin: airlied: he's got other nicks
20:06 airlied: wow he went as far as using tor to get in
20:06 airlied: yeah quieting lets him think he is talking to channel for a while before he notices
20:06 NanoSector: imirkin: you just banned everyone with tor and sasl, I guess
20:06 imirkin: NanoSector: indeed i did.
20:07 NanoSector: intended? :P
20:07 imirkin: indeed it was.
20:07 airlied: seems like a fine plan, I doubt we have many ppl who use it
20:07 NanoSector: imirkin: indeed you did fine