00:02 Arbition: Looks like it causes journalctl to fail. I'm going to have to go in and get it in the middle of it. It doesn't lock up everything, so I might be able to get it out
00:04 Lekensteyn: try just dmesg
00:07 Arbition: Can't start new processes. Hopefully the file has been committed to the filesystem
00:07 Arbition: wait no,its fine
00:08 Arbition: https://dpaste.de/DiPv
00:09 Arbition: paste has a 1 week expiry, copy if you need to keep it for reference
00:09 imirkin_: Arbition: do you have mutex debugging enabled? specifically ww mutex debugging?
00:10 Arbition: no idea
00:10 imirkin_: coz if you do, bad things will happen
00:10 Arbition: well it isn't a debug kernel, so it is whatever was distributed
00:10 imirkin_: uhhhh
00:10 imirkin_: it's a debug kernel
00:10 imirkin_: it has lockdep enabled
00:11 Arbition: I mean, they have a specific debug kernel package
00:11 Lekensteyn: is ww mutex debug bad? I think I have that enabled in my normal debug image
00:11 imirkin_: Lekensteyn: iirc it makes mutex acquires randomly fail
00:11 Lekensteyn: also get this lockdep warning, but it has been reported before with no progress
00:11 imirkin_: Lekensteyn: it's designed to debug the ww mutex logic itself
00:11 Arbition: though it is rawhide, so they might enable some debug stuff, and the debug kernel has more
00:11 Lekensteyn: ah ok, I don't have that random test
00:12 Arbition: I'll try it with a release distro. Might be a couple of days before I get that set up
00:12 Arbition: How do I check if that particular debug thing is enabled though?
00:12 Lekensteyn: anyway, can you wait for like 60 or 120 seconds and see if the lockup detector kicks in and prints a larger trace?
00:13 Arbition: oh right, yes more stuff has been printed
00:13 Lekensteyn: zgrep CONFIG_DEBUG_WW_MUTEX_SLOWPATH /proc/config.gz
00:13 Lekensteyn: should be not set
00:14 imirkin_: arch/s390/configs/default_defconfig:CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
00:14 imirkin_: oh my
00:14 Lekensteyn: :p
00:14 Arbition: I was about to, but then the system fully locked up
00:14 imirkin_: mlankhorst: --^
00:14 imirkin_: a few defconfigs appear to have WW_MUTEX_SLOWPATH=y
00:15 Lekensteyn: DEBUG_WW_MUTEX_SLOWPATH "If you are a driver writer, enable it. If you are a distro, do not"
00:16 Lekensteyn: Arbition: hm, hopefully you could still sync the logs somewhere, then try sysrq-reisub
00:16 Arbition: I foolishly output it to a tmpfs
00:17 imirkin_: Arbition: ssh in and run 'dmesg -w'
00:17 Arbition: SSH was not running
00:17 Lekensteyn: hm, isn't this also suspicious? BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
00:17 Lekensteyn: and why is acpi_call being loaded?
00:18 Arbition: Oh, that'd be so o can monitor my battery
00:19 Lekensteyn: uhm, you cannot use /sys/class/power_supply/... or upower for that?
00:19 Arbition: It is specifically for tpacpi bat
00:19 Arbition: So I can set charge state thresholds
00:21 Lekensteyn: ok, is this lockup issue a regression btw?
00:21 Arbition: No it has never worked properly
00:21 Arbition: Think I first tried on kernel 4.4 or so
00:22 Arbition: Then again, I've been using rawhide for quite some time. I'll have to find time to use a release version
00:25 Lekensteyn: the resume failure does not always seem to happen (in the dmesg you provided)
00:25 Arbition: # CONFIG_DEBUG_WW_MUTEX_SLOWPATH is not set
00:25 Arbition: but CONFIG_DEBUG_MUTEXES=y
00:25 imirkin_: that's fine
00:26 imirkin_: it was that WW_MUTEX_SLOWPATH one which causes badness
00:26 imirkin_: and was enabled in rawhide for a while iirc
00:26 Arbition: I'm wondering if it inheret from CONFIG_DEBUG_MUTEXES for being not set
00:27 Arbition: Lekensteyn: I'll try and get a later debug this time
00:28 Arbition: err dmesg
00:30 Arbition: I enabled ssh this time, but can't log in, despite being able to use the graphical terminal at the moment
00:30 Arbition: ok 2 minutes has elapsed, I'll get this one up
00:35 Arbition: This dmesg doesn't have all the same kernel panic stuff, but it sure did lock up the same
00:35 Arbition: https://dpaste.de/FEqS
00:36 Arbition: wait no it happened much earlier
00:36 Arbition: which means I didn't initiate it I think
00:36 Arbition: I didn't get from boot to running DRI_PRIME=1 glxinfo in 15 seconds
00:37 Arbition: oh that's for btrfs anyway
00:37 Arbition: think I ran it at around 400 seconds
00:38 Arbition: Oh this might be why [ 489.726683] DMA-API: debugging out of memory - disabling
00:39 Arbition: or not?
00:39 Lekensteyn: that message was also visible in your previous dmesg, not sure if it is related to the issue or not
00:41 Lekensteyn: has the PCI device actually resumed or not? You can check that with lspci -H1 -s1:
00:41 Lekensteyn: if it shows nothing, then the PCI device is still off, otherwise it is on
00:42 Arbition: I'll trigger it again
00:42 Arbition: had to force reboot again
00:42 Arbition: that part is consistent at least
00:42 Lekensteyn: oh hm, isn;t the problem happening at suspend rather than resume?
00:42 Arbition: No
00:42 Arbition: no suspend resume at all
00:42 Arbition: just running DRI_PRIME=1 glxinfo
00:43 Arbition: suspend resume usually works
00:43 Lekensteyn: after "waiting for kernel channels to go idle..." it waits 15 seconds before showing "failed to idle channel 1 [DRM]"
00:43 Lekensteyn: maybe some runtime pm ref is not being taken which results in the GPU falling asleep while it should not?
00:43 Arbition: I wouldn't know
00:44 Lekensteyn: can you reproduce the issue while you have while lspci -s1:;do sleep 4;done running?
00:45 Arbition: lspci keeps printing
00:45 Arbition: but glxinfo has not returned
00:46 Arbition: no sign of failed to idle channel 1 [DRM] yet
00:46 Lekensteyn: glxinfo not returning, is that a symptom of the issue?
00:46 Arbition: yes
00:47 Arbition: oh wait
00:47 Arbition: it just completed
00:47 Arbition: first time it has
00:47 Arbition: got some good dmesg output though
00:48 Arbition: as that happened
00:50 Arbition: shortly after that happened, my network activity failed, then the desktop stopped responding but the mouse cursor still moved
00:51 Arbition: The previous time the mouse froze too
00:54 Arbition: yeah this time that kick timeout happened [ 920.108753] nouveau 0000:01:00.0: fifo: channel 4 [glxinfo[3406]] kick timeout
00:54 Arbition: https://dpaste.de/v3C2
00:58 Arbition: 'gpfifogk104.c' mine is gk208, any potential issue there?
00:59 Lekensteyn: some code from older cards still apply to newer, would not worry about that number
00:59 Arbition: yeah thought that might be the case, though I'd flag it anyway just in case
00:59 Lekensteyn: you are also sure that it cannot be reproduced with runpm=0?
01:00 Arbition: yes, runpm=0 seems to work fine (ok not reclocking), glxinfo returns immediately, and I was actually running a game on it (i know it was using the GPU rather than the IGP because the processor package temperature was lower)
01:01 Arbition: I mentioned that because attempting to reclock causes a simialr issue, but I haven't investigated that much
01:01 Lekensteyn: can you test another scenario: blacklist nouveau (such that is not loaded at boot and you have time to enter some other commands)
01:02 Arbition: ok, is there a link to some instructions for that?
01:02 Lekensteyn: then: keep the while lspci loop and modprobe nouveau (without runpm=0)
01:02 Lekensteyn: finally try your glxinfo thing
01:03 Lekensteyn: if that still works (even with repeated trials), abort the while loop and wait for five seconds
01:03 Lekensteyn: then after it has runtime suspended, restart the while loop and try to reproduce again
01:03 Arbition: so modprobe.blacklist=nouveau on the kernel parameters will achieve this?
01:03 Lekensteyn: should be sufficient
01:03 Arbition: ok
01:08 Arbition: Oops that booted me to the desktop. Guess I'll have to run it in tmux or something l
01:08 Arbition: To the login screen I mean
01:15 Arbition: Ok that is probably inconclusive
01:16 Arbition: So running it in tmux "unable to open display" because I am force logged out
01:17 Arbition: then I log back in, resume the terminal, and a single instance of glxinfo runs and blocks
01:17 Arbition: I think I'm going to have to start in text mode, and start x manually in some manner to bypass any watchdog which may be running on the display manager (gbm)
01:18 Arbition: gdm
01:18 Lekensteyn: I'll probably have some sleep, it's over 2am here
01:22 Arbition: Well, it isn't gdm, x just dies
01:24 Arbition: But yeah, it isn't executing more than once when modprobing it after boot
16:08 NanoSector: karolherbst, if you're around, I just ran what you said with the nouveau.debug=pmu=trace thing but it didn't add anything to my dmesg when running over ssh
16:09 NanoSector: i'll upload the dmesg
16:11 NanoSector: karolherbst, http://hastebin.com/uwuterixox.vbs
16:19 karolherbst: NanoSector: did you add it to your grub or as an option to nouveau?
16:19 NanoSector: at grub (or systemd-boot in my casE)
16:19 NanoSector: *case
16:20 NanoSector: like, it froze when it did that but nothing new appeared in my dmesg
17:28 karolherbst: NanoSector: mhh odd
17:28 karolherbst: NanoSector: try nouveau.debug=debug then
19:33 NanoSector: hm if i reclock with that it doesn't hang my system
19:33 NanoSector: at least not this time
19:35 imirkin_: NanoSector: you have to reclock while your gpu isn't suspended btw
19:36 NanoSector: i said nothing
19:36 NanoSector: :x
20:18 NanoSector: karolherbst, i have logs with nouveau.debug=debug
20:20 NanoSector: if i could just pastebin them
20:21 karolherbst: too big?
20:21 NanoSector: nah, phone keeps locking up
20:22 NanoSector: had to use phone for ssh access
20:22 NanoSector: "Something went wrong" I noticed :x
20:22 karolherbst: :(
20:23 NanoSector: bah i'll just throw the text file in google drive
20:24 NanoSector: karolherbst, https://drive.google.com/open?id=0B2F3P3d66S-1LWZnOFFMaG5Yd0U
20:24 NanoSector: sorry if there's duplicate dmesg's in there, I just hit the save transcript button in juicessh
20:25 karolherbst: okay, that seems helpful
20:25 karolherbst: okay, the pmu dies alright
20:25 karolherbst: NanoSector: apply this patch to your kernel/nouveau : https://github.com/karolherbst/nouveau/commit/d5998e334dc3bbb835990894712234ed029762e4
20:25 karolherbst: then it won't die at least
20:26 NanoSector: then I just replace the nouveau.ko.gz with the compiled nouveau.ko right?
20:26 karolherbst: uhhhh
20:26 karolherbst: wait a second
20:27 karolherbst: NanoSector: you can't change the clocks when your GPU is turned off
20:28 NanoSector: I asked that yesterday and you told me I didn't have to have anything running :x
20:28 karolherbst: mhhh yeah, that is my mistake then
20:28 karolherbst: nouveau.runpm=0
20:28 NanoSector: hehe no problem
20:29 karolherbst: I always have runpm=0 set, so I sometimes forget about it
20:29 NanoSector: aight added
20:29 NanoSector: oh, why's that?
20:29 karolherbst: it disables turning off the card
20:29 NanoSector: yeah but you want to always have it on, or do you use it all the time?
20:29 karolherbst: I use bbswitch manually for it
20:29 NanoSector: ohh i see
20:30 karolherbst: I need to turn off the gpu anyway when nouveau isn't loaded
20:30 karolherbst: so
20:30 karolherbst: to be able to switch to nvidia
20:30 NanoSector: well i'm rebooting with that parameter set as well, will fire another dmesg at you
20:30 NanoSector: ah
20:34 NanoSector: well, loads of FAN update requests
20:34 NanoSector: so that works :x
20:34 meridion: I'm currently running Gentoo with Linux 4.8.1 on Nyan Big Tegra K1, with Xorg 1.8
20:34 NanoSector: stack trace
20:35 meridion: and Weston 1.11.0
20:35 meridion: I would like some pointers on getting Nouveau to work on this system
20:35 karolherbst: meridion: ask gnurou
20:35 meridion: I've been told there is still no xorg driver for this system
20:36 karolherbst: meridion: there are patches
20:36 meridion: I wouldn't mind patching stuff myself
20:36 karolherbst: meridion: https://github.com/Gnurou/xserver/commits/gk20a
20:36 karolherbst: both latest patches
20:37 karolherbst: as you might guess, you need dri3 enabled
20:37 NanoSector: karolherbst, http://sprunge.us/Rhdj
20:37 meridion: alright, I'll have a look
20:37 NanoSector: it didn't crash this time, froze for a little bit
20:38 karolherbst: meridion: no clue if anything else is required, when in doubt ask gnurou aka alex courbot
20:38 NanoSector: I first tried echoing 07 to pstate and that's hanging right now
20:39 meridion: karolherbst: alright, I'll do that
20:39 meridion: sure would be nice if those patches eventually find there way upstream
20:39 meridion: their*
20:39 karolherbst: meridion: nope, they are evil hacks
20:40 meridion: cleaned ofc.
20:40 karolherbst: won't do it
20:40 NanoSector: if i check pstate now it's at 07 even though the echo 07 thing is hanging
20:40 meridion: who, Gnurou?
20:40 karolherbst: there is a lot of infrastructure to do on the kernel side and X and everywhere else, it's a painful process
20:40 karolherbst: NanoSector: your gpu is basically frozen
20:40 meridion: okay
20:41 NanoSector: karolherbst, so i put it in the microwave?
20:41 karolherbst: meridion: etnaviv devs are also needing this
20:41 karolherbst: NanoSector: :p
20:41 meridion: karolherbst: and what about Wayland? that is supposed to have a working nouveau back-end
20:41 NanoSector: echoing 0f to pstate now doesn't do anything and hangs as well
20:41 karolherbst: meridion: no idea about wayland
20:41 NanoSector: i think wayland uses kms
20:41 meridion: alright, no problem
20:41 karolherbst: NanoSector: well, I
20:42 karolherbst: NanoSector: 've got what I wanted
20:42 NanoSector: :D
20:42 karolherbst: the pmu script will hekp
20:42 karolherbst: "nouveau 0000:01:00.0: pmu: R[10f808] = 48000004" and later
20:42 NanoSector: do you still want me to try that patch you linked?
20:42 karolherbst: it won't help
20:42 karolherbst: it will just prevent some hangs if the pmu communication breaks
20:43 NanoSector: ah okay, so is there anything else I can do?
20:43 karolherbst: but if your gpu goes sough, every bet is off anyway
20:43 karolherbst: NanoSector: nothing yet
20:43 karolherbst: NanoSector: until I have an idea what is odd
20:44 NanoSector: allright, you can poke/PM me anytime you need anything or have a new clue you want to share
20:44 karolherbst: won't take long I think, maybe it will
20:44 NanoSector: take your time :)
20:46 NanoSector: heh nouveau is still updating fan speed
20:46 NanoSector: i'll put this machine out of its misery *reboot*
20:46 karolherbst: I doubt that nouveau does it
20:46 karolherbst: more likely your EC does
20:46 NanoSector: I'm seeing "FAN update:" and "FAN target request" messages in dmesg -w
20:46 karolherbst: interestinf
20:47 karolherbst: no write to 62c000 on nvidia
20:47 karolherbst: okay
20:47 karolherbst: that will be the issue
20:48 karolherbst: NanoSector: drm/nouveau/nvkm/subdev/fb/ramgk104.c
20:48 karolherbst: NanoSector: ypu have some if (nvkm_device_engine(ram->base.fb->subdev.device, NVKM_ENGINE_DISP)) ram_wr32(fuc, 0x62c000, 0x0f0f0000);
20:48 karolherbst: liones there
20:48 karolherbst: remove all 4 occurences
20:48 karolherbst: I know that issue
20:48 karolherbst: I thought I fixed that, but meh
20:48 karolherbst: seems like we need another check
20:49 karolherbst: NanoSector: found all 4?
20:49 NanoSector: oh I just rebooted :x
20:49 NanoSector: what branch do I clone?
20:49 karolherbst: take whatever you have
20:49 NanoSector: I have no nouveau source code on my laptop right now
20:50 karolherbst: uhh
20:50 karolherbst: I thought you compile your kernel yourself or so
20:50 NanoSector: nope, stock Arch 4.8.10 kernel
20:50 karolherbst: I see
20:50 karolherbst: git clone https://github.com/karolherbst/nouveau.git -b stable_reclocking_kepler_v6
20:51 NanoSector: thanks
20:51 NanoSector: you released v6?
20:51 karolherbst: well, "released" I stalled the work, but I should finish it
20:51 karolherbst: the important bits are done though
20:51 karolherbst: and merged for 4.10
20:51 karolherbst: wait
20:52 karolherbst: NanoSector: I will create a branch for you
20:52 NanoSector: I can't find any bits of the code you mentioned
20:52 NanoSector: in that file
20:52 NanoSector: or i just can't type, derp
20:53 karolherbst: NanoSector: branch: NanoSector_test
20:53 karolherbst: I basically wrote the commit already: https://github.com/karolherbst/nouveau/commit/62287b62b30e7db4d3d4c8c6ea416c98673cbd92
20:54 NanoSector: cloned, thanks
20:54 karolherbst: silly issue :(
20:54 NanoSector: :( yeah
20:54 NanoSector: so now i just cd drm; make drm?
20:54 karolherbst: especillay, because I wrote this patch: https://github.com/karolherbst/nouveau/commit/0361b076a8ca3bce20f64d0bb82405712548c07d
20:55 NanoSector: oh, you're only supposed to write there when there's actually a display to talk to?
20:55 karolherbst: maybe?
20:55 karolherbst: well 0x62c000 is part of PDISPLAY
20:55 karolherbst: even though my nvidia gpu doesn't have any display connectors at all, it still has to be written
20:55 karolherbst: so your bets are nearly as good as mine
20:55 NanoSector: heh I trust you know more about this stuff :P
20:56 karolherbst: there could be flags somewhere in the vbios
20:56 karolherbst: so...
20:56 NanoSector: you did have a copy of my vbios right?
20:56 karolherbst: yes
20:56 NanoSector: okay cool
20:56 karolherbst: I am sure skeggsb already knows some bits which might decide this
20:56 NanoSector: it's building btw
20:58 NanoSector: must be so hard, developing for GPUs without having any clue how they really do work :(
20:58 karolherbst: it's actually more fun than cheating from the specs
20:58 NanoSector: I suppose it's really fun to actually see your code work
21:00 NanoSector: okay installed the new compiled module, rebooting
21:01 mupuf: karolherbst: funny how the optimal period depends on the frequency of the PWM (the first number)
21:01 mupuf: http://pastebin.com/8WbRXZj6
21:01 karolherbst: :D
21:01 mupuf: but then it is not a linear error
21:02 mupuf: err
21:02 mupuf: I mean the difference between 4096 and 8192 is not half the one between 8192 and 16384
21:02 karolherbst: mupuf: funny how 1000 and 2500 are the same
21:03 mupuf: and 8192/16384 are also the same
21:03 karolherbst: well
21:03 karolherbst: they aren't
21:03 mupuf: no, but they are quite close
21:04 mupuf: and the values are noisy, FYI
21:04 karolherbst: I figured
21:05 mupuf: FYI, here is how I compute the period:
21:05 mupuf: period = pwm_freq * period_factor / 65536
21:05 mupuf: period_factor is the number from the pastebin
21:05 mupuf: and pwm_freq is obviously the first number
21:06 karolherbst: mhh
21:06 NanoSector: oh boy was scared for a minute
21:06 NanoSector: depmod -A didn't add the vfat module
21:06 NanoSector: so /boot wouldn't mount :x
21:07 NanoSector: but it works now :D
21:07 karolherbst: yeah well
21:07 karolherbst: that's the reason I always want to see what nouveau actually does :p
21:08 karolherbst: silly piece of frigging gpu that reg that is...
21:08 NanoSector: echo 07 > pstate goes through now
21:08 NanoSector: 0f as well
21:08 karolherbst: skeggsb: seems like there is another heuristic to disable the 62c000 writes
21:08 NanoSector: seems to clock fine now
21:08 NanoSector: shall I try running glxspheres?
21:08 karolherbst: NanoSector: of course
21:09 karolherbst: stupid reg
21:09 NanoSector: ok running glxspheres64 hangs the GPU
21:09 karolherbst: NanoSector: what does dmesg say?
21:10 NanoSector: [ 225.352441] nouveau 0000:01:00.0: gr: TRAP ch 4 [00ff7d3000 glxspheres64[12946]]
21:10 karolherbst: ohh well
21:10 NanoSector: 0.261413 frames/sec - 0.291737 Mpixels/sec
21:10 NanoSector: heh
21:10 karolherbst: could be context switching issues
21:10 karolherbst: no idea
21:10 karolherbst: never had those
21:11 karolherbst: odd though that happen only when reclocked
21:11 karolherbst: maybe something else it odd, but it ain't memory related afaik
21:11 NanoSector: i think, haven't tried with default clocks
21:11 karolherbst: :D
21:11 NanoSector: but I was playing Portal Stories: Mel on default clocks on the default module
21:12 karolherbst: yeah, htose hangs are quite random sometimes
21:12 NanoSector: and that worked
21:13 NanoSector: oh, the glxspheres window disappeared
21:14 karolherbst: sometimes nouveau manages to recover
21:15 NanoSector: nah not this time :P
21:15 NanoSector: but yeah glxspheres64 works on the default clocks
21:15 karolherbst: NanoSector: try to reclock to 07 while it is running
21:16 NanoSector: is fine
21:16 karolherbst: now try 0a
21:16 NanoSector: hang
21:16 karolherbst: oh dear
21:16 karolherbst: switch to 07
21:16 NanoSector: still hanging
21:16 NanoSector: it has reclocked though
21:17 NanoSector: pstate now shows 07
21:17 karolherbst: launch another glxpsheres for fun
21:17 NanoSector: oh, dmesg stopped spamming traps
21:17 NanoSector: uh
21:18 karolherbst: okay, I think I know what issue this might be, odd
21:18 NanoSector: so
21:18 NanoSector: i launched another glxspheres64
21:18 NanoSector: it hung
21:18 NanoSector: then it unfroze
21:18 NanoSector: and now there's a black window staring at me
21:18 karolherbst: mhhh okay
21:18 NanoSector: [ 202.978081] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
21:18 NanoSector: [ 202.978088] nouveau 0000:01:00.0: fifo: gr engine fault on channel 4, recovering...
21:19 karolherbst: NanoSector: pstate shows something around 967 MHz?
21:19 NanoSector: hmm
21:19 karolherbst: NanoSector: last line
21:19 NanoSector: AC: core 405 MHz memory 810 MHz
21:19 karolherbst: ohh right
21:19 karolherbst: on 0a I meant
21:19 NanoSector: oh
21:19 NanoSector: pushing it to 0a
21:19 NanoSector: AC: core 966 MHz memory 1600 MHz
21:20 karolherbst: k
21:20 NanoSector: so that part works :P
21:20 NanoSector: progress!
21:20 karolherbst: sure
21:20 karolherbst: mhh
21:20 NanoSector: stay positive :D
21:20 karolherbst: okay
21:20 karolherbst: reboot and before launching anything echo 2 into boost
21:20 karolherbst: and 0a into pstate
21:20 NanoSector: there's no boost
21:21 karolherbst: uhhh
21:21 karolherbst: there should be
21:21 NanoSector: welp
21:21 NanoSector: clients gem_names name pstate vbios.rom
21:21 karolherbst: let me check
21:21 karolherbst: on which branch are you?
21:21 NanoSector: NanoSector_test
21:22 karolherbst: uhh crap
21:22 karolherbst: I used the wrong base
21:22 NanoSector: heh
21:22 NanoSector: it happens
21:22 NanoSector: I can manually change stable_reclocking_v6 if you want
21:23 karolherbst: git fetch origin; git reset --hard origin/NanoSector_test
21:23 NanoSector: done
21:23 karolherbst: recompile and so on
21:24 NanoSector: aight
21:24 karolherbst: okay, none of my reclocking fixes were on your last build
21:24 karolherbst: soooo
21:25 NanoSector: thanks for taking the time for looking at this btw, other projects could take an example :)
21:27 NanoSector: should be almost done
21:28 NanoSector: there we are
21:28 NanoSector: copying it in place and rebooting, brb
21:30 NanoSector: ok now there is a boost file
21:31 NanoSector: so 2 into boost and 0a into pstate?
21:31 NanoSector: ok that went through
21:32 NanoSector: oh, it's not happy
21:32 NanoSector: not at all
21:32 NanoSector: fifo: CHSW_ERROR 00000002
21:34 NanoSector: setting pstate 07 makes it even more unhappy
21:35 NanoSector: before trying to go to 07: http://sprunge.us/SKHi
21:35 NanoSector: after trying to go to 07: http://sprunge.us/JaKD
21:36 NanoSector: well, that's not very descriptive
21:39 karolherbst: not really
21:39 NanoSector: heh
21:39 NanoSector: figured there'd be more in the logs
21:39 karolherbst: mhh thinking
21:39 karolherbst: okay, I have an idea
21:41 NanoSector: hmm?
21:41 karolherbst: drm/nouveau/nvkm/subdev/clk/base.c
21:41 karolherbst: line 684
21:41 karolherbst: clk->base_khz = base.clock_mhz * 1000;
21:41 karolherbst: replace it with
21:41 karolherbst: clk->base_khz = 1777000
21:42 NanoSector: changed
21:42 NanoSector: rebooting
21:43 NanoSector: ok back
21:44 NanoSector: the boost frequencies changed a bit
21:44 karolherbst: yeah
21:44 karolherbst: try to reclock without setting any boost things
21:44 NanoSector: aight
21:44 NanoSector: 0a?
21:44 karolherbst: yeah
21:45 NanoSector: hey, that works
21:45 karolherbst: yeah
21:45 NanoSector: 0f works as well
21:45 karolherbst: okay
21:45 karolherbst: nice
21:45 karolherbst: another evidence for my theory
21:45 NanoSector: :D
21:46 NanoSector: shall i change boost?
21:46 karolherbst: I think there is a frequency limit we actually have to do something differently
21:46 karolherbst: NanoSector: no, won't work
21:46 NanoSector: ah okay
21:46 karolherbst: well you can try, but it shouldn't work
21:46 karolherbst: the point is, that after a certein frequency, we have to do something we don't do yet
21:46 NanoSector: yeah it goes back to that CHSW_ERROR thing
21:46 NanoSector: ohh okay
21:46 karolherbst: and this seems to vary between gpus as well
21:47 NanoSector: yeah you mentioned that my clocks were high
21:47 karolherbst: well
21:47 karolherbst: the boost 0 clock isn't that high
21:47 NanoSector: but boost 2 is?
21:47 karolherbst: well you won't reach the boost 2 clocks
21:47 karolherbst: maybe boost 1
21:47 NanoSector: oh i set boost 2
21:47 NanoSector: heh
21:47 karolherbst: yeah
21:47 karolherbst: somewhere between 1 and 2
21:48 karolherbst: most likely
21:48 NanoSector: if I set boost 2 the core clock goes to 1137 MHz, not the 1163 MHz it says
21:48 karolherbst: right
21:48 karolherbst: it is related to the voltage limits
21:48 karolherbst: 1163 requires more voltage than your hw can set
21:48 NanoSector: ah
21:48 karolherbst: basically
21:48 karolherbst: there is also some temperature magic going on
21:48 NanoSector: so the card could clock to that frequency but my laptop doesn't allow it
21:48 karolherbst: well, your gpu doesn't
21:49 karolherbst: or rather your vbios doesn't
21:49 NanoSector: oh i see
21:50 NanoSector: what's your theory by the way?
21:50 karolherbst: nouveau doesn't do something it needs to do
21:50 NanoSector: 99% certain i won't understand anyway, but worth a shot :P
21:50 NanoSector: oh that was your theory, okay
21:50 karolherbst: basically
21:50 NanoSector: hmm
21:50 NanoSector: so you need to figure out what that something is?
21:50 karolherbst: okay, correction: nouveau doesn't do something it needs to do on high clocks
21:50 karolherbst: yes
21:51 NanoSector: can i test something with the nvidia driver?
21:51 NanoSector: to provide a point of comparison
21:51 karolherbst: mind trying out with 1856000 ?
21:51 NanoSector: yeah sure
21:51 NanoSector: will have to leave in a bit though
21:53 karolherbst: no worries, I just want to know if it works with 1856000 or not
21:53 NanoSector: back
21:53 karolherbst: so that I somewhat know the limit
21:54 NanoSector: boost changed a bit again
21:54 NanoSector: 0a works
21:54 karolherbst: okay
21:55 NanoSector: 0f works
21:55 karolherbst: 928 MHz, interesting
21:55 karolherbst: I can get nvidia to clock mine up to 997 MHz, maybe I can figure something out myself
21:55 karolherbst: stock max is 862MHz though
21:59 NanoSector: just ran Portal Stories: Mel on nouveau and it ran fine, pretty fluent too
22:02 NanoSector: so that's a bunch of issues solved at least :) thanks
22:03 NanoSector: i'll have another gamble when you find anything new
22:03 NanoSector: anyway, heading out now, cya :)
22:12 karolherbst: mupuf: crap, I overclocked my GPU by 100MHz without changing voltage and no crash :(
22:13 mupuf: yes, so?
22:13 mupuf: try runnign heaven and we'll see :D
22:13 karolherbst: I run pixmark_piano
22:13 mupuf: and even then
22:13 mupuf: there is the voltage guard band, remember?
22:13 karolherbst: my votlage is like 1.0V
22:14 karolherbst: and on NanoSector gpu the clock crashed the card, allthough 50MHz lower worked like a charm
22:14 karolherbst: this will be fun to figure out
22:15 karolherbst: I am really sure there is a gpu specific clock after which we have to do something special in the reclocking process
22:20 karolherbst: hihi
22:20 karolherbst: glxspheres64 crashed it now
22:23 karolherbst: mupuf: what do you mean by the voltage guard band though?
22:24 mupuf: karolherbst: you need to read up again on brownouts ;)
22:24 karolherbst: ohh that thing
22:25 karolherbst: anyway, I don't get my gpu
22:25 karolherbst: I went now 200MHz above my vbios max clock and most of the time everything is running just fine
22:26 karolherbst: I still have a buffer of 0.2V
22:27 karolherbst: nah, that is pointelss to do on mine :/
22:28 karolherbst: mupuf: all your GPUs reclocked just fine with my patches, right?
22:28 karolherbst: the kepler and maxwell ones
22:29 mupuf: well, I have not tried them in some time
22:36 karolherbst: uhh interesting
22:36 karolherbst: NanoSectors gpu, nvidia doesn't touch PDISPLAY at all
22:37 karolherbst: PDISPLAY+0x2004 => 0xbadf1300
22:37 karolherbst: uhh I think this helps
22:37 karolherbst: greped PDISPLAY: https://gist.github.com/karolherbst/de3d1271a992583bdf389d881bbce792
22:38 karolherbst: on mine GPU nvidia touches PDISPLAY many times
22:39 karolherbst: R 0x022500 0x00000200 PUNITS.HW_MISC_DISABLE => { PCOPY_MASK = 0x2 | UNK12 = 0 }
22:39 karolherbst: :)
22:39 karolherbst: ohhh
22:39 karolherbst: this is read only :O
22:39 karolherbst: !
22:39 karolherbst: so if 0x1 is set on 22500 -> PDISPLAY no touchy touchy
22:41 karolherbst: nice
22:41 karolherbst: and all the regs aren't 0xbadf for me as well
22:42 imirkin_: karolherbst: we check that in nouveau ... in theory
22:42 karolherbst: imirkin_: seems like not in every case
22:42 karolherbst: or we don't check everywhere actually before touching those regs
22:43 karolherbst: I tried to fix it with this patch: https://github.com/karolherbst/nouveau/commit/0361b076a8ca3bce20f64d0bb82405712548c07d
22:43 imirkin_: ah right. i thought that went in
22:43 karolherbst: it actually went in
22:44 imirkin_: ah ok. well, i guess there's more
22:44 karolherbst: yes
22:44 imirkin_: but we TRY not to touch it ;)
22:44 karolherbst: :D
22:44 karolherbst: it helped somebody
22:44 karolherbst: so yeah, I thnk that 22500 0x1 bit might be a good flag to decide this
22:44 karolherbst: I will scan through the traces tomorrow
22:54 beep-eep: Hi there, I just installed a GTX 760 with nouveau. I would like to reclock it and have installed the most recent kernel to do so. I can set the pstate to 0f which will give me a core of 405-1228MHz and memory of 6008MHz however I am a little concerned that this may cause my card to overheat and there were a bunch of warning talking about how this was "highly experimental" on the arch wiki. Is it safe to use
22:54 beep-eep: the highest state of 0f or would it be better to use something lower, say a state of 0a?
22:55 imirkin_: beep-eep: first off, which kernel did you install?
22:55 imirkin_: beep-eep: secondly, while it may be possible, i don't believe i'm aware of any hw permanently harmed as a result of our reclocking efforts.
22:56 imirkin_: the fan should spin up when it heats up
22:56 beep-eep: imirkin_: 4.8.10-gnu-1 Linux Libre
22:56 imirkin_: ok - that's missing some crucial patches to improve reclocking
22:56 imirkin_: chances are the reclock largely failed. check the "AC:" line
22:57 beep-eep: imikrin_: so should I compile the newest kernel from source? i.e. 4.9.x?
22:57 imirkin_: beep-eep: i'd recommend grabbing the drm-next branch
22:57 imirkin_: unfortunately those patches are only on their way to kernel 4.10
22:57 imirkin_: https://cgit.freedesktop.org/~airlied/linux/log/?h=drm-next
22:58 beep-eep: imirkin_: Right, thanks. I'll get on with compiling it.
22:58 imirkin_: but check, perhaps you got lucky
22:58 imirkin_: pastebin the pstate file?
22:59 beep-eep: imirkin_: Well, I did not reclock it yet.
22:59 beep-eep: imirkin_: And yeah, I'll throw it on a pastebin.
22:59 imirkin_: oh ok
22:59 imirkin_: be ready for the whoel box to just hang, too
23:00 imirkin_: 0a is more likely to work properly
23:00 imirkin_: but all this stuff is very board-specific, unfortunately
23:01 beep-eep: imirkin_: https://0bin.net/paste/3EU8nZJGeGw0e0gZ#124PJ0enPHPMCCg0jBbva9KPej90YRhft0bTTpmgvZM
23:02 imirkin_: right. i meant after you tried reclocking ;)
23:02 imirkin_: the AC: line tells you the current state
23:02 beep-eep: imirkin_: right, I'll try 0a first then.
23:03 beep-eep: imirkin_: right, it flickered for a split second but seems to be fine now
23:04 imirkin_: that's expected
23:04 imirkin_: we don't operate the linebuffer properly =/
23:04 beep-eep: imirkin_: https://0bin.net/paste/IzqsVdQ24vkyQCcL#d+j5drVIe1ZptEXGoOtww0FKu6F5zs+g1gwhe1tN62x
23:05 imirkin_: cool, so that worked
23:05 imirkin_: you can compare the AC line to the 0a line
23:05 beep-eep: imirkin_: according to sensors, I am getting a slightly higher temperature approx. +4 celcius
23:05 imirkin_: and see that they're the same
23:05 imirkin_: or same-enough
23:06 beep-eep: imirkin_: so should I try 0f or would it be better to compile the drm-next branch first?
23:07 imirkin_: you can try it. unclear what'll happen :)
23:07 beep-eep: imirkin_: well, here goes
23:08 beep-eep: imirkin_: Only the memory clock change, no difference in core clock
23:08 imirkin_: yeah. probably because the voltage change failed
23:08 imirkin_: (you should see it in dmesg)
23:08 imirkin_: however the clock is still reasonably high
23:08 beep-eep: imirkin_: https://0bin.net/paste/bZaBXZ16ci4pjTjL#GG0cG733jNNAZYtNkqxX8gy9mPevcBhoBYCreUjUmvV
23:08 imirkin_: so you'll get a lot of the benefit
23:08 beep-eep: imirkin_: yeah
23:08 imirkin_: right
23:09 beep-eep: imirkin_: Just checked and yes it deos say failed to raise voltage -22 in dmesg
23:11 imirkin_: if you build drm-next it should clock up the core clock as well
23:11 imirkin_: (but potentially not to the 1200mhz that you see)
23:12 beep-eep: imirkin_: that's still quite a significant increase in clock speed
23:12 imirkin_: yea
23:15 mupuf: imirkin_: can I use the option "call to a friend" for this period function?
23:15 mupuf: I am fucking close ... but I am obviously not doing the right thing
23:15 imirkin_: sure. just make sure to pick the right friend :)
23:17 mupuf: imirkin_: :p
23:18 mupuf: I will send an email to recap the current state of my understanding
23:27 fubu: ssij
23:36 fubu: xD
23:50 fubu: u