08:25Tom^: imirkin: would a soon dying 780ti be of any use for anyone of you guys? , entire screen flickers randomly from time to time when running gpu heavy tasks
08:25Tom^: it runs X and the games fine otherwise :P
08:27Tom^: just waiting for rx 580 to get in stock in the store and then i can post this card freely to any of you, if its of any worth.
09:01pmoreau: Tom^: Nooooo, not your legendary 780Ti that has the best (or second best?) results on Nouveau… :'( It is too young to die!
09:02Tom^: pmoreau: well its weird, the coreclocks sticks to like 619mhz then flickering, then everything goes back to normal 1097 or so. and runs fine for a while then BAM flicker and core clock drops
09:02Tom^: and then all of a sudden all is well for a long time. :D
09:02Tom^: doesnt produce any sort of errors or so, but its annoying as hell playing games with
09:03pmoreau: That is weird… unless GPU temp goes too high and one of the therm protection of the card kicks in?
09:04Tom^: its almost like it does, and then things flickers because of core not keeping up with showing stuff. but the temps is 61C at most. have had this uber custom cooler for 2? years.
09:04Tom^: because fans died on stock heh
09:04Tom^: perhaps some wonky broken sensor *shrug*
09:05pmoreau: Does that happen on the blob as well?
09:05Tom^: thats where it happends
09:06Tom^: i almost thought it was a cs:go issue from some cs update. then i noticed it did the same thing in unigine heaven :p
09:07Tom^: so yeah HW issue, that appeared the last week or so. im assuming gpu.
09:12pmoreau: Tom^: Give it to mupuf, he will put it in the micro-wave and repair it! :-D
09:13Tom^: sure thing, stores say they will get the rx580 around late july or early august, so i guess its gonna be a month or two of flickering :/
09:14karolherbst: Tom^: you could run some tests which color pixels depending on which SM they are calculated on
09:15Tom^: do what now with what? :D
09:15karolherbst: no clue
09:15karolherbst: I just know there are some nvidia examples for that
09:15karolherbst: does it also happen with nouveau by the way?
09:15Tom^: havent tested
09:16karolherbst: you should
09:16karolherbst: would be fun to know what happens on nouveau
09:16karolherbst: maybe we get pgraph errors we couldn't figure out what they are for or something
09:17Tom^: or all my freezing or stuttering in the past with nouvea has simply been this dying component
09:19karolherbst: Tom^: dmesg output might help
09:21Tom^: actually you just reminded me i still have nvidia-drm built into my initramfs, hence i dont think simply just removing it from the kernel cmdline disables it. and it apparently can cause various issues.
09:23karolherbst: blacklist it
09:24karolherbst: or something like this, dunno
09:55Tom^: yep didnt help, oh well time to see if nouveau borks on it later then. :p
10:21karolherbst: "while (out) delete out;" mhhhh why you wanna have something like that in the first place
11:35karolherbst: uhh, I know what beginners can also do: run scan-build and fix some of those issues
12:33Tom^: karolherbst: ey what happend with the boost file
12:33Tom^: karolherbst: or isnt that mainlined yet?
12:33karolherbst: Tom^: it isn't
12:33Tom^: ah hm
12:34karolherbst: you can boot with nouveau.config=NvBoost=2 though
12:34Tom^: will do
12:53Tom^: karolherbst: funny, cant reproduce it.
12:53Tom^: novueau fixes hw issues. !
13:01Tom^: my bets are on a wonky sensor, so blob starts throttling. which nouveau doesnt yet use.
13:08Tom^: things have progressed since 2016-09-10 http://i.imgur.com/Yblt4Ns.png ~1115 score, http://i.imgur.com/QYR0rnh.png now 1252
13:11Tom^: guess that will do then until august
13:24john_cephalopoda: How can I reclock my GPU again? There was something about pstate in /sys, but I forgot the kernel flags that are needed to enable it.
13:26Tom^: john_cephalopoda: which kernel?
13:26Tom^: john_cephalopoda: and yea which gpu too? not all gpus is reclockable, yet. :p
13:26john_cephalopoda: I reclocked it with an earlier kernel.
13:26john_cephalopoda: 01:00.0 VGA compatible controller: NVIDIA Corporation GK106 [GeForce GTX 645 OEM] (rev a1)
13:27Tom^: john_cephalopoda: cat /sys/kernel/debug/dri/0/pstate to get available states and what its running is at the bottom the AC: xxx line
13:28karolherbst: Tom^: or doesn't hit a driver bug
13:28Tom^: john_cephalopoda: then simply just echo 0f | sudo tee /sys/kernel/debug/dri/0/pstate , for example.
13:28Tom^: karolherbst: yeah well i havent changed anything in quite a while, and then it all of a sudden appeared
13:29Tom^: karolherbst: which made me believe cs:go or steam update. but since it happend in unigine-heaven too i kinda ruled software out
13:29karolherbst: or maybe we just don't use that broken part in nouveau
13:29karolherbst: who knows
13:29Tom^: do you throttle on temp?
13:29karolherbst: that's why boost=0 is default
13:29karolherbst: but as long as your GPU doesn't hit 100°C everything is fine
13:30Tom^: its at 67C running unigine maxed out.
13:30Tom^: this cooler is <3
13:30john_cephalopoda: Tom^: /sys/kernel/debug/dri doesn't exist for me.
13:30karolherbst: john_cephalopoda: is debugfs mounted?
13:30karolherbst: john_cephalopoda: also you need to be root
13:30john_cephalopoda: karolherbst: That's why I asked. Because I wasn't sure any more what to mount.
13:31karolherbst: john_cephalopoda: mount | grep debugfs
13:31john_cephalopoda: karolherbst: I know that it's not mounted.
13:31karolherbst: okay, well you need it mounted ;)
13:45john_cephalopoda: Ok, added to fstab. Now compiling the kernel and then it should be there after the reboot.
14:22karolherbst: imirkin: was there anything you tried out regarding that use-after-free fence in those CTS tests?
14:25karolherbst: imirkin: there is something I quite didn't figure out: while running glcts it triggers, even when creating a trace. But when I play that trace it doesn't
14:28karolherbst: or something else
14:40karolherbst: imirkin: okay, running the CTS, the context is considered flushed after nvc0_state_validate, in glretrace it isn't. Any ideas why that is?
14:40karolherbst: allthough there are two calls to BCTX_REFN in the CTS which aren't there with glretrace before that
14:56imirkin_: if i did try something, i've long forgotten what it was
14:59karolherbst: I think I will set a memory watchpoint to figure out why that nvc0->state.flushed is different between glcts and glreatrace
17:02karolherbst: Tom^: how does it go until now?
17:02Tom^: oh it works fine on nouveau
17:02karolherbst: do you still have the same issues in cs:go as before?
17:02Tom^: so im sticking here until it either burns or the stores get that rx 580
17:03Tom^: karolherbst: not really, the fps is a tad bit to low for 144hz in some places but only a little bit
17:03Tom^: well yeah xD
17:03karolherbst: complaining about not getting 144fps with nouveau
17:04karolherbst: but I was refering to those slowdowns on gun fire
17:04Tom^: heh yeah, nope.
17:04karolherbst: I guess it is good enough if it stays above 72fps at least all the time
17:04Tom^: its above 120 at all time
17:05Tom^: on low settings tho, but thats how its meant to be played anyways
17:05karolherbst: settings maxed out? or game settings :p
17:05Tom^: gamer! :p
17:05karolherbst: why did I even ask
17:06karolherbst: textures maxed out though I assume?
17:06karolherbst: I keep wondering why they even keep that settings, but then I forget that some GPUs are still low on mem
17:46mangix: karolherbst: any insight on this?: https://bugzilla.redhat.com/show_bug.cgi?id=1461632
17:47karolherbst: mangix: git bisect might help
17:48karolherbst: mangix: I think it would be indeed enough to just bisect the nouveau secboot related commits though
17:59mangix: hrm. can i use nouveau as a module to test this?
18:00imirkin_: wasn't there a bug filed on the fd.o tracker by you for this?
18:01mangix: linked bug report is mine.
18:01imirkin_: ok. i thought i saw a similar one on fd.o. but i can't find it.
18:13karolherbst: imirkin_: yeah, I thought so as well
18:13karolherbst: couldn't find it
18:15karolherbst: there is only this bug with the 970, but here it's a 980
18:15mangix: so how does building the out of tree module work? i reverted some of Gnurou's changes, ran make, make install, still fails
18:15karolherbst: I can test it on a gm206 here
18:15imirkin_: mangix: cd drm; make
18:15karolherbst: mangix: ain't that easy
18:16karolherbst: mangix: you are on 4.11?
18:16mangix: yeah, i disabled GDM
18:16karolherbst: use the master_4.11 branch
18:16karolherbst: but this also contains 4.12 and 4.13 stuff
18:16karolherbst: should be fine though
18:17mangix: so basically, make, sudo make install, and reboot, right?
18:17karolherbst: mangix: ohh, so I guess this only happens if rendering is done on the GPU, which makes sense kind of
18:17karolherbst: mangix: not really
18:17karolherbst: mangix: you should delete the original nouveau.ko file
18:17karolherbst: some had problems where the original module was picked up instead
18:17karolherbst: or rename it
18:18mangix: hrm ok
18:19mangix: it's a .xz file
18:19karolherbst: yeah, doesn't matter
18:19mangix: i renamed to .xz.bak. hope that works
18:20karolherbst: mangix: you mean those secboot commits from februrary, didn't you?
18:21karolherbst: anyhow, I try to get the same thing on the gm206 I have access to
18:23mangix: yeah. i'm assuming those are the cause
18:24karolherbst: mangix: does lspci hang your machine?
18:25karolherbst: ohhh right, here I have runpm issue on that machine, which got fixed
18:26mangix: lspci works fine
18:27karolherbst: yeah, I just have another issue here I am now too lazy to figure out, because it got already fixed later and I don't want to mess up that machine
18:27mangix: problem still exists
18:27karolherbst: mangix: on which commit are you?
18:27mangix: never changed branches
18:28karolherbst: with the problem still exists you mean the hang or that it doesn't compile?
18:29karolherbst: if it's the latter, please provide the output
18:30mangix: it compiles
18:30mangix: systemctl start gdm results in crashing
18:30karolherbst: okay, then you can start bisecting
18:31karolherbst: I'll check for a good starting commit
18:32karolherbst: mangix: do you know how git bisect?
18:32mangix: copying that will be difficult with no DE
18:33karolherbst: you can always shorten commit hashes
18:33karolherbst: 7 chars are usually enough
18:33karolherbst: sometimes it works with 4, someitmes you need 8+
18:33karolherbst: fd345a61e should be enough
18:35mangix: alright, switched to desktop
18:35mangix: this should be easier
18:38mangix: kernel panic
18:38karolherbst: a different one?
18:39mangix: does git log show commits in order?
18:40karolherbst: did you built the fd345a61e commit?
18:40mangix: i did
18:40mangix: removed the .xz.bak file as well just to be sure
18:41mangix: something is weird though, it still shows gr: init failed
18:42mangix: i'm at 767abc
18:44mangix: is there a way to see if it's the module that's being loaded and not some kernel module?
18:45karolherbst: now it is getting ugly
18:45karolherbst: mangix: the master_4.11 branchs top commit also caused the hang, right?
18:45karolherbst: mangix: do you still have your 4.10 kernel installation?
18:46mangix: i do
18:46mangix: it's 4.10.17
18:47karolherbst: I totally forgot, that my branch is for newer features and not for using older commits on newer kernels. For that I need to backport everything ontop of 4.10, so that you can figure out what commit messes it up
18:47karolherbst: mangix: can you check if my master_4.10 branch causes the same issue?
18:48karolherbst: but then on top of 4.10
18:49mangix: oh build it on 4.10?
18:49karolherbst: mangix: yes
18:49karolherbst: mangix: my master_4.10 contains the features for 4.11 but it's compatible with a 4.10 kernel
18:49mangix: oh ok
18:51mangix: alright i think i've been doing this wrong
18:51mangix: nouveau .ko is in /usr/lib/modules, right?
18:52karolherbst: mangix: no clue, usually it is /lib/modules
18:52Tom^: /usr/lib/modules/4.11.2-1-zen/kernel/drivers/gpu/drm/nouveau/nouveau.ko.gz , here.
18:52karolherbst: but some distributions switch to /usr == /
18:52Tom^: isnt /lib a symlink on like all the distros nowadays?
18:53karolherbst: well, not on my machine :p
18:53mangix: now i have to wonder what booted
18:53karolherbst: uname -a
18:54mangix: i meant nouveau wise
18:54mangix: i removed the nouveau.ko.xz file but is still see the driver loaded
18:54mangix: now i have to wonder where it came from
18:56mangix: i'm guessing i'd need to regenerate it
18:57Tom^: depends if you have nouveau in it
18:58mangix: any way to find out?
18:58Tom^: lsinitcpio /boot/initramfs-linux.img | grep nouveau
18:59Tom^: i guess
19:00mangix: yep, it's there
19:07mangix: removed it
19:07mangix: lsmod shows no nouveau
19:07mangix: oh boy
19:08mangix: how do i insert the freshly compiled one?
19:09mangix: no such file or directory
19:09mangix: insmod nouveau that is
19:09karolherbst: you need to give the path to insmod
19:10mangix: where does the makefile install it to?
19:10karolherbst: but you should be able to modprobe it
19:10karolherbst: if you install via make install that is
19:11mangix: don't see it
19:11mangix: only module in extra is the one for my clevo
19:12karolherbst: if you are inside that drm directory: insmod nouveau/nouveau.ko
19:12mangix: oh i removed that
19:12karolherbst: I meant inside the repository
19:13mangix: no such file
19:13karolherbst: ohh, maybe it ends with .xz? no idea how the kernel config messes with it
19:14karolherbst: mhhh, maybe you need to recompile it
19:14mangix: i'm looking at the makefile, it doesn
19:14mangix: 't list it
19:14mangix: make output
19:14mangix: stops at bin/nv_rs16
19:15karolherbst: you need to run make inside drm
19:15mangix: now that makes a lot more sense
19:25mangix: master_4.11 causes a weird visual glitch
19:26karolherbst: mangix: but does gdm run without problems?
19:27mangix: text is vertically squished in addition to being duplicated horizontally 4 times
19:27mangix: and no
19:27mangix: doesn't look like it
19:27karolherbst: that's a new one
19:27karolherbst: ahh okay
19:27karolherbst: best way to debug things is now to start from master_4.10 and check which commit breaks secboot for you
19:33mangix: same issue with master_4.10. this is one funnny artifact
19:33mangix: want a picture?
19:36karolherbst: mangix: commit bc42f940e476b42ed90938b124e21a34b60584b2 shouldn't have your bug
19:37mangix: will test soon
19:37mangix: i can't read the text on the screen
19:38mangix: so glad i enabled ssh access
19:45mangix: karolherbst: that commit does not have me bug
19:49mangix: switched to commit prior to enabling GR
19:49mangix: i have a feeling it will work
20:01karolherbst: mangix: usually there is git bisect to figure out which commit broke stuff
20:11mangix: trying aa53e15839a527c2ce0edc497c0ee3e99000c417 now.
20:12mangix: makes me wonder if the reason this bug was never found was because i'm using a non-reference design on my GPU or because it has a custom VBIOS
20:21Tom^: karolherbst: things are working wonders, played 3 competetive games now. only a few places with bunch of smoke grenades etc where i dip down to like 90fps. but otherwise its crystal. weirdly enough it feels like the input latency is lower or something.
20:23karolherbst: mangix: maybe both, who knows
20:23karolherbst: Tom^: that makes sense
20:23Tom^: it does?
20:24karolherbst: Tom^: Nvidia doesn't integrate as nicely into all that kernel stuff as nouveau does, for legal reasons
20:24karolherbst: or maybe they didn't bother to switch over to new stuff
20:24karolherbst: with nvidia you also have no xrandr support really
20:25mangix: seems there are really two bugs
20:25karolherbst: mangix: yeah, most likely
20:26mangix: dmesg telling me that GR init failed also guarantees that GDM will not start properly
20:26mangix: but the function crash is another
20:26karolherbst: mupuf: I think we really need to work on QA a lot more, would be a good idea to get something working in this regard soon
20:27karolherbst: mangix: :( would be good to know what causes gr init to fail
20:28Lyude: we should try adding some stuff into igt
20:29karolherbst: Lyude: yeah
20:29mangix: karolherbst: working on it :)
20:29mupuf: karolherbst: of course!
20:29Lyude: if you guys want any help with that as well let me know, I have done a lot of igt stuff
20:29mupuf: Working on it hard for Intel, I will duplicate thatr for nouveau ASAP
20:30mupuf: I am finishing the multi-node support for ezbench
20:30mupuf: this way, we can control the work sent on multiple machines
20:30karolherbst: mupuf: would you mind to do an inventory for all the GPUs you have?
20:30mangix: one of Gnurou
20:30mangix: 's commits
20:30mupuf: yeah, I will have to do that
20:30karolherbst: mangix: no surprise
20:30mupuf: I have a decent amount of GPUs :D
20:31karolherbst: mupuf: yeah :p
20:31karolherbst: mupuf: mind also want to get an insurance for that, we could even split the costs :p
20:31karolherbst: no idea how much of it will be covered by normal stuff
20:32karolherbst: not that we suddenly lose all those GPUs :(
20:32mupuf: I have a pretty decent home insurance already
20:32mupuf: but I guess I need to check again with them if they cover it
20:32karolherbst: please do
20:33mupuf: but don't worry about the cost, I offer it to the community
20:33karolherbst: hey, let me pay something as well! :p
20:35karolherbst: this is super odd....
20:35karolherbst: when I run that test with glcts I get some readPixels call, but with glretrace I don't
20:37karolherbst: but it's in the trace
20:42mangix: ah ha
20:43mupuf: karolherbst: you'll pay by having your own CI node ;)
20:43mupuf: and keeping it accessible
20:43mupuf: and paying for the electricity
20:43karolherbst: difficult right now, maybe end of year I get to set it up :D
20:43mangix: it's between 571701c2 and 9bad5ce
20:43mangix: the latter fails
20:44mangix: erm former fails
20:44mangix: latter works
20:45karolherbst: mupuf: ohhh wait, I have an idea where I could get hardware for that...
20:47karolherbst: mangix: huh, 9bad5ce doesn't exists
20:47mangix: 571701c20e7c14c7b8f887445406b0ef73ccf8d0 is the bad commit
20:47karolherbst: guess why one shouldn't write big commits like this
20:48mangix: nope. saw a similar issue in a out of tree wifi driver as well
20:48mangix: ran the codebase through some static code analyzer
20:49mangix: no idea what part broke
20:49karolherbst: this is gonna be fun
20:50mangix: also i meant 9bed5ce
20:50mangix: my vision failed me :)
20:50karolherbst: okay, I have an idea what's going on
20:50karolherbst: but mhhh
20:51karolherbst: basically secboot runs into a timeout, because something went wrong seting up the PMU, maybe
20:51mangix: i remember seeing that in dmesg
20:51mangix: while bisecting
20:55karolherbst: let me figure out what exactly went wrong
21:01mupuf: night guy
21:01karolherbst: he changed/renamed code
21:01karolherbst: so much for "This is a big commit, but it essentially moves things around (and split the nvkm_secboot structure into two, nvkm_secboot and nvkm_acr). Code semantics should not be affected."
21:02karolherbst: some structs are also completly gone
21:02karolherbst: this will be fun
21:08karolherbst: mangix: can you once load nouveau with debug=secboot=debug on the broken and the last working commit?
21:15mangix: prefixed with nouveau?
21:15karolherbst: depends on how you load the module
21:16mangix: i just boot and it loads
21:16karolherbst: then prefixed
21:17mangix: so nouveau.debug=secboot=debug ?
21:21mangix: on working
21:21mangix: acquited PMU falcon and released
21:25mangix: karolherbst: same thing
21:25karolherbst: mangix: well yeah, but I want the dmesg output
21:26mangix: give me a moment
21:27mangix: journalctl -b 0 -1 good or too verbose?
21:27karolherbst: only the kernel stuff
21:30mangix: aaaand my laptop broke
21:31mangix: i have to remove the battery to get it back
21:31mangix: anyway i remember seeing fifo PMU FAULT or something similar
21:31karolherbst: I found a code change
21:33karolherbst: this would be tiring to debug seriously...
21:33karolherbst: mangix: do you think you might be able to setup a machine with SSH access where somebody could connect to?
21:33karolherbst: dmesg might shed some light
21:36karolherbst: ..... okay this part here is indeed wrong
21:36mangix: i could
21:37mangix: now i think it is the BIOS causing the problem
21:37mangix: the VBIOS i'm currently using is unlocked, meaning no throttling and i'm assuming higher voltage
21:40mangix: one's journald other is dmesg
21:42karolherbst: mangix: you need bf741f2cd043cb082aa856f19329448295da466a
21:42karolherbst: but let me check something
21:44karolherbst: it's not on 4.11
21:44mangix: it's on 4.10
21:44karolherbst: I meant the kernel
21:44mangix: that's what i've been testing
21:45karolherbst: I meant the kernel source, the 4.11 kernel tree doesn't have a fix which is needed for GPUs with 4G+ VRAM
21:45mangix: did not know there was such a thingt
21:45karolherbst: but uhm
21:46karolherbst: mangix: are you sure you tested master_4.11?
21:46karolherbst: please test commit bf741f2cd043cb082aa856f19329448295da466a
21:46karolherbst: and check if you still get that timeout error
21:47mangix: on 4.11 now
21:48mangix: what was the issue with 4GB RAM btw?
21:48mangix: uhhh compile failed
21:54mangix: this is a newer commit that causes the crash in gf100_gr_init_ctxctl
22:01karolherbst: let me think
22:03karolherbst: mhh okay, this is a different issue indeed
22:03karolherbst: seems like there is a second secboot issue somewhere
22:05karolherbst: Lyude: didn't you had access to a maxwell2 GPU with 8GB vram or something?
22:07karolherbst: gnurou: do you think you might find some time to look into this?
22:09imirkin_: to be clear - there's no crash. just hits a WARN() statement
22:10imirkin_: because it waiting for some register to change times out
22:10karolherbst1: imirkin_: do you refer to that secboot fail?
22:10karolherbst1: yeah well
22:10karolherbst1: it basically means: secboot failed
22:10imirkin_: there was just talk of crash.
22:11karolherbst: and it crashes later on, because the driver is in an odd state
22:11imirkin_: some people misread "stacktrace" as "crash", which is just not the case.
22:11karolherbst: I think
22:11karolherbst: yeah, might happen
22:11karolherbst: anyway, I guess there could be some more 4GB+ vram messup in the code somewhere
22:12imirkin_: the prior messup with weird memory partitions
22:12karolherbst: yeah, but unrelated to this one
22:13karolherbst: while gnurou reworked that secboot code he missed that the secboot image could be located at an address 32bit+, so he wrote a patch to fix this
22:13karolherbst: but somewhere else is still something wrong and sadly the commit responsible is like a +-1.5k change
22:19imirkin_: which commit? (link?)
22:22imirkin_: and you're pretty sure it's THAT commit causing issues?
22:23karolherbst: not quite
22:23karolherbst: there was a commit fixing an issue within this commit
22:23karolherbst: but no idea if the other issue is also caused by it
22:23karolherbst: but that fix is also not on the 4.11 kernel branch
22:25karolherbst: best would be if somebody could have access to a non working GPU and analyse that issue in depth
22:31mangix: will try debugging 4.11 now
22:59mangix: that failed. at least i know it was that commit that broke it.
23:01mangix: 571701c2 that is