06:35mlankhorst: announce sent
07:39cousin_luigi:thought of giving nouveau another try after his distro updated the kernel. But it still freezes after a few minutes on his gtx660. Is there anything he can do besides buying a new card?
07:56woshty: I took a look at debugfs pstate with my G105M: setting a different state manually only changes shader freq, memory stays the same. However I get about the same reduction in temperature as with using close source drivers, so I am happy.
08:02loa: cousin_luigi, there is branch for nouveau, where this is totally fixed.
08:03loa: i had exactly same problem.
08:07RSpliet: cousin_luigi: some freeze-related fixes have been pushed to kernel 4.7 that made my Fermi and Kepler cards rock-solid... but a few issues remain open with several other Kepler cards.
08:09cousin_luigi: loa: I tried 4.7.4, what kernel version are you using? Or perhaps you built your own?
08:10cousin_luigi: RSpliet: Where can I find more details about that?
08:12loa: are not only in 4.9 possibly expected full support?
08:13cousin_luigi: Perhaps I should try 4.8rcsomething
08:15RSpliet: cousin_luigi: you are welcome to try. Make sure to obtain a log of your problems and compare them to this bug: https://bugs.freedesktop.org/show_bug.cgi?id=93629
08:16RSpliet: seems to be specific to NVE6, haven't encountered this (or something similar) on NVC1, NVCE, NVE4, NVE7, NVF0 since the fixes in 4.7
08:17cousin_luigi: RSpliet: I had tried about a year ago, but the system hangs, hard.
08:17cousin_luigi: RSpliet: So unless I try with a serial console I'm not sure what else I could do.
08:18RSpliet: cousin_luigi: you could try the "magic sysrq dance" to flush buffers to your harddrive prior to rebooting after a crash. Or netconsole if you have a second machine
08:19cousin_luigi: RSpliet: I have to admit I'm not familiar with either technique.
08:19RSpliet: but netconsole takes a little fiddling to get cracking, whereas magic sysrq can be enabled easily on most kernels
08:32cousin_luigi: RSpliet: I see. How can I check if the feature is compiled in from a running kernel?
08:32cousin_luigi: /boot/config ?
08:32RSpliet: that's one way
08:32RSpliet: distro usually compile it in but disable it
08:32RSpliet: don't know from the top of my head how, but... give it a good google
08:38cousin_luigi: RSpliet: CONFIG_MAGIC_SYSRQ=y CONFIG_MAGIC_SYSRQ_DEFAULT_ENABLE=0x1 <- I see this in the relevant config
08:39ZeekHuge: Hi ! why is xsensors on my thinkpad T430 reporting nouveau's temperature to be 0 deg ?
08:42ZeekHuge: even though I can feel its hot, as compared to when I am switching it off, using bbswitch ..
08:44cousin_luigi: RSpliet: ok done. How do I see if that magic sysrq saved something?
09:32RSpliet: cousin_luigi: depends on whether logs are managed by flat files (/var/log/messages) or SystemD (journalctl)
09:33cousin_luigi: RSpliet: the latter, I believe
09:33RSpliet: google knows how to get the last log out
09:35cousin_luigi: RSpliet: well, journalctl -r doesn't show anything pertinent
09:40RSpliet: is that for the current or the previous boot?
09:40RSpliet: there's some flags for that
09:41cousin_luigi: RSpliet: It's via chroot so it should be the last thing it ever wrote to the log.
12:37karolherbst: skeggsb: do you know if we have any form of documentation for the REed PMU interfaces?
12:48rkantos: win 27
12:57cousin_luigi: I guess I'm doomed. I'll have to get a new card if I want to use wayland:/
13:02ahuillet: RSpliet : ping?
13:07RSpliet: ahuillet: pong :-)
13:07ahuillet: well I was writing half a novel in the finnish webapp. :)
13:08RSpliet: ahuillet: I wrote a small thesis back 15 minutes ago :-P but IRC might be an easier discussion platform
13:08ahuillet: so basically my point is VM just won't help for the bugs where the machine hangs
13:09RSpliet: ah, there it is
13:09ahuillet: and *if* it's qualitatively different than no-VM, then congratulations, you've found a virtualization bug, which isn't what you're after in mupuf's work
13:10ahuillet: as for how to hang the system... it's deceptively simple.
13:10ahuillet: some regression in the shader compiler gives you an infinite loop in a fragment shader and depending on a bunch of variables you're welcome to reboot. even with a VM.
13:11RSpliet: the reason why I would personally favour a VM-based approach is because it avoids messing with the bootloader... Nothing prevents the host system to have the watchdog configured (without special grub support)
13:13RSpliet: but with the grub-assisted approach you need to reconfigure grub from EzBench, and SystemD to bring back the EzBench interface to analyse the reason of a hang or crash if any. I'd feel more confident just messing with a VM and a hypervisor, which are much less critical components in my system :-)
13:14ahuillet: oh, on your own system, sure
13:14ahuillet: but yo ucan actually master the grub and systemd stuff much better than the VM stuff
13:14RSpliet: Amazon and Google have gotten quite good at mastering all that VM stuff :-P
13:15ahuillet: with no HW graphics bugs whatsoever...
13:15RSpliet: that's not what I said
13:17RSpliet: if there are VM related graphics driver bugs they should be solved. Just like SystemD quirks or GRUB watchdog bugs in the other case. Or EzBench bugs, or libc, or...
13:18RSpliet: if there are hardware bugs in the GPU, the use of a VM over a native system shouldn't make a difference when using passthrough. If there's hardware bugs in the IOMMU leading to serious issues, I'd be disappointed because we've been doing virtual memory for such a long time already ;-)
14:25tml_: is it possible that trying to use OpenCL on Fedora 24 can end up wedging the graphics driver?
14:26tml_: how can I prevent f24 from even trying to provide OpenCL?
14:27tml_: I suspect OpenCL because the log lines I see when the graphics stucks mentions a process name that might well be trying to use OpenCL
14:28tml_: (I don't sit by the machine's display all the time so I can't say exactly at what point it gets wedged. the log lines in question start with: kernel: nouveau 0000:03:00.0: fifo: CACHE_ERROR - ch 12 [cppunittester] subc 5 mthd 0180 data beef0301
14:29tml_: etc, TRAP_CCACHE, PGRAPH TLB flush idle timeout fail and a lots of other messages
15:20imirkin: tml_: you have a tesla gpu of some sort? i doubt it's coming from opencl - that stuff isn't hooked up.
15:21imirkin: that particular cache error is very odd - quite likely that the gpu already is in a messed up state
15:21tml_: remind me, how do I see what gpu it is?
15:22imirkin: lspci -nn -d 10de:
15:22tml_: 03:00.0 VGA compatible controller : NVIDIA Corporation GT218 [NVS 300] [10de:10d8] (rev a2)
15:23imirkin: right yeah. "some tesla gpu" :)
15:23imirkin: anyways, chances are there were messages *before* the cache error
15:25tml_: this perhaps? nouveau 0000:03:00.0: gr: PGRAPH TLB flush idle timeout fail
15:25imirkin: the very first one ;)
15:25imirkin: anyways, once you see TLB flush idle timeout fail, you're pretty much done
15:25imirkin: we don't know how to recover from that one
15:26tml_: I don't recall exactly at what times I rebooted today, so I am not sure which 'nouveau' message is the first
15:26imirkin: on boot, nouveau should print some init messages
15:26imirkin: you can use that to distinguish the boots
15:26tml_: yeah, let me check closer
15:27tml_: that 'timeout' message might be related to what it does when shutting things down before reboot
15:36darlac: woshty: You should try enable DDR2 reclocking on G98, it is stable on my G96 (GT220M)
15:37tml_: looking more carefully, I do think that the "nouveau 0000:03:00.0: fifo: CACHE_ERROR - ch 12 [cppunittester] subc 5 mthd 0180 data beef0301" is the first line since previous boot
15:53imirkin_: tml_: well, there must be some init lines too
15:55tml_: sure, lots of them...
15:55tml_: but those are "normal", right? or some of them in particular that I should look for?
15:55imirkin_: i dunno... can you just pastebin the lot?
15:56imirkin_: including messages after that cache error?
15:56imirkin_: skeggsb: cache error == ramht is messed up, right?
15:59imirkin_: may i ask wtf cppunittester is?
16:00tml_: a unit test program used when building LibreOffice
16:00tml_: "cppunit" is a C++ unit test framework
16:00imirkin_: oh right.
16:01tml_: and we have one unit test that exercices OpenCL, if available. unfortunately I don't know if it was when running that that this happened
16:01imirkin_: well, do you have 'clinfo'?
16:01imirkin_: if so, that should confirm that opencl is not available :)
16:02tml_: sure; it claims 2 platforms, "OpenCL 1.1 Mesa 12.0.3" and "OpenCL 1.2 beignet 1.1.1 (git-9043d32)"
16:02imirkin_: uh oh
16:06imirkin_: well, beef0301 is the handle we use for the notifier class. i don't see why that'd error out.
16:06imirkin_: subchan 5 is the one with the M2MF object bound to it
16:06imirkin_: [not that these things are absolute truths, but that's how the mesa driver sets things up]
16:08imirkin_: anyways, your cppunittester thing appears to be trying to execute some draws
16:09tml_: note that I run the build in a ssh session with no X ;)
16:09imirkin_: otoh... it appears to be getting an error while trying to set the texture limits for geometry shaders.
16:09imirkin_: [which it does on init]
16:09imirkin_: so it's all very odd.
16:10imirkin_: sorry, i have no idea what happens
16:10tml_: ok, no problem. being a developer myself, I can imagine these things are hard to solve
16:10tml_: will try to check more carefully what is going on and if I figure out some useful details, come here again
16:11imirkin_: fwiw nouveau isn't really designed to work with applications like libreoffice and other apps that have recently decided it'd be clever to use GL for their regular rendering
16:12tml_: this is not happening during GL use, though, as I run the build of LO in an ssh session without X
16:12imirkin_: well, it's clearly running the GL init sequence
16:13imirkin_: or rather... the screen init sequence
16:13imirkin_: could be GL, CL, vdpau, xvmc, va-api
16:13tml_: ah yes, well that happens upon boot right?
16:13imirkin_: no, "screen" as in the mesa-internal gallium concept
16:13imirkin_: which is an object shared by all contexts created later
16:13tml_: ah. and that too is show as being related to cppunittester? that would ne weird
16:14tml_: I do run a GNOME session, but it is not in that where cppunittester is run
16:14imirkin_: could be that the CL thing is starting up
16:14imirkin_: which means it has to create a screen
16:14imirkin_: only to realize that it won't work
16:14imirkin_: and you're hitting some issue with doing that a lot of times
16:14imirkin_: and/or in parallel with other things
16:15tml_: ok. I have to go afk now, weekend calls, but I will return with more info if/when I find out
16:17imirkin_: running things in parallel in general on nouveau often leads to sadness
16:17imirkin_: we've improved that for GF100+ in ... 4.5 or 4.6 or so
16:17imirkin_: i suspect G80 still has some weaknesses (and not like GF100+ is the definition of perfection...)
20:21BrianA_MN: If my card would need the 340.96 module driver, but nouveau also shows that the card is support, is the 325.15 firmware the right firmware to get video acceleration?
20:25imirkin_: BrianA_MN: which GPU do you have btw?
20:26BrianA_MN: The 8400GS/ class NV98. So the 325.15 firmware is universal for all the gpu's in the freedesktop.org/nouveau where is shows EXTFW simply needs the firmware and not the full NVIDIA Binary Kernel/modules driver. That's cool
20:26imirkin_: well, the firmware might have had very minor changes in newer versions
20:26imirkin_: you could try extracting from 340.x and seeing what happens
20:27imirkin_: my script might extract it wrong though
20:27imirkin_: any updates would have been *extremely* minor though
20:27imirkin_: it's not like that stuff is under active development...
20:28imirkin_: also note that if you're using mesa 12.0, and you want vdpau to not be incredibly slow in presence of OSD, you'll want https://patchwork.freedesktop.org/patch/110569/
20:28imirkin_: and lastly, don't attempt using vdpau + opengl - it'll fail miserably.
20:29imirkin_: (at least if vdpau and GL are done from separate threads)
20:30BrianA_MN: Thanks. I'm not an advanced Slackware user, so I think you mean tar -xvf the 340.96.run file, then patch (how?). But I do have mesa 12.
20:30imirkin_: the patch is for mesa
20:31imirkin_: i haven't used slackware since the floppy install days, so i dunno anything about how it currently is setup
20:33BrianA_MN: There is a Slackbuild Package of the 325.15 firmware which should properly place it in the filesystem, I think it will address the mesa and OpenGL issues also. I'm currently running the NVIDIA-Propietary blob, but it is causing some make issues with packaging. Some suggested going back to nouveau default shipped with Slackware stable and then adding the firmware. I was confused by the version differences.
20:34BrianA_MN: Your information has addressed that. Thanks
20:40BrianA_MN: The power management for NV50 shows WIP does that mean it might be partially working?
20:40imirkin_: for G98 it's sorta working i think
20:41imirkin_: should be accessible via pstate
20:41imirkin_: memory reclock might not be hooked up for DDR2, i forget
20:42BrianA_MN: After reverting to nouveau and the 325.15 firmware how would I tell if the power is being managed?
20:42imirkin_: well, power won't actively be managed
20:42BrianA_MN: In particular I suspend my Slackware rather than shutdown each night
20:42imirkin_: it'll just allow you to futz with stuff
20:42imirkin_: ram or disk?
20:43imirkin_: ah ok. that's slightly better supported, i think
20:44BrianA_MN: Will the firmware install for both the 64 and 32 bit parts?
20:44imirkin_: the firmware runs on the gpu
20:44imirkin_: it couldn't care less what cpu you have
20:44BrianA_MN: Oh got it.
20:45imirkin_: [although the likelihood of things working well with a big endian cpu are ... quite low. but that's nouveau's fault, not the gpu firmware's]
20:45BrianA_MN: This makes things so much easier for me than the extra steps I have to go through for proprietary, I'll try it tonight
20:46BrianA_MN: Thanks again.
20:46imirkin_: good luck
21:09BrianA_MN: I extracted the NVIDIA-Linux-x86_64-340.96.run but don't see any firmware and the python2 extract_firmware.py errlrs as no extract_firmware.py. SO I guess I'll have to use the 325.15 firmware slackpkg.
21:10imirkin_: i only recognize 340.32
21:10imirkin_: easy enough to update the script to accept 340.96, although i dunno if it'll find the firmware properly or not
21:11imirkin_: haven't tested myself
21:12BrianA_MN: OK I see the script and assume i simply have to place in the same directory.
21:12BrianA_MN: as the .run file
21:12imirkin_: well, it's up to you to extract it
21:13BrianA_MN: on slackware the x86_64 uses /usr/lib64 and the x86 uses /usr/lib. In multilib system I have both libs. Will the firmware need to be installed in both libs?
21:13imirkin_: as i said before, the firmware is agnostic to such things
21:14imirkin_: it's loaded by the kernel driver
21:14BrianA_MN: but doesn't the kernel driver need to know where to look? Or does it look in both libs?
21:14imirkin_: it looks in /lib/firmware by default
21:14BrianA_MN: ok that makes sense.
21:15imirkin_: well, it looks whereever request_firmware() looks
21:15imirkin_: so if you have a usermode helper setup, it can do pretty much whatever
21:15imirkin_: but those are pretty advanced topics
21:16BrianA_MN: If I copy and paste the script can I edit it with emacs and then simply re-save. Or do I need to edit with a specific editor?
21:17imirkin_: not sure what the question is
21:17imirkin_: how would the script know which editor it's being edited in?
21:17BrianA_MN: not a python programmer so wasn't sure if the extract_firmware is straight bash or if it then needs to be compiled for python?
21:18imirkin_: extract_firmware.py is a python script
21:18BrianA_MN: ie not a proggrammer
21:18BrianA_MN: not even a good spellerr or typerr
21:18imirkin_: it's a text file, interpreted by the program called "python"
21:18BrianA_MN: OK. thanks you are patient
21:19imirkin_: (much as a bash script is a text file interpreted by the program called "bash")
22:51mupuf: RSpliet, ahuillet: grub-reboot is a one time thing. The next reboot goes back to the default entry
22:51mupuf: which is why it is perfect for our needs
22:52mupuf: jsut need to add a watchdog support to grub so as we could reboot on another kernel if the kernel fails to boot
22:52mupuf: this is something that should be highly appreciated by distros
22:52mupuf: (update your kernel, reboot on it, but if it fails, select another kernel that is known to work)
22:52imirkin: mupuf: i think a lot of hw nowadays includes watchdogs
22:52imirkin: i mean - consumer hw
22:52mupuf: but grub has no support for them
22:53imirkin: yeah, but i think you can set it on kernel cmdline
22:53imirkin: which means that unless your kernel is MEGA fucked, it should work out
22:53imirkin: not sure, maybe not
22:53mupuf: not entirely untrue ;)
22:53imirkin: you might also be able to set it in bios
22:53mupuf: but yeah, as you said, maybe not
22:53mupuf: yeah, possible too, but then it is a bios-related thing you need to set
22:54mupuf: whereas I only want this watchdog when I am booting an ezbench kernel
22:54imirkin: heh. i guess i'm not 100% sure what problem you're trying to solve, but if you want something 100% reliable, consumer hw ain't gonna have it
22:55mupuf: sure ;)
22:56mupuf: the problem I want to fix: bisecting kernel issues, rebooting on a kernel that does not boot
22:56mupuf: and getting stuck there
22:56imirkin: on consumer hw? you need external power control
22:57mupuf: well, the watchdog is supposed to be handling this case
22:57mupuf: but yeah, external watchdogs are of course better
22:57mupuf: but hey, they are a pain to set up
22:58imirkin: "supposed to be"
22:58imirkin: on consumer hw, it's a bs watchdog
22:58imirkin: it has no teeth
23:04mupuf: imirkin: any experience with this?
23:04mupuf: I am really interested if you do have! Seriously :)
23:05imirkin: mmmm iirc i've played around with those watchdogs a bit
23:05imirkin: the iTCO ones
23:05mupuf: I have problems understanding how one could screw up a watchdog, but I am experienced enough to know that people fuck up everything
23:05imirkin: and they suck
23:05imirkin: the ones in supermicro mobos do the actual job
23:05imirkin: but that's server hw, not consumer hw
23:05imirkin: i think the iTCO ones aren't actual watchdogs - that's how they screw up
23:06imirkin: they just send an interrupt saying "please reboot the system now, that'd be awesome, thanks"
23:06mupuf: an interupt to the firmware?
23:06imirkin: OS :)
23:06imirkin: anyways, i could be confused on the matter
23:06imirkin: but that's my understanding.
23:08mupuf: WHAT? :o
23:09imirkin: not all watchdogs are made the same
23:10imirkin: i guess that's my main point
23:10imirkin: if it's a legit hw watchdog that has teeth, then fine
23:10imirkin: if all it will do is "please reboot. if you don't, then you will force me to have to ask you politely again in a minute" - then ... :)