11:50karolherbst: imirkin: so future 5.14 and 5.15 releases should fix the fermi bug, I just saw greg picking them up
18:41imirkin: karolherbst: cool, will check it out
21:46RSpliet: karolherbst: which fermi bug is this?
21:46karolherbst: RSpliet: hit on all fermis with just one CE engine
21:47RSpliet: Oh yes, I did read about that oen
21:47karolherbst: RSpliet: ohh btw.. there is still this one PLL locking patch from you :D
21:47RSpliet: I should take some time to document the trouble with my GK107
21:47karolherbst: RSpliet: what troubles?
21:48RSpliet: Random lock-ups. Random freezes in gnome shell until I trigger the activities window (blindly, the cursor doesn't move either). Sometimes all solid-white surfaces turn either yellow or azure-blue...
21:48karolherbst: uhh.. sounds broken
21:49RSpliet: Yeah.... the stuff I put up with!
21:49RSpliet: Only because graphics cards would cost me a kidney these days
21:49RSpliet: The white surface bug is weird. Even the inside of the signal icon turned yellow or blue. Then again... if it's a vector graphic that might make sense.
21:50karolherbst: weird that you are seeing those issues though
21:51karolherbst: crazy you should contact the developers of that software you are using :p
21:51RSpliet: This machine has never been stable for me. It's had broken ram for ages, it's had trouble with the IOMMU...
21:53RSpliet: My wifi PCI board was causing trouble on the bus. I erm... yeah, not the best purchase ever. But I had fun reverse engineering those GT21x'es back when I still had hair
21:54karolherbst: interesting.... so if your hw is going bonkers not sure what we can do to help here :p
21:57RSpliet: Oh the best bit is that the magic sysrq combo alt+sysrq+b (for reboot) shuts down the machine rather than rebooting.
21:59RSpliet: Actually, I can probably file a kernel bug for that one quite easily.
21:59RSpliet: I suspect that's an ACPI issue.
21:59karolherbst: I suspect you might want to update your firmware first
21:59RSpliet: firmware updates :-D
21:59RSpliet: This is a motherboard from 2012
22:00karolherbst: just need to boot DOS
22:00RSpliet: They haven't released a firmware update since
22:00RSpliet: March 2015
22:01karolherbst: would be funny if it fixes some weirdo rebooting problem
22:01RSpliet: Obviously that's already been installed ;-)
22:03RSpliet: Mmmmh, the kernel bugzilla will tell me to file it with Fedora instead
22:03karolherbst: just check with an upstream kernel then :p
22:03RSpliet: One does not simply...
22:04karolherbst: it's actually not that difficult on fedora :p
22:04RSpliet: Well, at least since I replaced the CPU fan last year it doesn't overheat when recompiling a kernel
22:04karolherbst: but still wondering what those graphic issues are all about
22:04RSpliet: Turns out the stock fan on an AMD FX 6300 was a "bit rubbish"
22:05RSpliet: And that stock thermal paste disintegrates
22:05RSpliet: Yeah, I can dumpsterdive in journalctl for logs if you like
22:05RSpliet: All the other magic sysrq keys still work
22:06karolherbst: RSpliet: are you running with iommu enforced/enabled or something?
22:06RSpliet: Do I need to dive into the UEFI settings to find out? Or is there an easy sysfs/log thing I can check?
22:08karolherbst: I think you actually have to boot with iommu=force or something
22:09karolherbst: noforce is the deafult
22:09karolherbst: just wondering as you spoke about iommu related issues
22:09karolherbst: ehh actually soft is the default
22:09RSpliet: rhgb quiet ath9k.bt_ant_diversity=1 nouveau.debug=pmu=debug plymouth.splash-delay=0 acpi_enforce_resources=lax
22:09RSpliet: that's my kernel param line in the grub defaults file
22:11karolherbst: whatever that acpi_enforce_resources thing does
22:11karolherbst: but yeah...
22:11karolherbst: sounds okay
22:12RSpliet: I thought it was about not having the IOMMU driver trip up during boot, but when I search for it it's all about sensors.
22:12karolherbst: yeah no clue
22:12karolherbst: what do you have an iommu for anyway
22:13RSpliet: Turns out that the UEFIs fan speed control for the CPU fan also doesn't work and just has it run at max RPM. The solution is to plug it into one of the other fan headers and have a userspace daemon do it for me.
22:14karolherbst: no offense, but your system sounds like shit
22:14RSpliet: Wait 'till you smell it
22:14karolherbst: I just blame everything on your motherboard and you'd probably accept it anyway
22:15RSpliet: Yeah, you're pretty much right about all of that, although the nouveau issues feel like carefully broken command streams.
22:16karolherbst: so you had issues with other PCIe devices, do I remember correctly here? :p
22:16RSpliet: Oh, did I tell you that the SATA controller makes Samsung SSDs puke? Hans de Goede recently wrote up a quirk to disable NCQ on my motherboard for Samsung Evo SSDs
22:16karolherbst: please tell me more
22:17karolherbst: oh wow
22:17RSpliet: I got 99 comments but a...
22:18karolherbst: RSpliet: I kind of doubt that.. well.. kepler is broken somehow, but you never know
22:18RSpliet: truth be told, the fault seems to be on Samsung's end, but still
22:18karolherbst: did you upgrade to fedora 35 already?
22:18karolherbst: so I guess it was always like this or just started at some point?
22:19RSpliet: Well... previously my machine was too shit to run into these issues with the graphics card. It'd crash way before that
22:19karolherbst: maybe your GPU is broken too
22:19RSpliet: Yeah, I wouldn't 100% rule out a dud RAM chip by now.
22:21RSpliet: Oh here's one for you! https://paste.centos.org/view/2b809b68
22:21RSpliet: Wait... how is this august 29? I reboot this machine way more often than that
22:22karolherbst: ohhhhhh wait
22:22karolherbst: ahh nvm
22:22RSpliet: Oh I'm just a tool with journalctl
22:22karolherbst: we still have this weirdo fifo bug on kepler though
22:22karolherbst: so maybe our ctxsw firmware is busted
22:23karolherbst: maybe also mesa is broken in some weirdo way
22:23karolherbst: I think the most relevant question here is: does it work with the nvidia driver
22:26RSpliet: Recent kernel logs are clean as a whistle. Well, apart from one mention of
22:26RSpliet: nov 03 10:58:02 Tuvok kernel: nouveau 0000:01:00.0: fifo: CHSW_ERROR 00000004
22:26RSpliet: nov 03 10:58:03 Tuvok kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 
22:26RSpliet: On the 3rd of november
22:26karolherbst: ahh.. that's the weirdo ctxsw error we have...
22:27RSpliet: Bit suspicious though. Every log ends with nov 03 22:00:33 Tuvok systemd-shutdown: Sending SIGTERM to remaining processes...
22:27karolherbst: I saw that issue on a gk107 as well or gk106 or sometihng
22:27RSpliet: Even though half my shutdowns are definitely magic sysrq induced
22:29RSpliet: Ooo, I got one of those sched errors without the chsw_error
22:37karolherbst: RSpliet: anyway.. the point is, you might be hitting those weirdo firmware bugs or whatever it is
22:43RSpliet: Yes... I still have a headache from massaging that firmware back in 2016. They're a PITA to debug.
23:52karolherbst: RSpliet: well....... we could write a debugger
23:52karolherbst: afaik we know everything we need for that
23:53karolherbst: I even wrote a bash script ones :D had an included disassembler and shit
23:53karolherbst: could even single step
23:54karolherbst: not sure how hard that would be to add a new target to gdb