03:12_171_: I want to start working on the nouveau driver. I read most of the IntroductoryCourse on the website, but I still have some questions! Is this the right place to ask them?
03:16pabs3: yes (I'm not a nouveau dev though)
03:19_171_: Okay, first I'd like to know if there's a good place where I can get some more up-to-date information. Most of the links on the website seem to be very old, so I found it hard to tell what was still relevant today and what may no longer apply.
03:21_171_: For example, I found this page (https://nouveau.freedesktop.org/NVC0_Firmware.html) which talks about extracting firmware from Maxwell cards, which is what I have, but I also have a bunch of Nvidia firmware installed already, so is extracting the firmware yourself still necessary?
03:23pabs3: re firmware, in some cases yes as the firmware is not redistributable
03:23pabs3: I expect updating the website would be a great way to contribute
03:24pabs3: I have zero idea if Maxwell cards need non-redistributable firmware though
03:24_171_: How would I find that out?
03:27pabs3: I guess wait for one of the devs, they would probably know or know how to find out
03:31_171_: I might work on the website, but I really want to figure out why nouveau doesn't work properly with my GPU first. It's an NV118 chip on a laptop. I started looking at the code but when I found that page, I thought maybe I need to extract some firmware myself to make it work, so now I'm not sure what to do...
03:31_171_: I guess I'll just wait for an answer here!
03:59pabs3: according to the CodeNames and FeatureMatrix it sounds like it should work: https://nouveau.freedesktop.org/FeatureMatrix.html https://nouveau.freedesktop.org/CodeNames.html
04:10HdkR: GM2xx and above need non-redistributable blobs
04:11pabs3: _171_: maybe start by posting here what "doesn't work properly" means, do you get some errors in dmesg? check out https://nouveau.freedesktop.org/TroubleShooting.html
04:11HdkR: Which Nvidia has released most to allow access to the 3D engine, but PMU is never unless you have Tegra in te product name
04:11HdkR: most families*
04:12pabs3: and my GT 740 has both the nouveau open source firmware, and non-redistributable firmware that is (mostly) not needed any more
04:13_171_: Okay, I'll try extracting the firmware and see if it gets any better, it probably can't hurt!
04:14_171_: I'll come back here if I still have questions.
04:15_171_: Also, the page talks about driver version 340.32 but the one that works for my GPU is version 460.32. Does that matter?
04:16HdkR: You card seems to be GM1xx though, which shouldn't need it?
04:16_171_: Yes, it's GM108M
04:16HdkR: It uses that version because the extraction script doesn't understand newer drivers
04:16_171_: Yes, there's no extraction script for the newer one?
04:17_171_: ...or is all the firmware for my GM108M already available anyway?
04:18HdkR: GM1xx was before Nvidia started signing all the firmwares
04:18_171_: Oh, I see, so it's the signed firmware that's not redistributable.
04:19HdkR: Well, even the non-signed ones couldn't be distributed which is why Nouveau usually RE'd and generated their own
04:19HdkR: With signing you just...can't
04:20_171_: So then, the only point of failure would be nouveau itself and my system configuration, correct?
04:20_171_: Oh, I see.
04:20_171_: Does the nouveau firmware comes with the kernel or is it a separate package?
04:21HdkR: They might be in https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
04:21HdkR: I don't remember though
04:22pabs3: IIRC the open nouveau firmware is likely generated by the kernel and or mesa at runtime rather than pre-built
04:22HdkR: That answers that
04:22_171_: Ah, okay. I have those installed already.
04:23HdkR: Your GPU is also one of the rarer ones, so Nouveau may have more quirks with it than one would expect
04:24_171_: Oh, then I guess I'll have to get to work!
04:25_171_: Is there any documentation on the kernel code besides just the code itself?
04:25HdkR: Not sure, you're exhausting my knowledge as a non-nouveau dev :)
04:26pabs3: I guess just what is on the wiki and in linux.git and mesa.git
04:26_171_: Alright, I'll look around. Thank you for answering my questions!
09:04ccr: meh. remembered to check the Power Macs at work, the working ones turned out to be way newer (2008) Xeon-based ones instead of G5, just using the same case style. :|
09:05HdkR: The cheese grater lasted for a while :D
09:07ccr: heh. admittably the case style is quite cool. too bad it's way too custom to easily reuse for something (PSU mainly being a problem).
09:09HdkR: There are ways to mod it to fit standard components but it requires some metalworking skills
09:09ccr: I was just hoping that if they had been G5, I could've probably snatched one for helping test nouveau on BE platform
09:10ccr: (and AMD/ATI too perhaps, as they seemed to have Radeon XT2600s installed)
09:10HdkR: And of course, if you want a Mac case that supports ATX, Might as well as get the latest flashiest one right? https://www.dunecase.com/gallery.html
10:27ccr: heh. my simple OpenGL test program is faster on this NV50 using Zink over nouveau rather than nouveau GL directly.
10:27imirkin: that's a sad comment on nouveau... llvmpipe is faster than hw?
10:28ccr:rubs his head in wonder
10:28imirkin: (since zink uses vk, there's no other impl for it to use, presumably)
10:28ccr: well, it's lavapipe apparently
10:28imirkin: anyways, cpu rendering is definitely faster in many cases
10:28imirkin: that's why i think the whole push of drawing 2d gui elements with GL is asinine
10:29ccr: have to agree there
10:31ccr: nouveau: 23825 ms total for 497 total frames = 20.86 FPS average
10:31ccr: lavapipe: 12548 ms total for 615 total frames = 49.01 FPS average
10:31ccr: of course with nouveau the CPU is not pegged at high usage :P
10:32karolherbst: try drawing a desktop at 4K resolution :p
10:34ccr: but then again modesetting+glamor apparently sucks with that, https://gitlab.freedesktop.org/drm/intel/-/issues/2905#note_772518 .. maybe wayland does it better, dunno?
10:35ccr: not that the above has necessarily anything to do with cpu vs gpu but just glamor
12:18karolherbst: ccr: the biggest problem is, that GPU freq scheduling is just broken
12:18karolherbst: and CPU
12:19karolherbst: which explains 99% of the perf issues
12:20karolherbst: forcing max clocks all the time makes it all very smooth, at least for me
12:20karolherbst: but yeah, GL CPU overhead is quite huge
12:25karolherbst: imirkin: nouveau kills my wifi connection :/
12:25karolherbst: something is very wrong with our IRQ handling
12:26karolherbst: imirkin: well.. running 100k deqp tests single threaded
12:26karolherbst: but sometimes something waits too long
12:26karolherbst: and the wifi controller gets resetted
12:26imirkin: ah. i thought you were going to say due to RF interference
12:26imirkin: that'd be funnier.
12:26karolherbst: yeah.. would be
12:27karolherbst: it might be a coincidence, but this happens mostly only when I do this deqp run :D
12:27imirkin: like some town in the UK which lost internet because some old TV would get turned on, which caused some sort of crazy interference.
12:27imirkin: (there was a BBC story about it)
12:29karolherbst: I've heard about trains killing power because of lacking isolation and stuff, but this is also quite fun
12:32karolherbst: but yeah.. I think we are blocking IRQ handling with our timeouts
12:32karolherbst: that's something we have to fix anyway
12:32karolherbst: it's super annoying
12:32karolherbst: we should get rid of all timeouts in those code paths
12:32karolherbst: and offload stuff to worker threads
12:33imirkin: certainly anything with a nvkm_msec thing needs to be in a worker thread, yea
12:33imirkin: and should trigger lockdep things if you try to use in atomic context
12:33karolherbst: imirkin: anyway, the annoying part is not that my wifi connection resets, but because my VPN goes down as well :D
12:33karolherbst: yeah.. 2FA auth
12:34imirkin: auth is annoying.
12:34imirkin: should just let everyone in
12:34imirkin: much simpler that way
12:34karolherbst: but my laptop has actually a smartcard reader
12:34karolherbst: maybe I could do something there
12:35karolherbst: much easier than having to type in some 6 digit code
12:35imirkin: oh, it's the stupid RSA thing?
12:35imirkin: RSA token
12:35karolherbst: we have tokens, but we also have TOTP
12:35imirkin: yeah, the RSA token is TOTP
12:35karolherbst: ahh yeah, but I have a physical totken and a TOTP token on my phone
12:35imirkin: i mean, i guess not the same as the official TOTP presumably
12:36karolherbst: there are many different methods :)
12:36imirkin: but same idea
12:36imirkin: of something time-based
12:36karolherbst: but I guess I could have the token on my laptop and do some magic there, but that kind of defeats the purpose
12:36karolherbst: smartcard would be fun
12:36imirkin: at G we had these things where you typed in a code, and it would give you a token. had to use that to ssh in (this is before VPN was a thing ... now it's all VPN and I assume more seamless)
12:37karolherbst: we have a full SSO rollout
12:37karolherbst: so you log into the VPN, and do kerberos all the way
12:37karolherbst: even for ssh
12:37karolherbst: it's quite nice
12:38karolherbst: I think we even got rid of all services with manual kerberos login by now
12:39karolherbst: so even logins to third party services go through our SSO
14:41peetah: hello, I'm using a GeForce GTX 1060 6GB with the intel HD2000 gpu of the motherboard of a DELL optiplex 990, but I can't manage to quiet the fan of the nvidia card: pwm1_enable is set to -1 and trying to change it results in an Invalid argument error
14:41peetah: even when I disable the nvidia card completely with "echo 1 > remove" , it disappears from lscpi output but the fan is still on
14:42peetah: does someone have a clue about how to shutdown the fan or at least slow it down so that it can be less noisy ?
14:42imirkin: peetah: nouveau can't control the fans
14:42imirkin: they require signed firmware to control
14:42imirkin: and the firmware that nvidia provides only does automatic fan control
14:43peetah: that's bas :(
14:44imirkin: thank you for choosing nvidia. we appreciate you have a choice of gpu vendors, and you appear to have made the wrong one.
14:44peetah: did not choose :) I recycled an old tower from a friend
14:45RSpliet: not a chance that the 990 can be dropped into runpm?
14:45RSpliet: sorry, the NVIDIA GPU inside the Optiplex 990
14:45imirkin: yeah, figured that HD2000 and GTX 1060 was something odd going on...
14:45imirkin: HD2000 = sandybridge, i.e. 10yo
14:46RSpliet: Ah oh, nah if the displays are connected to the NVIDIA GPU, then runpm is a no-go :-D
14:46imirkin: also if it's not a laptop, then runpm is a no-go
14:46peetah: no, the display comes from the intel gpu
14:46ccr:is tempted to edit "he seems to have chosen ... poorly" meme with GPUs
14:47imirkin: ccr: well, i'm just recycling the thing from south park
14:47imirkin: where the plane is crashing
14:50peetah: what about shutting down the card completely as described above, but the fan continues to spin ?
14:50imirkin: peetah: hint: you're not shutting it down completely
14:50imirkin: you're just telling linux to forget it exists
14:50imirkin: linux forgot. but the card remembers!
14:51imirkin: i've heard stories of motherboards existing where you can actually shut down slot power to cards and remove them while the thing is running, but they don't exactly sell those at best buy
14:51peetah: so there is no way to cut the power of the pci express slot that hosts this card ?
14:51imirkin: i mean ... wire cutters / soldering iron?
14:52imirkin: or you could just unplug the card
14:52RSpliet: not while running presumably
14:52imirkin: what fun is that?!
14:52RSpliet: I'm a grinch at parties too
14:53peetah: I was hoping to be able to keep it in order to activate it only when required but it appears that removing it is the only solution left
14:53imirkin: laptops usually have the capability to turn off the GPU
14:53imirkin: they expose this stuff in ACPI, and you call some method to make it happen
14:54peetah: I already checked ACPI methods and there seems to have nothing relevant there
14:54RSpliet: peetah: you still have the options of installing the closed-source driver. nouveau on that GPU isn't much use in the first place if it's not going to drive displays
14:54imirkin: yeah, coz it's not a laptop
14:55peetah: RSpliet: don't want to :) I really do not have a real need for this card, it was only an option if nouveau was able to handle it correctly
14:56RSpliet: peetah: cool. Just wanted to make sure nobody overlooks the obvious :-)
14:57peetah: anyway, imirkin, RSpliet, thanks for your insights !
17:34_171_: Okay, I came here tomorrow talking aobut how I had problems with my GPU but I didn't specify what they were! So, here you go:
17:36imirkin: time-traveler, eh
17:36_171_: I have a PRIME laptop with an Intel integrated graphics chip (HD Graphics 5500) and an Nvidia GPU (GM108M). From what I can tell, only the Intel chip is connected to the display. If I try to start X with just the Nvidia card, nouveau segfaults and X crashes.
17:36_171_: I meant yesterday, sorry.
17:37_171_: So here are my problems.
17:37imirkin: why are you trying to do that? don't do that.
17:37_171_: I just did it for testing.
17:37imirkin: if there are no outputs on the primary gpu, then it's not great
17:38imirkin: you're much better off using the intel chip
17:38imirkin: and you can use DRI_PRIME to offload specific applications
17:38imirkin: GM108 should support reclocking, so you should be able to get half-decent perf out of it
17:39_171_: Whenever I boot, nouveau fills my screen with a bunch of errors, most of them have something to do with fifo operations it seems.
17:40_171_: Yes, that's what I've been doing.
17:40imirkin: there was someone who had an issue
17:40imirkin: with the GM108 coming up weirdly-clocked
17:40imirkin: and their issue was solved by forcing the reclock on start with nouveau.config=NvClkMode=7
17:40_171_: Also, anything I do with it is ungodly slow.
17:40imirkin: well, if you're getting those errors, that makes sense
17:41_171_: Okay, maybe I'll try that. Is that safe, though? I don't want to overheat it!
17:41imirkin: safe enough
17:41imirkin: 7 is probably the default mode anyways
17:41imirkin: you can check the list of available pstates
17:41imirkin: in /sys/kernel/debug/dri/1/pstate
17:42_171_: How do I tell which one is in use?
17:42imirkin: the last line
17:43imirkin: AC: ...
17:43imirkin: but the others are the 'defined' ones
17:43_171_: Oh, it says 0 MHz lol
17:43imirkin: that's coz it's off
17:43imirkin: if you run like glxgears on it
17:43imirkin: you should get the info
17:43_171_: Oh, okay.
17:45_171_: Yes, it goes to 405 MHz if I run something, but I don't think you understand what I mean when I say it's slow.
17:45imirkin: i understand what you mean.
17:45imirkin: try the thing i said.
17:45imirkin: add that to your kernel cmdline
17:45_171_: Just running trivial stuff like lspci or glxinfo takes a couple seconds.
17:45imirkin: or make sure that option is set some other way
17:45imirkin: well - it has to get turned on
17:45imirkin: since it auto-suspends after 5s iirc
17:46_171_: Yeah, I'll try.
17:46_171_: It also doesn't render anything.
17:46karolherbst: _171_: you need a compositor
17:46_171_: I have picom.
17:46imirkin: not with dri3
17:46imirkin: which you should have with intel i think
17:47karolherbst: ohh.. right, my mistake
17:47_171_: Is it enabled by default?
17:47karolherbst: _171_: the couple of seconds are because the GPU needs to be fully booted
17:47karolherbst: this takes a while
17:47karolherbst: but what do you mean by "it doesn't render anything"?
17:48_171_: Well, I started glxgears a few seconds ago and still nothing.
17:48_171_: It's a black window so far.
17:48karolherbst: okay, that's wrong then
17:49_171_: I tried to run a benchmarking program called glmark2 on it, and it would report 0 FPS.
17:49_171_: Also, my CPU usage sometimes spike to 100% on whatever core it's running on.
17:50karolherbst: mhh, sounds like something goes very wrong
17:50karolherbst: mind pastebining the output of "dmesg"?
17:50_171_: Yeah, like right now glxgears is at 100% usage.
17:50imirkin: or you could do the thing i said.
17:50karolherbst: might be
17:51_171_: I have a bunch of dmesg logs saved, so I'll do that.
17:51karolherbst: but yeah.. we have those random GPUs which need to be reclocked once in order to work, but it is still unknown why that actually helps
17:52_171_: Okay, the specific model is GeForce 940M, if that helps.
17:52RSpliet: if vblank is disabled, glxgears is expected to approach 100% CPU. It's heavily CPU-bound.
17:52_171_: RSpliet it does that with anything I run, even glxinfo.
17:53karolherbst: yeah.. sounds like the GPU doesn't behave as expected
17:56_171_: my glxgears stdout gives me 1-3 fps with nothing rendered.
17:56RSpliet: which GPU, I wonder.
17:57RSpliet: If I try to use glxgears over prime, my dmesg gets absolutely littered
17:57imirkin: GM108, as per above
17:57imirkin: _171_: can you try the thing i said?
17:57_171_: Yeah, nouveau reports GM108 and lspci reports GM108M but I don't think it makes a difference.
17:57RSpliet: imirkin: https://paste.centos.org/view/f45bb7b3
17:57_171_: I will, I just want to post my dmesg logs first.
17:57RSpliet: littered I said
17:57imirkin: RSpliet: wtf did you do...
17:58_171_: This is a log from booting https://pastebin.com/QyhZMK0u
17:58imirkin: RSpliet: looks like a pcie link training issue ... or something
17:58RSpliet: imirkin: DRI_PRIME=1 vblank_mode=0 glxgears
17:58RSpliet: that's all I did
17:59_171_: This one I think is from starting X https://pastebin.com/WUtWSGkr
17:59_171_: It's also kind of what my dmesg looks like right now!
18:01_171_: Another one from booting I think https://pastebin.com/2eb14Xcs
18:02imirkin: RSpliet: ok. well it's some sort of PCI issue. you're getting error reports of physical problems.
18:02imirkin: RSpliet: if you don't like those logs, disable PCI AER
18:03imirkin: RSpliet: could be that we push the link to too high a rate
18:03imirkin: and the pci controller craps out mid-way
18:03_171_: Another one from booting https://pastebin.com/s7Sjcmxc
18:03_171_: Are you talking to me imirkin?
18:04imirkin: _171_: the only comment i have for you is to do the thing i said.
18:04imirkin: my comments were for RSpliet, as indicated by the "RSpliet:" prefix.
18:05_171_: Oh sorry, I didn't see.
18:05_171_: Last one, from booting also https://pastebin.com/kdA4yUrh
18:06imirkin: i keep repeating it, but seemingly to no avail. you probably don't want my help, which is fine too.
18:06_171_: I do, I just wanted to post these, I'll be rebooting now.
18:18_171_: Okay so I just rebooted with nouveau.config=NvClkMode=7
18:19_171_: I get less messages on boot, only one error line.
18:20_171_: I don't have it in the logs I just saved for some reason, so I'll come back to post it.
18:21_171_: My wireless chip started erroring out in the terminal, which I think someone talked about.
18:22_171_: Starting X takes longer, lspci still hangs, although I'm pretty sure that it's because the GPU takes time to boot, because when I do multiple lspci's in a row, only the first one hangs.
18:22_171_: Also, quitting out of X forced my computer to reboot??
18:23_171_: And now I've been running glxinfo for 2 or 3 minutes and still hasn't finished.
18:23imirkin: the wireless thing was due to interrupt handler taking too long
18:23imirkin: do you have a dmesg from this last boot?
18:23_171_: Yeah, that's what it seems like.
18:23_171_: The wireless one or the current one?
18:23imirkin: the current one
18:24imirkin: sounds like your issues don't match what this other person had though
18:24imirkin: oh, that person also disabled runpm
18:24imirkin: maybe try that as well?
18:24imirkin: nouveau.runpm=0 nouveau.config=NvClkMode=7
18:24_171_: Yes, I think I have the log, let me post it.
18:25_171_: Okay, I'll try that.
18:25imirkin: although iirc he needed the runpm=0 for the dumb reason that we don't respect NvClkMode when resuming from runpm or something like that?
18:26_171_: Here it is https://pastebin.com/N0cSBN0u
18:26imirkin: [ 7.970049] nouveau 0000:04:00.0: bus: MMIO read of 00000000 FAULT at 6013d4 [ IBUS ]
18:26imirkin: don't worry about that
18:26imirkin: that's normal.
18:27imirkin: it's a major pain to get rid of that error unfortunately
18:27imirkin: but it doesn't hurt anything
18:27_171_: Also the kernel name is because I wanted to do some nouveau development and compiled my own kernel for it, although it's mostly just a generic kernel with a bunch of modules disabled. Nothing that's nouveau-relevant was changed.
18:27_171_: What's that error for?
18:27imirkin: that's fine
18:28imirkin: it's some stupid access we do early in devinit to see if ... something
18:28_171_: To see what?
18:28imirkin: i talked to skeggsb about it a long time ago, we didn't agree on a way of fixing it
18:28imirkin: don't remember, tyring to find it now
18:28imirkin: basically 0x3d4 io port is part of display, and GM108 doesn't have a display unit at all
18:28imirkin: hence the error
18:29_171_: Oh, okay.
18:29_171_: Alright I'll reboot again with no runpm.
18:31imirkin: the 6013d4 thing is this: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/-/issues/471#note_354200
18:33imirkin: hmmm ... although i don't see the lockvga in there anymore?
18:33imirkin: oh duh. it's in nvkm_devinit_preinit directly
18:44_171_: Okay, I just rebooted with runpm=0 and config=NvClkMode=7 and it works great!
18:45_171_: I don't get any errors in the terminal, X starts instantly and glxgears actually runs normally.
18:45imirkin: see channel logs for an explanation of what the 6013d4 error is about
18:46imirkin: (see topic for link to channel logs)
18:47_171_: Okay. Thank you!
18:47_171_: Why did I have to do all that, though?
18:48RSpliet: imirkin: I just about never use this GPU in my laptop, not too bothered in day-to-day use
18:48RSpliet: it's labelled "940M"
18:48imirkin: _171_: the theory is that the bios doesn't bring up its memory in a stable state, so we have to perform a reclock
18:48RSpliet: And (with nouveau) it's about as fast as the Intel HDA GPU that this laptop also has
18:48RSpliet: e.g. the NVIDIA GPU is nothing more than a marketing gimmick :-P
18:48imirkin: RSpliet: certainly with the PCI errors
18:51_171_: Wait, the 940M is exactly what I have lol.
18:53_171_: imirkin I'm not sure what that means, why would you have to reclock for that?
18:53imirkin: _171_: reclock = set clocks
18:53imirkin: but also other memory configuration
18:54imirkin: if the onboard memory is configured incorrectly, that's going to lead to a lot of errors
18:57karolherbst: imirkin: skeggsb mentioned that it might be caused by us not doing a proper devinit and skipping stuff
18:57imirkin: could be
18:58karolherbst: sadly never had such a system available for proper testing