19:19 Lyude: karolherbst: think we should cc those pascal rpm fixes to stable? I should probably go test it a little more thoroughly on the laptops that we've got lying around, but I think if we can verify this is a safe change (and I'd be surprised if it wasn't tbh) it's probably stable material
19:19 karolherbst: probably
19:20 karolherbst: but again, I kind of prefer a fix inside the pci subsystem :p
19:20 Lyude: mm, once i've finished looking through all of these I'll do a quick peek to see if I can figure out where, if anywhere, the pci subsystem can/should do this
19:20 karolherbst: there is obviously something wrong on the pcie level
19:21 karolherbst: and most likely the GPU and the controller can't really communicate
19:21 karolherbst: Lyude: there might be something to reprobe the connection... but I didn't find anything to do that
19:21 karolherbst: maybe the hardware can, but it's not wired up?
19:21 karolherbst: dunno
19:23 Lyude: karolherbst:looking at the ode in drm/nouveau/nvkm/subdev/pci/gk104.c, I think whether or not this is a bug in the pcie subsystem might depend on what the registers we use in gk104_pcie_max_speed() and gk104_pcie_set_link_speed() actually are (e.g. if they're part of standard PCIe config space, or really just some random mmio bits)
19:24 karolherbst: starting with kepler those are more or less random mmio bits
19:24 karolherbst: this was different with pre kepler
19:25 karolherbst: some still are within the pci config space though
19:25 karolherbst: Lyude: the pci config space is mapped at 0x88000
19:25 Lyude: mmmm yeah, I think you might be right thwen
19:26 karolherbst: anyway, that's what nvidia is doing as well
19:26 karolherbst: and with disabled reclocking we don't even touch those except for the v1 -> v2 transition
19:26 karolherbst: and this is only important for tesla boards
19:26 karolherbst: might have seen it on fermi as well
19:26 karolherbst: (and on kepler, if you mess with the device prior loading nvidia)
19:28 Lyude: ...hm, does the pcie subsystem even have functions to change the link rate?
19:28 Lyude: I see some to query it at least
19:30 karolherbst: dunno
19:30 Lyude: doesn't look like it does
19:30 karolherbst: I think it has
19:30 Lyude: or I'm not finding it with grep anyway
19:31 Lyude: karolherbst: what i'm the most curious about here is whether or not there actually is a standard way to change this, because if there isn't that's probably why the PCI subsystem doesn't do it on it's own, maybe there's some device-specific magic here
19:32 karolherbst: oh, but the speed isn't the actual problem here
19:32 karolherbst: I think something like this happens: controller assumes the link is at 2.5, because that's what the device used before suspending
19:32 karolherbst: now the device comes up and assumes 8.0
19:32 karolherbst: controller still thinks 2.5
19:32 karolherbst: => bad things happen
19:33 Lyude: karolherbst: mm, makes sense
19:33 karolherbst: anyway, I am no expert with PCIe, so that's why I would rather ask some PCIe people about that
19:33 Lyude: yeah, agreed
19:33 karolherbst: Lyude: the big issue is, that the GPU itself change the link
19:34 karolherbst: there is no driver interaction (except my workaround)
19:34 Lyude: karolherbst: that makes me curious of something
19:34 karolherbst: we have the signed devinit script inside the vbios
19:34 karolherbst: and a signed pmu image with a devinit script parser
19:34 Lyude: karolherbst: have you seen drivers/pci/pcie/bw_notification.c
19:34 karolherbst: and we launch that while doing devinit
19:34 karolherbst: Lyude: nope
19:36 karolherbst: Lyude: looks like it is new?
19:36 Lyude: karolherbst: well anyway, let's cc linux-pci@vger.kernel.org and Bjorn Helgaas <helgaas@kernel.org> on the next respin of this
19:37 karolherbst: yeah
19:39 karolherbst: Lyude: ahh, found the upstream bug again: https://bugzilla.kernel.org/show_bug.cgi?id=156341
19:39 karolherbst: the cc list alone :D
19:40 Lyude: phew
19:45 Lyude: karolherbst: alright-feel free to respin and add ccs
20:05 karolherbst: Lyude: checkpatch didn't complain about the bool field
20:05 karolherbst: maybe somebody was ranting on the mailing list without being able to back that up :p
20:10 Lyude: karolherbst:oh good
20:10 Lyude: karolherbst: because I was getting sick of using u8
20:10 karolherbst: :D
20:12 karolherbst: sent
22:16 karolherbst: Lyude: actually, I think the issue has to be the controller. Because you know, GPUs can kind of detect if the controller does't support a certain PCIe link speed and they automatically fall back to a lower one
22:16 karolherbst: allthough the mechanism here might be a different one
22:17 Lyude: karolherbst: should be pretty easy to check, but is it actually likely that all of these different laptops are using basically the same PCI controller?
22:18 karolherbst: probably :D
22:19 Lyude: mhm, yeah
22:19 karolherbst: I mean they use the one from the Intel CPU
22:19 karolherbst: no?
22:19 karolherbst: that's usually where the GPU is connected with
22:19 Lyude: ah right, probably
22:20 Lyude: karolherbst:https://paste.fedoraproject.org/paste/WcNgVtncegWOlNGOFlvv3g
