00:03karolherbst: the device seems to be set to 8.0 pcie link speed all the time
00:03karolherbst: but nvidia doesn't set it through the pci config anyway
00:09Lekensteyn: 0x4 (bit 2) is "DVSC" (Dynamic voltage scaling?) https://github.com/Lekensteyn/acpi-stuff/blob/master/dsl/Dell_Inc.-XPS_15_9560/ssdt5.dsl#L793
00:10karolherbst: nouveau_op_dsm_muid in nouveaus code
00:10karolherbst: at least for the UUID
00:10Lekensteyn: yes the UUID is the Optimus DSM, function 0x1a is the "Optimus Capabilities" function
00:11karolherbst: so it's probing?
00:11Lekensteyn: ...on exit?
00:12Lekensteyn: it is done only once
00:12Lekensteyn: but if it is 1 it has side-effects (which may persist?)
00:12Lekensteyn: check line 813
00:13Lekensteyn: good catch, maybe this is something
00:14karolherbst: might be
00:14karolherbst: or it's totally irrelevant...
00:14karolherbst: but still, it is an odd call
00:14karolherbst: will play around with my script tomorrow
00:14karolherbst: this makes it soo easy to hack on that stuff
00:14karolherbst: and when the GPU disappears, I just system suspend/resume :)
00:14karolherbst: no broken kernel anymore
00:15Lekensteyn: nice :)
00:15Lekensteyn: I am sure many would be very happy if you finally nail down this issue :)
00:16karolherbst: Lekensteyn: that's what I have right now: https://gist.github.com/karolherbst/0d0c369a0c14dc092fcb0f5c854dd79c :)
00:16karolherbst: in case you need it as well
00:16karolherbst: I just need a better way of calling acpi methods
00:16karolherbst: but I've got told with debugging kernels you get some kernel interface
00:17karolherbst: also this bash eval $3="'$val'" trick is the best and most evil hack I've ever seen
00:17karolherbst: it's so awesome
00:17Lekensteyn: have you used the acpidbg interface?
00:18karolherbst: not yet
00:18karolherbst: it's disabled with a normal kernel
00:18karolherbst: and I am too laze to reboot
00:19Lekensteyn: might be worth trying if you dislike your current ACPI call workflow https://www.kernel.org/doc/html/latest/firmware-guide/acpi/aml-debugger.html
00:19karolherbst: ohh I only dislike it because I need to compile some random kernel module and load it :p
00:20karolherbst: anyway, I would have to boot a debug kenrnel
00:20karolherbst: which is slightly more annoying
00:20karolherbst: a git clone I can just script away
00:20karolherbst: a reboot into a debug kernel not so much
00:21Lekensteyn: also, I just looked for the 0x1a calls, it is first called with 0x6 which sets DVSR and DVSR to 1. That does have side-effects in the _ON and _OFF calls!
00:21karolherbst: the big question
00:21karolherbst: how comes this d3 pci config write into play?
00:21karolherbst: keep in mind that putting the GPU into d1 or d2 causes the same issue
00:21karolherbst: and setting the link back to 8.0 "fixes" it :/
00:22Lekensteyn: not sure how that relates to this problem. Maybe Windows changes the link speed too?
00:22Lekensteyn: keep in mind that I only trace the pci config space, not some other mmio
00:22karolherbst: yeah.. maybe? dunno
00:22Lekensteyn: so maybe the Nvidia driver does more stuff
00:23karolherbst: keep in mind that I am talking with nvidia about this issue and make your assumptions about what we've talked about :p
00:24Lekensteyn: I would inquire whether this "dynamic voltage scaling" stuff is something that could be related
00:24karolherbst: right.. might be..
00:25karolherbst: Lekensteyn: there is one interesting thing though
00:25karolherbst: right now the GPU is dead on my system.. I think
00:25karolherbst: bridge: "LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)"
00:26Lekensteyn: in the default config, it reads "VGAR" (a 2000 byte region) in _OFF which gets restored in _ON. But note that this happens after PGON (where the lockup usually happens)
00:26Lekensteyn: with DVSR=DVSR=1, this save/restore is not done
00:26karolherbst: _ON calls into PGON, no?
00:27Lekensteyn: oh hmm, this is \_SB.PCI0.PEG0.PEGP._ON
00:28karolherbst: honestly.. I don't think this is related.. if I can make a bad guess, I think what's actually happening on the hardware is, that the bridge wants to reestablish communication with a 2.5 link, although the GPU is set to 8.0 and something funky happens
00:28Lekensteyn: nvm I guess, this _ON is not called when power resources are in use
00:28karolherbst: like lnik traning failing
00:28karolherbst: or something else
00:28karolherbst: for me, "_SB.PCI0.PEG0.PG00._OFF" is called :/
00:28karolherbst: but yeah
00:30Lekensteyn: yes, PG00._OFF is the power resource that is expected, PEGP._ON is the old mechanism that is indirectly called via _PS0 (and that is not used when power resources are in use)
00:30karolherbst: I am actually wondering if this old way still works :/
00:30karolherbst: no idea how to force the kernel to use it though
00:30karolherbst: but I think it shouldn't
00:31karolherbst: I am sure it causes the same fail
00:31Lekensteyn: pcie_port_pm=off would force the old method
00:31Lekensteyn: but the lockup existed before that
01:35karolherbst: this issue is just annoying
11:43karolherbst: Lekensteyn: how much work would it be for you to generate a new trace? Or do you have a trace where there are multiple cycles of runtime suspend+resume?
11:43karolherbst: I would like to get rid of the initial probing and just want to know what happens between regular cycles
11:45karolherbst: mhh, although I think the end of the trace might be actually enough for it.. I am just wondering if one of the older writes are important
12:27Tom^: does nouveau have a "zerovram" option as amdgpu does? or does it zero all vram allocations by default?
13:11karolherbst: Tom^: it doesn't zero it out
13:11karolherbst: but I didn't know amdgpu has this
13:12karolherbst: honestly, we should do that by default on every driver, otherwise it's a security issue
13:13karolherbst: and this should be done on free on a kernel level
13:16karolherbst: Tom^: why are you asking though?
13:22Tom^: karolherbst: because i have my theory of cs:go breaking because of it
13:22karolherbst: could be
13:22karolherbst: you mean like random bytes when you scope in or something?
13:23Tom^: karolherbst: yeah
13:24karolherbst: Tom^: I'd assume that a texture or fb is used unitialized and we could probably zero those out from gallium behind a driconf flag or something
13:24karolherbst: do you know if radeonsi hits this as well?
13:24Tom^: karolherbst: https://cgit.freedesktop.org/mesa/mesa/commit/?id=49025292fbbf285d4473d2c20a83b6c533a79d45 yeah
13:26Tom^: technically if you ask me, its a broken game with a driver side workaround.
13:26Tom^: i mean why is it using uninitialized stuff :p
13:27karolherbst: because that's the sanes thing to expect
13:27karolherbst: that those are zero
13:28karolherbst: well, random might be an option, but usually those are just whatever was in vram (even from other processes)
13:28karolherbst: anyway... I'd like that those things wouldn't be driver specifics workaround, but gallium core stuff.....
13:29karolherbst: imirkin: is there some libdrm flag we have to zero out vram allocations?
13:35karolherbst: Tom^: but nice that somebody figured that out.... but didn't bother to make a sane solution for it :/ *sigh*
13:35Tom^: karolherbst: yeah oh well :p
15:13imirkin: karolherbst: no
15:14imirkin: karolherbst: the kernel driver is meant to zero out such new allocations
15:14imirkin: but in practice, there's some spot that gets missed
15:15karolherbst: imirkin: but do we zero out stuff? I highly doubt we do it at all
15:15karolherbst: but of course the kernel is supposed to zero out, it's just that nobody does it
15:16imirkin: there's definitely logic to keep track of whether an alloc needs to get zero'd
15:16imirkin: but i think it doesn't happen for vram pages, only cpu pages?
15:16imirkin: iirc it's in ttm somewhere
15:16karolherbst: yeah, sounds about right. For CPU pages we have to
15:16karolherbst: otherwise we would get a ton of CVEs :p
15:17karolherbst: for VRAM nobody seems to care
15:17imirkin: i care
15:17imirkin: just ... not enough :)
15:18imirkin: iirc zerovram goes further than necessary for security though
15:18imirkin: it will zero out new allocations even if it's data that was already from the application
15:19imirkin: zerovram is to fix broken games, not to improve security
15:21karolherbst: imirkin: you mean, not far enough :p
15:21karolherbst: to be secure you want to free on deallocation
15:21karolherbst: but yeah
15:21karolherbst: we want to implement it to fix such broken applications I think
15:21imirkin: well, to be secure you want to not ever provide one app's data to another
15:21imirkin: whether you do it on free or alloc doesn't really matter
15:22karolherbst: doing ot on alloc has silly implications with VMs and such
15:22karolherbst: and allows weirdo side channel attacks
15:22imirkin: doing it on free has silly perf implications
15:22karolherbst: well, that's always the trade of ;)
15:22karolherbst: but intel shows us that maybe we really can't do those trade of as this always fails
15:23karolherbst: anyway, if doing the zero out for some applications fixes them.. well, I guess we should add support for that
15:23imirkin: but like i said, zerovram goes further than is necessary for security
15:23imirkin: if an app does free / alloc and gets the same pages, zerovram will zero it, while that's not strictly necessary
15:24imirkin: or if there's some suballocated buffer, and a suballocation is reused. etc.
15:24karolherbst: but it's still just an userspace decision, so it's completly irrelevant for security
15:24karolherbst: might be something a web browser might want to enable though
15:24imirkin: WebGL specifies alloations are zero'd
15:24imirkin: this caused issues for an ASTC test
15:24karolherbst: ahh, so they do it inside the webgl layer
15:24imirkin: since an all zero's ASTC texture is purple, not black :)
15:24karolherbst: good enough
15:25karolherbst: imirkin: but I was under the impression that we might have some flag for that inside nouveau :/ maybe I am indeed wrong
15:26imirkin: there's one in the kernel driver
15:26imirkin: in ttm iirc
15:26imirkin: and maybe there's one in the bo alloc uapi? i forget
15:27karolherbst: mhh, we have stuff like NVOBJ_FLAG_ZERO_ALLOC
15:28karolherbst: skeggsb should probably know what to do
15:49imirkin: yeah, sounds familiar
15:49imirkin: but again, this doesn't cover the subdivision case
15:49imirkin: zerovram i believe is meant to zero all api-level allocations
15:51karolherbst: imirkin: mhh, I don't think it does actually
15:51karolherbst: see #dri-devel and what bas said
15:51karolherbst: "<bnieuwenhuizen> radeonsi's option is also half useless as it does not apply to anything reused from our own suballocation/bo cache"
15:52karolherbst: I think the issue is that d3d guarentees that new allocation are 0ed or something
15:52karolherbst: and games just assume this behaviour in GL
15:52karolherbst: the code in radeonsi isn't all that big actually
15:52karolherbst: a flag set in the ws code
15:52karolherbst: and then some bit passed to the kernel
15:53imirkin: perhaps setting NVOBJ_FLAG_ZERO_ALLOC would be sufficient then
15:53karolherbst: yeah, maybe
15:53karolherbst: probably a good idea to just try it out and see if it changes anything
17:08Lekensteyn: karolherbst: at earliest I would be able to generate a trace next week. However, there are clear timestamp markers that separate boot, enable device (Fri, 24 Aug 2018 13:07:20), disable device (Fri, 24 Aug 2018 13:07:37)
17:09Lekensteyn: port device enters d3 (Fri, 24 Aug 2018 13:08:02), port device enters d3 completed (Fri, 24 Aug 2018 13:08:06), finally the machine is powered down
17:09karolherbst: Lekensteyn: yeah... it's just that I would like to know what one have to do between cycles
17:09karolherbst: to get rid of all the unimportant stuff
17:10karolherbst: I am sure something has to be set up in order for this to work
17:10karolherbst: and it might be hidden somethere in the initialization code
17:24Lekensteyn: what theory would you like me to test?
17:28karolherbst: none really. I just would like to know what happens between suspends
17:37karolherbst: Lekensteyn: I am currently wondering if it's something we have to do on the bridge only or if the GPU has to be involved as well...
17:39karolherbst: btw, that is this "NET._PS0" ACPI call?
17:39karolherbst: I couldn't find it
17:44Lekensteyn: karolherbst: it is a hack, I repurposed the QEMU ACPI tables https://github.com/Lekensteyn/acpi-stuff/blob/master/d3test/fakedev.asl#L165
17:45karolherbst: ohh, I see
17:45Lekensteyn: again, this is emulated hardware, not the actual PCIe port on bare metal. So if there is something special that needs to be done with the PCIe port... good luck figuring that out
17:47karolherbst: Lekensteyn: maybe a different perspective. That _OFF method does cut of all power to the GPU device, right? What reasons could there be that a PCIe bridge controller isn't able to establish communication with a device in D0unitizliaed state?
17:47karolherbst: maybe that putting the GPU into d3 triggers something on the bridge...
17:48karolherbst: I should probably dump the full pci config of both devices pre and post d3hot
18:01Lekensteyn: karolherbst: when the GPU device enters D3cold, all state is lost right? Then somehow not restoring the boot link speed presumably confuses the PCIe port?
18:05karolherbst: Lekensteyn: right, but why does it work if we keep the GPU in D0?
18:12Lekensteyn: what do you mean?
18:13karolherbst: Lekensteyn: there is this PM field in the PCI config, 0x64 on an nvidia GPU and the lower 2 bits control the PCI device state (D0-D3hot)
18:14karolherbst: so, if we put the GPU into D3 before invoking the ACPI _OFF method, the GPU doesn't come back
18:14karolherbst: if we skip that, the GPU comes back successfully
18:14karolherbst: "email@example.com:vfio_pci_write_config (0000:01:00.0, @0x64, 0xb, len=0x2) PM: PMCSR" in your log
18:15karolherbst: that's closely before the _OFF call
18:15karolherbst: 0x8 indicates no_soft_reset, but that's a read only bit afaik
18:15karolherbst: and means that you don't have to do a reset for d3hot -> d0
18:15karolherbst: or something
18:15Lekensteyn: I think I also tested skipping writing PMCSR at some point, but it did not help?
18:16karolherbst: it should
18:16karolherbst: at least it does work for me
18:17Lekensteyn: putting the device in D3 might break some ACPI code that tries to access the config space
18:17karolherbst: but the config space is still available
18:17karolherbst: setpci still works in d3hot
18:18karolherbst: Lekensteyn: thing is, d1 shows the same behaviour
18:18karolherbst: so it doesn't really matter as long as it isn't d0
18:23Lekensteyn: hmm, but if Windows succeeds when setting PMCSR.PowerState=D3, then so should Linux...
18:25Lekensteyn: Daniel Drake (IIRC) from Endless at some point figured had a patch that changed the way how some register (prefetchable memory base or something like that) was written. I wonder if it could somehow be related
18:25Lekensteyn: that also modifies the port instead of the GPU device
18:50karolherbst: Lekensteyn: yeah, I saw that patch
18:52karolherbst: Lekensteyn: but the thing is, the bridge never gets powered of
18:52karolherbst: it stays in d3hot
18:52karolherbst: so most of the state should be preserved actually
19:20karolherbst: ohh, interesting, there is indeed quite a lot of stuff different
19:21karolherbst: Lekensteyn: after putting the GPU into D3, a lot of registers on the bridge get different values
19:21karolherbst: on the GPU only the PMCSR one changes
19:22karolherbst: most of it might be just garbage though
19:31karolherbst: Lekensteyn: mhhh, okay, so lspci after the ACPI calls is indeed interesting
19:31karolherbst: after the _OFF call, the bridge has this: DevSta: CorrErr+
19:32karolherbst: and Status: NegoPending+
19:32imirkin: if only we knew what all these things meant
19:33karolherbst: after I turn the GPU on, the status doesn't change
19:33karolherbst: it is still NegoPending+
19:33karolherbst: on Capabilities: [100 v1] Virtual Channel
19:34karolherbst: mhhh... I'll try something funky
19:40karolherbst: that's wrong
19:40karolherbst: imirkin, Lekensteyn... okay, so the bridge still advertises a 2.5 link, although we know that's wrong as it would have to up to 8.0 again...
19:41karolherbst: but.. sadly I have no idea how to force a speed on the bridge
19:41karolherbst: even for nouveau setting it is all outside the normal PCI mmio space :/
19:41karolherbst: (even on older gens)
19:57karolherbst: I am convinced that the bridge is the part doing smoething wrong
19:57karolherbst: not necassarily the kernel
21:01Lekensteyn: karolherbst: the link state is derived from the D state of connected components. The LTSSM (Link Training and Statue State Machine) sould start with link speed 2.5GT/s.
21:01Lekensteyn: you can probably change the available link speeds in the config space of the bridge, but that does not change the fact that the state machine starts in the Detect state with 2.5GT/s
21:02Lekensteyn: this is based on 4.2.6 Link Training and Status State Rules, PCIe base spec 3.0
21:04Lekensteyn: last time I looked in the Virtual Channel thing, it was not interesting
21:04karolherbst: mhh, I see
21:05karolherbst: Lekensteyn: but then I am wondering why putting the GPU into 8.0 link speed makes it work again
21:05Lekensteyn: aren't you changing the link speed from 8 back to 2.5? Or I misremember your patch
21:05karolherbst: I change it from 2.5 to 8.0 (default)
21:05karolherbst: devinit reduces it from 8.0 to 2.5
21:07Lekensteyn: is 8 the initial link speed at boot, before loading devinit?
21:07Lekensteyn: perhaps the bridge assumes 8GT/s and does not like the fact that devinit changes it to 2.5 under the hood. Need to check the state machine to see whether additional requirements exist
21:08karolherbst: what is odd is, that the kernel and lspci started to mark the speed as downgraded
21:08karolherbst: and if I rescan the bus after removing the device in 2.5 mode, dmesg warns about it
21:08karolherbst: "32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x16 link at 0000:00:01.0 (capable of 126.016 Gb/s with 8 GT/s x16 link)"
21:09Lekensteyn: 126.016? that looks odd
21:10karolherbst: although.. that value is still a bit odd
21:11karolherbst: should be 252.08
21:11Lekensteyn: if that message is to be believed, the GPU is at 8GT/s, but the bridge caps it somehow
21:11karolherbst: no 126.016 is correct
21:11karolherbst: 252... is pcie 4
21:11karolherbst: Lekensteyn: no, I think the bridge detects that the GPU can do 8.0, but the GPU only set the link to 2.5
21:12Lekensteyn: (the kernel message was added in https://git.kernel.org/linus/9e506a7b51474241f0c900e53e85512780275c05)
21:12karolherbst: or maybe the bridge really does weirdo stuff
21:12Lekensteyn: "limited by 2.5 GT/s x16 link at **0000:00:01.0**" that is your port right?
21:12karolherbst: uff, right
21:12karolherbst: that can't be right
21:12karolherbst: it's the GPU which limits it, not the port
21:13karolherbst: I'd guess that pcie_bandwidth_available is just wrong
21:14karolherbst: ahh, yeah..
21:14karolherbst: pcie_bandwidth_available is just stupid
21:14karolherbst: Lekensteyn: pcie_bandwidth_available iterates the tree until it finds a device with the expected bandwidth
21:15karolherbst: Lekensteyn: and I guess if all clients of a bridge are put into a slower mode, the bridge might just do so a well
21:16imirkin: pmoreau: i think this one's right up your alley - https://bugs.freedesktop.org/show_bug.cgi?id=110973
21:16imirkin: weren't you work on gmux stuff? and macs?
21:16Lekensteyn: yes I think so, the link speed is dependent on the connected devices
21:19Lekensteyn: karolherbst: here are some logs for my Clevo P651RA laptop, the PCI config was obtained with RWEverywhere and converted to a format that lspci -F understands https://lekensteyn.nl/junk/infos.tar.gz
21:19Lekensteyn: with the following commands, the link status is always reported as 2.5GT/s ever since it reboots with a HDMI cable attached: for i in `ls -1tr *.txt`; do echo "$i"; lspci -F "$i" -nns:1 -vvv | grep GT; done
21:21karolherbst: Lekensteyn: but do you see that the device reports 8.0 with no driver?
21:22karolherbst: anyway, I am more than convinced that something is broken inside the bridge controller
21:23karolherbst: the question is just, how to fix it?
21:25Lekensteyn: with no driver it reports as 8GT/s. Without understanding the situation, fixing it would be a trial-and-error exercise at the moment
21:25karolherbst: maybe we have to ask intel engineers?
21:25karolherbst: asking nvidia wasn't really giving me anything helpful
21:30karolherbst: Lekensteyn: maybe that's also the reason why nothing shows up in your traces as the emulated bridge might not trigger any workarounds
21:33Lekensteyn: worth trying. I tried to dig through some errata, but I did not concrete suggestions out of it
21:33karolherbst: I would expect something from the early win10 days
22:28karolherbst: Lekensteyn: do you know if there are some docs for the bridge controller?
22:37karolherbst: ohh, there is nice stuff in the spec docs
22:51karolherbst: Lekensteyn: fun.. reading it explains like 90% of all ACPI names
22:51karolherbst: now I know everything
22:59Lekensteyn: yeah, the various PCIe capabilities are standardized ones that are described in the PCIe spec
23:30karolherbst: Lekensteyn: errata KBL079
23:30karolherbst: "Attempts to Retrain a PCIe* Link May be Ignored"
23:31karolherbst: and the following also sound interesting
23:37Lekensteyn: same issue is also known as SKL020 https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/desktop-6th-gen-core-family-spec-update.pdf
23:38Lekensteyn: but if the port is in D0, then it should not be in L1 and the link training bit should be processed
23:40karolherbst: yeah... it's weird
23:40karolherbst: Lekensteyn: that NegoPEnding bit is interesting though
23:41karolherbst: why does silly browser disable copy&paste
23:41karolherbst: nvm then
23:55karolherbst: Lekensteyn: funky... I think I messed up the bridge for good
23:55karolherbst: system suspend/resume doesn't bring the GPU back anymore
23:55karolherbst: no idea if that's good or bad
23:56karolherbst: mhh, at least the bridge is now in "LnkSta: Speed 8GT/s (ok), Width x16 (ok)"
23:58karolherbst: anyway, I should sleep