00:33 karolherbst: Lyude: ping! I would like to add some tested-by to the mmiotrace patch before sending it to the ML
02:37 kherbst: \o/
02:41 karolherbst: https://gist.githubusercontent.com/karolherbst/b1fa7ce1b2a25eb3df59762d4114ac0c/raw/f242594f37c5c1a8a479bc6bd08083666b09bbed/gistfile1.txt
02:41 karolherbst: ..........
04:27 rhyskidd: karolherbst: given you're looking at Pascal PCI MSI bits, can I get your review on https://github.com/envytools/envytools/pull/106 ?
14:06 Orbstheorem: Hello, I'm using nouveau driver on kernel 4.9.61, I followed the instructions to use the outputs on my discrete GPU. It was working yesterday, but since then I have suspended and resumed my computer a couple of times. Today when I plugged my computer the screen is not detected though it works with another computer. Anyone has encountered such issue?
14:14 Orbstheorem: I'd provide my dmesg, but it's full of wifi info
14:21 karolherbst: Orbstheorem: does the nouveau connector come up in xrandr?
14:28 Orbstheorem: Yes, I can see DP-... outputs
14:28 Orbstheorem: karolherbst: ^^
14:31 Orbstheorem: Which only show up after I run xrandr --setprovideroutputsource
14:34 karolherbst: mhh
14:34 karolherbst: well if it shows up, the display should work... in theory
15:17 tagr: rhyskidd, karolherbst: fwiw, the CYA_2 register was introduced on Fermi
15:26 karolherbst: tagr: any idea for what CYA stands for?
15:26 imirkin: cover-your-ass?
15:27 karolherbst: imirkin: I think there are better inappropiate terms to come up with in regards to MSI rearm
15:27 imirkin: i think it's the nvidia equivalent of the intel chicken registers
15:28 karolherbst: might be
15:29 karolherbst: but the next issue I have to take care of is this holy one: https://gist.githubusercontent.com/karolherbst/70133e07b6be5847fecedddca176ee78/raw/89a17e66e9ac38621de38d6f3a08de3e335ade9f/gistfile1.txt
15:30 imirkin: good luck =]
15:30 karolherbst: I am sure the falcon is just off
15:30 karolherbst: bad00100 is the value of like all gr regs
15:31 imirkin: ah yea
15:31 karolherbst: well, it is enabled in PMC.ENABLE though
15:31 karolherbst: so some other funkyness
15:31 karolherbst: this is the state the GPU is in after boot/resume
15:32 karolherbst: nouveau can't handle that at all except nvidia does the right stuff
15:32 karolherbst: I think I try to convince skeggsb to not use ptimer anymore for timeouts on the host....
15:32 karolherbst: because I don't see any benefit in this
15:33 karolherbst: and it just adds silly freezes and hangs and infinite loops
15:38 imirkin: probably block/clock gating
15:38 karolherbst: ohhhhh, right. let me check
15:38 imirkin: the ptimer stuff has host-side timeouts too iirc
15:41 karolherbst: I don't think so
15:41 karolherbst: I am sure all those nvkm_xsec macros only use the GPU timer
15:43 imirkin: hm, you're right
15:43 imirkin: https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/include/nvkm/subdev/timer.h#L39
15:44 karolherbst: I had this discussion with skeggsb already
15:46 karolherbst: imirkin: *sigh*, nouveau left the GPU in a state where even nvidia can't recover/load from
15:46 karolherbst: "NVRM: failed to copy vbios to system memory."
15:47 karolherbst: okay, even GPU_RESET doesn't work
16:21 rhyskidd: karolherbst: re: CYA_2 any particular documentation you'd want around there -- beyond the existing comment that ties this to Pascal-family MSI rearm?
16:21 rhyskidd: "write dummy value 0 to rearm PCI MSI on Pascal"
16:21 karolherbst: not really. would be nice to know what CYA actually stands for, but...
16:22 rhyskidd: not sure we're going to get that anytime soon ...
16:26 karolherbst: maybe not, maybe yes
16:27 rhyskidd: i see you're otherwise working on: https://bugs.freedesktop.org/show_bug.cgi?id=100228
16:30 karolherbst: mhh interesting
16:31 karolherbst: I am actually working on getting nouveau to work on my GPU here
16:34 mwk: karolherbst: cover your ass.
16:35 karolherbst: ... fine
16:35 mwk: seriously
16:35 mwk: it's an extra bit of functionality that's not supposed to be used, but is built in just in case there's a bug somewhere
16:35 karolherbst: :D
16:35 karolherbst: nvidia uses it quite often then
16:35 mwk: so they do.
16:35 karolherbst: yeah
16:36 mwk: well, all hw makers do
16:36 mwk: stuffing a lot of likely-unused functionality is cheaper than spinning a new mask
16:36 karolherbst: in my trace 44 times (X start, glxingo, stop)
16:36 karolherbst: most likely
16:37 karolherbst: well, this reg is important though
16:37 mwk: and I'm going to bet it was a spare
16:37 mwk: and you were supposed to rearm MSI by some other means that just so turned out to be fucked up
16:37 karolherbst: I am not quite sure though
16:38 karolherbst: I think on pascal we have to use this one
16:38 karolherbst: it seems like they remvoed some legacy stuff there
16:38 karolherbst: and they actually moved to MSI-X
16:40 imirkin: either way, cya stands for "cover your ass"
17:01 rhyskidd: i think we have consensus ... :)
17:04 rhyskidd: have queued up the PSEC2 and PVDEC fields for rnndb per discussion here on wed
17:06 rhyskidd: will do the PTHERM temp sensor and a few other envytools cleanups later this afternoon
17:24 mwk: rhyskidd: is there any doubt about the enable/intr fields?
17:24 mwk: eh
17:24 mwk: just shit it
17:24 mwk: *ship it
19:11 rhyskidd: imirkin and yourself (i recall) were pretty confident on them. the additional supporting source is https://download.nvidia.com/open-gpu-doc/pascal/1/gp100-msi-intr.txt
19:33 rhyskidd: mwk: thanks, I'll ship those two enable/intr fields PRs
20:04 tagr: karolherbst: yeah, CYA is cover-your-ass
20:04 feep: not to confuse with CYOA~
20:05 tagr: as you guys mentioned, these are usually used for workaround kind of things
20:05 tagr: I've come across those on other blocks on Tegra, too
20:05 karolherbst: tagr: even on Pascal?
20:05 tagr: karolherbst: what do you mean?
20:05 karolherbst: or is it a workaround in the sense of "we always do that"
20:06 karolherbst: tagr: well, we just started to use that register on pascal, for reasons I don't know
20:06 tagr: karolherbst: well, once it's there, might as well keep it for software compatibility I guess
20:06 sooda: i'm guessing that all hw is full of cya bits that work around some new functionality just in case it won't work as designed. first time i saw that in hw manuals i was rather surprised to see that this really is an official acronym :D
20:06 karolherbst: I see. well prior pascal we use 0x88068 for rearm
20:07 karolherbst: but if somebody tells us to rather use the new one on supported chipsets, we might will just do that...
20:08 karolherbst: there is a workaround on some chips, where we use the pci write kernel stuff instead of poking into the mmio register
20:08 tagr: karolherbst: I can look into that, but if 0x88068 works for you that's probably fine too
20:08 karolherbst: mhh interesting
20:08 karolherbst: on gf100 we use 88704 as well
20:09 karolherbst: but not on kepler and for example gf106
20:09 tagr: karolherbst: I thought I had seen a couple of occurrences of the 704 register being used in Nouveau, though with different values
20:09 karolherbst: ohh, on gf100 we write 0xff into it instead of 0x0
20:09 karolherbst: right
20:10 karolherbst: brb
20:34 RalfJ: Hi all -- what is the right place to report problems with sound via HDMI on an nvidia card? it seems the nouveau kernel module issue tracker is freedesktop, but that does not sound right for audio stuff...
20:36 karolherbst: RalfJ: let me guess, the audio HDMI device doesn#t appear?
20:36 imirkin_: RalfJ: if you want to file a bug, xorg -> Driver/nouveau (yeah, unintuitive, don't worry about it)
20:36 imirkin_: RalfJ: but really you can just get help here, which will likely be faster.
20:36 karolherbst: imirkin_: we should fix that multi_fun bug... I am sure RalfJ has this issue
20:37 RalfJ: karolherbst: yeah, no audio via HDMI. I should add this is an optimus laptop.
20:37 imirkin_: RalfJ: lspci -nn -d 10de:
20:37 imirkin_: (pastebin the result of that command)
20:37 imirkin_: karolherbst: someone wrote the patch i was going to write, and it still didn't work
20:37 RalfJ: I found some scary "setpci ..." commands on the interwebs that are supposed to help when using the proprietary driver but didnt do anything but crash my laptop for me
20:38 karolherbst: RalfJ: sounds about the right thing
20:38 imirkin_: karolherbst: don't jump to conclusions. get info first :)
20:38 RalfJ: imirkin_: https://pastebin.com/hY13ptiF
20:38 karolherbst: imirkin_: :D right, but here I am quite sure what the issue is ;)
20:38 imirkin_: yep. ok. so it's definitely that same issue.
20:39 karolherbst: let me check something
20:39 RalfJ: (right now there is nothing plugged into the HDMI port, so the GPU is turened off. but lspci doesnt list any more "NVIDIA" devices even when it is turned on.
20:39 imirkin_: RalfJ: you're supposed to also have an audio adapter in that output. the fact that you don't indicates it's the issue where the card doesn't come up as a multi-function card properly.
20:40 karolherbst: which was the address again?
20:40 imirkin_: 488 or something?
20:40 RalfJ: imirkin_: half of these words dont mean much to me, but I think I am getting the gist :)
20:40 karolherbst: imirkin_: yeah
20:40 karolherbst: 0x88488
20:40 RalfJ: I found this online: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1377653/comments/19
20:40 RalfJ: but "rmmod nouveau" doesnt work so I could not run it
20:40 karolherbst: RalfJ: yeah, the setpci thing is correct
20:41 RalfJ: and it turns out nouveau doesnt like when I run sh -c 'echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove' and sh -c 'echo 1 > /sys/bus/pci/devices/0000:00:01.0/rescan' a
20:42 RalfJ: karolherbst: just running the setpci thing doesnt seem to change anything though
20:42 RalfJ: the lspci -nn output did not change
20:43 karolherbst: fun, on my pascal laptop I don't get anything either, but there are no ports wired to the nvidia GPU
20:43 karolherbst: but on my kepler, where there are no ports wired as well, I get the audio device
20:44 RalfJ: that ubuntu bug makes it sound like I need to trigger some form of rescan to see new PCI devices?
20:44 karolherbst: imirkin_: I could imagine that "3d Controllers" don't have that audio device for real... but I wouldn't trust this
20:44 karolherbst: RalfJ: right
20:44 karolherbst: and you need to remove the card as well
20:44 RalfJ: karolherbst: I could verify udner windows whether audio works voer that port at all
20:44 karolherbst: RalfJ: the remove/rescan part is the most important one here
20:44 RalfJ: I just assumed it would
20:45 imirkin_: karolherbst: nah, they do.
20:45 karolherbst: imirkin_: of course
20:45 RalfJ: karolherbst: I see. so the problem is adapting that to work with nouveau rather than the proprietary driver?
20:45 karolherbst: well my Pascal doesn't seem to have it, but 0x8800c stays empty anyway?
20:45 Lyude: karolherbst: yo1
20:45 Lyude: Sorry I've been off, thanksgiving
20:45 RalfJ: (If that helps I could compile and boot a kernel with some patches)
20:45 Lyude: karolherbst: yes the mmiotrace patch qowrks
20:45 karolherbst: Lyude: no worries
20:45 karolherbst: Lyude: nice
20:46 karolherbst: Lyude: I just put a tested by you on there, good?
20:46 Lyude: yeah, I can review it as well if you want, don't forget to cc it to stable as well I suppose
20:46 karolherbst: RalfJ: well... that's kind of the problem, I doubt that the current state of linux allows us to do those workarounds
20:46 karolherbst: Lyude: try to review it ;)
20:46 karolherbst: reviewing includes understanding what the hell I did there, right?
20:48 Lyude: karolherbst: i read through the patch summary and what you said seemed to make sense
20:48 Lyude: but i will double check, will probably just give you my rb here though so I don't have to get up and find my keys
20:49 karolherbst: Lyude: well right, but pq gave me only an acked-by. He is the original author of that stuff and doesn't think he understand much of it anymore enough to actually review things ;)
20:49 karolherbst: feel free to dig through it though
20:50 Lyude: ah; if they didn't probably best just to leave it on the mailing list
20:50 karolherbst: yeah, something like this
20:51 RalfJ: karolherbst: so is it possible to somehow do the setpci early enough during boot that it will have its effect?
20:51 karolherbst: RalfJ: grub
20:51 karolherbst: maybe
20:51 RalfJ: oh. *that* early.
20:51 karolherbst: exactly
20:52 karolherbst: we kind of need to do this before the kernel scans the bus
20:52 karolherbst: which happens quite early
20:52 karolherbst: kind of before we know there is a nvidia card to begin with
20:52 RalfJ: that sounds... funny. I mean somehow there must be a way this is intended to work?
20:53 karolherbst: mhh yeah well, we could remove the GPU from the bus in the nouveau driver...
20:53 karolherbst: never looked into how that is suppose to work
20:53 karolherbst: RalfJ: well you should be able to do those setpci and remove/rescan commands if X was stopped
20:54 karolherbst: try this first and check if something changes
20:54 karolherbst: and remove nouveau before that
20:54 RalfJ: karolherbst: can I remove/rescan without rmmod nou... oh nvm^^
20:54 RalfJ: is it important to have sth. in the HDMI port for these tests?
20:54 karolherbst: I doubt this
20:54 RalfJ: okay. brb then.
20:54 karolherbst: but maybe?
20:55 karolherbst: shouldn't hurt
20:55 karolherbst: sometimes it appears without doing that setpci thing if something is plugged... I think. Not quite sure, but we should just assume the worst case and go from there
20:57 RalfJ: karolherbst: I cant rmmod nouveau even after systemctl stop gdm3
20:57 RalfJ: it says its still in use
20:57 karolherbst: RalfJ: mhhh, odd
20:57 karolherbst: check the clients file
20:58 RalfJ: sorry, which file?
20:58 karolherbst: should be /sys/kernel/debug/dri/1/clients
20:58 karolherbst: or /sys/kernel/debug/dri/0/clients
20:58 karolherbst: depending on whether nvidia is 1 or 0
20:58 karolherbst: there is a name file as well
20:59 RalfJ: dri/1 contains dozens if "i915" files so I assume thats intel
20:59 karolherbst: k
20:59 karolherbst: then nouveau is 0 ;)
20:59 RalfJ: right now it lists 4 times systemd-logind as client. I guess I will try to stop that one too then.
20:59 karolherbst: mhh
20:59 RalfJ: brb.
21:00 karolherbst: you could just kill the sessions
21:02 tagr: don't you usually have to unbind the console in order for the DRM driver to decrease the module refcount to 0?
21:02 karolherbst: tagr: not no optimus
21:02 karolherbst: only i915 is bound
21:02 tagr: oh, right
21:04 tagr: does anyone have the lspci -vvv output of the GPU for these cases? what capability is at 0x488?
21:07 karolherbst: tagr: well we kind of know what is going on here. with 0x88488 bit 25 we can toggle the multi_fun bit 23 in 0x8800c. Thing is, on some GPUs this is disabled by default
21:07 imirkin_: tagr: 488 is in the "far" range
21:07 tagr: karolherbst: so this is some kind of vendor specific capability?
21:07 imirkin_: tagr: the main thing is the multifunc thing for pci
21:07 karolherbst: tagr: yeah, I think so
21:08 imirkin_: tagr: i think that for optimus, they don't want an extra audio controller to show up
21:08 imirkin_: so they turn it off
21:08 imirkin_: and then have special stuff happen when one is plugged in
21:08 tagr: I suppose this could maybe be implemented as some sort of fixup in PCI
21:09 imirkin_: yeah, tried that
21:09 imirkin_: even an early fixup isn't early enough for linux pci core
21:09 imirkin_: coz of the multifunction vs not handling
21:09 tagr: because it's already enumerated the bus, eh?
21:09 karolherbst: I think in the end the driver needs to be able to do this
21:09 imirkin_: i wanted to use pci hotplug
21:09 imirkin_: that clearly has to handle that stuff
21:09 imirkin_: but that's not what bjorn had suggested, at least initially
21:09 tagr: yeah, sounds reasonable
21:10 imirkin_: thing is, we don't want to hotplug the .0 function, only the .1 function
21:10 karolherbst: imirkin_: thing is: is every chip with pcie able to do hotplugging?
21:10 imirkin_: and in order for that to happen, linux has to think that multiple functions are a thing :)
21:10 imirkin_: karolherbst: at the linux api level -- sure!
21:10 karolherbst: ;)
21:10 imirkin_: i wouldn't necessarily go yanking stuff
21:11 imirkin_: esp when it's soldered on
21:11 karolherbst: right, but you know, if we can't rely on that anyway
21:12 imirkin_: we just need to convince that it's a multifunction device
21:12 imirkin_: the dude who did the patch tried it, but i don't think he read enough of the surrounding code
21:12 imirkin_: also i wonder why not treat everything as a multifunciton (pcie) device
21:13 karolherbst: I think the point is, that the device kind of needs to report it or have it enabled internally
21:13 tagr: pci_scan_child_bus() looks like it might just do what's necessary
21:13 karolherbst: I could imagine that the GPU just goes like: "nah, don't have two devices" until you convince the GPU it has
21:13 imirkin_: either way, it'll need some messing around by someone who has the hardware
21:14 imirkin_: karolherbst: does your current laptop report the audio subfunction?
21:14 karolherbst: my kepler one does it after I poke that 88488 reg + remove/rescan
21:14 imirkin_: ok, well that one should be a good target for messing around with then
21:14 karolherbst: yeah
21:15 karolherbst: and I think it does have the audio device, because it is a MXM card
21:15 imirkin_: otoh you may be reluctant to if it's your main box :)
21:15 karolherbst: well the main issue on my main box is, that I don't have a boot loader or second boot option setup in uefi
21:16 karolherbst: so if I mess to much around with the kernel itself... annoying
21:18 RalfJ: karolherbst: after a reboot and if I prevent gdm3 from enabling the hdmi screen I can unload nouveau and run the remove, rescan. however, no ne device appears.
21:18 RalfJ: also, probably unrelated, nouveau shows errors on dmesg when it loads: https://pastebin.com/YMxc4gt2
21:19 imirkin_: when do you do the setpci in that sequence?
21:19 RalfJ: imirkin_: https://pastebin.com/WLr76WQG
21:19 karolherbst: do it after rmmod nouveau
21:19 imirkin_: wrong order
21:20 imirkin_: ;)
21:21 karolherbst: and maybe "echo 1 > /sys/bus/pci/rescan" is prefered over doing it just on 01.0
21:21 karolherbst: dunno
21:22 RalfJ: imirkin_: karolherbst: like this https://pastebin.com/ET9dh95u ?
21:23 imirkin_: yes.
21:23 RalfJ: okay. I probably have to reboot again, it shows an <unown> in the clients with the PID oif a process that doesn trun any more...
21:23 tagr: actually, pci_scan_slot() should do it, in conjunction with pci_bus_add_device()
21:27 tagr: drivers/edac/i82875p_edac.c seems to be doing something similar
21:29 tagr: drivers/platform/x86/{asus-wmi.c,eepc-laptop.c} as well
21:29 RalfJ: imirkin_, karolherbst: still no luck
21:32 RalfJ: (and I still get the errors when nouveau is loaded. in fact I also get them during boot.)
21:37 karolherbst: tagr: interesting
21:40 RalfJ: I have to leave now; I summarized what we did at https://bugs.freedesktop.org/show_bug.cgi?id=103896
21:41 RalfJ: I will stay in this channel so if there are more things I should try, feel free to hightlight me :)
21:43 RalfJ: also, thanks a TON for making this driver! I owe you lots of beer (or another beverage of your choice). :D the proprietary drivers dont really support reverse PRIME at all, so without nouveau, I couldnt use this machine
21:43 feep: yeah it's awesome :D
22:44 AndrewR: Hello ...Unfortunately I can confirm it was nouveau causing too much CPU load on kernel 4.14 :( Oprofile data shows a lot of time spent in drmmode_event_handler: https://pastebin.com/EGn5pQzr This cpu usage not start with seamonkey on fresh X server, but gradually builds up with time (up to 90% in 5 hours). I set Option DRI3 in xorg.conf file, may be with dri2 this will bug disappear (will test after some sleep).
22:44 karolherbst: AndrewR: interesting
22:46 karolherbst: imirkin_: seems like we don't clean up drmmode_events correctly?
22:47 imirkin_: i wrote a patch which ... updated stuff
22:47 imirkin_: but perhaps nothing related to that
22:47 imirkin_: i wrote it for DP-MST handling
22:48 imirkin_: file a bug, i'll have a look
22:49 AndrewR: imirkin_, ok
23:09 imirkin_: AndrewR: thanks
23:09 imirkin_: soundsl ike 4.14 generates a ton more events
23:10 imirkin_: either that's a bug, or our handler is somehow shitty
23:11 AndrewR: https://bugs.freedesktop.org/show_bug.cgi?id=103897