21:10karolherbst:should maybe write a script to do this just every day
21:10bencoh: asking people for lspci?
21:10karolherbst: no, inviting dri-logger
21:11bencoh: oh :]
21:11karolherbst: seems like we don't have logs for the entire month
21:12karolherbst: bencoh: lspci -tvvnn as well please
21:12bencoh: to be honest I should have printed bridge->device in quirk_broken_nv_runpm(), but looks like I got lazy back then
21:12karolherbst: well.. lspci can tell us :p
21:12bencoh: yeah, I was actually wondering which one was connected to the nvidia card as well
21:14karolherbst: so it's PCI bridge : Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port [8086:0151] (rev 09)
21:14karolherbst: which I guess kind of makes sense
21:15karolherbst: I guess it's the same one
21:15karolherbst: just for apple
21:15karolherbst: "00:01.0 PCI bridge : Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05)" is the one blacklisted
21:15karolherbst: ohh.. newer gen
21:15karolherbst: bencoh: and the patch helps indeed I guess?
21:15bencoh: it does :)
21:16karolherbst: I hoped we wouldn't need to add another one, but maybe I just called for it by adding the switch :p
21:16bencoh: I suppose there are more bridges out there that show the same behavior
21:16bencoh: you definitely did :D
21:16karolherbst: but we tested others, and they were fine afaik
21:16karolherbst: it seems to be pretty much just the "main" bridge controller
21:17karolherbst: sometimes you have the GPU on one of the lower bridges
21:17bencoh: when I started investigating it I first went through git log thinking "maybe someone fixed that but not for my hw", and ... it certainly called for me
21:17karolherbst: those 1c.x ones
21:17karolherbst: and they are fine afaik
21:18karolherbst: bencoh: well.. at this point I think it would be safe to just include your id and send a patch out
21:18karolherbst: and cc stable
21:18bencoh: 0x0151 then?
21:19bencoh: alright, I'll need to test it just to make sure and send a patch
21:19karolherbst: I am still not sure if it's some bug in the kernel or some hw bug :/
21:19karolherbst: but keeping track of the devices where it does seem to help might help with something
21:19karolherbst: it does not surprise me that it happens on an apple device
21:19bencoh: don't nouveau patches hit some staging tree before reaching stable@ btw?
21:20karolherbst: bencoh: normally yes, but for this it's fine to send to dri-devel and stable directly
21:20bencoh: huhu, to be honest I'm surprised I got this laptop to work nicely at all
21:20karolherbst: and add my review-by
21:20karolherbst: "Reviewed-by: Karol Herbst <firstname.lastname@example.org>"
21:21bencoh: noted :)
21:21karolherbst: (well.. as long as the patch really only adds the id :p)
21:21bencoh: (haha, yeah)
21:21karolherbst: bencoh: the older gens are manageable
21:21karolherbst: I had a macbook pro from the.. sandy bridge era
21:21karolherbst: which I think is slightly older than yours
21:22bencoh: mine is ivy bridge, apparently
21:22karolherbst: and it was totally not working 5-6 years ago :p
21:22karolherbst: bencoh: ohh, I guess you are legacy booting, right?
21:22karolherbst: ohh no
21:23bencoh: by the way, since we're at it ... I need to bring the nvidia card on before suspending (mem state), otherwise it just oops (at resume iirc)
21:23karolherbst: of course not :p
21:23bencoh: no, I'm using rEFIt
21:23karolherbst: earlier you had to add grub hacks to even make it display on the intel gpu
21:23bencoh: (I bring it on using vgaswitcheroo)
21:23karolherbst: as it defaults to the discrete GPU when booting
21:23bencoh: well, I didn't had to, but I remember reading about that
21:24bencoh: have* to
21:24karolherbst: mine was a MacBookPro8,2
21:24bencoh: so err, is there any proper reason that the card needs to be enabled before suspend, or should that be taken care on by the driver?
21:25bencoh: (I'd tend to say the driver should attend to it, or at least handle resume properly nonetheless)
21:27karolherbst: bencoh: state needs to be saved
21:27karolherbst: you don't want to keep VRAM powered on while suspending, do you?
21:28bencoh: right, but I'd say the driver should still do it even if the card is "off"
21:28karolherbst: or is there an issue with resuming?
21:28bencoh: (not that I really know what it means for it to be off)
21:28bencoh: I think there is, I'll paste some logs
21:28karolherbst: well, no power -> bits become random over time
21:29karolherbst: I guess we could skip it if nothing uses the GPU...
21:29karolherbst: skeggsb: could we skip doing the full resume path if nothing ended up using the GPU on suspend?
21:29karolherbst: not that it matters...
21:29karolherbst: bencoh: apparently there are some (or all) firmwares which "loose" the GPU when it was powered off when suspending
21:30karolherbst: so we can't skip waking it up, but we could make suspend faster by skipping some work
21:31karolherbst: bencoh: that's without the patch, right?
21:31bencoh: for now it just crashes unless I power it on manually before entering suspend
21:31bencoh: karolherbst: that happens even with the patch
21:31karolherbst: huh.. strange
21:31bencoh: the patch prevents the d3hot/d3cold issue
21:31karolherbst: that's... weird
21:32bencoh: well, at least on this laptop that's how it is
21:32karolherbst: yeah... odd
21:33bencoh: I added echo ON > vgaswitcheroo to pm-suspend/systemd hooks, and it has been suspending fine since then
21:33bencoh: I still can't get hibernate to play nice as well, but ... I guess that'd be more tedious
21:34karolherbst: ehhh.. something is weird
21:35bencoh: karolherbst: you're referring to that part I suppose
21:35bencoh: Jul 2 05:00:46 alkaid kernel: [ 1153.126263] nouveau 0000:01:00.0: Refused to change power state, currently in D3
21:35karolherbst: might printing the kernel stack when that happens?
21:35karolherbst: the message is somewhere in the pci code
21:35bencoh: I'd need to build it with kgdb on then
21:36karolherbst: there is a function.. wait
21:36bencoh: well, we already have a stack trace
21:36karolherbst: yeah.. but that's somewhere else :p
21:36bencoh: oh, you mean adding dump_stack() to that code
21:36karolherbst: just right after the message
21:36robi: who would be the best person to send a patch?
21:36bencoh: the Refused to change power state, currently in D3 one, you mean
21:36robi: (for review)
21:37karolherbst: robi: the nouveau mailing list
21:37robi: got it
21:37karolherbst: bencoh: yes
21:37bencoh: I just realized this doesn't happen *during* resume, but during ON > vgaswitcheroo *after* resume, by the way
21:37bencoh: as if suspending it while it's off broke something
21:37karolherbst: bencoh: yeah... I was already wondering :p
21:38karolherbst: bencoh: ahh, yeah
21:38karolherbst: can't do that :p
21:38karolherbst: the GPU needs to be on while suspending
21:38bencoh: yeah, the question is still why doesn't the driver take care of the needed part
21:38bencoh: (during suspend/resume)
21:38karolherbst: it does normally
21:38karolherbst: I think...
21:38karolherbst: maybe somebody messed with it
21:38bencoh: hmm ... looks like it doesn't do everything that's needed on this laptop at least :D
21:39bencoh: ah, could be as well :-)
21:39karolherbst: uhm.. wait
21:40bencoh: I'll be afk for a moment, coming back in 20~30mn :)
21:58karolherbst: HdkR: https://gitlab.freedesktop.org/karolherbst/nouveau_ci/-/tree/master/
21:59HdkR: 404'd, I guess private project?
21:59karolherbst: are you logged in?
22:00HdkR: I am
22:01karolherbst: gitlabs interface is just terrible if it comes to adding names
22:01karolherbst: "ryan" just lists all bryans as well...
22:01karolherbst: and if you put a space, it's like an or
22:02karolherbst: ahh "ryan hou" is the magic string for you :p
22:02HdkR: I've been mistaken for bryan many times by humans, I think it's matching behaviour :P
22:02karolherbst: done, now you can see it :p
22:03karolherbst: the python script is terrible though
22:03karolherbst: to much os.system and sudo :D
22:03karolherbst: _but_ it does set everything up
22:04karolherbst: dnsmasq as DHCP and TFTP server, laptop finds it and does load the netboox.xyz binaries, which then loads the menu via https :)
22:04karolherbst: now I just need to provide my own menu + ipxe files
22:05karolherbst: still having some hardcoded interfaces in the dnsmasq.conf .. oh well
22:05karolherbst: that's for later
22:06HdkR: Everything is is rough on the first iteration
22:06HdkR: w/e :D
22:06karolherbst: heh.. apparently my internet doesn't work anymore
22:06HdkR: weird, I have "access" but it isn't showing me a file structure? wtf are you doing gitlab
22:06karolherbst: ahh seems like only ipv4 is bonkers
22:07karolherbst: ohh wait.. this happened earlier
22:07karolherbst: something screws up my network (it's me probably?)
22:07karolherbst: needed to reboot but any network access essentially blocks forever
22:07karolherbst: it's weird
22:08karolherbst: and some stuff still works
22:09HdkR: The boot starts taking down your entire network, or the device's networking is broken?
22:09karolherbst: it just appears to start randomly
22:09karolherbst: I think my second network interface dies in a weird way
22:10karolherbst: ahh yeah.. lsusb hangs as well
22:11karolherbst: HdkR: device path of hell: ../../devices/pci0000:00/0000:00:1d.6/0000:06:00.0/0000:07:02.0/0000:3e:00.0/usb4/4-1/4-1.4/4-1.4:1.0/net/enp62s0u1u4
22:12karolherbst: ahh yeah..
22:12karolherbst: it's definetly the device
22:12karolherbst: or kernel
22:12HdkR: Something not interacting nicely with that USB device at the very least
22:13karolherbst: I blame TB actually...
22:13karolherbst: I just removed the device, but it's still there
22:13karolherbst: "[18231.513344] pcieport 0000:07:00.0: can't change power state from D3cold to D0 (config space inaccessible)" :D
22:13karolherbst: yeah.. lol
22:14robi: karolherbst: that path looks like something you might get if you daisychain 5 usb hubs together
22:14karolherbst: I wouldn't be surprised if the commit nouveau broke with is involved here as well
22:14karolherbst: robi: nope
22:14karolherbst: USB-C hub on my TB port :p
22:15karolherbst: it's... weird
22:15karolherbst: TB isn't USB-C directly.. so in order to accept USB-C devices, it has to spawn a fake USB controller
22:15karolherbst: 00:1d.6 is a PCIe port, a normal one
22:15karolherbst: 0000:06:00.0 is a spawend TB bridge
22:16karolherbst: 0000:07:02.0 is .. another spawned bridge? (now, that's ridicoulus)
22:16karolherbst: 0000:3e:00.0 is the USB 3.1 controller from the TB controller
22:16karolherbst: and then you get the USB stuff
22:17HdkR: Thunderbolt is amazing in so many ways...
22:17karolherbst: ahhh ACPI throws some errors in dmesg as well
22:17karolherbst: and here I thought runpm enabled all the way is a good idea :D
22:26karolherbst: reboot it is then
22:35karolherbst: HdkR: can you see the code now?
22:36HdkR: ah, now I can
22:36karolherbst: weird.. seems like being a guest isn't enough :p
22:39HdkR: ah okay, I see how this work
22:39karolherbst: there are a few things I don't like :p
22:40karolherbst: but in the end I want it to auto install :)
22:40karolherbst: but that's for next week
22:41HdkR: How does Netboot know what OSes are available?
22:41karolherbst: it has a full https stack...
22:41karolherbst: so.. it downloads the menu and everything :D
22:41karolherbst: it's quite instane
22:42HdkR: Oh, so its "Installers->Operating Systems" menu actually just downloads the media from the web without configuration?
22:42karolherbst: you can provide your own files though
22:42karolherbst: but then it still downloads the images
22:43karolherbst: depending on the ipxe file
22:43karolherbst: it's quite cool actually
22:44HdkR: Yea, that's really neat. I'm definitely going to toy with this once I get my hands on a new ARM device :D
22:45karolherbst: it doesn't do arm though
22:45HdkR: womp womp
22:45karolherbst: well.. it's firmware code :p
22:46karolherbst: but that stuff even comes with a node web frontend if you want to, and you can deploy new ipxe files while the service is running
22:46karolherbst: and if an arm device supports anyway of that I am sure you could also support it there
22:46karolherbst: just... the ipxe files are quite x86 centric
22:47karolherbst: (I skipped the node stuff obviously)
22:49HdkR: Hm, I'll have to check how this ProX supports ipxe boot then
22:49HdkR: Maybe the network bits just don't work past that point
22:50karolherbst: it probably just downloads the kernel and initrd file directly
22:50karolherbst: normally the bootloader support BOOTC and/or TFTP
22:51karolherbst: PXE is weird...
22:51karolherbst: but I am sure arm devices don't do PXE
22:52HdkR: I'll check its boot options in a bit
22:53HdkR: Wacky microsoft product
22:53karolherbst: ohh wait.. I do the same, just instead of using closed source PXE I use netboot.xyz :D
22:54karolherbst:still needs to understand how all of that works
22:55karolherbst: ahh.. there is "Syslinux PXELINUX" and gPXE/iPXE as open source alternatives
23:00karolherbst: oh well
23:00karolherbst: let's see what works reliable enough :D