17:23 karolherbst: imirkin_: random thought about the vertex buffer thing: normally I would expect some TRAP or WARP error or anything if we didn't get things right, but this is a ctxsw timeout, so maybe it's just something trivial like we fred the memory already, but the context still points to it and the firmware wants to save state, but can't because that's just invalid memory right now? Or something stupid like that
17:24 imirkin_: the context would only save the pointer in that case
17:24 imirkin_: not read its contents
17:24 imirkin_: but yeah, could be some pre-fetch thing
17:38 karolherbst: mhhh, now I got the GPU to crash for real
17:40 imirkin_: yay!
23:30 Lyude: is there a way to unload nouveau in such a way that we don't shut off the GPU?
23:31 Lyude: or rather-don't clear whatever context we had on the GPU
23:31 imirkin_: no
23:31 Lyude: hm.
23:34 imirkin_: why are you unloading nouveau?
23:34 imirkin_: leaving it loaded is a great way of not touching the context ;)
23:35 Lyude: imirkin_: tldr I'm debugging a very evil machine that we confirmed a while back was not power cycling the nv GPU on reboots (yes. really.)
23:36 Lyude: so i'm trying to get the GPU setup, yank nouveau without shutting the GPU down, then see if I can manually get it to turn "off" enough with acpi methods enough that it's not in a bizarre still-on statwe when we boot. if I can do that, that means there is actually a way for us to fix this
23:37 gnarface: that might be a known limitation of the actual hardware Lyude
23:38 Lyude: gnarface: i beg your pardon?
23:38 imirkin_: Lyude: well you can check with that register....
23:38 imirkin_: hold on
23:38 gnarface: NVidia's own documentation for upgrading drivers demands a full shutdown, power off at power supply, wait for cooldown, then cold boot.
23:38 Lyude: gnarface: no that's not it
23:38 gnarface: though i haven't run into a problem disobeying that since AGP cards
23:38 Lyude: oh-
23:38 imirkin_: Lyude: 0x2240c -- check bit 1
23:39 imirkin_: i.e. the second bit
23:39 Lyude: gimme a quick sec to get envytools on this
23:39 imirkin_: that will be set to 1 if the bios has run
23:39 imirkin_: (on fermi+)
23:40 imirkin_: and 0 if it hasn't :)
23:40 imirkin_: so if you e.g. suspend and resume, it'll be 0 again
23:40 imirkin_: https://github.com/skeggsb/nouveau/blob/master/drm/nouveau/nvkm/subdev/devinit/gf100.c#L93
23:42 Lyude: imirkin_: ahhh, that is useful to know, I guess I can use that to know if I've managed to get the GPU to turn off before the reboot
23:45 imirkin_: there's some evidence that a handful of fermi+ vbios's don't actually set that bit
23:45 imirkin_: but i think i've only seen like two ... maybe GF106's or something