18:02imirkin: anholt: recovery is a sore topic, unfortunately
18:02glennk: POST-traumatic stress?
18:03imirkin: glennk: you're on fire today
18:19ajax: i feel like i've asked about this before (recovery and why it's bad) but i don't remember the answer
18:21imirkin: we don't really have a reliable way to go back to a good state
18:22imirkin: like when tlb flush fails, that means the whole memory thing is messed up
18:22imirkin: how do you fix that? dunno
18:22imirkin: with laptops, we could start getting clever and force it to get turned off
18:22imirkin: but no such thing with desktop parts
18:29ajax: well. we can post secondary cards, right? so we have to have at least that much ability to run the boot bytecode.
18:30ajax: is the problem that we don't know which engines need resetting in which order?
18:41ajax: i guess that's a subset of "go back to a good state"
18:51imirkin: ajax: double-post isn't good, unfortunately
18:51imirkin: post takes it out of "reset" state
18:52imirkin: running post on an already posted-board can lead to trouble
18:52imirkin: we take great care to try to avoid that
18:55ajax: i feel like characterising that trouble might be instructive but i'm also def not volunteering right now
18:57imirkin: yeah ........
18:58imirkin: i think one little bit that you might not appreciate is that there are many thousands of SKUs. each one has its own init sequence (in the vbios), so what each one does and how each reacts to this may be different.
18:58imirkin: not to mention different generations/etc
18:58imirkin: we have enough other problems, so we haven't _really_ cared much about what init does
18:58imirkin: as long as it works :)(
18:59ajax: fair enough. and certainly relying on reset is worse than just not effing up your page tables in the first place.
18:59ajax: or whatever it is that's making tlb flush die
18:59imirkin: yeah, sadly the path of least resistance is always "don'g hang"
19:00imirkin: rather than "figure out how to make reset work"
19:00imirkin: tlb flush dying is just a quick indicator of complete failure. if that doesn't work, then nothing works. but it's probably not the thing that's broken directly.
19:00imirkin: just a side-effect of "memory went for a walk"
19:01imirkin: "be back after lunch"
19:01imirkin: anyways, would *love* any serious attention being paid to this stuff
19:02imirkin: as you likely know all too well, it's a lot of hard thankless work, even when you have docs