06:28npnth: I've got an SDL program I've written that, when exiting, causes my machine to lock up in ways that I can't really predict.
06:29npnth: Sometimes the machine is accessible over ssh, and `kill -9' restores functionality. Sometimes (-mainline kernel as of this morning) the machine is completely inaccessible, and does not respond to any input.
06:30npnth: Sometimes (the nouveau 4.15 branch) the machine responds to mouse movement, but is otherwise dead.
06:31npnth: I'm not positive that this has anything to do with nouveau, but I don't think a program in userspace should be able to mess up the machine this badly, and since it displays graphics, nouveau is my first stop.
06:32npnth: I have the various _DEBUG options turned on in my kernel config for nouveau, but the manner of the lockup makes it pretty hard to get information out of dmesg.
06:33npnth: Is there anything I might be able to try that would get some more information out of nouveau, given that I'll have to hard kill the machine immediately after triggering this behavior?
06:55imirkin: npnth: what, related to graphics, is this program doing?
06:58npnth: imirkin: It's not doing too much: it allows clicking and dragging graphs. http://repo.or.cz/clav.git , and everything should be concentrated in ui-sdl.c, if you want to look at it.
07:00npnth: For example, grepping for SDL_ gives only these calls: https://ptpb.pw/AgIH
07:01npnth: Since the issue arises on exit, the problem is probably somewhere near ui_teardown().
07:28imirkin: and do you have a dmesg from when the thing dies?
07:29npnth: imirkin: I have one thing which may be relevant.
07:30npnth: If I have 'dmesg > out.txt' running as it dies, it doesn't end up getting written to disk (I think): there's nothing relevant.
07:30npnth: But if I'm connected over ssh, there's enough time for some data to be transferred. It looks very generic and doesn't mention nouveau at all. Let me grab it.
07:30imirkin: 'dmesg -w'
07:30imirkin: will stream it over ssh
07:30npnth: Yeah, sorry
07:31npnth: I was using dmesg -w in all the things I just said.
07:31imirkin: anyways, i'm off to sleep
07:31npnth: https://ptpb.pw/CvSD <-- what showed up over ssh.
07:32npnth: Okay, see you
07:32imirkin: RAX: 6b6b6b6b6b6b6b63
07:32imirkin: that's the poison value
07:33imirkin: "there's something wrong somewhere"
07:33npnth: Oh, nice.
07:33imirkin: try turning on KASAN
07:33imirkin: or ... SLUB
07:33imirkin: i forget
07:33imirkin: some acronym
07:33imirkin: which looks for screwups like this
07:33npnth: I'll look into those, thank you very much. That's exactly what I was hoping for.
17:57Hijiri: what card can I expect the most nouveau performance with?
17:59imirkin: GTX 780 Ti
17:59imirkin: (or the titan which is basically the same)
18:00imirkin: any high-end kepler will generally have reasonable results
18:02Hijiri: are they still expensive because of SLI
18:03Hijiri: oh, nevermind, I was looking in the wrong place
18:04imirkin: depends what one's line for "expensive" is
18:04imirkin: you should be able to get like a 660 or 670 for under $100
18:44karolherbst: imirkin: because I am pretty much done with those P50 issues, I would like to take a look at those issues we encounter with plasma and so on. Do you know a super reliable way to trigger those things? Is there some work in progress or some comments about the issues somewhere?
18:44karolherbst: don't really want to start from 0 fixing those things
18:45imirkin: i've never encountered the issue
18:45imirkin: because i've never used plasma or kde-anything
18:45imirkin: i just know people report it
18:45imirkin: so i mostly just repeat what people report so they all know they're not alone
18:46karolherbst: okay, I think you and skeggsb had some WIP patches for stuff?
19:03cyndis: karolherbst: fyi, not sure if you saw it, but I sent the mesa patch to fix GP10B
19:11imirkin: karolherbst: don't think i have anything relevant
19:11imirkin: cyndis: does the kernel have any gp10b support atm?
19:11imirkin: oh, so it does.
19:12imirkin: and indeed it only exposes c0c0
19:16cyndis: thank you
19:19imirkin: if there are other issues, send more patches
19:19cyndis: at least kmscube is now working
19:19imirkin: btw, dunno what level of involvement you're looking for, but there are some unresolved bugs on maxwell+ with cache flushes
19:19imirkin: (well, i assume cache flushes. i don't really know what the problem is.)
19:21cyndis: if it affects gm20b, maybe i could take a look at some point. i do, though, have way too many different tegra features/bugs to work on :)
19:21cyndis: (ah, or gp10b)
19:22imirkin: i assume it would, but i don't have access to either
19:22imirkin: it shows up in xonotic with floors becoming transparent sometimes
19:22imirkin: (and unigine valley with random weirdness, but that may be harder to run on arm. i can provide traces though.)
19:23cyndis: i'm not sure if we can even run X11 without crashes atm because the tegra mesa renderonly driver is very WIP
19:23imirkin: but if you've got bigger fish to fry, i totally get it
19:23imirkin: just pointing it out in case you're tired of twiddling your thumbs
19:23cyndis: sure :p
19:23cyndis: i'll keep it in mind
19:24imirkin: unfortunately i don't really know where to start with that bug. forcing flushes a lot "fixes" it, which means it's some lack of understanding of the caching that goes on in the gpu
19:25imirkin: (or a cache we're not aware of, that didn't exist on kepler, for example)
19:27cyndis: i'll ask one of our gpu driver guys if they can think of anything when i next bump into them
19:28imirkin: yeah, i mean if i could just ask someone who understands the hardware, and esp understands how it differs from kepler, i'm sure it'd be like a "oh, you're supposed to flush X" and be done with it
19:28imirkin: of course there's also the possibility that we're supposed to do it on kepler too, but get lucky :)
19:29karolherbst: cyndis: yeah, I sah
19:31imirkin: karolherbst: no objections to my maxwell bindless patches i presume? i tested them a bit, seemed to not totally be broken
19:31karolherbst: imirkin: last time I tested them nothing broke
19:31karolherbst: I can do that again on monday
19:31karolherbst: or tomorrow
19:32imirkin: nah, i wouldn't worry about it
19:32karolherbst: I think half of the piglit test passed
19:32karolherbst: and I think DOW also ran
19:33imirkin: hm? all of them passed
19:33imirkin: [for me]
19:33imirkin: and i fixed the imageSize bits too
19:33karolherbst: maybe you have new patches?
19:33imirkin: since like 2 weeks ago? yes
21:33rhyskidd: mupuf: noticed a simple missing spot or two where POWER TOPOLOGY from the nv BIT documentation can be added
21:34rhyskidd: (am going through my own vbios trying to drive down the number of unknown tables now)
23:16npnth: imirkin: Regarding the issue I mentioned last night, I rebuilt my kernel (-mainline) with KASAN, and triggered the issue again.
23:16npnth: It didn't kill the machine this particular time, and I got the following in dmesg: https://ptpb.pw/mZyB .
23:18npnth: This does mention nouveau in the trace, so I figured I'd show it around. I'm not sure what it means, but I can poking around with it if you think that's worth it.
23:19npnth: Also, I think KASAN might be preventing the hard crash, so I might be able to whittle it down to a MWE. Not sure.
23:24npnth: Hm. After invoking it more times, 'm not seeing any further hits in dmesg. Perhaps KASAN is removing redundant calls, perhaps it's not as reliable as I thought, perhaps it's first-time only, &c...
23:48imirkin: npnth: no, it does mention nouveau
23:49imirkin: curious, though
23:49imirkin: skeggsb: --^
23:55imirkin: [oh, and that's exactly what you said. i read it as "doesn't"]