06:28 npnth: I've got an SDL program I've written that, when exiting, causes my machine to lock up in ways that I can't really predict.
06:29 npnth: Sometimes the machine is accessible over ssh, and `kill -9' restores functionality. Sometimes (-mainline kernel as of this morning) the machine is completely inaccessible, and does not respond to any input.
06:30 npnth: Sometimes (the nouveau 4.15 branch) the machine responds to mouse movement, but is otherwise dead.
06:31 npnth: I'm not positive that this has anything to do with nouveau, but I don't think a program in userspace should be able to mess up the machine this badly, and since it displays graphics, nouveau is my first stop.
06:32 npnth: I have the various _DEBUG options turned on in my kernel config for nouveau, but the manner of the lockup makes it pretty hard to get information out of dmesg.
06:33 npnth: Is there anything I might be able to try that would get some more information out of nouveau, given that I'll have to hard kill the machine immediately after triggering this behavior?
06:55 imirkin: npnth: what, related to graphics, is this program doing?
06:58 npnth: imirkin: It's not doing too much: it allows clicking and dragging graphs. http://repo.or.cz/clav.git , and everything should be concentrated in ui-sdl.c, if you want to look at it.
07:00 npnth: For example, grepping for SDL_ gives only these calls: https://ptpb.pw/AgIH
07:01 npnth: Since the issue arises on exit, the problem is probably somewhere near ui_teardown().
07:28 imirkin: and do you have a dmesg from when the thing dies?
07:29 npnth: imirkin: I have one thing which may be relevant.
07:30 npnth: If I have 'dmesg > out.txt' running as it dies, it doesn't end up getting written to disk (I think): there's nothing relevant.
07:30 npnth: But if I'm connected over ssh, there's enough time for some data to be transferred. It looks very generic and doesn't mention nouveau at all. Let me grab it.
07:30 imirkin: 'dmesg -w'
07:30 imirkin: will stream it over ssh
07:30 npnth: Yeah, sorry
07:31 npnth: I was using dmesg -w in all the things I just said.
07:31 imirkin: anyways, i'm off to sleep
07:31 npnth: https://ptpb.pw/CvSD <-- what showed up over ssh.
07:32 npnth: Okay, see you
07:32 imirkin: RAX: 6b6b6b6b6b6b6b63
07:32 imirkin: that's the poison value
07:33 imirkin: "there's something wrong somewhere"
07:33 npnth: Oh, nice.
07:33 imirkin: try turning on KASAN
07:33 imirkin: or ... SLUB
07:33 imirkin: i forget
07:33 imirkin: some acronym
07:33 imirkin: which looks for screwups like this
07:33 npnth: I'll look into those, thank you very much. That's exactly what I was hoping for.
17:57 Hijiri: what card can I expect the most nouveau performance with?
17:59 imirkin: GTX 780 Ti
17:59 Hijiri: thanks
17:59 imirkin: (or the titan which is basically the same)
18:00 imirkin: any high-end kepler will generally have reasonable results
18:02 Hijiri: are they still expensive because of SLI
18:03 Hijiri: oh, nevermind, I was looking in the wrong place
18:04 imirkin: depends what one's line for "expensive" is
18:04 imirkin: you should be able to get like a 660 or 670 for under $100
18:44 karolherbst: imirkin: because I am pretty much done with those P50 issues, I would like to take a look at those issues we encounter with plasma and so on. Do you know a super reliable way to trigger those things? Is there some work in progress or some comments about the issues somewhere?
18:44 karolherbst: don't really want to start from 0 fixing those things
18:45 imirkin: i've never encountered the issue
18:45 imirkin: because i've never used plasma or kde-anything
18:45 imirkin: i just know people report it
18:45 imirkin: so i mostly just repeat what people report so they all know they're not alone
18:46 karolherbst: okay, I think you and skeggsb had some WIP patches for stuff?
19:03 cyndis: karolherbst: fyi, not sure if you saw it, but I sent the mesa patch to fix GP10B
19:11 imirkin: karolherbst: don't think i have anything relevant
19:11 imirkin: cyndis: does the kernel have any gp10b support atm?
19:11 imirkin: oh, so it does.
19:12 imirkin: and indeed it only exposes c0c0
19:16 cyndis: yep
19:16 imirkin: pushed
19:16 cyndis: thank you
19:19 imirkin: if there are other issues, send more patches
19:19 cyndis: sure
19:19 cyndis: at least kmscube is now working
19:19 imirkin: btw, dunno what level of involvement you're looking for, but there are some unresolved bugs on maxwell+ with cache flushes
19:19 imirkin: (well, i assume cache flushes. i don't really know what the problem is.)
19:21 cyndis: if it affects gm20b, maybe i could take a look at some point. i do, though, have way too many different tegra features/bugs to work on :)
19:21 cyndis: (ah, or gp10b)
19:22 imirkin: i assume it would, but i don't have access to either
19:22 imirkin: it shows up in xonotic with floors becoming transparent sometimes
19:22 imirkin: (and unigine valley with random weirdness, but that may be harder to run on arm. i can provide traces though.)
19:23 cyndis: i'm not sure if we can even run X11 without crashes atm because the tegra mesa renderonly driver is very WIP
19:23 imirkin: but if you've got bigger fish to fry, i totally get it
19:23 imirkin: just pointing it out in case you're tired of twiddling your thumbs
19:23 cyndis: sure :p
19:23 cyndis: i'll keep it in mind
19:24 imirkin: unfortunately i don't really know where to start with that bug. forcing flushes a lot "fixes" it, which means it's some lack of understanding of the caching that goes on in the gpu
19:25 imirkin: (or a cache we're not aware of, that didn't exist on kepler, for example)
19:27 cyndis: i'll ask one of our gpu driver guys if they can think of anything when i next bump into them
19:28 imirkin: yeah, i mean if i could just ask someone who understands the hardware, and esp understands how it differs from kepler, i'm sure it'd be like a "oh, you're supposed to flush X" and be done with it
19:28 imirkin: of course there's also the possibility that we're supposed to do it on kepler too, but get lucky :)
19:29 karolherbst: cyndis: yeah, I sah
19:31 imirkin: karolherbst: no objections to my maxwell bindless patches i presume? i tested them a bit, seemed to not totally be broken
19:31 karolherbst: imirkin: last time I tested them nothing broke
19:31 karolherbst: I can do that again on monday
19:31 karolherbst: or tomorrow
19:32 imirkin: nah, i wouldn't worry about it
19:32 karolherbst: okay
19:32 karolherbst: I think half of the piglit test passed
19:32 karolherbst: and I think DOW also ran
19:33 imirkin: hm? all of them passed
19:33 imirkin: [for me]
19:33 imirkin: and i fixed the imageSize bits too
19:33 karolherbst: ohh
19:33 karolherbst: maybe you have new patches?
19:33 imirkin: since like 2 weeks ago? yes
19:33 karolherbst: ahh
21:33 rhyskidd: mupuf: noticed a simple missing spot or two where POWER TOPOLOGY from the nv BIT documentation can be added
21:34 rhyskidd: (am going through my own vbios trying to drive down the number of unknown tables now)
23:16 npnth: imirkin: Regarding the issue I mentioned last night, I rebuilt my kernel (-mainline) with KASAN, and triggered the issue again.
23:16 npnth: It didn't kill the machine this particular time, and I got the following in dmesg: https://ptpb.pw/mZyB .
23:18 npnth: This does mention nouveau in the trace, so I figured I'd show it around. I'm not sure what it means, but I can poking around with it if you think that's worth it.
23:19 npnth: Also, I think KASAN might be preventing the hard crash, so I might be able to whittle it down to a MWE. Not sure.
23:24 npnth: Hm. After invoking it more times, 'm not seeing any further hits in dmesg. Perhaps KASAN is removing redundant calls, perhaps it's not as reliable as I thought, perhaps it's first-time only, &c...
23:48 imirkin: npnth: no, it does mention nouveau
23:49 imirkin: curious, though
23:49 imirkin: skeggsb: --^
23:55 imirkin: [oh, and that's exactly what you said. i read it as "doesn't"]