02:18 psaisanas: Hi, i have a Quadro FX5500 (NV49 Class) gpu and it seems to be displaying some artifacts when rendering more than one OpenGL context on the screen. Is is possible to use the old context switching firmware extracted from the old nouveau_firmware package with the current nouveau driver to test?
02:26 imirkin: i don't think so
02:29 psaisanas: Thanks for replying. Well i have set up the Quadro FX5500 with the Nvidia-304.125 blob drivers and created an mmio trace dump. Would it be useful to upload for such an old gpu?
02:30 imirkin: FX5500 is a NV3x iirc...
02:30 psaisanas: No, NV49
02:30 psaisanas: Quadro FX5500
02:30 imirkin: oh, they made weirdo names
02:30 imirkin: ya
02:30 imirkin: GeForce FX5500 is a NV34
02:31 imirkin: but they kept the Quadro FX line into the NV4x series. fun!
02:31 imirkin: is it a single application that uses 2 contexts, or 2 separate applications?
02:34 psaisanas: single application, i.e. if using the gnome 3 desktop and two 3d effects are happenning at the same time it starts to produce artifacts. or even running glxgears with the desktop will start to produce curruption... hard to describe!
02:35 psaisanas: corruption! oops!
02:37 imirkin: so... multiple applications
02:37 imirkin: i assume that gnome-shell just uses a single context
02:37 imirkin: and i definitely know that glxgears uses a single context
02:39 psaisanas: not sure...
02:40 imirkin: anyway... this isn't necessarily context switching issues... could be plain ol' resource mismanagement of sorts
02:43 psaisanas: Would an mmio trace dump perhaps help if it were to be uploaded in this instance?
02:44 imirkin: not sure
02:44 imirkin: can you describe the corruption?
02:48 psaisanas: Sometimes like part of the rendering on the edges of the screen that should be on the right hand side of the screen is placed on the left hand side of the screen. The corrupted edges of the screen will only flash if there is some other gl activity going on. But most of the screen in the middle is displayed correctly. Hope it makes some sense....
02:51 imirkin: what happens if you e.g. have a couple of glxgears windows running side-by-side
02:55 psaisanas: the same display corruption as if there was one glxgears instance. Interestingly, if i use the xface desktop there is no display corruption as there are no 3d effects going on in the background. bus as soon as i start glxgears it does go crazy...
02:56 imirkin: are you using moderately modern software versions?
02:56 imirkin: not a whole lot of dev going on there, but... there's some really old stuff out there too
02:59 psaisanas: xorg edgers and oibaf ppa repos updated today on ubuntu 14.04 on x86_64. Unfortunately i'm not in the building with the problematic machine to give you exact versions
02:59 imirkin: no, that's perfectly recent enough
03:00 imirkin: was just making sure you weren't using like mesa 7 or something
03:01 mlankhorst: skeggsb: could you cc stable when you do fixes, and mention affected commit ids when they introduce regressions? :P
03:02 imirkin: +1
03:03 psaisanas: Actually i also got a Powermac G5 quad that has a Quadro FX4500. Id love to get 3d working on that with something newer than mesa 8... But thats a battle for another time
03:11 imirkin: i think benh had patches to make nv4x work on ppc...
03:13 imirkin: the main sticking point was 64k pages iirc
03:13 imirkin: with 4k pages it's mostly fine
03:14 psaisanas: I can get nouveau working on ppc64 as long as the kernel is using pagesizes of 4k and not using msi interrupts. MSI interrupts get assigned but it sort of hangs. Disabling msi interrupts by default gets it working again.
03:15 psaisanas: No the main issue is when running glxinfo it cant find a compatible mode, i think its a big endian issue or something.
03:16 imirkin: yeah
03:16 imirkin: it _may_ have been fixed with the latest code
03:16 imirkin: but i'm fairly sure that mesa 9.0 and maybe 9.1 should work totally fine
03:17 imirkin: after that there was some partial fixing of be which left things in a broken state. there's been further fixing, but i'm not sure of the status
03:21 psaisanas: using glamoregl accel will get glx gears up and running with mesa 10.3 or 10.4. it falls back to software rendering and the colours are all wrong.
03:25 psaisanas: Anyway, ill send the mmio trace for the quadro fx5500 tomorrow it helps.
04:51 mlankhorst: tagr_: could bugs occur when you have multiple VM entries that point to the same VRAM address?
06:48 Scall: pmoreau: here it is the netconsole log: https://www.dropbox.com/s/zwfiojbtjf69lyi/netconsole-debug_level_6?dl=0
06:48 Scall: The weird thing is that adding "debug" to the kernel command line the system doesn't *completely* freeze as before (but it still does if I remove "debug"). So, with "debug", I _can_:
06:49 Scall: * Move the mouse pointer, reboot with Alt-SysRq-REISUB (I couldn't do any of these without "debug")
06:49 Scall: but, still with "debug", I _can't_:
06:49 Scall: * Use the mouse pointer to do anything (clicks on any area of the desktop don't have effect; all the GUI is frozen), use the keyboard to go in a TTY.
06:49 Scall: The software I've used is the same from the last time I sent you the other two logs, nothing has been updated since then. Of course I've added "debug" to make netconsole to work, as suggested in this page: https://wiki.ubuntu.com/Kernel/Netconsole
06:49 Scall: imirkin: I've got a log with "nouveau.config=NvMSI=0" as well, not sure if it can be useful, but here it is: https://www.dropbox.com/s/9pxkxpyzrcta0ml/netconsole-debug_level_6-msi_off?dl=0
07:05 Scall: Another detail: without "debug" it freezes as soon as I try to _resize_ the "glxgears" window, with "debug" it freezes as soon as I _start_ glxgears (I can't even see the gears in the window, because the opened window is entirely black; without "debug" I could see the gears, though). And it is always so, I've rebooted several times to test.
08:41 pmoreau: Scall: There aren't many errors in the logs, apart from some PBUS mmio write errors. It seems to be stuck in an infinite loop starting from line 4529 (in the netconsole file) - I don't know why - but I could be wrong.
08:50 Scall: pmoreau: Oh, I see. If there are any other useful information I can send you, or if a "spam" (7) debug level could help, just let me know ;)
09:34 pmoreau: Scall: I can't do much more than looking at the files and trying to interpret them: I have little knowledge about how Mesa / X interact with the driver and I unfortunately don't have time to learn about it.
09:35 pmoreau: Scall: Hopefully someone else will have time to have a better look at it.
12:19 pmoreau: Which engines have an influence over the EVO engine?
12:20 pmoreau: And is there a way to trace EVO commands?
12:20 imirkin: you should be able to trace evo commands by doing a mmt trace of X
12:20 mlankhorst: isn't evo mostly display?
12:21 pmoreau: I copy/pasted every PDISPLAY writes from the blob, but apart from making my screen yellow then white, it had no effects upon the original issue. :D
12:21 pmoreau: So I was looking at something else
12:21 pmoreau: imirkin: And using dedma upon the trace or some other tool?
12:23 pmoreau: I'll follow the wiki :-)
12:23 imirkin: pmoreau: demmt... although check with joi
12:23 imirkin: and/or RSpliet
12:25 pmoreau: Ok
12:34 pmoreau: Is it "viable" to do both a mmiotrace and a mmt trace at the same time?
12:34 imirkin: ya
12:34 imirkin: don't expect to set any speed records
12:34 pmoreau: ;D
12:47 joi: (demmt can't decode evo commands)
12:50 pmoreau: Are they displayed in some text format like a mmiotrace or do they remain in some binary format?
12:52 joi: pmoreau: fyi, if you want to see mmiotrace and mmt in sync, you can direct demmt output to mmiotrace (there's "sync mode" in mmt/demmt)
12:56 pmoreau: joi: Nice! :-)
12:56 pmoreau: Is it the -s flag of demmt?
12:56 joi: yes
12:57 pmoreau: So do I need to launch demmt on-the-fly along with mmt or can it be done offline?
12:58 joi: there's no way to sync them offline
12:59 pmoreau: Just wanted to be sure
13:02 RSpliet: but they're easy enough to follow manually
13:20 pmoreau: The log file stays empty. I checked I was tracing the good Xorg binary and that I wasn't suid'ed. As for the command, I'm using the one from the wiki - minus replacing glxgears by Xorg.
13:30 joi: pmoreau: paste full command, please
13:30 pmoreau: Tracing glxgears gives some results.
13:31 pmoreau: joi: `valgrind --tool=mmt --mmt-trace-file=/dev/nvidia0 --mmt-trace-file=/dev/nvidia1 --mmt-trace-file=/dev/nvidiactl --mmt-trace-nvidia-ioctls --log-file=file-bin.log Xorg`
13:32 joi: looks good
13:58 pmoreau: Silly me... I was tracing the wrong binary... --"
14:38 pmoreau: joi: Seems to work really well, thanks a lot!
14:56 ariscop: having trouble with an 870m, same fault as this guy https://bbs.archlinux.org/viewtopic.php?id=185962
14:57 ariscop: says mimo read fault, then any attempt to use it via DRI_PRIME kills x, and sometimes panics and/or appears to kill the pci bus
14:58 imirkin: ariscop: what gpu is the 870M? can you pastebin your dmesg?
14:59 ariscop: http://ix.io/gfC
15:00 ariscop: can't get a dmesg from when it faults unfortunately
15:00 ariscop: same behavior since 3.14, i'm on 3.18 now
15:00 imirkin: and you're concerned about "[ 3713.031769] nouveau E[ PBUS][0000:01:00.0] MMIO read of 0x00000000 FAULT at 0x500c30 [ IBUS ]"?
15:00 imirkin: i don't think it's really a problem...
15:00 ariscop: imirkin, i'm more concerned with attempting to use it killing the machine
15:01 imirkin: yeah :) is it a thinkpad?
15:01 ariscop: "EUROCOM RACER 3A", clevo rebadge
15:02 imirkin: the reason i ask is that some people were experiencing very odd hangs when enabling the nvidia gpu in like one model of thinkpad with a certain vbios
15:02 imirkin: it was like the W530 or something
15:02 imirkin: seemed very specific to that setup
15:02 ariscop: well i'm not entirely sure what BOOT0 is, but it's the same address on mine as that other guys fourm post
15:03 ariscop: could be unique to this model
15:03 imirkin: no, that just says it's a GK104
15:03 imirkin: and the issue was more with the machine's bios or acpi tables or... something
15:05 ariscop: imirkin, any ideas how i might go about debugging this?
15:05 imirkin: "kills X" -- what does that mean?
15:05 imirkin: X exits?
15:06 ariscop: X segfaults
15:06 imirkin: awesome -- figure out what the segfault is
15:06 imirkin: btw, you have an odd quantity of ram, i think there was a fix
15:06 imirkin: that affected situations that had... weird amounts of ram
15:07 imirkin: (more like uneven memory partitions, but that's hard to tell from what's output in the logs)
15:42 ariscop: new errors http://ix.io/gfF
15:42 ariscop: behaves differently when you load at boot apparently
15:43 imirkin: neat
15:43 imirkin: grctx template channel unload timeout
15:43 imirkin: never seen that before
15:43 imirkin: skeggsb: --^
15:43 imirkin: ariscop: this won't be good for your battery, but try booting with nouveau.runpm=0
15:44 ariscop: this thing lasts like 3 hours running tf2, should be fine
15:44 ariscop: brb
15:47 ariscop: that did something http://ix.io/gfG
15:47 imirkin: oh, infinite sadness
15:47 ariscop: :(
15:48 skeggsb: that smells of the "gr not powered up" issue
15:48 imirkin: skeggsb: yeah, smells of it, but a ton of people have that issue now after your fix went in
15:49 imirkin: skeggsb: and for some of them, changing some subtle timing thing in the kernel can cause it to randomly work, or work more often
15:49 imirkin: (something with rcu intervals or something else random like that)
15:49 skeggsb: yeah, i noticed those posts on the bug
15:49 ariscop: the joys of timing bugs
15:50 imirkin: airlied: didn't i have some sort of suggestion to move something above something else in that logic? i know i'm being vague, but it was a long time ago... some write was happening before it was powered up... or something
15:50 imirkin: er
15:50 imirkin: skeggsb: --^
15:51 skeggsb: yes, i vaguely recall that discussion, i also *think* we decided that wouldn't help
15:51 ariscop: i'll see what loading it a bit after boot does
15:53 imirkin: skeggsb: i think it _didn't_ help, but it did get rid of some error messages or smoething
15:53 imirkin: oh right. moving it from init to ctor
15:55 ariscop: http://ix.io/gfI doesn't seem to complain after it's been up for a bit
15:55 imirkin: ariscop: it's non-determinstic at this point...
15:58 ariscop: http://ix.io/gfJ nevermind, though x seems to actually restart now
15:58 ariscop: idle channel 0xcccc0000?
15:59 imirkin: ariscop: you mean "failure idling channel"?
15:59 imirkin: that means that the gpu got stuck
16:00 ariscop: nouveau E[Xorg[2252]] failed to idle channel 0xcccc0000 [Xorg[2252]], 0xcccc0000 just seems odd
16:05 ariscop: imirkin, is there a bug tracker i can watch?
16:06 pmoreau: ariscop: Our bug tracker is at bugs.freedesktop.org
16:09 imirkin: ariscop: search for "nve4 hub_init"
16:09 ariscop: actually getting a few similar results
16:09 pmoreau: The ioctls are really interesting. The blob creates one NVRM_DISP_MASTER per card then destroys them quite immediately. Once both cards are initialised it creates a new NVRM_DISP_MASTER and the other channels on the main card.
16:09 ariscop: 87942 has the same x crash as me
16:10 pmoreau: But the most interesting is that it creates two NVRM_DEVICE for the main card (MCP79) with two different versions of NVRM_FIFO_IB. The first one gets version g82 and the other one g80. I don't get the point.
16:11 imirkin: pmoreau: welcome to tracing a clean, well-maintained driver
16:12 pmoreau: imirkin: ;)
16:12 imirkin: it's like asking why loading an object requires downloading the entire database when you use an orm... helper function calls helper function calls helper function...
16:13 pmoreau: I don't want to guess how much hairs were teared out nor computers broken by the first ones who worked on Nouveau!
16:15 pmoreau: Eh! Wait! They created a third NVRM_DEVICE for the MCP79! \o/
16:16 imirkin: coz three's a party
16:17 pmoreau: I hope the G96 doesn't stay alone though :/