The irregular Nouveau-Development companion
Issue for July, 21st
1. Intro
Hello again, this is number 24 of our TiNDC. Now development speed is really picking up with ahuillet, darktama, pmdata, pq and even marcheu giving a bonus appearance more and more often (He claimed that he will be back in full force in about 1-2 weeks).
If the speed keeps up, I am pondering the idea to rename the TiNDC to "The daily Nouveau Companion"
Better this way than scrambling for any news. At this point, I would like to thank you for your interest in our project. Just 36 hours after the publication of the last issue, 1750 hits have been registered, 24 hours later we had roughly 2650 hits. 10 days later we are now at ~5200 hits!
2. The Current status
Darktama pushed his changes in the DRM and DDX (nv50-branch) in order to get preliminary G8x support working. As the the object handling for 8x00 cards is now using 64 bit addresses instead of 32 bit (and thus bigger object sizes), a better abstraction and separation between OpenGL contexts and with 64 different contexts / FIFOs available, with the GPU being able to protect VRAM from CPU access, changes were needed.
Before Darktama pushed his changes though, he had tested the idea of "compatibility mode", that is running G8x with NV4x objects and commands. Unfortunately, that didn't work at all, as the card complained loudly.
So the DRM needed to be able to handle the new objects which resulted in Darktama's patches. Tests on a second G84 resulted in success, the 2D display was using acceleration via EXA copy and EXA solid routines. Please use "MigrationHeuristic greedy" and be prepared to encounter the console switching problems mentioned in earlier issues when going back to the text console.
Unfortunately those patches didn't go well with Ahuillet's DRM patches, which he hadn't pushed at that time. So he had to learn git conflict management the hard way.
Next topic on Darktama's todo list is the un-hardcoding of the NV5x / G8x PRAMIN setup. He claims to understand the functionality now and wants to prove it with further patches. The first of that patches went in already, more to come.
Now back to Ahuillet's DMA problems, stillunknown did help a lot by testing various versions of DRM and DDX combinations. First ahuillet got PCIGART working for nouveau, but test results coming in showed various levels of "success". From DMA hangs (NV43, PCIe, 64 bit CPU) to a sped up EXA (for cards < NV50) to a slower Xv (DMA compared to a simple memcpy() which does not make much sense), everything was available.
Confusion spread among the ranks of the developers and testers, but higher beings from the project of X came to the rescue, pointing out some inconsistencies in DRM programming (like using virt_to_bus() which is not a good idea on PPC or x86_64). IDR did some clean up work, which allowed PPC to survive X startup but had to give up due to time limitations.
Benh came in picking up from where IDR left. After two days of part time hacking, he got PPC to a somewhat working state. There are a bunch of rendering issues (incorrectly rendered fonts) and a composite window manager trying to display shadows resulted in blitting errors. (Thread: http://lists.freedesktop.org/archives/nouveau/2007-July/000200.html).
Preliminary tests showed that DMA (PCI or AGP) sped up EXA a lot, but that they were slower for Xv, in that they took more CPU- and real- time to display a frame of video.
The next task accomplished by ahuillet, jb17some and p0g was to benchmark Xv in order to find out the bottlenecks and hopefully improve Xv DMA. It doesn't matter much if Xv DMA is a little slower than CPU copy, as long as it frees the CPU for other things, e.g. decoding the next video frame.
jb17some helped out by oprofiling nouveau_drv.so for those use cases. It showed that most of the time were spent in NvPutImage() and in NvWaitNotifier(). All other functioned were nearly unnoticable.
When jb17some provided a source code annotated by oprofile (or its tools), that provided the final clue: The driver was obviously busy waiting on the "DMA transfer finished notifier".
A few hours later ahuillet got his oprofile working too and acknowleged jb17some findings: Most of the time was spent while busy waiting for the notifier. And because the memcpy() didn't wait for the notifier at all, it was much faster than PCI DMA. But PCI DMA turned out to be slower than AGP, in that more time was spent busy waiting for the DMA notifier.
Back to oprofiling showed now about 50% time spent in copy_from_user() / copy_to_user(). As nouveau shouldn't enter kernel space very often and not "with heaps of data" as marcheu put it, these result created puzzlement. Oprofile's callgraph showed that this was related to internal X-server functions. EDIT: This was the result of a bad test, the copy_*_user can be avoided by using shared memory.
So, as a fast test, the wait was nuked and testing ensued. Result was: PCI was twice as fast as before and AGP got about 10% faster. Still memcpy() ruled the pack and PCI DMA emitted rendering corruption (expected, as the sync point was deliberately nuked). Perhaps we should point out the fact that Xv is still not for the "normal user" as it is not able to display more than one video at once without rendering corruption...
So much confusion but it can be summed up as follows:
- on NV50 and later cards Xv doesn't work yet
- EXA is sped up by either AGP or PCI DMA
- DMA transfer notifiers and the busy waiting on it make DMA slow
Darktama was not slacking either: He merged nv50-branch and randr-1.2 branch in DDX git so that there is no further need for him to merge in changes from two other branches (randr-1.2 and master). So if you want to use a NV50 / G8x card, please use randr-1.2 branch, nv50-branch is now obsoleted.
Apart from merging, Darktama added a few patches to randr-1.2 which should increase performance on NV50 / G8x for other values than "greedy" for "MigrationHeuristic".
hughsie tried to get latest nouveau working on FC7. It wasn't easy as DRM in-kernel and DRM-git is currently out of sync. Still he managed to get it working and found that it was working good (according to gtkperf) (http://hughsient.livejournal.com/29989.html). He promised to get a newer version in to FC7 once DRM gets sync'd with the kernel version.
3. Help needed
We would like owners 8800 cards to test our current driver and report back to us. As we currently have "only" two G84 cards for development and testing available, feedback from users with this hardware would be much welcome. Please note: Use the randr-1.2 branch and report back to Darktama.
And we need MMioTraces for NV41, NV42, NV44,NV45, NV47,NV48 and NV4C. Please make yourself known in our channel in case you can help.
If you don't mind, please do test ahuillet's patches at git://people.freedesktop.org/~ahuillet/xf86-video-nouveau and give him feedback. However, be prepared for problems, misfeatures and crashes as this is definately a work in progress!