DE/EN/ES/FR/RU/Team |
The irregular Nouveau-Development companion
Issue for August, 8th
1. Intro
A little bit more than two weeks have passed and I was black mailed by the main developers to write yet another TiNDC (if you don't write, we don't code). So for the greater good of the community, here we go: TiNDC 25.
Ah, before we start, I want to thank ahuillet who once again took the time to edit my rough draft regarding the Xv topics. But now...
2. The Current status
Much discussions were had about how to get rid of the busy wait on the DMA notifier. After getting feedback from darktama, ajax, pq and marcheu, a tendency materialized, to avoid the TTM for now and implement a scheme with 2 ioctl()s
- Allocate new fence number (ever-increasing uint64_t)
- Wait for fence number #,
with the system assuming that transfers are done in-order and as the fence number would have been per-fifo, this would have worked.
Due to the fact that the TTM doesn't cater exactly to our needs e.g. the management of FIFOs from user space, it seems more and more as if nouveau is only using selected functionality while implementing other parts privately.
As the TTM needs to work a large range of cards, it works even for the dumbest cards of all. And because NVidia cards have implemented most functionality in hardware already, much of TTM code is actually overhead for us.
This decision didn't go well with airlied (DRM maintainer) who wants only one memory manager (TTM, revised if needed) in the tree. The discussion went on in #dri-devel (http://people.freedesktop.org/~cbrill/dri-log/index.php?date=2007-07-26) with valid points exchanged on both sides. The final judgement is that we will migrate to TTM but as TTM isn't stable, not fully used in current drivers (at most there are 2 drivers using TTM or working with it: Radeon and Intel) and missing functionality for nouveau we will work with the TTM guys to get TTM adapted to our needs while we will add stop gap measures for now, to get nouveau working.
Avoiding this minefield of "political" debate, ahuillet switched his priorities and tried to implement double buffering. Not quite 24 hours later, he requested testers.
To make matters a little bit more complex, darktama did work on integrating TTM into nouveau. He readied a patch and sent it to dri-devel for review, as it was modifying some parts of the TTM to better suit our needs. While waiting for feedback, the discussion mentioned above broke loose.
Still work on Xv continued with p0g writing some tests to find out more about how the card handles YV12 image data.
After some staring at p0g's results p0g, pq, ahuillet and marcheu came to the conclusion that the native format was nv12 (that's a encoding / display format, not a NVidia card, see www.fourcc.org/ link "VUY Formats"). Furthermore the data to be displayed was obviously uploaded to the card via the FIFO and not via DMA transfer. So more tests were done to find out why and what kind of performance advantage that would give us. Results are still pending.
Another interesting topic within Xv is the overlay object. After the initial work on Xv worked but didn't yield as much of a performance gain as expected, both p0g and ahuillet tried to reverse engineer the data sent to the card in case of overlay usage. They used valgrind-mmt traces and have found out the object and how to control most of its attributes.
The overlay object is not a normal object like those NV30_TCL_PRIMITIVE_3D or PCI / AGP object mentioned in earlier issues. It is a software object which means that if you write a value to an offset within this object, the card will generate an interrupt which will be handled by the kernel module. The kernel module has to look at the value written and execute a function assigned to this value. The function will then write / manage the MMIO registers which will deliver the requested functionality on screen. As that was not what ahuillet was looking for / expecting work moved on to other topics.
But from the work on the overlay (involving MMIO-Traces) marcheu and ahuillet came to the conclusion that we could support native YV12 (actually, NV12, but the conversion is easy...) by setting simply the same values to the registers as the overlay, but it turned out to be not so easy. Setting those two bits had a result, and one of them made ahuillet think that we hit the jackpot, but debugging in incorrectly rendered images isn't that easy, so it took several hours to become only 80% sure that this bit was not a "native NV12" bit, but instead some YUV->RGB conversion parameter.
So currently we can say: For all cards <NV50 we should have a working Xv implementation which is slightly better performing than the old version. By the way: Marcheu and ahuillet did some benchmarks and the interesting thing is: Xv is quite fast for a given frame but when mplayer display frames, it outputs some counters on the screen, and that triggers some 2D operations which in turn do sync everything, including the Xv transfer.
Next topic renouveau: pmdata is still trying to split renouveau into dumper and parser. He has succeeded in creating dumps containing only values from the parsed memory and let them later run through a parser which prints out the dump as we know it now. Still missing though is the xml framework.
Another important problem was bisected by pq which wasn't easy at all: When viewing 2 videos via Xv on nouveau, the DMA queue would sooner or later hang (after 10 minutes latest). As nv didn't exhibit this problem, he traced the problem back to a combined commit in DDX and DRM. ( https://bugs.freedesktop.org/show_bug.cgi?id=11820)
During the weekend airlied did some work on PPC. After some problems due to a not up to date nouveau_drm.h he got glxgears working again on his G5. However, the color is still wrong (black) but that's much better than non working.
Additionally, airlied did get renouveau working on MacOS X so that we can get information about handling NVidia cards on PPC too.
"I probbably need to simplify large parts of renouveau with an axe..." |
In addition to ahuillet's work on Xv, jb17some did some work on XvMC. He is quite certain that he has found out the way how data is transfered to the card and how the data is modified before the final upload. A more up to date summary of his findings are here http://nouveau.freedesktop.org/wiki/jb17bsome in the Wiki.
Marcheu and Darktama did some fixes too: They bumped DRM to version 0.0.10 which removed the burden of subchannel management from the driver writer and handed it to the DRM. The DRM got its own channel which is currently unused. However it is needed for soon to come fixes to get TTM working with nouveau. And that concludes the story from above regarding whether to use or not to use TTM.
- "A channel is just another word for FIFO plus its graphical context. Each of the FIFOs has 8 subchannels which are slots which hold a command to be executed by the card. The slots are executed in sequence. "
Now why does the DRM need its own FIFO? Well, in the future the TTM will manage the cards memory. The philosophy of the TTM is that the DRM owns the pieces of memory and places them as it wants, so it also needs to be able to move things around for example in order to avoid memory fragmentation. Thus you need to be able to do DMA and for that you'll need the DMA object we talked about in the last issues. And for that we need a FIFO.
Next on Marcheu's TODO is to fix the "upload to screen" functionality on NV3x and then extend nv40 Exa functionality to NV3x. After that is probably work on dual head coming up. The sequence in which the features get implemented are in part based in interest and in part based on the progression of the TTM (see above). Currently the status of the TTM makes work on 3D hard. So "easier" parts like 2D performance get worked on first.
However, using multiple channels needs working context switches but some cards (NV1x and NV2x) got problems like DMA-queue hangs during startup of X. Further work is needed (and testers are welcome!) but it seems as if context switching work on that cards isn't fully working.
And as is often the case when everything is broken, matc comes to the rescue. This time he noticed that we didn't correctly init the interrupts on NV3x cards (well, we do at first, but subsequent init procedures need to disable them again and we don't enable them after that.
3. Help needed
Ahuillet asks for more testers of his Xv code. Apart from compiling and installing nouveau this only involves looking at videos and giving him feedback
We would like owners of 8800 cards to test our current driver and report back to us. As we currently have "only" two G84 cards for development and testing available, feedback from users with this hardware would be very welcome. Please note: Use the randr-1.2 branch and report back to Darktama.
As noted above, we need MMioTraces for NV41, NV42, NV44, NV45, NV47, NV48 and NV4C. Please make yourself known in our channel if you can have one of these cards available.
Finally, a correction to the access number published last week, I was wrong there, my statistics only show absolute accesses and not references. So the access numbers were nearly 3 times too high!. Currently we are at roughly 2200 hits on issue #23.