[FrontPage] [TitleIndex] [WordIndex

Home

TiNDC 2006

TiNDC 2007

TiNDC 2008

Old Logs

Current Logs

DE/EN/ES/FR/RU/Team

The irregular Nouveau Development Companion #40

1. Issue for October, 29th, 2008

Greetings everyone, the long awaited TiNDC #40 is finally here.

It has been a very long time since the last TiNDC, over five months. Also the Nouveau wiki has been fairly quiet, because no new shiny features have progressed to the level where we would like the general public to test them (or a little bit due to forgetting to update it). This does not say that the project is dead, not at all, the difficult parts take time, and changes in the DRI/DRM development process model and memory manager designs are not exactly pushing us forward. Having our beloved koala_BR hijacked by real life, the public output of the project has been very low. Getting frustrated with the silence, I (pq) decided to write something, but I don't plan to become a regular TiNDC writer. So, here is a review of the current things, but unfortunately I cannot cover everything from the past five months. Thanks to gQuigs for writing the Google Summer of Code section. Enjoy!

Topics:

Highlight: NV50 2D is now on par with the rest! For details, see Short Topics.

1.1. The Nouveau Source Code Repositories

The Nouveau project uses several git repositories and many branches. In these quiet times, the best place to monitor Nouveau's progress is to watch the repositories. Here is a short introduction to most of them.

1.2. Renouveau, Mmiotrace and User Contributed Dumps

We know that the list at http://people.freedesktop.org/~jpakkane/ren/ has not really been updated for some time, even though the web page is regenerated. The automatic script for fetching Renouveau dumps from gmail and putting them online broke, when freedesktop.org disabled all DSA keys due to the famous security bug in Debian's OpenSSL. No Renouveau dumps are lost, though, they are still available to the developers in gmail.

Sometimes people drop by and ask if they should make a Renouveau dump with their specific card. At this point we are not burning to get new dumps, but if you can make one, please do. We would prefer dumps for 9000-series and GTX200-based cards, so basically anything that is newer than 8000-series. On the other hand, mmiotraces from all card generations are warmly welcome. At the time of writing there are 165 emails in the mmiotrace dump gmail account, totaling in 1.5GB worth of compressed dumps.

Pmdata has improved Renouveau along the year, introducing an XML-based database of graphics commands for each card generation. Renouveau has gone through a major change due to that, now the dumping and interpreting the dump are two different steps. This means, that we do not need to dump again when a new command is identified. The command is added into the database, and existing dumps are reinterpreted. Pmdata has also written some more tests for Renouveau, and taking advantage of those requires new dumps.

After a long time, pq has managed to get Mmiotrace into the mainline kernel. The first version of in-tree Mmiotrace is in 2.6.27 and in 2.6.28 it will be fully functional as compared to the out-of-tree version. If you use these versions, use the documentation coming with the kernels. The instructions in the wiki are for out-of-tree versions of Mmiotrace. If you can choose, 2.6.28 is the preferred version.

1.3. Displays and Suspend

Malc0 has been working on perfecting the Randr 1.2 support and fixing bugs on the display mode setting front. He started the support for suspend, both suspend to disk and suspend to RAM. The current state on suspend is that there are fairly good chances of it working for pre-NV50 hardware, but NV50 and later cards will not work. Various cards from NV05 to NV40 have been reported to work.

To try suspend, you will need to patch the DDX. If you suspend to RAM, then on resume you need to POST the card by hand (or by scripts) before switching back to X. The suspend support is still a sort of a hack, and more kernel work is needed. In the far future, when kernel mode setting becomes reality, suspending should become clean and robust.

On other things, malc0 has changed the Randr 1.2 model on pre-NV50 cards to be connector based instead of being encoder based. (Think of connectors as physical connectors, and encoders as signal sources that are routed to connectors.) So now you configure which connector you want active, and it should also ease the migration to kernel mode setting world. Malc0 has also been tuning the video BIOS parser, which is essential for Randr 1.2 mode setting and initializing the card.

1.4. Kernel Mode Setting

Kernel mode setting (KMS) means moving the display mode setting from X server drivers into the kernel. This is a general development direction for Linux graphics, and some of the end-user visible effects will be flickerless boot, faster switching between virtual terminals and X sessions, high resolution virtual terminals, and the ability to see critical kernel messages even while running X. From the developer side, KMS should finally end the bloody battle between various kernel and user space drivers over the control the of graphics hardware, and make the design a lot cleaner and easier to understand.

Stillunknown has made a prototype implementation of KMS for his NV50 class card to make sure the KMS API is workable, before it is set in stone. The KMS work is lead by other projects than Nouveau, and indeed there were issues, which are hopefully fixed now. Stillunknown plans to get back to it when we have a memory manager, and also sends thanks to Luc Verhaegen for giving some good ideas about modesetting in general and the need for abstraction. This is what stillunknown himself has to say about things:

Stillunknown has also taken a look at the modesetting equivalent of a fifo. It works much like a graphics fifo, just lacking multiple objects. He has looked at that several times, but it has proved to be a non-trivial effort getting it to work. Interesting registers, including the PUT and GET registers have been isolated, but the contents of some other registers remain a mystery, unfortunately. It seems that there even are more than one fifo available, quite how many is unknown. This functionality is not expected to work in the foreseeable future, but it is not a disaster, because the hardware offers indirect access via two MMIO registers. It is however possible that a functional tiled framebuffer depends on this fifo, but until proven otherwise this remains a guess (Note: The lack of a tiled framebuffer is the reason why a compositing manager is needed for NV50, the 3D engine cannot render to a linear buffer and the compositing manager ensures that windows have a backbuffer, which can be rendered to). For the moment it is very useful to know that an mmiotrace dump does not tell everything about modesetting, instead valgrind-mmt can be used to trace userspace fifos.

1.5. Gallium3D Progress

Some time ago there was the Mesa DRI driver model, where drivers were implemented directly between the OpenGL API and the hardware. This made the drivers big, complex and redundant. The Nouveau 3D driver was started for that old model, and it evolved fairly far in that most card generations were able to run at least glxgears. Then came the Gallium3D infrastructure, and the project was set back miles on 3D support. However, the Gallium3D model is far better than the old model. The user API, for instance OpenGL, is abstracted away and the drivers only need to implement a single core API, making the hardware drivers small and clean. Well, this statement is a simplification, but you can read about Gallium3D design at http://www.tungstengraphics.com/technologies/gallium3d.html .

As you might know, NV40 Gallium3D is currently the most advanced part of Nouveau, and some people claim to have played Quake 3 Arena with it. Do not jump for joy now, because the mantra still holds: Gallium3D is not supported yet, and we do not want bug reports about it. Of course, unless the bug report has a patch attached, that fixes the problem. Otherwise, trying to test it will likely lead to trouble, and we really do not want to waste the developer time or nerves on discussing something that is known to be broken. On the other hand, if you plan to contribute code, then come to talk with us already!

Pmdata has been developing NV30 Gallium3D after Marcheu started it. Pmdata followed what was happening with NV40 Gallium3D and made similar changes, because the Gallium3D APIs were still a bit in flux and somewhat uncharted territory. Pmdata records his progress on his personal wiki page, so you should check that for news and images: http://nouveau.freedesktop.org/wiki/PatriceMandin . Judging from the xmoto screencapture, geometry processing is working, and there are even textures, although the textures are swizzled when they should not be (or vice versa). Uploading properly swizzled textures has proven to be a little harder than he first thought. He has also made nv30_demo program, which pokes the card directly to try rendering commands.

Development on NV10 Gallium has been quiet for some time, and so has NV04. It is interesting to recall, that NV04-NV20 family range does not have real fragment shaders, and NV04-NV10 families do not have vertex shaders, but Gallium3D is built on the assumption that shaders do exist. Marcheu has investigated how these fixed pipeline cards could be used in Gallium3D, and it seems possible, but he has yet to make his mind about which approach is preferable. It could mean changes throughout the whole Gallium3D stack, or not.

Someone has yet to start NV20 Gallium3D work, there is currently nothing. It can be bootstrapped by copying in the NV10 Gallium bits and adding NV30 Gallium vertex program bits.

Darktama has started NV50 Gallium3D, but it does not do anything useful at all, yet. For instance, textures do not work. After doing NV50 2D work, he says he now has a much better view on how to implement things, and will get back to it when he can.

Marcheu has recently been working with LLVM (http://llvm.org/), that should optimize shader programs to the max. But, his work has been with x86 LLVM, specifically the SSE instruction set, and not with GPU instruction sets, for vertex programs. Why would one want to do vertex processing on the CPU? The answer is two-fold: early cards do not support vertex programs, or their fixed vertex pipelines are not worth the trouble to use, since modern CPUs do vertex processing faster. The trouble in using the fixed vertex pipelines is the adaptation of Gallium3D, so if that can be skipped, the better. CPU vertex processing is also required, when the vertex shader is so complex, that it cannot be realized on a GPU that does support vertex shaders. On the other hand, Gallium3D must be adapted to make use of fixed fragment pipelines, since there is no point trying to do that on the CPU, it would not be fast. Marcheu says the LLVM/x86-SSE vertex processing works, but the software fragment pipeline (a.k.a softpipe FP) he has to use at this point is a nest of bugs.

The major bottleneck in Nouveau's Gallium3D development is the lack of developer time. Granted, the current simple memory manager easily runs out of memory and falls over, but it is still enough to try and implement almost all 3D functionality.

1.6. Memory Management

The DRM has historically had a plethora of simple memory managers, and so does Nouveau have one currently. However, a full-featured memory manager is required for efficient use of resources, and previously this was supposed to be TTM (Translation Table Maps). Darktama had been working to that end, when Intel developers came up with GEM (Graphics Execution Manager) after trying for a year to get TTM going well enough on integrated Intel graphics. In the process the TTM user API was removed, and now Nouveau and also Radeon are going to use GEM. Actually, GEM is little more than just an API, and it needs a backend, so TTM will still be used internally. Darktama says, that not much work got wasted in the transition, since TTM is still around.

Darktama is practically our memory manager developer and he puts all the time he can afford into making Nouveau use GEM (and TTM). The "ng" branch in the DDX git repository is a part of his GEM playground, where he is trying things out and figuring out a proper design. Other parts of that playground are http://cgit.freedesktop.org/~darktama/drm/ branch "ng", which is based on the DRM branch "modesetting-gem", and nouveau/mesa repository's "gallium-0.2-ng". Darktama does not yet know, if that work will ever be merged into "master", or does it need to be rewritten after he has learnt what needs to be done. He has worked with NV40 and says that apart from rough edges, it is already working "fairly OK". Performance is not at the same level as with the simple memory manager in "master", but the new work should solve the various out-of-memory errors the current Gallium triggers. It also makes the 3D code interact well with the 2D code, in that moving a 3D application around on screen does not leave trashy trails. All in all, darktama is fairly happy with it, but it is definitely not ready to be merged into Nouveau mainline.

1.7. Fruits of The Google Summer of Code 2008

Ymanton has been working on video decoding, especially XvMC, via shader instructions creating the reference implementation with Gallium3D as his Google Summer of Code (GSoC) project. Ymanton comments:

Since the summer is indeed over school has retaken most of ymanton's time, but he is still trying to determine why decoding is not as fast as expected. "Current consensus is that it's because we're using linear textures instead of tiled, so I've been trying to figure out how to get tiled textures and the related DMA functionality working." To do that we need to look at how we do DMA in Xv and the 2D driver and how we do it on NV50.

Ymanton wrote a simple OpenGL program that goes through the same steps as our decoding process and tested it on the blob. The blob "tears through 720p" at around 60 fps so we know the hardware (NV40) is capable of doing this.

It has been tested to not to crash on NV30, but without shader support it will not actually display anything. Ymanton does not have an NV50 card so he has not tested that, and he does not believe it is finished enough, yet.

Ymanton's Gallium3D video decoder currently gets 18-20 fps with the Nouveau driver on NV40. Xv is still a better option for now, as the CPU spikes because it does not get to 24 fps. 1080p has memory issues and will be worked on once 720p achieves reasonable performance.

1.8. Short Topics

That's it for this time, folks! Thank you for your continued interest, and please, turn some stones and try to find us a couple more developers, will you? :-)

<<< Previous Issue | Next Issue >>>


2013-03-24 13:16