If you do not understand the terms mentioned here head to IntroductoryCourse.
The card generations
First of all, there are a lot of different card models, grouped into 8 card generations for our purposes. The cards are identified by a few different names: the marketing name ("GeForce 8800 GTX"), the code name that nvidia uses in their documentation ("G80"), and the real code number ("NV50"). The code number is embedded directly in hardware, and it's the important thing for identifying your card for programming purposes. For the first cards, roughly up to NV3x, nvidia code name was equal to the real code number of the cards, later it diverged. Note that the same real code number can correspond to several marketing names: this means that these cards use the same GPU core, but differ in some other characteristic, like bus size, memory size, number of working shaders/ROPs, etc.
Generation name |
Model code numbers |
Commercial names |
Notes |
NV01 |
Diamond Edge 3D |
The first nv card ever. Had 3d drawing engine using quadratic curves instead of polygons. Nobody used it, nobody bought it... don't really bother. Not supported by nouveau. |
|
NV02 |
? |
Quadratic curves taken even further, never really completed. A bad joke. |
|
NV03 |
RIVA 128 |
First serious, polygon-based 3d card. Can do DX5. Very rare now. Not supported by nouveau. |
|
NV04 |
NV04, NV05 |
RIVA TNT, TNT2 |
Introduced DMA FIFOs, enough hardware 3d support to do DX6 (multitex and stencil), and objects as we know them. First card supported by nouveau. |
NV10 |
NV10, NV11, NV15, NV17, NV18 |
GeForce 256, GeForce 2, GeForce 4 MX |
Introduced hardware TCL |
NV20 |
NV20, NV25, NV28, NV2A |
GeForce 3, GeForce 4 Ti |
First shaders |
NV30 |
NV30, NV31, NV34, NV35, NV36 |
GeForce FX |
First serious shaders |
NV40 |
NV4x, NV67 |
GeForce 6xxx, GeForce 7xxx |
Only shaders now, fixed-function removed |
NV50 |
NV50, NV8x, NV9x, NVAx |
GeForce 8xxx, GeForce 9xxx, GeForce 1xx, GeForce 2xx |
Unified shader architecture, CUDA support. Major pieces of card architecture redone, a lot of things changed. Only one of the old objects survived. |
PCI BARs
NV cards have 2 or 3 BARs, all of them memory spaces:
- BAR 0: control registers. 16MB in size. Is divided into several areas for each of the functional blocks of the card.
- BAR 1: VRAM. On pre-NV50, corresponds directly to the available VRAM on card. On NV50, gets remapped through VM engine.
- BAR 2: PRAMIN. This bar exists only on NV40 and newer and gives you access to the whole of PRAMIN. On NV50, gets remapped through VM engine.
The functional blocks
The control BAR contains several regions of MMIO registers, corresponding roughly to functional units of the card:
Address |
Name |
Description |
0x000000-0x000fff (NV04-NV40), 0x000000-0x001fff (NV50) |
PMC |
Master Control. Contains registers that apply to the card as a whole, like master interrupt enable/status, card version, etc. |
0x001000-0x001fff (NV04-NV40), 0x088000-0x088fff (NV50) |
PBUS |
Bus-related configuration. Simply an alias for PCI configuration space. |
0x002000-0x0037ff |
PFIFO |
Responsible for queueing of commands to PGRAPH and possibly other execution engines. |
0x003800-0x003fff (NV04-NV30), 0x090000-0x090fff (NV40-NV50) |
PFIFO_CACHE1 |
Another part of PFIFO. |
0x008000-0x008fff (NV10-original NV40) |
PVIDEO |
Video overlay. |
0x009000-0x009fff |
PTIMER |
A timer. can count time. and do timer interrupts. |
0x00d000-0x00dfff |
PTV |
FILLME |
0x00e000-0x00efff (NV50 only) |
PCONNECTOR |
FILLME |
0x0c0000-0x0c0fff |
PRMVIO |
Mapped to VGA registers: 0x3c2, 0x3c3, 0x3c4, 0x3c5, 0x3ce, 0x3cf |
0x100000-0x100fff |
PFB |
FrameBuffer control. Controls the VRAM configuration. |
0x101000-0x101fff |
PEXTDEV |
Interfacing with external devices, reading straps |
0x300000-0x30ffff |
PROM |
Contains a copy of Video BIOS |
0x400000-0x40ffff |
PGRAPH |
The graphics engine. Does all acceleration, including 3d, 2d, CUDA, and even memory blits. |
0x600000-0x600fff |
PCRTC0 |
CRTC setup |
0x601000-0x601fff |
PRMCIO |
CRTC setup. Mapped to VGA registers: 0x3c0, 0x3c1, 0x3c2, 0x3d4, 0x3d5, 0x3da |
0x610000-0x61ffff (NV50 only) |
PDISPLAY |
NV50 modesetting FIFO setup |
0x640000-0x64ffff (NV50 only) |
PDISPLAY_USER |
NV50 modesetting FIFO submission |
0x680000-0x680fff |
PRAMDAC |
RAMDAC setup. On NV04, also contains overlay control. |
0x681000-0x681fff |
PRMDIO |
RAMDAC setup. Mapped to VGA registers: 0x3c6, 0x3c7, 0x3c8, 0x3c9. |
0x700000-0x7fffff (NV04 and up), 0xc00000-0xcfffff (NV03) |
PRAMIN |
Instance memory, see below |
0x800000-0x8fffff (NV04), 0x800000-0x9fffff (NV10-NV30), 0x800000-0x81ffff (NV40), 0xc00000-0xcfffff (NV50) |
FIFO |
PFIFO user submission interface |
PMC
PMC is Master Control, a block of registers used for stuff that don't fit anywhere else, or apply to the whole card.
Address |
Name |
Description |
0x0000 |
PMC_BOOT_0 |
Reports card chipset and stepping |
0x0004 (NV10+) |
PMC_BOOT_1 |
Selects big/little endian mode for the card |
0x0100 |
PMC_INTR_0 |
Shows which functional units have pending IRQ |
0x0140 |
PMC_INTR_EN_0 |
Selects which functional units can cause IRQs |
0x0160 |
PMC_INTR_READ_0 |
??? |
0x0200 |
PMC_ENABLE |
Enables other functional units |
0x1540 (NV40) |
(not named yet) |
Enables/disables individual vertex/pixel shader units |
0x1540 (NV50) |
(not named yet) |
Enables/disables MPs, TPs, and ROPs |
0x15f0 (NV40 and up) |
PMC_BACKLIGHT |
Controls backlight on laptops |
0x1700 (NV40) |
??? |
??? |
0x1704 (NV40) |
??? |
??? |
0x1708 (NV40) |
??? |
??? |
0x170c (NV40) |
??? |
??? |
0x1700 (NV50) |
PMC_BAR0_PRAMIN |
Physical VRAM address of window that PRAMIN points to, shifted right by 16 bits. |
0x1704 (NV50) |
??? |
??? |
0x1708 (NV50) |
??? |
??? |
0x170c (NV50) |
??? |
??? |
0x1710 (NV50) |
??? |
??? |
0x1900-0x191c (NV50) |
??? |
??? |
PMC_BOOT_0
This tells you what GPU you have. For maximum enjoyment, it has different format for NV01-NV03, NV04-NV05 and NV10+. Yay.
If bits 24-27 are non-0, you have NV10 or better, with the following format:
- Bits 0-7: card stepping.
- Bits 16-19: card minor revision?
- Bits 20-27: The chipset code number. The important one that goes after NV.
If bits 12-15 are non-0, you have one of NV04 family cards:
- Bits 12-15: Always 4.
- Bits 16-19: Minor card revision.
- Bits 20-23: Major card revision. 0 is NV04, non-0 are NV05+.
- Bits 24-27: Manufacturer. 0: nvidia. Thankfully no companies other than nvidia make nvidia GPUs, or we'd be royally screwed when figuring out which BOOT_0 format to use.
- Bits 28-31: Foundry. 0: SGS, 1: Helios, 2: TSMC.
In other cases, you have NV01-NV03.
- Bits 12-15: Always 0
- Bits 16-19: Card architecture. 1 is NV01, 2 is NV02, 3 is NV03.
- Bits 24-27: Manufacturer. 0: nvidia.
- Bits 28-31: Foundry. 0: SGS, 1: Helios, 2: TSMC.
PMC_BOOT_1
Available on NV10 and later. Endian switch. Write 0x00000001 in your favorite endianness to switch the card to this endianness. This switch, if set to big-endian mode, causes all accesses to MMIO BAR to be byte-swapped [xor all byte addresses with 3], except for areas containing 8-bit legacy VGA registers. This includes PRAMIN accesses. Accesses to framebuffer BAR aren't affected.
When read, returns 0 if in little-endian mode, 0x01000001 if in big-endian mode.
TODO: Check NV40 PRAMIN BAR and NV50. Check how to byte-swap PFIFO pushbuffers.
PMC_INTR and PMC_INTR_EN
The functional blocks of the card can report interrupts for various reasons. PMC_INTR reports which of them are currently reporting interrupts, After you service an interrupt, you have to write 1 to corresponding PMC_INTR bit to clear its status.
The bit assignments for PMC_INTR:
- bit 8: PFIFO
- bit 12: PGRAPH
- bit 20: PTIMER
- bit 24: PCRTC0
- bit 25: PCRTC1
- bit 26: PDISPLAY (NV50 only)
The PMC_INTR_EN register enables/disables reporting interrupts by PCI IRQ line. Write 1 to enable interrupts, 0 to disable.
PMC_ENABLE
Used to turn the functional units on/off.
- bit 8: PFIFO
- bit 12: PGRAPH
- bit 20: PFB
- bit 24: PCRTC0
- bit 25: PCRTC1
- bit 26: PDISPLAY (NV50 only)
- bit 28: PVIDEO
The VRAM and the RAMIN
The Video memory is used to store all kinds of data used by the card: scanout buffers, video input buffers, render targets, textures, pixmaps, vertex buffers, shader code, command buffers, as well as some management objects. Video memory is split into normal memory and RAMIN, also known as instance memory. RAMIN is used to contain the card management objects (usually accessible only to the kernel), normal memory is for objects that normal applications access.
Stuff stored in RAMIN: RAMHT, RAMRO, RAMFC, NvObjectTypes|DMA/graph objects, PGRAPH context (NV20+), channel setup and page tables (NV50+). Also, VBIOS copies its own image to first 64kiB of PRAMIN when it boots (or to area initially pointed to by PMC+0x1700, in case of NV50).
NV03 - NV40
On NV03 through NV40: RAMIN starts from end of VRAM, and is simply last 1MB (or 32MB on NV40) of memory, except addressed in reverse. The unit of reversal is 16 bytes on NV04 and NV10, 512kiB on NV40. No idea on other cards. This means that address 0x12345 in RAMIN on NV04 card with 32MB of memory is actually address 0x1ffedc5 in VRAM. Or, if you prefer a formula:
real VRAM address = VRAM_size - (ramin_address - (ramin_address % reversal_unit_size)) - reversal_unit_size + (ramin_address % reversal_unit_size)
The PRAMIN window at 0x700000-0x7fffff (or 0xc00000-0xcfffff on NV03, presumably) in BAR 0 is mapped to first 1MB of RAMIN, and uses RAMIN addresses (that is, reversed wrt real VRAM addresses).
On NV40, BAR 2 is also mapped to the RAMIN using RAMIN addresses, and is 32MB long. Before NV40, RAMIN addresses used by other pieces of card max out at 1MB. No idea if you're allowed to go past BAR 2 size on NV40.
BAR 1 is simply mapped to the whole VRAM. If VRAM is >256MB, only first 256MB are accessible directly by the CPU, and you're basically out of luck.
NV50
On NV50, on the other hand, things are completely different. RAMIN is no longer a separate area of memory, and the management objects previously in RAMIN can now be placed anywhere in VRAM. Address reversal is gone too.
NV50 introduces virtual memory: the addresses that GPU uses are virtual, and the memory management unit maps them to actual physical addresses in VRAM by page-based address translation. This remapping applies to both GPU accesses from graphics engine (with per-channel page tables) and PCI BARs 1 and 2 (with another page table sets). The only difference between RAMIN objects and normal objects is that RAMIN objects are specified by their physical addresses directly.
However, the PRAMIN mapping in BAR0 can now be used to access any part of video memory by its physical address -- it's used as a 1MB window to physical memory, and its base can be set to any 64kB-aligned address using PMC register 0x1700 (see above).