Surface Layouts

A surface's layout refers to how pixels are mapped to memory locations. This page discusses the different layouts supported by Nvidia hardware and how to deal with them. Some of the details are educated guesses and may or may not be accurate.

Nvidia hardware generally supports 3 types of layouts (that we know of):

Linear
Tiled
Swizzled

Linear

Linear surfaces are the simplest formats to understand. Pixels are arranged horizontally, one after the other, in rows. Linear surfaces have good spatial locality in one dimension (the x-axis), but not the other; moving 1 pixel along the y-axis produces a relatively large address displacement. The wider the surface is, the larger the displacement. Rows may have some padding at the end for alignment purposes.

Addressing:

bpp = bytes per pixel
S_w = surface width (including padding)
Pixel(x, y) = ( y * S_w + x ) * bpp

Linear
(32x32 image)

Tiled

Tiled surfaces are similar to linear surfaces, except that instead of the entire surface being stored linearly, each tile is stored linearly, and tiles are arranged horizontally, one after the other, in rows. Tiled surfaces give more spatial locality, making better use of caches and prefetching. This is because the tiles are not nearly as wide as the entire surface; therefore moving 1 pixel along the y-axis produces a much smaller (and constant) displacement. However, crossing tile boundaries (especially along the y-axis) will produce larger displacements. The typical tile size on Nvidia hardware is 16x16. Tiled surface dimensions must be a multiple of the tile dimensions obviously.

Addressing:

T_w = tile width
T_h = tile height
T_s = T_w * T_h
T_x = x / T_w
T_y = y / T_h
T_px = x % T_w
T_py = y % T_h
S_tw = S_w / T_w
Tile(x, y) = ( T_y * S_tw + T_x ) * T_s
Pixel(x, y) = [ Tile(x, y) + T_py * T_w + T_px ] * bpp

Tiled 16x16
(32x32 image, 16x16 tiles)

Swizzled

Swizzled surfaces are similar to tiled surfaces, but offer even more spatial locality due to their recursive nature. Displacement along either axis starts out very small and gets progressively larger. See Wikipedia for images of the access pattern and a more formal discussion: Z-order (curve). Swizzled surface dimensions must be powers of two.

Note that this has little to do with shuffling the components of a vector register in shader programs and such, which is also called swizzling. Probably the term is used in this case because we are shuffling the bits of pixel x/y coordinates to generate addresses.

Addressing:

SwizzleBits(x, y) = y_n x_n y_n-1 x_n-1 ... y₁ x₁ y₀ x₀
Pixel(x, y) = SwizzleBits(x, y) * bpp

Swizzled
(32x32 image)

NV04-NV40

Individual surfaces referenced by the 3D engine have a format which can be swizzled or not swizzled. NV_SCALED_IMAGE_FROM_MEMORY can be used to swizzle an image while copying it, although it seems to have a size limit of 1024x1024; larger surfaces need to be copied in chunks. In general swizzling large surfaces in this way does seem to be slow in comparison to rendering to a swizzled surface using the 3D engine.

Tiled surfaces on the other hand seem to be set up completely differently. There appear to be MTRR-like registers in the MMIO space, where you can specify an offset+len pair to mark a region of memory as tiled. The number of registers available depends on which chip we're dealing with. Tiled memory regions don't appear linear to the CPU like they do on some other hardware, since we can dump the blob's framebuffer and see that it is obviously tiled. All hardware units reading from/writing to memory probably know to check the tile regs, so most likely you can DMA or render to/from tiled without doing anything more than usual. (What do you get if you copy+swizzle a texture to a tiled region?)

Swizzled surfaces are probably only useful as textures and render targets, not as back or front buffers, partly because they have to be POT, and most textures already are POT, and partly because they probably can't be used as scanout buffers.