Back to the OpenGL extension cross reference

GL_NV_vertex_program3


Name


    NV_vertex_program3

Name Strings


    GL_NV_vertex_program3

Contact


    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)

Status


    Shipping.

Version


    Last Modified Data:         $Date: 2004/05/17 $
NVIDIA Revision: 1

Number


    306

Dependencies


    ARB_vertex_program is required.
NV_vertex_program2_option is required.

Overview


    This extension, like the NV_vertex_program2_option extension,
provides additional vertex program functionality to extend the
standard ARB_vertex_program language and execution environment.
ARB programs wishing to use this added functionality need only add:

OPTION NV_vertex_program3;

to the beginning of their vertex programs.

New functionality provided by this extension, above and beyond that
already provided by NV_vertex_program2_option extension, includes:

* texture lookups in vertex programs,

* ability to push and pop address registers on the stack,

* address register-relative addressing for vertex attribute and
result arrays, and

* a second four-component condition code.

Issues


    Should we provided a separate "!!VP3.0" program type, like the
"!!VP2.0" type defined in NV_vertex_program2?

RESOLVED: No. Since ARB_vertex_program has been fully defined
(it wasn't in the !!VP2.0 time-frame), we will simply define
language extensions to !!ARBvp1.0 that expose new functionality.
The NV_vertex_program2_option specification followed this same
pattern for the NV3X family (GeForce FX, Quadro FX).

Should this be called "NV_vertex_program3_option"?

RESOLVED: No. The similar extension to !!ARBvp1.0 called
"NV_vertex_program2_option" got that name only because the simpler
"NV_vertex_program2" name had already been used.

Is there a limit on the number of texture units that can be accessed
by a vertex program?

RESOLVED: Yes -- same as MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB from
the ARB_vertex_shader extension. !!!

Since vertices don't have screen space partial derivatives, how is
the LOD used for texture accesses defined?

RESOLVED: The TXL instruction allows a program to explicitly
set an LOD; the LOD for all other texture instructions is zero.
The texture LOD bias specified in the texture object and environment
do apply to all vertex texture lookups.

New Procedures and Functions


    None.

New Tokens


    Accepted by the <pname> parameter of GetBooleanv, GetIntegerv,
GetFloatv, and GetDoublev:

MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB 0x8B4C


Additions to Chapter 2 of the OpenGL 1.4 Specification (OpenGL Operation)


    Modify Section 2.14.2, Vertex Program Grammar and Restrictions

(mostly add to existing grammar rules, as extended by
NV_vertex_program2_option)

<optionName> ::= "NV_vertex_program3"

<instruction> ::= <TexInstruction>

<ALUInstruction> ::= <ASTACKop_instruction>

<TexInstruction> ::= <TEXop_instruction>

<ASTACKop_instruction> ::= <ASTACKop> <instOperandAddrVNS>

<ASTACKop> ::= "PUSHA"
| "POPA"

<TEXop_instruction> ::= <TEXop> <instResult> "," <instOperandV> ","
<texTarget>

<TEXop> ::= "TEX"
| "TXP"
| "TXB"
| "TXL"

<texTarget> ::= <texImageUnit> "," <texTargetType>

<texImageUnit> ::= "texture" <optTexImageUnitNum>

<optTexImageUnitNum> ::= /* empty */
| "[" <texImageUnitNum> "]"

<texImageUnitNum> ::= <integer>
/*[0,MAX_TEXTURE_IMAGE_UNITS_ARB-1]*/

<texTargetType> ::= "1D"
| "2D"
| "3D"
| "CUBE"
| "RECT"

<attribVtxBasic> ::= "texcoord" "[" <arrayMemRel> "]"
| "attrib" "[" <arrayMemRel> "]"

<resultVtxBasic> ::= "texcoord" "[" <arrayMemRel> "]"

<ccMaskRule> ::= "EQ0"
| "GE0"
| "GT0"
| "LE0"
| "LT0"
| "NE0"
| "TR0"
| "FL0"
| "EQ1"
| "GE1"
| "GT1"
| "LE1"
| "LT1"
| "NE1"
| "TR1"
| "FL1"

(modify description of reserved identifiers)

... The following strings are reserved keywords and may not be used
as identifiers:

ABS, ADD, ADDRESS, ALIAS, ARA, ARL, ARR, ATTRIB, BRA, CAL, COS,
DP3, DP4, DPH, DST, END, EX2, EXP, FLR, FRC, LG2, LIT, LOG, MAD,
MAX, MIN, MOV, MUL, OPTION, OUTPUT, PARAM, POPA, POW, PUSHA, RCC,
RCP, RET, RSQ, SEQ, SFL, SGE, SGT, SIN, SLE, SLT, SNE, SUB, SSG,
STR, SWZ, TEMP, TEX, TXB, TXL, TXP, XPD, program, result, state,
and vertex.

Modify Section 2.14.3.1, Vertex Attributes

(add new bindings to binding table)

Vertex Attribute Binding Components Underlying State
------------------------ ---------- --------------------------------
...
vertex.texcoord[A+n] (s,t,r,q) indexed texture coordinate
vertex.attrib[A+n] (x,y,z,w) indexed generic vertex attribute

If a vertex attribute binding matches "vertex.texcoord[A+n]", where
"A" is a component of an address register (Section 2.14.3.5), a
texture coordinate number <c> is computed by adding the current
value of the address register component and <n>. The "x", "y",
"z", and "w" components of the vertex attribute variable are
filled with the "s", "t", "r", and "q" components, respectively,
of the vertex texture coordinates for texture unit <c>. If <c>
is negative or greater than or equal to MAX_TEXTURE_COORDS_ARB,
the vertex attribute variable is undefined.

If a vertex attribute binding matches "vertex.attrib[A+n]", where
"A" is a component of an address register (Section 2.14.3.5), a
vertex attribute number <a> is computed by adding the current value
of the address register component and <n>. The "x", "y", "z", and
"w" components of the vertex attribute variable are filled with the
"x", "y", "z", and "w" components, respectively, of generic vertex
attribute <a>. If <a> is negative or greater than or equal to
MAX_VERTEX_ATTRIBS_ARB, the vertex attribute variable is undefined.

Modify Section 2.14.3.4, Vertex Program Results

(add new binding to binding table)

Binding Components Description
----------------------------- ---------- ----------------------------
...
result.texcoord[A+n] (s,t,r,q) indexed texture coordinate

If a result variable binding matches "result.texcoord[A+n]", where "A"
is a component of an address register (Section 2.14.3.5), a texture
coordinate number <c> is computed by adding the current value of
the address register component and <n>. Updates to the "x", "y",
"z", and "w" components of the result variable set the "s", "t",
"r" and "q" components, respectively, of the transformed vertex's
texture coordinates for texture unit <c>. If <c> is negative or
greater than or equal to MAX_TEXTURE_COORDS_ARB, the effects of
updates to vertex attribute variable are undefined and may overwrite
other programs results.

Modify Section 2.14.3.X, Condition Code Registers (added in
NV_Vertex_program2_option)

The vertex program condition code registers are two four-component
vectors, called CC0 and CC1. Each component of this register is one
of four enumerated values: GT (greater than), EQ (equal), LT (less
than), or UN (unordered). The condition code register can be used
to mask writes to registers and to evaluate conditional branches.

Most vertex program instructions can optionally update one of the
two condition code registers. When a vertex program instruction
updates a condition code register, a condition code component is set
to LT if the corresponding component of the result is less than zero,
EQ if it is equal to zero, GT if it is greater than zero, and UN if
it is NaN (not a number).

The condition code registers are initialized to vectors of EQ values
each time a vertex program executes.

Modify Section 2.14.4, Vertex Program Execution Environment

(modify instruction table) There are forty-eight vertex program
instructions. Vertex program instructions may have up to eight
variants, including a suffix of "C" or "C0" to allow an update of
condition code register zero (section 2.14.3.X), a suffix of "C1"
to allow an update of condition code register one, and a suffix of
"_SAT" to clamp the result vector components to the range [0,1].
For example, the eight forms of the "ADD" instruction are "ADD",
"ADDC", "ADDC0", "ADDC1", "ADD_SAT", "ADDC_SAT", "ADDC0_SAT", and
"ADDC1_SAT". The instructions and their respective input and output
parameters are summarized in Table X.5.

Modifiers
Instruction C S Inputs Output Description
----------- - - ------ ------ --------------------------------
ABS X X v v absolute value
ADD X X v,v v add
ARA X - a a address register add
ARL X - s a address register load
ARR X - v a address register load (round)
BRA - - c - branch
CAL - - c - subroutine call
COS X X s ssss cosine
DP3 X X v,v ssss 3-component dot product
DP4 X X v,v ssss 4-component dot product
DPH X X v,v ssss homogeneous dot product
DST X X v,v v distance vector
EX2 X X s ssss exponential base 2
EXP X X s v exponential base 2 (approximate)
FLR X X v v floor
FRC X X v v fraction
LG2 X X s ssss logarithm base 2
LIT X X v v compute light coefficients
LOG X X s v logarithm base 2 (approximate)
MAD X X v,v,v v multiply and add
MAX X X v,v v maximum
MIN X X v,v v minimum
MOV X X v v move
MUL X X v,v v multiply
POPA - - - a pop address register
POW X X s,s ssss exponentiate
PUSHA - - a - push address register
RCC X X s ssss reciprocal (clamped)
RCP X X s ssss reciprocal
RET - - c - subroutine return
RSQ X X s ssss reciprocal square root
SEQ X X v,v v set on equal
SFL X X v,v v set on false
SGE X X v,v v set on greater than or equal
SGT X X v,v v set on greater than
SIN X X s ssss sine
SLE X X v,v v set on less than or equal
SLT X X v,v v set on less than
SNE X X v,v v set on not equal
SSG X X v v set sign
STR X X v,v v set on true
SUB X X v,v v subtract
SWZ X X v v extended swizzle
TEX X X v v texture lookup
TXB X X v v texture lookup with LOD bias
TXL X X v v texture lookup with explicit LOD
TXP X X v v projective texture lookup
XPD X X v,v v cross product

Table X.5: Summary of vertex program instructions. The columns
"C" and "S" indicate whether the "C", "C0", and "C1" condition code
update modifiers, and the "_SAT" saturation modifiers, respectively,
are supported for the opcode. "v" indicates a floating-point vector
input or output, "s" indicates a floating-point scalar input,
"ssss" indicates a scalar output replicated across a 4-component
result vector, "a" indicates a vector address register, and "c"
indicates a condition code test.

Rewrite Section 2.14.4.3, Vertex Program Destination Register Update

A vertex program instruction can optionally clamp the results of
a floating-point result vector to the range [0,1]. The components
of the result vector are clamped to [0,1] if the saturation suffix
"_SAT" is present in the instruction.

Most vertex program instructions write a 4-component result vector to
a single temporary or vertex result register. Writes to individual
components of the destination register are controlled by individual
component write masks specified as part of the instruction.

The component write mask is specified by the <optionalMask> rule
found in the <maskedDstReg> rule. If the optional mask is "",
all components are enabled. Otherwise, the optional mask names
the individual components to enable. The characters "x", "y",
"z", and "w" match the x, y, z, and w components respectively.
For example, an optional mask of ".xzw" indicates that the x, z,
and w components should be enabled for writing but the y component
should not. The grammar requires that the destination register mask
components must be listed in "xyzw" order. The condition code write
mask is specified by the <ccMask> rule found in the <instResultCC>
and <instResultAddrCC> rules. Otherwise, the selected condition
code register is loaded and swizzled according to the swizzle
codes specified by <swizzleSuffix>. Each component of the swizzled
condition code is tested according to the rule given by <ccMaskRule>.
<ccMaskRule> may have the values "EQ", "NE", "LT", "GE", LE", or "GT",
which mean to enable writes if the corresponding condition code field
evaluates to equal, not equal, less than, greater than or equal, less
than or equal, or greater than, respectively. Comparisons involving
condition codes of "UN" (unordered) evaluate to true for "NE" and
false otherwise. For example, if the condition code is (GT,LT,EQ,GT)
and the condition code mask is "(NE.zyxw)", the swizzle operation
will load (EQ,LT,GT,GT) and the mask will thus will enable writes on
the y, z, and w components. In addition, "TR" always enables writes
and "FL" always disables writes, regardless of the condition code.
If the condition code mask is empty, it is treated as "(TR)".

Each component of the destination register is updated with the result
of the vertex program instruction if and only if the component is
enabled for writes by both the component write mask and the condition
code write mask. Otherwise, the component of the destination register
remains unchanged.

A vertex program instruction can also optionally update the condition
code register. The condition code is updated if the condition
code register update suffix "C" is present in the instruction.
The instruction "ADDC" will update the condition code; the otherwise
equivalent instruction "ADD" will not. If condition code updates
are enabled, each component of the destination register enabled
for writes is compared to zero. The corresponding component of
the condition code is set to "LT", "EQ", or "GT", if the written
component is less than, equal to, or greater than zero, respectively.
Condition code components are set to "UN" if the written component is
NaN (not a number). Values of -0.0 and +0.0 both evaluate to "EQ".
If a component of the destination register is not enabled for writes,
the corresponding condition code component is also unchanged.

In the following example code,

# R1=(-2, 0, 2, NaN) R0 CC
MOVC R0, R1; # ( -2, 0, 2, NaN) (LT,EQ,GT,UN)
MOVC R0.xyz, R1.yzwx; # ( 0, 2, NaN, NaN) (EQ,GT,UN,UN)
MOVC R0 (NE), R1.zywx; # ( 0, 0, NaN, -2) (EQ,EQ,UN,LT)

the first instruction writes (-2,0,2,NaN) to R0 and updates the
condition code to (LT,EQ,GT,UN). The second instruction, only the
"x", "y", and "z" components of R0 and the condition code are updated,
so R0 ends up with (0,2,NaN,NaN) and the condition code ends up with
(EQ,GT,UN,UN). In the third instruction, the condition code mask
disables writes to the x component (its condition code field is "EQ"),
so R0 ends up with (0,0,NaN,-2) and the condition code ends up with
(EQ,EQ,UN,LT).

The following pseudocode illustrates the process of writing a
result vector to the destination register. In the pseudocode,
"instrSaturate" is TRUE if and only if result saturation is
enabled, "instrMask" refers to the component write mask given by
the <optWriteMask> rule. "ccMaskRule" refers to the condition code
mask rule given by <ccMask> and "updatecc" is TRUE if and only if
condition code updates are enabled. "result", "destination", and "cc"
refer to the result vector, the register selected by <dstRegister>
and the condition code, respectively. Condition codes do not exist
in the VP1 execution environment.

boolean TestCC(CondCode field) {
switch (ccMaskRule) {
case "EQ": return (field == "EQ");
case "NE": return (field != "EQ");
case "LT": return (field == "LT");
case "GE": return (field == "GT" || field == "EQ");
case "LE": return (field == "LT" || field == "EQ");
case "GT": return (field == "GT");
case "TR": return TRUE;
case "FL": return FALSE;
case "": return TRUE;
}
}

enum GenerateCC(float value) {
if (value == NaN) {
return UN;
} else if (value < 0) {
return LT;
} else if (value == 0) {
return EQ;
} else {
return GT;
}
}

void UpdateDestination(floatVec destination, floatVec result)
{
floatVec merged;
ccVec mergedCC;

// Clamp result components to [0,1] if requested in the instruction.
if (instrSaturate) {
if (result.x < 0) result.x = 0;
else if (result.x > 1) result.x = 1;
if (result.y < 0) result.y = 0;
else if (result.y > 1) result.y = 1;
if (result.z < 0) result.z = 0;
else if (result.z > 1) result.z = 1;
if (result.w < 0) result.w = 0;
else if (result.w > 1) result.w = 1;
}

// Merge the converted result into the destination register, under
// control of the compile- and run-time write masks.
merged = destination;
mergedCC = cc;
if (instrMask.x && TestCC(cc.c***)) {
merged.x = result.x;
if (updatecc) mergedCC.x = GenerateCC(result.x);
}
if (instrMask.y && TestCC(cc.*c**)) {
merged.y = result.y;
if (updatecc) mergedCC.y = GenerateCC(result.y);
}
if (instrMask.z && TestCC(cc.**c*)) {
merged.z = result.z;
if (updatecc) mergedCC.z = GenerateCC(result.z);
}
if (instrMask.w && TestCC(cc.***c)) {
merged.w = result.w;
if (updatecc) mergedCC.w = GenerateCC(result.w);
}

// Write out the new destination register and condition code.
destination = merged;
cc = mergedCC;
}

While this rule describes floating-point results, the same logic
applies to the integer results generated by the ARA, ARL, and ARR
instructions.

Add to Section 2.14.4.5, Vertex Program Options

Section 2.14.4.5.3, NV_vertex_program3 Program Option

If a vertex program specifies the "NV_vertex_program3" option, the
ARB_vertex_program grammar and execution environment are extended
to take advantage of all the features of the "NV_vertex_program2"
option, plus the following features:

* several new instructions:

* POPA -- pop address register off stack
* PUSHA -- push address register onto stack
* TEX -- texture lookup
* TXB -- texture lookup w/LOD bias
* TXL -- texture lookup w/explicit LOD
* TXP -- projective texture lookup

* address register-relative addressing for vertex texture
coordinate and generic attribute arrays,

* address register-relative addressing for vertex texture
coordinate result array, and

* a second four-component condition code.

Add to Section 2.14.5, Vertex Program Instruction Set

Section 2.14.5.43, POPA: Pop Address Register Stack

The POPA instruction generates a integer result vector by popping
an entry off of the call stack.

if (callStackDepth <= 0) {
terminate vertex program;
} else {
callStackDepth--;
if (callStack[callStackDepth] is an address register) {
iresult = callStack[callStackDepth];
} else {
terminate vertex program;
}
}

In the pseudocode, <callStackDepth> is the current depth of the call
stack and <callStack> is an array holding the call stack.

The vertex program terminates abnormally if it executes a POPA
instruction when the call stack is empty, or when the entry at the
top of the call stack is not an address register pushed by PUSHA.

Section 2.14.5.44, PUSHA: Push Address Register Stack

The PUSHA instruction pushes the address register operand onto the
call stack, which is also used for subroutine calls. The PUSHA
instruction does not generate a result vector.

tmp = AddrVectorLoad(op0);
if (callStackDepth >= MAX_PROGRAM_CALL_STACK_DEPTH_NV) {
terminate vertex program;
} else {
callStack[callStackDepth] = tmp;
callStackDepth++;
}

In the pseudocode, <callStackDepth> is the current depth of the call
stack and <callStack> is an array holding the call stack.

The vertex program terminates abnormally if it executes a PUSHA
instruction when the call stack is full.

Component swizzling is not supported when the operand is loaded.

Section 2.14.5.45, TEX: Texture Lookup

The TEX instruction uses the single vector operand to perform a
lookup in the specified texture map, yielding a 4-component result
vector containing filtered texel values. The (s,t,r,q) coordinates
used for the texture lookup are (x,y,z,1), where x, y, and z are
components of the vector operand.

tmp = VectorLoad(op0);
result = TextureSample(tmp.x, tmp.y, tmp.z, 1.0, 0.0, unit, target);

where <unit> and <target> are the texture image unit number and
target type, matching the <texImageUnitNum> and <texTargetType>
grammar rules.

The resulting sample is mapped to RGBA as described in Table 3.21,
and the R, G, B, and A values are written to the x, y, z, and w
components, respectively, of the result vector.

Since partial derivatives of the texture coordinates are not defined,
the base LOD value for vertex texture lookups is defined to be
zero. The value of lambda' used in equation 3.16 will be simply
clamp(texobj_bias + texunit_bias).

Section 2.14.5.46, TXB: Texture Lookup (With LOD Bias)

The TXB instruction uses the single vector operand to perform a
lookup in the specified texture map, yielding a 4-component result
vector containing filtered texel values. The (s,t,r,q) coordinates
used for the texture lookup are (x,y,z,1), where x, y, and z are
components of the vector operand. The w component of the operand
is used as an additional LOD bias.

tmp = VectorLoad(op0);
result = TextureSample(tmp.x, tmp.y, tmp.z, 1.0, tmp.w, unit, target);

where <unit> and <target> are the texture image unit number and
target type, matching the <texImageUnitNum> and <texTargetType>
grammar rules.

The resulting sample is mapped to RGBA as described in Table 3.21,
and the R, G, B, and A values are written to the x, y, z, and w
components, respectively, of the result vector.

Since partial derivatives of the texture coordinates are not defined,
the base LOD value for vertex texture lookups is defined to be
zero. The value of lambda' used in equation 3.16 will be simply
clamp(texobj_bias + texunit_bias + tmp.w).

Since the base LOD value is zero, the TXB instruction is completely
equivalent to the TXL instruction, where the w component contains
an explicit base LOD value.

Section 2.14.5.47, TXL: Texture Lookup (With Explicit LOD)

The TXL instruction uses the single vector operand to perform a
lookup in the specified texture map, yielding a 4-component result
vector containing filtered texel values. The (s,t,r,q) coordinates
used for the texture lookup are (x,y,z,1), where x, y, and z are
components of the vector operand. The w component of the operand
is used as the base LOD for the texture lookup.

tmp = VectorLoad(op0);
result = TextureSampleLOD(tmp.x, tmp.y, tmp.z, 1.0, tmp.w, unit, target);

where <unit> and <target> are the texture image unit number and
target type, matching the <texImageUnitNum> and <texTargetType>
grammar rules.

The resulting sample is mapped to RGBA as described in Table 3.21,
and the R, G, B, and A values are written to the x, y, z, and w
components, respectively, of the result vector.

The value of lambda' used in equation 3.16 will be simply tmp.w +
clamp(texobj_bias + texunit_bias), where tmp.w is the base LOD.

Section 2.14.5.48, TXP: Texture Lookup (Projective)

The TXP instruction uses the single vector operand to perform a
lookup in the specified texture map, yielding a 4-component result
vector containing filtered texel values. The (s,t,r,q) coordinates
used for the texture lookup are (x,y,z,w), where x, y, z, and w are
the four components of the vector operand.

tmp = VectorLoad(op0);
result = TextureSample(tmp.x, tmp.y, tmp.z, tmp.w, 0.0, unit, target);

where <unit> and <target> are the texture image unit number and
target type, matching the <texImageUnitNum> and <texTargetType>
grammar rules.

The resulting sample is mapped to RGBA as described in Table 3.21,
and the R, G, B, and A values are written to the x, y, z, and w
components, respectively, of the result vector.

Since partial derivatives of the texture coordinates are not defined,
the base LOD value for vertex texture lookups is defined to be
zero. The value of lambda' used in equation 3.16 will be simply
clamp(texobj_bias + texunit_bias).

Additions to Chapter 3 of the OpenGL 1.4 Specification (Rasterization)


    None.

Additions to Chapter 4 of the OpenGL 1.4 Specification (Per-Fragment Operations and the Frame Buffer)


    None.

Additions to Chapter 5 of the OpenGL 1.4 Specification (Special Functions)


    None.

Additions to Chapter 6 of the OpenGL 1.4 Specification (State and State Requests)


    None.

Additions to Appendix A of the OpenGL 1.4 Specification (Invariance)


    None.

Additions to the AGL/GLX/WGL Specifications


    None.

Dependencies on ARB_vertex_program


    ARB_vertex_program is required.

This specification and NV_vertex_program2_option are based on a
modified version of the grammar published in the ARB_vertex_program
specification. This modified grammar includes a few structural
changes to better accommodate new functionality from this and
other extensions, but should be functionally equivalent to the
ARB_vertex_program grammar. See NV_vertex_program2_option for
details on the base grammar.

Dependencies on NV_vertex_program2_option


    NV_vertex_program2_option is required.  

If the NV_vertex_program3 program option is specified, all
the functionality described in both this extension and the
NV_vertex_program2_option specification is available.

Errors


    None.

New State


    None.

Revision History


    None

Implementation Support


   List of OpenGL implementations supporting the GL_NV_vertex_program3 extension

Original File


   Original text file for the GL_NV_vertex_program3 extension


Page generated on Sun Nov 20 18:40:21 2005