Instructions

The basic unit of computation in NIR is the instruction. An instruction can be one of the various types listed below. Each instruction type is a derived class of nir_instr. Instructions occur in basic blocks; each basic block consists of a list of instructions which is executed from beginning to end.

ALU Instructions

ALU instructions represent simple operations, such as addition, multiplication, comparison, etc., that take a certain number of arguments and return a result that only depends on the arguments. A good rule of thumb is that only things which can be constant folded should be ALU operations. If it can’t be constant folded, then it should probably be an intrinsic instead.

ALU operations are typeless, meaning that they’re only defined to convert a certain bitpattern input to another bitpattern output. intBitsToFloat() and friends are implicit. Boolean true is defined to be ~0 (NIR_TRUE) and false is defined to be NIR_FALSE.

Each ALU instruction has an opcode, which is a member of an enum (nir_op) that describes what it does as well as how many arguments it takes. Associated with each opcode is an info structure (nir_op_info), which shows how many arguments the opcode takes as well as information such as whether the opcode is commutative (op a b == op b a) or associative ((op (op a b) c) == (op a (op b c))). The info structure for each opcode may be accessed through a global array called nir_op_infos that’s indexed by the opcode.

Even though ALU operations are typeless, each opcode also has an “ALU type” which can be floating-point, boolean, integer, or unsigned integer. The ALU type mainly helps backends which use the absolute value, negate, and saturate modifiers (normally not used by core NIR) – there’s some generic infrastructure in NIR which will fold iabs and ineg operations into integer sources, as well as fabs and fneg for floating-point sources, although most core NIR optimizations will assume that they are kept separate. In addition, if an operation takes a boolean argument, then the argument may be assumed to be either NIR_TRUE or NIR_FALSE, and if an operation’s result has a boolean type, then it may only produce only NIR_TRUE or NIR_FALSE.

ALU opcodes also have the notion of size, or the number of components. ALU opcodes are either non-per-component, in which case the destination as well as each of the arguments are explicitly sized, or per-component. Per-component opcodes have the destination size as well as at least one of the argument sizes set to 0. The sources with their size set to 0 are known as the per-component sources. Conceptually, for per-component instructions, the destination is computed by looping over each component and computing some function which depends only on the matching component of the per-component sources as well as possibly all the components of the non-per-component sources. In pseudocode:

for each component "comp":
    dest.comp = some_func(per_comp_src1.comp, per_comp_src2.comp, ...,
                          non_per_comp_src)

Both the info table entry and the enum values are generated from a Python script called nir_opcodes.py which, when imported, creates an opcodes list which contains objects of the Opcode class. Inside nir_opcodes.py, opcodes are created using the opcode function, which constructs the object and adds it to the list, as well as various helper functions which call opcode. For example, the following line in nir_opcodes.py:

binop("fmul", tfloat, commutative + associative, "src0 * src1")

creates a declaration of a nir_op_fmul member of the nir_op enum, which is defined in the generated file nir_opcodes.h, as well as the following entry in the nir_op_infos array (defined in nir_opcodes.c):

{
   .name = "fmul",
   .num_inputs = 2,
   .output_size = 0,
   .output_type = nir_type_float,
   .input_sizes = {
      0, 0
    },
    .input_types = {
      nir_type_float, nir_type_float
    },
    .algebraic_properties =
        NIR_OP_IS_COMMUTATIVE | NIR_OP_IS_ASSOCIATIVE
},

The src0 * src1 part of the definition isn’t just documentation; it’s actually used to generate code that can constant fold the operation. Currently, every ALU operation must have a description of how it should be constant-folded, which makes documenting the operation (including any corner cases) much simpler in most cases, as well as obviating the need to deal with per-component and non-per-component subtleties – the pseudocode above is implemented for you, and all you have to do is write the some_func. In this case, the definition of fmul also creates the following code in nir_constant_expressions.c:

static nir_const_value
evaluate_fmul(unsigned num_components, nir_const_value *_src)
{
   nir_const_value _dst_val = { { {0, 0, 0, 0} } };


      for (unsigned _i = 0; _i < num_components; _i++) {
               float src0 = _src[0].f[_i];
               float src1 = _src[1].f[_i];

            float dst = src0 * src1;

            _dst_val.f[_i] = dst;
      }

   return _dst_val;
}

as well as the following case in nir_eval_const_opcode:

case nir_op_fmul: {
   return evaluate_fmul(num_components, src);
   break;
}

For more information on the format of the constant expression strings, see the documentation for the Opcode class in nir_opcodes.py.

Intrinsic Instructions

Intrinsics are like the stateful sidekicks to ALU instructions; they include mainly various different kinds of loads/stores, as well as execution barriers. Similar to ALU instructions, there is an enum of opcodes (nir_intrinsic_op) as well as a table containing information for each opcode (nir_intrinsic_infos). Intrinsics may or may not have a destination, and they may also include 1 or more constant indices (integers). Also similar to ALU instructions, both destinations and sources include a size that’s part of the opcode, and both may be made per-component by setting their size to 0, in which case the size is obtained from the num_components field of the instruction. Finally, intrinsics may include one or more variable dereferences, although these are usually lowered away before they reach the driver.

Unlike ALU instructions, which can be freely reordered and deleted as long as they still produce the same result and satisfy the constaints imposed by SSA form, intrinsics have a few rules regarding how they may be reordered. Currently, they’re rather conservative, but it’s expected that they’ll get more refined in the future. There are two flags that are part of nir_intrinsic_infos: NIR_INTRINSIC_CAN_REORDER and NIR_INTRINSIC_CAN_DELETE. If an intrinsic can be reordered, then it can be reordered with respect to any other instruction; to prevent two intrinsics from being reordered with respect to each other, both must not have “can reorder.” If an intrinsic can be deleted, then its only dependencies are on whatever uses its result, and if it’s unused then it can be deleted. For example, if two intrinsic opcodes are for reading and writing to a common resource, then the store opcode should have neither flag set, and the load instruction should have only the “can delete” flag set. Note that load instructions can’t be reordered with respect to each other, and both load and store instructions can’t be reordered with respect to other loads/stores to resources which don’t alias with the resource you’re reading/writing; this is a deficiency of the model, which is expected to change when more sophisticated analyses are implemented.

Two especially important intrinsics are load_var and store_var, through which all loads and stores to variables occur. Most accesses (besides accesses to textures and buffers) to variables happen through these instructions in core NIR, although they can be lowered to loads/stores to registers, inputs, outputs, etc. with actual indices before they reach the backend.

Unlike ALU instructions, intrinsics haven’t yet been converted to the new Python way of specifing opcodes. Instead, intrinsic opcodes are defined in a header file, nir_intrinsics.h, which expands to a series of INTRINSIC macros. nir_intrinsics.h is included twice, once in nir.h to create the nir_intrinsic_op, and another time in nir_intrinsics.c to create the nir_intrinsic_infos array. For example, here’s the definition of the store_var intrinsic:

INTRINSIC(store_var, 1, ARR(0), false, 0, 1, 0, 0)

This says that store_var has one source of size 0 (and thus is per-component), has no destination, one variable, no indices, and no semantic flags (it can’t be reordered and can’t be deleted). It creates the nir_intrinsic_store_var enum member, as well as the corresponding entry in nir_intrinsic_infos.

Call Instructions

Call instructions in NIR are pretty simple. They contain a pointer to the overload that they reference. Arguments are passed through dereferences, which may be copied from, copied to, or both depending on whether the matching parameter in the overload is an input, and output, or both. In addition, there’s a return dereference (NULL for functions with void return type) which gets overwritten with the return value of the function.

Jump Instructions

A jump instruction in NIR is a break, a continue, or a return. Returns don’t include a value; instead, functions that return a value instead fill out a specially-designated variable which is the return variable. For more information, see Control Flow.

Texture Instructions

Even though texture instructions could be supported as intrinsics, the vast number of combinations mean that doing so is practically impossible. Instead, NIR has a dedicated texture instruction. There’s still an array of sources, except that each source also has a type associated with it. There are various source types, each corresponding to a piece of information that the different texture operations require. There can be at most one source of each type. In addition, there are several texture operations:

  • nir_texop_tex: normal texture lookup.
  • nir_texop_txb: texture lookup with LOD bias.
  • nir_texop_txl: texture look-up with explicit LOD.
  • nir_texop_txd: texture look-up with partial derivatvies.
  • nir_texop_txf: texel fetch with explicit LOD.
  • nir_texop_txf_ms: multisample texture fetch.
  • nir_texop_txs: query texture size.
  • nir_texop_lod: texture lod query.
  • nir_texop_tg4: texture gather.
  • nir_texop_query_levels: texture levels query.

It’s assumed that frontends will only insert the source types that are needed given the sampler type and the operation.

Like a lot of other resources, there are two ways to represent a sampler in NIR: either using a variable dereference, or as an index in a single flat array. When using an index, there is various information stored in the texture instruction itself so that backends which need to know the type of the sampler, whether it’s a cube or array sampler, etc. can have that information even in the lowered form.

Constant-Load Instructions

This instruction creates a constant SSA value. Note that writing to a register isn’t supported; instead, you can use a constant load instruction plus a move to a register.

Undef Instructions

Creates an undefined SSA value. At each use of the value, each of the bits can be assumed to be whatever the implementation or optimization passes deem convenient. Similar in semantics to a register that’s read before its written.

Phi Instructions

From Instructions.h in LLVM:

// PHINode - The PHINode class is used to represent the magical mystical PHI
// node, that can not exist in nature, but can be synthesized in a computer
// scientist's overactive imagination.

Phi nodes contain a list of sources matched to predecessor blocks, where there must be one source for each predecessor block. Conceptually, when a certain predecessor block branches to the block with the phi node, the source corresponding to the predessor block is copied to the destination of the phi node. If there’s more than one phi node in a block, then this process happens in parallel. Phi nodes must be at the beginning of a block, i.e. each block must consist of any phi instructions followed by any non-phi nodes.

Parallel Copy Instructions

Copies a list of registers or SSA values to another list of registers or SSA values in parallel. Only used internally by the from-SSA pass.