Control Flow

Background

Traditional Compilers

In most IR’s, functions consist of basic blocks (often shortened to just blocks), which are a series of instructions that always execute linearly from the beginning to the end, and the edges that connect them. Each basic block ends with a jump or branch that transfers execution either unconditionally to one basic block or conditionally to two (or sometimes more) blocks. Together, the basic blocks and edges comprise the Control Flow Graph or CFG. For example, this snippet of code:

while (foo) {
    ...
    if (baz) {
        ...
    } else {
        ...
    }
    if (bar) {
        ...
    }
}
...

could be translated to a CFG that looks like:

digraph {

    post [label="..."];

    /* force 'post' to be at the bottom*/
    {rank="sink" post}

    header [label="foo?"];
    block1 [label="...\nbaz?"];
    then [label="...\nbar?"];
    else [label="...\nbar?"];
    block2 [label="...\nfoo?"];

    header -> block1;
    header -> post;
    block1 -> then;
    block1 -> else;
    then -> header;
    else -> header;
    then -> block2;
    else -> block2;
    block2 -> block1;
    block2 -> post;
}

Note that the CFG is a little different from the original code, since it’s been optimized somewhat; for example, we’ve folded the second if into the then and else branches of the first. This is a good thing for traditional, scalar, hardware, but as we’ll see, these types of optimizations are usually unnecessary and sometimes harmful for GPU’s. However, this is the standard model that most literature on compiler theory assumes.

GPU’s

A unique feature of most GPU’s is that they are designed to run many different instances of the same program in lock step in order to reduce the size of the scheduling hardware by sharing it between many different “cores.” When control flow diverges, meaning that when two different instances (fragments, vertices, etc.) branch in different directions, then the GPU will take both sides of the branch. For example, if both thread 1 and thread 2 are currently in block A, and thread 1 wants to branch to block B while thread 2 wants to branch to block C, then the GPU will first branch to block B with thread 1 enabled and thread 2 disabled, and then when execution reaches a predefined “merge block,” the GPU will jump back to block C while flipping the enabled threads and run until the merge block is reached, at which point the control flow has converged and both thread 1 and thread 2 can be enabled.

Although most GPU’s do something like this internally, the details of how it works can vary greatly from vendor to vendor and generation to generation. Some GPU’s, such as the Mali T6xx series, give each instance a separate program counter and don’t keep track of divergance at all! All Intel chips have instructions such as IF, ELSE, ENDIF, and WHILE, and on all current generations, they’re faster than the branch-with-merge instructions described above (although they do a similar thing under the hood). Also, the rules as to when control flow re-converges can vary based on the implementation as well as choices made in the backend.

There’s one place besides modifying control-flow that these details matter: cross-channel operations. There are a few operations that GPU’s can do where separate instances (sometimes also called channels or lanes) can share information. One important example is the operation of taking a derivative in screen-space in fragment shaders, where the value of the input is exchanged between fragments in a 2x2 block. Derivatives can be taken either explicitly, or implicitly in order to calculate the appropriate level of detail when sampling from a texture. Source languages such as GLSL require that these derivatives be taken in uniform control flow, meaning that every channel must be enabled or disabled together, in order that the inputs to the derivative are well-defined, and most backends take advantage of this guarantee. Therefore, optimizations in any intermediate IR must respect this invariant, which means that the IR must have a good idea of when control flow diverges. For example, in the following snippet:

vec2 color = texture(tex, coordinate * 2.0)
if (foo) {
    /* the only time we ever use 'color' */
    output = color;
}

we can only push the texture operation into the if if foo is uniform, meaning it takes the same value for each fragment, since the texture takes an implicit derivative.

The NIR Control Flow Model

In order to support many different backends, as well as maintain the structured control information currently given by source languages such as GLSL and required by some backends such as Intel, NIR uses a control flow model that explicitly contains structured control flow elements such as loops and if statements. This approach also gives a clear model for when control flow converges and diverges that’s guaranteed to be supported by every GPU.

Nevertheless, there’s still the need to support the vast existing literature that takes basic blocks as fundamental. So NIR includes basic blocks as a primitive as well. Control flow in NIR consists of a control flow tree whose elements are if statements and infinite loops, and whose leaves are basic blocks. In addition, the successors of each block, which are the blocks that a given block branches to, and the predecessors of each block, which are the blocks that branch to a given block, are always kept up-to-date. Finally, there’s the start block, which is the first block in the function, and end block, which is a fake basic block that return statements point to. Together, the start block, end block, and graph created by the successors and predecessors form the control flow graph that complements the control flow tree. For example, this:

void main(void) {
    if (foo)
        return;

    while (bar) {
        if (baz)
            continue;

        ...
    }
}

would become, in NIR:

digraph {
    clusterrank="local";
    subgraph cluster_main {
        style="solid";
        label="main";
        start [label="(start)"];
        start -> then1_block;
        start -> else1_block;

        subgraph cluster_if1 {
            style="filled";
            fillcolor="lightgrey";
            label="if (foo)";

            subgraph cluster_then1 {
                label="then";
                fillcolor="white";

                then1_block [label="return"];
            }

            subgraph cluster_else1 {
                label="else";
                fillcolor="white";

                else1_block [label="(empty)"];
            }
        }

        pre_loop [label="(empty)"];
        else1_block -> pre_loop;
        pre_loop -> loop_header;

        subgraph cluster_loop {
            style="filled";
            fillcolor="lightgrey";
            label="loop";

            loop_header [label="(empty)"];
            then3_block -> loop_header [constraint=false];
            loop_header -> then2_block;
            loop_header -> else2_block;

            subgraph cluster_if2 {
                fillcolor="white";
                label="if (!bar)";

                subgraph cluster_then2 {
                    fillcolor="lightgrey";
                    label="then";

                    then2_block [label="break"];
                }

                subgraph cluster_else2 {
                    fillcolor="lightgrey";
                    label="else";

                    else2_block [label="(empty)"];
                }
            }

            loop_middle_block [label="(empty)"];
            else2_block -> loop_middle_block;
            loop_middle_block -> then3_block;
            loop_middle_block -> else3_block;

            subgraph cluster_if3 {
                fillcolor="white";
                label="if (baz)";

                subgraph cluster_then3 {
                    fillcolor="lightgrey";
                    label="then";

                    then3_block [label="continue"];
                }

                subgraph cluster_else3 {
                    fillcolor="lightgrey";
                    label="else";

                    else3_block [label="(empty)"];
                }
            }

            loop_end_block [label="...", rank="max"];
            else3_block -> loop_end_block;
            loop_end_block -> loop_header [constraint=false];
        }

        post_loop [label="(empty)"];
        then2_block -> post_loop;
        loop_end_block -> post_loop [style="invis"];

        post_loop -> end_block;
        then1_block -> end_block;

        end_block [label="(end)"];
    }
}

where the if statements and loops are represented by boxes and the basic blocks are represented by ovals. One thing that may be initially surprising is that if statements always have at least one empty basic block in the “else” portion – that is, if-then statements are turned into if-then-else statements. This helps optimizations that push operations into if statements, since there could be a statement that only ever executes when the condition is false, and adding the empty block creates a place where those statements can be moved. On the basic block level, creating the empty block removes a critical edge, which is an edge from a block with more than one successor to another with more than one predecessor. Consider this if-then statement:

if (foo) {
    bar = 1;
}
...

and its basic block representation:

digraph {
    pre [label="foo?"];
    then [label="bar = 1;"];
    post [label="..."];

    pre -> then;
    pre -> post [color="red"];
    then -> post;
}

The red edge is a critical edge, since its one of two incoming edges and one of two outgoing edges. Before running optimizations like Partial Redundancy Elimination (PRE) and Global Code Motion (GCM) whose aim is to move code into less frequently executed paths, most compilers will split the critical edge by inserting an empty basic block:

digraph {
    pre [label="foo?"];
    then [label="bar = 1;"];
    else [label="(empty)"];
    post [label="..."];

    pre -> then;
    pre -> else;
    then -> post;
    else -> post;
}

However, in basic-block-focused compilers, keeping critical edges split all the time would interfere with other optimizations that aim to reduce the number of jumps that have to be executed. But because NIR keeps control flow structured, those sorts of optimizations are either done very differently or not done at all, and therefore it makes sense to always keep critical edges split. It’s for the same reason that NIR doesn’t have a “predicated break” or “predicated continue” instruction, which is supported by most GPU’s: they add critical edges to the CFG and prevent the compiler from being able to make code execute only when the break or continue executes. In both cases, it’s easy enough for the backend to perform the optimizations to remove the extra blocks if necessary.

We’ve now explained why most of the extra empty basic blocks were inserted in the example NIR control flow, but there’s still one left. There’s an empty block in between the first if statement and the loop, so that the then and else branches branch to the empty block and then to the first block of the loop instead of jumping directly to the loop. Clearly, it isn’t there to remove a critical edge. So why insert it? Well, imagine that there was a statement in the loop that we determined to be loop-independent, so that we could move it outside the loop, but it was used inside the loop so we couldn’t move it after the loop. The empty block before the loop then comes in handy as a place to move it. Just as splitting critical edges helps optimizations such as PRE, inserting so-called padding blocks before and after the loop can help optimizations that do Loop-Invariant Code Motion (LICM), including GCM.

Putting it Together

We can put all the rules we’ve created into a guide for constructing the control flow tree. To do this, we’ll need a few different data types:

  • A control flow node (often shortened to “cf node” and defined as nir_cf_node in nir.h) is the base class for everything in the control flow tree. It can be a loop, an if statement, or a basic block.
  • A control flow list (often shortened to “cf list”) is a list of control flow nodes that corresponds to a series of statements in GLSL. It’s used to represent the body of a function and a loop as well as the then and else branches of an if statement. In NIR, it’s implemented as an intrusive linked list.
  • An if statement (defined as nir_if) contains a control flow list for the then and else branches as well as a condition.
  • A loop (defined as nir_loop) is an infinite loop (the only way to exit is through break statements). It only contains a control flow list representing the body.
  • A basic block, in addition to its previous definition, is now a leaf of the control-flow tree. In NIR, basic blocks are defined in a structure called nir_block.

as well as two rules, which together will cover both the if-then-else and loop padding situations: a control flow list must end and begin with a basic block and must contain one (and exactly one) block between each non-block control flow node (i.e. loop or if statement). That is, control flow lists must look like:

block
loop/if
block
loop/if
...
loop/if
block

and they have to consist of at least one (possibly empty) basic block. Finally, there are a class of instructions called “jump instructions”, defined as nir_jump_instr in nir.h, which is how breaks, continues, and returns are represented in NIR Note that “multilevel breaks” and “multilevel continues”, i.e. jumping to a loop outside of the innermost one, are currently not supported, although they may be in the future. There must be at most one jump instruction per basic block, and it must be at the end of the block.

If you aren’t sure, you should go and convince yourself that the example NIR control flow given earlier satisfies all these rules, in addition to being free of critical edges.

Modifying Control Flow

We’ve seen that there are two complimentary ways of describing control flow in NIR, the control flow tree and the control flow graph, which contain redundant information. To ease the burden of keeping both forms up-to-date, core NIR provides a number of helpers for rewriting the control flow graph. They allow you to manipulate the program as if it consists of a series of statements, like in GLSL, while “under the hood” they guarantee that the control flow tree is in the correct form and the successors and predecessors of the basic blocks involved are updated. Currently, these functions include:

  • nir_cf_node_insert_before
  • nir_cf_node_insert_after
  • nir_cf_node_insert_begin
  • nir_cf_node_insert_end
  • nir_cf_node_remove

For details see nir.h.

Table Of Contents

Previous topic

Introduction

Next topic

Instructions

This Page