In the last installment of this series we started to look at the AST and the symbol table by examining the compiler dumps of these two data structures. In this chapter we are going to explore a bit more the AST for the control flow statements.

Statement descriptor

At the end of the previous installment we learnt about the Statement Descriptor (STD). STDs allow us to order the ASTs that represent statements (the ordering is required to make sure we don't alter the semantics of the program). An AST node that represents a statement is linked to an STD and each STD is linked back to that AST. An STD includes extra information about what other STD precede or follow it.

We can dump all the STDs passing -Hq,5,2 to flang. Because the second argument is a bitmask, we can use -Hq,5,3 to print both the symtab and the STD (they will be emitted in this order).

For instance, the function of the last chapter

test.f90
1
2
3
4
5
FUNCTION ADD(A, B)
    INTEGER :: A, B, ADD

    ADD = A + B
END

has the following STD information

std:    1.  lineno:4      label:0    ()  ast:16   
      add = a + b
std:    2.  lineno:5      label:0    ()  ast:17   
      end

The field std is the identifier of the STD (the number of the identifier itself does not convey any order). lineno represents the line of the statement. The field label is used for labeled statements, see later. Finally ast is the AST identifier. The AST identifier can be matched in the AST dump (that we can obtain passing -Hq,4,256 to flang).

...
assign      hshlk/std:    1  type:integer opt=(0,0)
aptr:   16  dest:   15  src:   14

end         hshlk/std:    2 opt=(0,0)
aptr:   17
...

You may be wondering where have the two first statements gone. Well, the truth is that, a compiler only needs to emit code for anything that may have a side-effect (or visible behaviour). The non-executable statements by themselves won't cause any code be emitted (they will impact it, of course): the raw material here are the executable statements.

Equipped with this tools now we can look a bit more with detail some of the control flow statements of Fortran.

Examples of control flow in Fortran

Due to the long story of Fortran, the language has many colourful features that nowadays are regarded as obsolete or just weird. From a point of view of the evolution of programming languages (PL), Fortran is very interesting. Having been there for so long means that we can see the different trends an ideas in PL design impacting the evolution of the language.

If statement

Consider the following function that computes the absolute value of A

test.f90
1
2
3
4
5
6
FUNCTION MY_ABS(A)
    INTEGER :: A

    IF (A < 0) A = -A
    MY_ABS = A
END

The statement in line 4 is called an if-statement and it "gates" the execution of the statement A = -A, to the truth value of the expression A < 0.

std:    1.  lineno:4      label:0    ()  ast:14   
      if (a .lt. 0) then
std:    2.  lineno:4      label:0    ()  ast:16   
      a = -a
std:    3.  lineno:4      label:0    ()  ast:17   
      endif
std:    4.  lineno:5      label:0    ()  ast:19   
      my_abs = a
std:    5.  lineno:6      label:0    ()  ast:20   
      end
constant    hshlk/std:    0  type:integer opt=(0,0)
aptr:    2  sptr:  609 (0)
...
ident       hshlk/std:    0  type:integer  alias:    0  callfg:0 opt=(0,0)
aptr:   12  sptr:  625 (a)

binop       hshlk/std:    0  type:logical  alias:    0  callfg:0 opt=(0,0)
aptr:   13  lop :   12  rop:    2  optype:23

if-then     hshlk/std:    1 opt=(0,0)
aptr:   14  ifexpr:   13

unaryop     hshlk/std:    0  type:integer  alias:    0  callfg:0 opt=(0,0)
aptr:   15  lop :   12  optype:2

assign      hshlk/std:    2  type:integer opt=(0,0)
aptr:   16  dest:   12  src:   15

endif       hshlk/std:    3 opt=(0,0)
aptr:   17

ident       hshlk/std:    0  type:integer  alias:    0  callfg:0 opt=(0,0)
aptr:   18  sptr:  626 (my_abs)

assign      hshlk/std:    4  type:integer opt=(0,0)
aptr:   19  dest:   18  src:   12

end         hshlk/std:    5 opt=(0,0)
aptr:   20

From the two dumps above we can see that the if-statement is represented using an AST (14) of type if-then. The field ifexpr represents the control expression of the if-statement. In this case it is the AST 13 which is a binary operation, binop, (of type 23 representing the "lower than" operation) with expression type logical. The left operand is a reference to the variable a, represented in the AST 12. As we saw in the previous chapter, references to variables are represented with a tree type ident. The right operand is just the constant 0, represented in the AST 2 of type constant.

The conditionally executed statement A = -A is an assignment-statement represented in the AST 16. Note that the destination is the same AST 2 we used in left operand of the AST 13 above: rarely there is a need to repeat references to the same variable.

The AST 17 represents the END IF statement. It may come as a surprising thing to you, but flang is not nesting the control flow structures in any way at this level. So to represent the nesting, flang sees itself forced to create regions which are naturally delimited in Fortran using many kinds of end-statements (in this case a end-if-statement). Because of the parser and semantic checks, these regions will be well-formed and will honour the proper nesting, but nothing in the in-memory representation is explicitly forcing this.

If-then construct

Consider the following function that does an addition or a subtraction depending on a logical parameter.

test.f90
1
2
3
4
5
6
7
8
9
10
FUNCTION ADD_OR_SUB(L, A, B)
    LOGICAL :: L
    INTEGER :: A, B, ADD_OR_SUB

    IF (L) THEN
        ADD_OR_SUB = A + B
    ELSE
        ADD_OR_SUB = A - B
    END IF
END
std:    1.  lineno:5      label:0    ()  ast:13   
      if (l) then
std:    2.  lineno:6      label:0    ()  ast:18   
      add_or_sub = a + b
std:    3.  lineno:7      label:0    ()  ast:19   
      else
std:    4.  lineno:8      label:0    ()  ast:21   
      add_or_sub = a - b
std:    5.  lineno:9      label:0    ()  ast:22   
      endif
std:    6.  lineno:10     label:0    ()  ast:23   
      end
...
if-then     hshlk/std:    1 opt=(0,0)
aptr:   13  ifexpr:   12
...
else        hshlk/std:    3 opt=(0,0)
aptr:   19
...
endif       hshlk/std:    5 opt=(0,0)
aptr:   22

Again we need to delimit the regions corresponding to the "then" and the "else" via specific statements. This actually means that flang lowers the if-statement of the previous section as it would lower an if-construct with a single statement in the then part and no else part.

Let's see what happens a more complex example:

test.f90
1
2
3
4
5
6
7
8
9
10
11
12
13
SUBROUTINE SUB(L1, L2, A, B)
    LOGICAL :: L1, L2
    INTEGER :: A, B
    IF (L1) THEN
        A = 1
        B = 3
    ELSE
        IF (L2) THEN
            A = 2
            B = 4
        END IF
    END IF
END SUBROUTINE SUB

The STDs look like this

std:    1.  lineno:4      label:0    ()  ast:13   
      if (l1) then
std:    2.  lineno:5      label:0    ()  ast:15   
      a = 1
std:    3.  lineno:6      label:0    ()  ast:18   
      b = 3
std:    4.  lineno:7      label:0    ()  ast:19   
      else
std:    5.  lineno:8      label:0    ()  ast:21   
      if (l2) then
std:    6.  lineno:9      label:0    ()  ast:23   
      a = 2
std:    7.  lineno:10     label:0    ()  ast:25   
      b = 4
std:    8.  lineno:11     label:0    ()  ast:26   
      endif
std:    9.  lineno:12     label:0    ()  ast:27   
      endif
std:   10.  lineno:13     label:0    ()  ast:28   
      end

We can tally the control flow STDs with their ASTs. This time I have indented the dump to make the nesting explicit.

...
if-then     hshlk/std:    1 opt=(0,0)
aptr:   13  ifexpr:   12
   ...
else        hshlk/std:    4 opt=(0,0)
aptr:   19
   ...
   if-then     hshlk/std:    5 opt=(0,0)
   aptr:   21  ifexpr:   20
      ...
   endif       hshlk/std:    8 opt=(0,0)
   aptr:   26
endif       hshlk/std:    9 opt=(0,0)
aptr:   27

Do-while-statement

Fortran does not have a dedicated while statement but instead it is a special case of the do-statement. Consider the following program that computes the greater common divisor using the Euclid algorithm.

test.f90
1
2
3
4
5
6
7
8
9
10
11
12
FUNCTION GCD(A, B)
    INTEGER :: A, B, GCD
    INTEGER :: T

    DO WHILE (B /= 0)
        T = B
        B = MOD(A, B)
        A = T
    END DO

    GCD = A
END FUNCTION GCD

Let's dump the STD to see what has happened now:

std:    1.  lineno:5      label:629  (%L99999)  ast:12   
%L99999:
      continue
std:    2.  lineno:5      label:0    ()  ast:15   
      if (b .ne. 0) then
std:    3.  lineno:6      label:0    ()  ast:17   
      t = b
std:    4.  lineno:7      label:0    ()  ast:21   
      b = mod(a,b)
std:    5.  lineno:8      label:0    ()  ast:22   
      a = t
std:    6.  lineno:9      label:0    ()  ast:23   
      goto 99999
std:    7.  lineno:9      label:0    ()  ast:25   
      endif
std:    8.  lineno:11     label:0    ()  ast:27   
      gcd = a
std:    9.  lineno:12     label:0    ()  ast:28   
      end

Take a moment to appreciate what flang did here. It introduced an artificial continue-statement (CONTINUE). The execution of a continue-statement has no effect but flang attached an artificial label %L99999 to it. Now we see that our DO WHILE has been replaced by an if-then-statement. The then part of this if-then-statement executes the body of the DO WHILE. Because this is a loop, flang has introduced a GOTO statement back to the label of the continue-statement that it emitted before the loop body.

So flang has lowered the input above into something like this.

test.f90
1
2
3
4
5
6
7
8
9
10
11
12
13
14
FUNCTION GCD(A, B)
    INTEGER :: A, B, GCD
    INTEGER :: T

    99999: CONTINUE
    IF (B /= 0) THEN
       T = B
       B = MOD(A, B)
       A = T
       GOTO 99999
    END IF

    GCD = A
END FUNCTION GCD

This kind of early lowerings come with some tradeoffs. The upside of lowering early is that the construct can be supported using the existing infrastructure for more general constructs. This way, the rest of the compiler does not need to be taught about the new construct. The downside is that sometimes these constructs can be lowered into executable code in a more efficient way. By lowering to existing, more general, constructs we may lose this ability or we are forced to invest some effort to recover the lost information. Also, tooling that needs to work at the source-level (e.g. refactoring tools) may be hindered by these lowerings as now the AST is not representing with enough fidelity what was in the input, so rewriting based on the AST is not possible or it will be impacted by this kind of lowering. A way to mitigate this problem is to annotate in the lowered construct extra information that can be used by source-level processing tools or the code generation phases of the compiler.

Do construct

One of the most important constructs of Fortran is the do-construct. Along with the assignment statement it is probably one of the oldest statements in Fortran. There are two forms of the do-construct: the non-block do-construct and the block one. The block do-construct is delimited by a do-statement and an end-do-statement.

DO I = 1, N
   ...
END DO

The non-block one is delimited using a first do-statement including a label and any statement with the same label. This was a design sensible in the 60s but it is now regarded as deprecated.

DO 50 I = 1, N
  ...
50 <last-statement-of-body-of-loop-I>

The fun part of non-block constructs is that the label can be shared with more than one loop in a nested way.

DO 50 I = 1, N
  ...
  DO 50 J = 1, M
  ...
50 <last-statement-of-body-of-loop-J-but-also-ends-body-of-loop-I>

Because this syntax is a bit awkward in the current world of structured programming, it is not uncommon to use a labeled CONTINUE statement to as the last-statement of a non-block do-construct (in the unlikely case we're forced to use one).

DO 50 I = 1, N
  ...
50 CONTINUE

The good news are that flang will lower the non-block loops into block ones. Consider the following program that computes the sum of integers 1 to N.

test.f90
1
2
3
4
5
6
7
8
9
10
11
12
FUNCTION SUMMATION(N)
    IMPLICIT NONE
    INTEGER :: SUMMATION
    INTEGER :: N
    INTEGER :: I, S

    S = 0
    DO 50 I = 1, N
    50 S = S + I

    SUMMATION = S
END FUNCTION SUMMATION
std:    1.  lineno:7      label:0    ()  ast:13   
      s = 0
std:    2.  lineno:8      label:0    ()  ast:17   
      do i = 1, n
std:    3.  lineno:9      label:629  (.L00050)  ast:19   
.L00050:
      s = s + i
std:    4.  lineno:9      label:0    ()  ast:20   
      enddo
std:    5.  lineno:11     label:0    ()  ast:22   
      summation = s
std:    6.  lineno:12     label:0    ()  ast:23   
      end

The do-construct is a very high-level construct: a loop like DO var = start, end, stride iterates M times where M = (end - start + stride) / stride (if M is negative or zero no iterations happen). Before starting the first iteration, var will be assigned the value of start, and at the end of each iterationvar is incremented with the value of stride (which if not specified is 1 and it can be a negative value).

Technically in the Fortran standard, DO WHILE and DO belong to the same do-construct with different loop-control parts. It just makes sense, though, to consider them separately here.

"Do forever"

There is a form of the do-construct that may never end. In practice, this is just a dated form of the DO WHILE because any sensible program written in Fortran will eventually terminate: this means there will be some check inside the loop body that will make the loop terminate. We can rewrite the Euclid algorithm above using this form. The EXIT statement causes the termination of the innermost enclosing loop.

test.f90
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
FUNCTION GCD(A, B)
    INTEGER :: A, B, GCD
    INTEGER :: T

    IF (B /= 0) THEN
      DO
          T = B
          B = MOD(A, B)
          A = T
          IF (B /= 0) EXIT
      END DO
    END IF

    GCD = A
END FUNCTION GCD
std:    1.  lineno:5      label:0    ()  ast:14   
      if (b .ne. 0) then
std:    2.  lineno:6      label:629  (%L99999)  ast:15   
%L99999:
      continue
std:    3.  lineno:6      label:0    ()  ast:17   
      if (.true.) then
std:    4.  lineno:7      label:0    ()  ast:19   
      t = b
std:    5.  lineno:8      label:0    ()  ast:23   
      b = mod(a,b)
std:    6.  lineno:9      label:0    ()  ast:24   
      a = t
std:    7.  lineno:10     label:0    ()  ast:25   
      if (b .ne. 0) then
std:    8.  lineno:10     label:0    ()  ast:26   
      goto 99998
std:    9.  lineno:10     label:0    ()  ast:28   
      endif
std:   10.  lineno:11     label:0    ()  ast:29   
      goto 99999
std:   11.  lineno:11     label:0    ()  ast:31   
      endif
std:   12.  lineno:11     label:631  (%L99998)  ast:32   
%L99998:
      continue
std:   13.  lineno:12     label:0    ()  ast:33   
      endif
std:   14.  lineno:14     label:0    ()  ast:35   
      gcd = a
std:   15.  lineno:15     label:0    ()  ast:36   
      end

If we look at the STDs of the code above we see that the EXIT statement is lowered to a GOTO branch to a statement right after the loop (in this case the STD 12). Note also how the DO "forever" is just implemented as if the user had written DO WHILE (.TRUE.) (check STDs 2, 3 and 10).

Curtailing do constructs

It is possible to "curtail" an iteration of a loop using the CYCLE statement. For instance, we use that statement to sum only the positive numbers of a given array.

test.f90
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
FUNCTION POSITIVE_SUMMATION(N, A)
    IMPLICIT NONE
    INTEGER :: POSITIVE_SUMMATION
    INTEGER :: N
    INTEGER :: A(N)
    INTEGER :: S, I

    S = 0
    DO I = 1, N
        IF (A(I) < 0) CYCLE
        S = S + A(I)
    END DO

    POSITIVE_SUMMATION = S
END FUNCTION POSITIVE_SUMMATION

Flang does the obvious and the cycle becomes a GOTO to an artifically-introduced last statement of the do-construct. Again the continue-statement comes handy to add new statements with no effect only to be able to attach a label.

std:    1.  lineno:8      label:0    ()  ast:16   
      s = 0
std:    2.  lineno:9      label:0    ()  ast:18   
      do i = 1, n
std:    3.  lineno:10     label:0    ()  ast:23   
      if (a(i) .lt. 0) then
std:    4.  lineno:10     label:0    ()  ast:24   
      goto 99999
std:    5.  lineno:10     label:0    ()  ast:26   
      endif
std:    6.  lineno:11     label:0    ()  ast:28   
      s = s + a(i)
std:    7.  lineno:11     label:631  (%L99999)  ast:29   
%L99999:
      continue
std:    8.  lineno:12     label:0    ()  ast:30   
      enddo
std:    9.  lineno:14     label:0    ()  ast:32   
      positive_summation = s
std:   10.  lineno:15     label:0    ()  ast:33   
      end

Under the hood

We've seen several examples of how these statements are represented in the AST and its STDs. Let's dive a bit deeper to see when this happens in flang. We have to understand we are at relatively early steps in the compilation pipeline, the parser is checking our syntax and semantics and if it is correct it will build the appropriate trees.

For this post I think it is interesting to look how a DO WHILE is currently lowered. Because this is an involved process, we will focus only on the specific construct itself.

DO WHILE (B /= 0)
   T = B
   B = MOD(A, B)
   A = T
END DO

We are handling semantic actions. We know from previous chapters that they are split in several files depending on the specific construct recognised by the parser. In the case of Fortran executable statements, most of them are found in the file tools/flang1/flang1exe/semant3.c.

We will first encounter a DO token. Fortran allows some constructs to have non-numeric labels. These labels precede the keyword (DO in this case) and are followed by a colon. In this case there isn't any construct-name, so flang just remembers this fact.

tools/flang1/flang1exe/semant3.c
2896
2897
2898
2899
2900
2901
  /*
   *	<do construct> ::= DO |
   */
  case DO_CONSTRUCT1:
    named_construct = 0;
    break;

We mentioned earlier about non-block constructs, for those a label would follow the DO token. In this case no label follows it, so flang just remembers this fact too.

tools/flang1/flang1exe/semant3.c
2887
2888
2889
2890
2891
2892
  /*
   *      <do begin> ::= <do construct>
   */
  case DO_BEGIN2:
    do_label = 0;
    break;

What may follow a <do begin> in the flang parser parlance is called a <loop control> so we will recognise that first. A <loop control> includes several syntaxes but in this case our will be <dowhile> <etmp lp> <expression> ). So we need to first recognise a <dowhile>.

tools/flang1/flang1exe/semant3.c
3108
3109
3110
3111
3112
3113
3114
3115
3116
3117
3118
3119
3120
3121
3122
3123
3124
3125
  /*
   *	<dowhile> ::= <opt comma> WHILE
   */
  case DOWHILE1:
    NEED_DOIF(doif, DI_DOWHILE);
    DI_DO_LABEL(doif) = do_label;
    doinfo = get_doinfo(1);
    doinfo->index_var = 0; /* marks doinfo for a DOWHILE */
    DI_DOINFO(doif) = doinfo;
    DI_NAME(doif) = named_construct;
    if (scn.currlab)
      DI_TOP_LABEL(doif) = scn.currlab;
    else
      scn.currlab = DI_TOP_LABEL(doif) = getlab();
    ast = mk_stmt(A_CONTINUE, 0);
    (void)add_stmt(ast);
    direct_loop_enter();
    break;

Things have suddenly become much more complicated. NEED_DOIF macro makes sure that the doif variable will contain a fresh new nesting information for the do-construct that we are about to handle. It is called doif because flang uses it both for do- and if-constructs. In the existing style of flang, it is just an identifier that will be used in other macros (in this case prefixed with DI_) to associate information to the do-if nesting. And you can see that in the DI_DO_LABEL where we just keep the do_label we noted earlier (there is none in our input so do_label is just 0 here).

Function get_doinfo returns a DOINFO pointer. This is used for the specific bits of the do-construct we have encountered. If you wonder about the 1 argument, this is used as the area to get the memory (flang has 30 memory areas and each one has some specific usage and behaviour, area 1 is used for doinfo, but flang creates DOINFOs in other areas too). Setting doinfo->index_var to zero is used to mean that this do-construct is a DO WHILE. We link then the current doif identifier with the doinfo using DI_DOINFO. We also link the doif with the construct name (which there was none much like the label) using DI_NAME.

Now the code checks if the current statement had a label. Before the DO token and even before the construct-name, this do-statement, like any other Fortran statement, can have a label. If it doesn't, and in our case it doesn't, we just get a "compiler created" (i.e. artificial) label by calling getlab. We then remember it as the top level label of this doif using DI_TOP_LABEL. We also bluntly set the label identifier of the scanner, I'm pretty sure there is an (obscure) reason to do this.

Now we create the CONTINUE statement we saw flang emitted above! The function direct_loop_enter is called to state that we are entering a loop, there is some bookkeeping related to directives (e.g. OpenMP or OpenACC) that is required. This function takes care of it. In our case this function is mostly a no-operation.

Great, we semantically handled a <dowhile>. Now the next item is <etmp lp>. This is a bit of an odd non-terminal that means "expression temporary and left parenthesis". Because this is not strictly related to statements, it is found in semant2.c.

tools/flang1/flang1exe/semant2.c
531
532
533
534
535
536
537
  /*
   *	<etmp lp> ::= (
   */
  case ETMP_LP1:
    sem.use_etmps = TRUE;
    sem.etmp_list = NULL;
    break;

Basically this is bookkeeping used by the parser to mean "we may have to allocate temporary expressions". After that an <expression> will follow, let's ignore that and presume everything works as expected and then a closing parenthesis. At this point we have recognised the <loop control> of a DO WHILE.

tools/flang1/flang1exe/semant3.c
3084
3085
3086
3087
3088
3089
3090
3091
3092
  /*
   *	<loop control> ::= <dowhile> <etmp lp> <expression> ) |
   */
  case LOOP_CONTROL2:
    ast2 = gen_logical_if_expr(RHS(3));
    ast = mk_stmt(A_IFTHEN, 0);
    A_IFEXPRP(ast, ast2);
    SST_ASTP(LHS, ast);
    break;

The function gen_logical_if_expr basically gathers the logical expression found in the parentheses (synthesized during the semantic checking of <expression> above). Now we create the if-then statement using mk_stmt (the second argument is the type which there is none for a control statement like this one, so we just set it to 0). This gives us an incomplete if-then statement node so we set its IFEXPR attribute using the macro A_IFEXPRP with the logical expression. Finally, new statement is now set as the AST attribute of <loop control> itself, this is done by SST_ASTP.

Now a <loop control> has been recognized so we're done with the <control stmt>

tools/flang1/flang1exe/semant3.c
1832
1833
1834
1835
1836
1837
  /*
   *      <control stmt> ::= <do begin> <loop control> |
   */
  case CONTROL_STMT7:
    SST_ASTP(LHS, SST_ASTG(RHS(2)));
    break;

The code above just keeps the AST of <loop control> (in our case an A_IFTHEN) in <control stmt>. These are almost top level-statements from the point of view of the parser, except for a few things.

tools/flang1/flang1exe/semant2.c
1305
1306
1307
1308
1309
1310
1311
  /*
   *      <statement> ::= <nii> <nim> <control stmt> |
   */
  case STATEMENT6:
    prevphase = sem.pgphase;
    SST_ASTP(LHS, SST_ASTG(RHS(3)));
    goto executable_shared;

If you wonder about <nii> and <nim> above, they just mean "not inside interface" and "not inside module" (certainly a DO WHILE can't appear directly in those contexts). This semantic action just forwards the AST in control <control stmt> into the <statement> itself. Finally executable_shared performs common stuff for statements, including adding it to the STD list.

tools/flang1/flang1exe/semant2.c
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
  executable_shared:
    sem.pgphase = PHASE_EXEC;
    sem.temps_reset = FALSE;
  /* fall thru to 'statement_shared' */

  statement_shared:

    if ((ast = SST_ASTG(LHS))) {
      (void)add_stmt(ast);
      SST_ASTG(LHS) = 0;
    }

Ok, so what has happened is that now we have just added an if-then-statement in the STD. Then the statements making up the body of the DO WHILE will be added too and eventually we will reach the end-do-statement.

tools/flang1/flang1exe/semant3.c
1862
1863
1864
1865
1866
1867
1868
1869
1870
1871
1872
  /*
   *      <control stmt> ::= ENDDO <construct name> |
   */
  case CONTROL_STMT9:
  share_do:
    SST_ASTP(LHS, 0);
    if (sem.doif_depth <= 0) {
      error(104, ERR_Severe, gbl.lineno, "- mismatched ENDDO", CNULL);
      (void)add_stmt(mk_stmt(A_ENDDO, 0));
      break;
    }

Here the construct name can be empty (and in our input it is). First we check if doif_depth is not zero: if it is then this is a stray end-do-statement that does not follow any if-then-statement or do-statement.

tools/flang1/flang1exe/semant3.c
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
1886
1887
    doif = sem.doif_depth;
    if (scn.currlab && DI_DO_LABEL(doif) == scn.currlab)
      /*
       * the enddo is labeled and the label matches the do label.
       */
      ;
    else if ((DI_ID(doif) != DI_DO && DI_ID(doif) != DI_DOWHILE &&
              DI_ID(doif) != DI_DOCONCURRENT) || DI_DO_LABEL(doif)) {
      error(104, 3, gbl.lineno, "- mismatched ENDDO", CNULL);
      SST_ASTP(LHS, 0);
      break;
    }
    if (DI_NAME(doif) != named_construct)
      err307("DO [CONCURRENT|WHILE] and ENDDO", DI_NAME(doif),
              named_construct);

Now we check if this end-do-statement actually finishes a do-construct: it could well be that the input is an if-then-statement followed by and end-do-statement. The doif data structure is used for if-then-statement and do-statement, so we need to check if this ends a (plain) DO, a DO WHILE or a DO CONCURRENT. In our case it will be a DO WHILE. Note that in our input DI_DO_LABEL(doif) is going to be zero (and no label can be zero in Fortran) so the first check won't trigger. Note also that the first check may look a bit weak but it is correct because DI_DO_LABEL only concerns to non-block do-constructs (and never to if-then-statements). After this check we verify that if the END DO contains a construct name it matches the one we used in the do-construct, we didn't use any in our input so this branch will be trivially satisfied (both DI_NAME(doif) and named_construct will be 0).

tools/flang1/flang1exe/semant3.c
1889
    do_end(doinfo);

The function do_end does a lot of things which are required upon completion of a do-contruct.

tools/flang1/flang1exe/semutil.c
6000
6001
6002
6003
6004
6005
6006
    case DI_DOWHILE:
      ast = mk_stmt(A_GOTO, 0);
      A_L1P(ast, mk_label(DI_TOP_LABEL(orig_doif)));
      RFCNTI(DI_TOP_LABEL(orig_doif));
      (void)add_stmt(ast);
      (void)add_stmt(mk_stmt(A_ENDIF, 0));
      break;

For the particular case of our DO WHILE, we emit the GOTO statement (A_GOTO) that will branch back to the DI_TOP_LABEL of this do-construct. The macro A_L1P is used to set the label of the GOTO (the attribute is called A_L1 because some statements have more than one label-reference, like the arithmetic if-statement that has three of them). Note also how we increment the reference counter of the top label (haven't verified it but looks like flang will later remove labels that are defined-but-not-used, so we have to prevent that). The GOTO statement is added to the STD. Finally we create an end-if-statement and we also add it to the STD. This way we keep the AST well-formed by nesting everything as expected.

Summary

This has been another long post but we can summarise it in:

  • Some constructs are lowered into sequences of if-construct and goto-statements.
  • Structured programming structures are represented in a flattened way using the natural delimiter statements of Fortran. The in-memory representation does not provide nesting of any form and the proper nesting must be explicitly represented and maintained.
  • The do-construct that is not a DO WHILE is lowered preserving the original loop-control part.

In the next chapter we will go beyond the AST, and related intermediate representation, and we will start looking at what happens after the initial AST has been formed.