Think In Geek

In geek we trust

# Walk-through flang – Part 7

In previous chapters we saw how the input source was lexed, parsed and semantically analysed and we looked at how the symbols and data types are represented. But we haven’t looked at what happens once the semantic analysis finishes. In this installment we’re going to talk about the AST.

## Abstract Syntax Tree

Fortran has two kinds of statements: non-executable and executable. The former declare properties about entities of the program and we saw in the previous chapter that these entities or symbols are represented in a symbol table. There is usually not much more to do beyond registering and remembering the attributes implied by a non-executable statement.

Executable statements, on the other hand, describe the computation that our Fortran program has to do. So we need a mechanism to represent that computation. At some point we will use this representation to generate a program that does exactly the intended computation by the Fortran source. A common way to represent this is using an AST, or abstract syntax tree.

ASTs are called abstract because they do not represent all the lexical and syntactical details of the source. Instead, they represent the fundamental parts of the language that are going to be relevant for the computation. The flang AST does exactly that.

## Example

Lets consider this very simple function below

```test.f901 2 3 4 5 FUNCTION ADD(A, B) INTEGER :: A, B, ADD   ADD = A + B END```

### Symbol table

In the last chapter we talked about the symbol table but we didn’t consider any specific example neither we attempted to dump any of the symbol tables. Now it is a good opportunity to do this.

The flang driver allows us to pass debug flags to `flang1` using `-Hq,x,y` (and to `flang2` using `-Mq,x,y`). The numbers `x` and `y` are documented in the flang documentation (file `coding.html`). We can dump the symbol table doing `-Hq,5,1`

`\$ flang -c test.f90 -Hq,5,1`

When we request debug information, flang will create a file named `<name>.qbf` for every given `<name>.f90` compiled. In our case `test.qdbf`.

```test.qdbfF90 PARSER begins {flang after parser   add integer entry sptr: 624 hashlk: 0 nmptr: 5015 dtype: 6 scope: 0 lineno: 1 enclfunc: 0 dcld: 1 ccsym: 0 entstd: 0 entnum: 0 endline: 5 symlk: 1 paramct: 2 dpdsc: 2 funcline: 1 bihnum: 0 fval: 627 pure: 0 impure: 0 recur:0 adjarr:0 aftent:0 assumshp:0 private:0 gsame: 0 altname: 0 Parameters: sptr = 625, a sptr = 626, b   a integer variable sptr: 625 hashlk: 0 nmptr: 5019 dtype: 6 scope: 624 lineno: 1 enclfunc: 0 dcld:1 ccsym:0 save:0 ref:0 dinit:0 vol:0 ptrv:0 cvlen:0 address: 0 sc:3(SC_DUMMY) symlk: 1 midnum: 0 socptr: 0 autobj: 0 addrtkn:0 eqv:0 hccsym:0 alloc:0 arg:0 seq:1 nml:0 assn:0 private:0 sdsc: 0 ptroff: 0 descr: 0 altname:0 optarg:0 intent:INOUT   b integer variable sptr: 626 hashlk: 0 nmptr: 5021 dtype: 6 scope: 624 lineno: 1 enclfunc: 0 dcld:1 ccsym:0 save:0 ref:0 dinit:0 vol:0 ptrv:0 cvlen:0 address: 0 sc:3(SC_DUMMY) symlk: 1 midnum: 0 socptr: 0 autobj: 0 addrtkn:0 eqv:0 hccsym:0 alloc:0 arg:0 seq:1 nml:0 assn:0 private:0 sdsc: 0 ptroff: 0 descr: 0 altname:0 optarg:0 intent:INOUT   add integer variable sptr: 627 hashlk: 0 nmptr: 5015 dtype: 6 scope: 624 lineno: 0 enclfunc: 0 dcld:1 ccsym:0 save:0 ref:0 dinit:0 vol:0 ptrv:0 cvlen:0 address: 0 sc:3(SC_DUMMY) symlk: 1 midnum: 0 socptr: 0 autobj: 0 addrtkn:0 eqv:0 hccsym:0 alloc:0 arg:0 seq:1 nml:0 assn:1 private:0 sdsc: 0 ptroff: 0 descr: 0 altname:0 result optarg:0 intent:INOUT ...```

The dump of the table is not super easy to read but basically we see four entries. Each entry starts with the name of the symbol and the kind of symbol. The next lines are several attributes of the symbol. The set of attributes dumped changes for each kind of symbol.

In our example we only have two kinds of symbols: a first `add` which is an entry (which is the way flang names a `FUNCTION`) then `a`, `b` and another `add`. The second `add` is there because it represents the result-name of the `FUNCTION` (had we used RESULT(myname) in the FUNCTION statement we would not have had repeated symbol names). In general repeated names are not a problem (sometimes it happens in Fortran) but it in this case, note that the scope of the variables is 624 which is the id of the symbol. The id of symbols that entail scopes like `FUNCTION`, `SUBROUTINE`, `MODULE` is used to define a scoping relationship. Also note that the variables are marked to have a storage-class of SC_DUMMY. In Fortran parlance a dummy-argument is a formal parameter.

### AST

Ok, now we have seen what flang knows (or at least shows in the dumps) about the symbols. Let’s see the AST it generates. We can get the AST using `-Hq,4,256`. It is possible to combine several `-Hq` flags so we can see at the same time several internal dumps.

`\$ flang -c test.f90 -Hq,5,1 -Hq,4,256`
```test.qdbf... AST Table   null hshlk/std: 0 opt=(0,0) aptr: 1 <null_ast>   constant hshlk/std: 0 type:integer opt=(0,0) aptr: 2 sptr: 609 (0)   constant hshlk/std: 0 type:integer opt=(0,0) aptr: 3 sptr: 611 (1)   ident hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 4 sptr: 1 (..sqrt)   unaryop hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 5 lop : 4 optype:28 ptr0   ident hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 6 sptr: 2 (.sqrt)   unaryop hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 7 lop : 6 optype:28 ptr1   ident hshlk/std: 0 type:character*1 alias: 0 callfg:0 opt=(0,0) aptr: 8 sptr: 3 (.dsqrt)   unaryop hshlk/std: 0 type:character*1 alias: 0 callfg:0 opt=(0,0) aptr: 9 lop : 8 optype:28 ptr0c   constant hshlk/std: 0 type:integer*8 opt=(0,0) aptr: 10 sptr: 610 (0)   constant hshlk/std: 0 type:integer*8 opt=(0,0) aptr: 11 sptr: 612 (1)   ident hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 12 sptr: 625 (a)   ident hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 13 sptr: 626 (b)   binop hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 14 lop : 12 rop: 13 optype:1   ident hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 15 sptr: 627 (add)   assign hshlk/std: 1 type:integer opt=(0,0) aptr: 16 dest: 15 src: 14   end hshlk/std: 2 opt=(0,0) aptr: 17 ...```

For reasons I have not looked into them yet, the AST contains references to a couple of preregistered symbols `.sqrt` and `.dsqrt` (they look like to related to the corresponding Fortran intrinsics) and a couple of constants 0 and 1. Why no other preregistered symbols or constants appear is a bit puzzling to me. That said, the interesting bits of this dump are the following

```test.qdbf... AST Table   ...   ident hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 12 sptr: 625 (a)   ident hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 13 sptr: 626 (b)   binop hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 14 lop : 12 rop: 13 optype:1   ident hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 15 sptr: 627 (add)   assign hshlk/std: 1 type:integer opt=(0,0) aptr: 16 dest: 15 src: 14   end hshlk/std: 2 opt=(0,0) aptr: 17 ...```

An AST is made of nodes. A node represents some computation. Sometimes the computation is almost no computation, like just a constant or a reference to a variable. Some other times the computation is compound of other computations. This is, other nodes. Recall that in line 4 our Fortran program is doing

```4 ADD = A + B```

It is possible to relate that statement to the AST dump above. Let’s see, that statement is in Fortran parlance an assignment statement. Flang represents it using a node for assignments.

```assign hshlk/std: 1 type:integer opt=(0,0) aptr: 16 dest: 15 src: 14```

`assign` is the kind of the node, in this case this node represents an assignment. The `type` of the operation is integer. `aptr` means the `id` of this node (remember that these id, even if just numbers, are unrelated to the id’s used for symbols or data types and the precise id number could be the same). Now we have the operands of this node, which are other computations, i.e. other AST nodes. In this case the destination (`dest`) is encoded in the node 15 and the source (`src`) in the node 14. What are the destination and the source of an assignment? They are the left and right-hand side of the assignment, respectively.

The left-hand side is `ADD` in our Fortran program. Should be encoded in the tree 15.

```ident hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 15 sptr: 627 (add)```

Indeed, we could check the symbol table above and see that 627 is exactly the result name `add`. So good so far.

The right-hand side of our assignment is `A + B` and it is encoded in the tree 14.

```binop hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 14 lop : 12 rop: 13 optype:1```

This node is a bit more interesting because it is a binary operation (`binop`) (again of type `integer`). The left and right-hand sides of a binary operations are named by flang as `lop` and `rop` respectively. We can check the corresponding AST nodes.

```ident hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 12 sptr: 625 (a)   ident hshlk/std: 0 type:integer alias: 0 callfg:0 opt=(0,0) aptr: 13 sptr: 626 (b)```

Looks sensible.

## Kinds and attributes of AST

Flang has about 160 ASTs. They are defined in the file `tools/flang1/utils/ast/ast.n`. This file is in a troff-like syntax which is not particularly readable. The form is relatively simple and we can extract information using grep.

```\$ grep -A2 "\.ST" tools/flang1/utils/ast/ast.n
.SI null
This AST type represents the AST at location 0 in the AST table.
Generally, the pointer (index) value of zero will not appear in
--
.SI ident LVAL EXPR
This node is a leaf node in the AST and represents an
identifer which is fully resolved (declared).
--
.SI constant EXPR
.lp
Leaf node representing a constant.
--
.SI label
.lp
Leaf node representing a label.
--
.SI binop EXPR
.lp
Binary Operator node.
--
.SI unaryop EXPR
.lp
Unary Operator node.
...
```

File `tools/flang1/utils/ast/ast.n` is also used to generate a similar one in the build directory named `tools/flang1/utils/ast/ast.out.n`. The latter
includes extra information of the memory layout of the nodes. However the relevant attributes should already be documented in the former. Theoretically it should be possible to generate Sphinx documentation from this file (like it is already done with other similar files) but this does not seem to be implemented yet.

For each `.SM` entry in `ast.n`, look at the following entries `.SI`, `.SE`, `.FL` and `.OV`.

Entry Meaning Accessor
`.SM NAME` Kind of AST node: `BINOP`, `UNOP`, `ID`, etc.. `A_TYPEG(ast)` will return `NAME`.
`A_TYPEP(ast, NAME)` is possible but less common.
`.SI name` Textual description of the node: for debugging only `astb.atypes[A_TYPEG(i)]`
`.FL NAME` Name of a flag (boolean value) of this node. `A_NAMEG(ast)`
`A_NAMEP(ast, bool-value)`
`.SE NAME` Name of an attribute of this AST, usually returning an id of another AST or data type.

`A_NAMEG(ast)`
`A_NAMEP(ast, id)`
`.OV NAME` Used for attributes of integer nature that are not an id of an entity (like a kind or a hash link).

`A_NAMEG(ast)`
`A_NAMEP(ast, value)`

For instance, AST nodes that represent a binary operation have this information, according to `ast.n`.

• Kind of node: `BINOP`
• `CALLFG` (flag). Set if a function reference appears in the left and/or right operand. (for instance in `2.3 + SQRT(1.2)`)
• `DTYPE`. Data type of the result of the operation
• `LOP`. AST pointer to left (first) operand
• `OPTYPE`. Type of operator (see `OP_` macros in `ast.h`)
• `ROP`. AST pointer to right (second) operand
• `ALIAS`. If node evaluates to a constant, this field locates the constant node representing this value

### The gory details

If you are really interested on seeing how an AST is stored, just take a look at `tools/flang1/utils/ast/ast.h` in the build directory. You will see that there is a global variable `astb `which represents the global block that stores all the ASTs. A single AST node is represented using a struct of the type `AST`.

The names of the nodes are stored in `astb.atypes`, indexed by the type of the node (i.e. `A_TYPEG(ast)`).

The field `astb.attr` stores some attributes intended to simplify some checks.

```/* AST attributes: for fast AST checking -- astb.attr is a table indexed * by A_<type> */ #define __A_LVAL 1 #define __A_EXPR 2   #define A_ISLVAL(a) (astb.attr[a]&__A_LVAL) #define A_ISEXPR(a) (astb.attr[a]&__A_EXPR)```

The `astb` global variable also stores other info. We will discuss some of that extra info later in this chapter.

## Creation of an AST

ASTs are created in many places of the compiler but semantic analysis is going to create the initial set that will be used by further analysis and transformations in the compiler.

### Low level creation

To create an AST node we can use the function `new_node`. This is a low level function that will return a new node with the given kind.

```ast.cint new_node(int type) { int nd;   nd = STG_NEXT(astb); if (nd > MAXAST || astb.stg_base == NULL) errfatal(7); A_TYPEP(nd, type); return nd; }```

This function basically allocates a new ast in `astb`. The macro `STG_NEXT` expands to the call `stg_next((STG *)&name.stg_base, 1)`. The function `stg_next` is a bit too long to explain today but basically reserves a number of data and returns the first element allocated, as an integer of course. It is a generic function but as it is applied to `astb` it is used to allocate ASTs.

Due to implementation details elsewhere there is a maximum number of ASTs we can allocate, so we check this. The number is fairly large 0x0400_0000 (and also gives us 5 extra bits in the top of the identifier if we ever need them).

`#define MAXAST 67108864`

Next thing this function does it setting the kind of the node. The kind of the node is called `TYPE` in flang. `A_TYPEP` is the setter we need to use.

### Constructors

While `new_node` will give us a node it is pretty bare. Most of the time our nodes have essential information that we will always need to create them correctly. For instance a binary operation always will have the kind of operation performed, the left hand side, the right hand side and the type of the operation. As such flang provides many helpers that ease creating those
nodes. Their names are of the form mk_*.

Flang seems to distinguish, conventionally, the creation of statement ASTs from other ASTs. So there is a generic `mk_stmt`. The code uses this to signal the AST is actually a statement. As a special element it receives a `DTYPE` to state the data type of the statement. Statements where there is not an actual data computed, like control flow statements, are created with a data type of 0.

```ast.cint mk_stmt(int stmt_type, DTYPE dtype) { int ast;   ast = new_node(stmt_type); if (dtype) A_DTYPEP(ast, dtype); return ast; }```

Some statements have dedicated constuctors like `mk_assn_stmt`, that is used to create an assignment statement. You can see the data type, the destination (the left hand side of the assignment) and the source (the right hand side). `dest` and `source` are ASTs (although not made obvious in the prototype).

```int mk_assn_stmt(int dest, int source, DTYPE dtype) { int ast; ast = mk_stmt(A_ASN, dtype); A_DESTP(ast, dest); A_SRCP(ast, source); return ast; }```

## Statement descriptor

Flang ASTs are going to represent two different things:

• Expressions
• Statements

Expressions will have a type and denote a computation which mostly revolves about evaluating the expression to determine its value. Statements are instead executed, rather than evaluated. Or put differently, they are evaluated just for their side-effects. Because statements have side-effects the order in which we execute them may be important.

Sometimes it does not matter because there is no way to tell. For instance in the example below, it does not matter that `A = 3` is executed after or before the first PRINT statement, we can’t simply tell the order. But we can’t run the second `PRINT` before the first `PRINT` or execute the assignment after the second `PRINT` as that would be an observably different execution of the program.

```A = 3 PRINT *, "HI" PRINT *, A```

Because not all ASTs are statements, flang decouples information that is strictly for statements in another structure called the Statement Descriptor (STD). It is an attribute of an AST that represents a statement and it is accessible via `A_STDP` and `A_STDG` as expected.

The STD data structure is relatively simple. The most relevant attributes are the the following.

Attribute Meaning Accessor
`ast` The AST that is described by this STD. Contains an AST identifier `STD_AST(std)`
`next` The STD that follows this STD. This is a STD identifier `STD_NEXT(std)`
`prev` The STD that precedes this STD. This is a STD identifier `STD_PREV(std)`
`label` The label symbol of this statement if it has one. This is a SPTR `STD_LABEL(std)`
`lineno` The line number of this statement `STD_LINENO(std)`
`findex` This is the file index. It identifies the file that contains the statement. It changes because of `INCLUDE` lines and preprocessor `#include` directives. The identifier is a FIH (File Information Header) that we haven’t discussed yet. `STD_FINDEX(std)`

### Creation of an STD

To create an STD for a an AST the function `mk_std` can be used. This will only link (bidirectionally) the STD and the AST but we still need to place the statement itself.

In the early stages of the compiler, the semantic actions of the parser are responsible for most of the creations of ASTs and STDs in the compiler. Because the creation of statements often happens in the order of the program there is a convenient function `add_stmt` that will do a `mk_std` to the new AST and then place the STD as the last one. Sometimes AST of statements have to be placed before or after other STDs and this can be achieved with helpers like `add_stmt_after` and `add_stmt_before`.

The file `ast.c` contains many other functions useful to manipulate placement of statements. Note that where the code expects an statement it means an identifier of an STD, not an AST.

## Summary

In this chapter we have seen the structure that flang uses to represent the computations performed by our Fortran program.

• The computation is represented using an AST. An AST is a data structure formed by AST nodes.
• There are about 160 different kinds of AST nodes. Each kind represents some different form of computation, ranging from simpler ones like just references to variables or constants to more complex ones like binary operands and statements.
• Each kind of node has different attributes. Each attribute has its own accessors. Some of these attributes are ASTs, SPTRs or DTYPEs.
• Some ASTs represent Fortran statements, where the order is important. The extra information of statements is decoupled from regular AST in a data structure called STD.
• STDs concern about ordering of the fortran statements.

In the next installment of this series we will look at ASTs a bit more, in particular we have not considered statements that change the control flow.

### 2 thoughts on “Walk-through flang – Part 7”

• Rajesh Kumar Singh says:

Hello,

Thanks a lot for publishing this full if knowledge series about flang compiler. I have gone through all your posts on flang and I am looking for your next post on this topic.

Regards
Rajesh

• Roger Ferrer Ibáñez says:

Thanks for the support Rajesh!

This site uses Akismet to reduce spam. Learn how your comment data is processed.