Walk-through flang – Part 7
In previous chapters we saw how the input source was lexed, parsed and semantically analysed and we looked at how the symbols and data types are represented. But we haven't looked at what happens once the semantic analysis finishes. In this installment we're going to talk about the AST.
Abstract Syntax Tree
Fortran has two kinds of statements: non-executable and executable. The former declare properties about entities of the program and we saw in the previous chapter that these entities or symbols are represented in a symbol table. There is usually not much more to do beyond registering and remembering the attributes implied by a non-executable statement.
Executable statements, on the other hand, describe the computation that our Fortran program has to do. So we need a mechanism to represent that computation. At some point we will use this representation to generate a program that does exactly the intended computation by the Fortran source. A common way to represent this is using an AST, or abstract syntax tree.
ASTs are called abstract because they do not represent all the lexical and syntactical details of the source. Instead, they represent the fundamental parts of the language that are going to be relevant for the computation. The flang AST does exactly that.
Example
Lets consider this very simple function below
1
2
3
4
5
FUNCTION ADD(A, B)
INTEGER :: A, B, ADD
ADD = A + B
END
Symbol table
In the last chapter we talked about the symbol table but we didn't consider any specific example neither we attempted to dump any of the symbol tables. Now it is a good opportunity to do this.
The flang driver allows us to pass debug flags to flang1
using -Hq,x,y
(and to flang2
using -Mq,x,y
). The numbers x
and y
are documented in the flang documentation (file coding.html
). We can dump the symbol table doing -Hq,5,1
When we request debug information, flang will create a file named <name>.qbf
for every given <name>.f90
compiled. In our case test.qdbf
.
</code> The dump of the table is not super easy to read but basically we see four entries. Each entry starts with the name of the symbol and the kind of symbol. The next lines are several attributes of the symbol. The set of attributes dumped changes for each kind of symbol.
In our example we only have two kinds of symbols: a first add
which is an entry (which is the way flang names a FUNCTION
) then a
, b
and another add
. The second add
is there because it represents the result-name of the FUNCTION
(had we used RESULT(myname) in the FUNCTION statement we would not have had repeated symbol names). In general repeated names are not a problem (sometimes it happens in Fortran) but it in this case, note that the scope of the variables is 624 which is the id of the symbol. The id of symbols that entail scopes like FUNCTION
, SUBROUTINE
, MODULE
is used to define a scoping relationship. Also note that the variables are marked to have a storage-class of SC_DUMMY. In Fortran parlance a dummy-argument is a formal parameter.
AST
Ok, now we have seen what flang knows (or at least shows in the dumps) about the symbols. Let's see the AST it generates. We can get the AST using -Hq,4,256
. It is possible to combine several -Hq
flags so we can see at the same time several internal dumps.
For reasons I have not looked into them yet, the AST contains references to a couple of preregistered symbols .sqrt
and .dsqrt
(they look like to related to the corresponding Fortran intrinsics) and a couple of constants 0 and 1. Why no other preregistered symbols or constants appear is a bit puzzling to me. That said, the interesting bits of this dump are the following
An AST is made of nodes. A node represents some computation. Sometimes the computation is almost no computation, like just a constant or a reference to a variable. Some other times the computation is compound of other computations. This is, other nodes. Recall that in line 4 our Fortran program is doing
4
ADD = A + B
It is possible to relate that statement to the AST dump above. Let's see, that statement is in Fortran parlance an assignment statement. Flang represents it using a node for assignments.
assign
is the kind of the node, in this case this node represents an assignment. The type
of the operation is integer. aptr
means the id
of this node (remember that these id, even if just numbers, are unrelated to the id's used for symbols or data types and the precise id number could be the same). Now we have the operands of this node, which are other computations, i.e. other AST nodes. In this case the destination (dest
) is encoded in the node 15 and the source (src
) in the node 14. What are the destination and the source of an assignment? They are the left and right-hand side of the assignment, respectively.
The left-hand side is ADD
in our Fortran program. Should be encoded in the tree 15.
Indeed, we could check the symbol table above and see that 627 is exactly the result name add
. So good so far.
The right-hand side of our assignment is A + B
and it is encoded in the tree 14.
This node is a bit more interesting because it is a binary operation (binop
) (again of type integer
). The left and right-hand sides of a binary operations are named by flang as lop
and rop
respectively. We can check the corresponding AST nodes.
Looks sensible.
Kinds and attributes of AST
Flang has about 160 ASTs. They are defined in the file tools/flang1/utils/ast/ast.n
. This file is in a troff-like syntax which is not particularly readable. The form is relatively simple and we can extract information using grep.
File tools/flang1/utils/ast/ast.n
is also used to generate a similar one in the build directory named tools/flang1/utils/ast/ast.out.n
. The latter
includes extra information of the memory layout of the nodes. However the relevant attributes should already be documented in the former. Theoretically it should be possible to generate Sphinx documentation from this file (like it is already done with other similar files) but this does not seem to be implemented yet.
For each .SM
entry in ast.n
, look at the following entries .SI
, .SE
, .FL
and .OV
.
Entry | Meaning | Accessor | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
.SM NAME |
Kind of AST node: BINOP , UNOP , ID , etc.. |
A_TYPEG(ast) will return NAME .A_TYPEP(ast, NAME) is possible but less common. |
|||||||||||||||||||||
.SI name |
Textual description of the node: for debugging only | astb.atypes[A_TYPEG(i)]
</tr>
| |||||||||||||||||||||
.FL NAME |
Name of a flag (boolean value) of this node. | A_NAMEG(ast) A_NAMEP(ast, bool-value) |
|||||||||||||||||||||
.SE NAME |
Name of an attribute of this AST, usually returning an id of another AST or data type. | A_NAMEG(ast) A_NAMEP(ast, id) |
</tr>
|||||||||||||||||||||
.OV NAME |
Used for attributes of integer nature that are not an id of an entity (like a kind or a hash link). | A_NAMEG(ast) A_NAMEP(ast, value) |
</tr>
</tbody>
</table>
Attribute | Meaning | Accessor |
---|---|---|
ast |
The AST that is described by this STD. Contains an AST identifier | STD_AST(std) |
next |
The STD that follows this STD. This is a STD identifier | STD_NEXT(std) |
prev |
The STD that precedes this STD. This is a STD identifier | STD_PREV(std) |
label |
The label symbol of this statement if it has one. This is a SPTR | STD_LABEL(std) |
lineno |
The line number of this statement | STD_LINENO(std) |
findex |
This is the file index. It identifies the file that contains the statement. It changes because of INCLUDE lines and preprocessor #include directives. The identifier is a FIH (File Information Header) that we haven't discussed yet. |
STD_FINDEX(std) |
Creation of an STD
To create an STD for a an AST the function mk_std
can be used. This will only link (bidirectionally) the STD and the AST but we still need to place the statement itself.
In the early stages of the compiler, the semantic actions of the parser are responsible for most of the creations of ASTs and STDs in the compiler. Because the creation of statements often happens in the order of the program there is a convenient function add_stmt
that will do a mk_std
to the new AST and then place the STD as the last one. Sometimes AST of statements have to be placed before or after other STDs and this can be achieved with helpers like add_stmt_after
and add_stmt_before
.
The file ast.c
contains many other functions useful to manipulate placement of statements. Note that where the code expects an statement it means an identifier of an STD, not an AST.
Summary
In this chapter we have seen the structure that flang uses to represent the computations performed by our Fortran program.
- The computation is represented using an AST. An AST is a data structure formed by AST nodes.
- There are about 160 different kinds of AST nodes. Each kind represents some different form of computation, ranging from simpler ones like just references to variables or constants to more complex ones like binary operands and statements.
- Each kind of node has different attributes. Each attribute has its own accessors. Some of these attributes are ASTs, SPTRs or DTYPEs.
- Some ASTs represent Fortran statements, where the order is important. The extra information of statements is decoupled from regular AST in a data structure called STD.
- STDs concern about ordering of the fortran statements.
In the next installment of this series we will look at ASTs a bit more, in particular we have not considered statements that change the control flow.