In the last installment of this series we saw how to verify that the sequence of tokens of the input is syntactically valid. Today we will see what we need to give it meaning.
Semantics concerns to the meaning of the program. This means that our sequence of tokens, once they follow some syntax, have meaning in the context of the programming language. In part 1 we gave a more or less abstract semantics of tiny. Now, as compiler writers, it is up to us to materialize such semantics in an implementation that fulfills it.
If you recall part 2, the final goal of our front end is creating a GENERIC tree and handing it to the rest of the compiler. Let's talk about bit more about GENERIC trees.
GENERIC trees are represented using the type
tree. A tree can be a
NULL_TREE or point to an actual tree. Each tree has a tree code that is specified at the moment of creation. Given a tree we can use the macro
TREE_CODE to get the tree code. Most trees, but not all, have a location that we can obtain using the macro
EXPR_LOC, if it does not have location it will return
Trees are created using macros
build5. The first parameter of each
buildN macro is the tree code and the remaining
N arguments are trees, called the operands. As an alternative
build5_loc can be used instead to create a tree along with a location. The location goes in the first argument and the remaining arguments are the same as in
Despite their name, GENERIC trees do not collectively form a tree but a graph. This happens because it is not an error that a tree appears as the operand of two or more trees.
Each tree of a specific tree code may have associated several attributes. These attributes are accessed using macros. Most of these macros expand in a way that can be used to set the attribute to the tree. So given a tree
t, an attribute can be queried doing
SOME_TREE_PROPERTY(t) and can be set doing
SOME_TREE_PROPERTY(t) = property. These attributes are of different nature, sometimes are other trees, sometimes are boolean values (zero or nonzero), etc.
GENERIC trees are used to represent many aspects of a program but there are three important classes of trees: declarations, expressions and types.
Declarations are used to tell the compiler about the existence of something. Variables go into a tree with code
VAR_DECL. Labels of the program (used for
gotos) go into a
LABEL_DECL. tiny does not have functions explicitly but if we declare a function, it goes into a
FUNCTION_DECL and each of its parameters would be represented using
Expressions represent trees that can be evaluated. There are a lot of tree codes related to expressions that we will see later. One distinguished node,
error_mark_node, will be used as a marker for erroneous trees that may appear during semantic analysis. Given a tree
t, the macro
error_operand_p(t) returns true if
Finally, types represent data types. They are represented as trees because most type systems have a recursive structure that fits well in a graph-like structure like GENERIC. Type trees are heavily reused in GENERIC. In tiny we will need tree types for int, float, boolean and strings. Expressions and declarations have type and it can be accessed using
GENERIC is an intermediate representation that is heavily biased towards a C model of execution (like a relatively high-level assembler). The reason is that GCC was originally a C compiler that later on was extended to support other programming languages. Imperative programming languages, like tiny, fit relatively well in GENERIC. Other programming languages, like functional ones, do not fit so well in GENERIC and a front end for such languages likely uses its own representation that ends being lowered to GENERIC.
Tiny is so simple that we can use GENERIC trees almost directly. Almost, because not all GENERIC trees may have locations so we will pair a tree and a location, to make sure we have a location. Getting the GENERIC tree is, then, as simple as requesting the tree member of the pair. We want to have location in all trees for diagnostic purposes.
In order to ease using GENERIC trees, we will use a
Tree class (mind the uppercase) that will be a very thin wrapper to
A GENERIC tree is actually a pointer, so comparison by identity is possible. For simplicity, let's teach
Tree to do identity comparisons as well.
For convenience we will also wrap the creation of
Trees into a set of
build_tree overloaded functions.
In the definition of tiny we also talked about a stack of mappings from identifiers to values that we collectively called the scope. Note that the mappings in the scope, as defined in the tiny definition, are a dynamic entity so the exact value of the mapping will likely not be known at compile time. That said, the mapping itself must exist. We will represent this mapping in a class called
SymbolMapping. It will map identifiers (i.e. strings) to
SymbolPtrs (later on we will see what is a
As you can see it is a very thin wrapper to a map of strings to Symbol (for this reason sometimes a structure like this is called a symbol table).
SymbolMapping::insert adds a new
Symbol into the map using its name as the key. It also checks that the name is not being added twice: this is not possible in tiny.
SymbolMapping::get returns the mapped
Symbol for the given string. Since it may happen that there is no such mapping this function may return a nul
Scope is, as we said, a stack of
We can manage the current symbol mapping using
Scope::pop_scope(). The former will be used when we need a fresh mapping (as it will happen when handling
Scope::get_current_mapping returns the current mapping (i.e. the one that was created in the last push_scope that has not been popped yet).
Scope::lookup is used to get the last mapping for a given string (or null if there is no such mapping).
We have to traverse the stack from the top (end of the
MapStack) to the bottom (beginning of the
MapStack), so we use a
reverse_iterator for this.
Scope::pop_scope have obvious implementations.
We will use the class
Symbol to represent a named entity of a tiny program. So far the only named entities we have in tiny are variables. Other languages may have types, constants and functions in their set of entities with names.
Symbol class would be used as well for such entities.
There will be a single Symbol object for each named instance, so this class is mostly used by reference. Similar to what we did with tokens in part 3, we will define
const_SymbolPtr as smart pointers. We have already used
SymbolPtr in classes
Tiny is so simple that we only need to keep the name of a symbol (something slightly redundant since GENERIC will have the name somewhere as well) and the associated
VAR_DECL tree. In a language with other kind of symbols we would probably want to keep the kind of the symbol and we would probably store other kind of
gcc-src/gcc/tiny directory now looks like this.
Today we will stop here. We have seen the objects that will be required for the semantic analysis itself. In the next part we will change the parser to generate GENERIC trees that will represent the semantics of our program.
That's all for today.