A tiny GCC front end – Part 11
Our tiny language features a few types:
string and arrays of those types. We can even declare new type names based on other types but it still missing a record type. Today we will address this.
Before we added arrays to tiny, the value of our variables was simple, atomic, non-structured at all. When we introduced arrays we let a single variable denote several values, but all the values have to be of the same type. We are missing the possibility of representing several values of different types: we need a record type (or tuple type).
Record types are constructed using an ordered set of types. We will need to refer the elements of such set in some way. We could access the field using some index derived from its order (e.g. the first element could be indexed using a zero, the second element using one). Usually, though, the elements of the record are given names, so the name can be used to refer to the element. While both approaches are feasible using names is probably easier for the programmer. In tiny we will use names.
Similar to arrays, we will need special syntax to define a record type. A record type is made of a sequence of pairs of names and types that we will call fields. First let's see how to declare a field.
〈field-declaration〉 → 〈identifier〉
A field declaration has the same syntax as a variable/type declaration but without an initial
type. A keyword is not needed because a field declaration will always appear inside a record type.
Recall that * means the previous element of the language repeated zero or more times
For instance we can declare a variable with a record type made of two floats x and y, like this.
In chapter 10 we introduced type declarations, so we can declare a
point type of record type.
This way we can now declare several points without having to repeat the record type. There is a reason for this that we will discuss below.
We need a way to access the field of a record type inside an expression or in the left hand side of an assignment. We will add a new 〈primary〉 expression.
〈field-access〉 → 〈primary〉
For instance, fields of
p2 can be accessed like this.
We still need to clarify the priority between an 〈array-element〉 and a 〈field-access〉. The following expressions (assuming they are valid given appropiate types for the variable
a and the field
are to be interpreted like
A record type is a type the values of which is the cartesian product of the values that can be represented by the fields of a record type. A record type with n fields named φ0, φ1, …, φn-1 where each field φi has an associated type τi will be able to represent a value (ε0, ε1, …, εn-1) where each εi is a value of the type τi. Given a value of record type, we can select a single value of it, in our case using its name.
Two values of record type are the same only if they come from the same declaration. This means that
p2 below have different types because their record types come from different declarations, even if their sequences of fields are the same.
p2 are the same, because their record type comes from the same declaration.
This kind of type equivalence is called
equivalence by name in contrast to
equivalence by structure. Both have advantages and drawbacks but most programming languages choose the former.
As we discussed in chapter 10, in a type declaration, we cannot use the type being declared in the type part. So this will be invalid.
The name of each field must be unique inside a record. It is possible to have a field with record type.
An expression of the form 〈primary〉
. 〈identifier〉 is a field access. The primary expression must have record type and 〈identifier〉 must be the name of a field of that record type. The type of a field access is the 〈type〉 as the type of the corresponding field declaration. A field access can be used as the left hand side operator of an assignment and can be used as the operand of a
Now that we have a specification of this extension, we can start implementing it.
We need to recognize two new tokens: a new keyword
record and the dot operator. So we add bot to our set of tokens.
Our existing machinery will handle
record, so only the dot must be tokenized. Given the current specification, the dot is relatively simple as long as it is not followed by a number, a
. in the code will be the token
DOT. This restriction makes sense as we want
.1 to be a
FLOAT_LITERAL not a
DOT followed by an
Parse a record type
To parse a record type we first have to be able to parse a field declaration. It is pretty straightforward. GENERIC represents field declarations using a
FIELD_DECL tree which simply has the name of the field and its type. We also have to make sure to mark the field addressable otherwise the
read statement will not work on fields. Note also that we pass a vector of field names so we can diagnose repeated field names (I'm using a vector because the number of fields is usually small and it does not pay to use a more sophisticated data structure).
Now that we can parse a field declaration, let's parse a record type. First lets extend
parse_type so it forwards to
parse_record when it finds the token
Parsing a record type is not particularly complex. Once we have skipped the
record keyword we keep parsing field declarations until we find an
end keyword. A record type in GENERIC is represented using a
RECORD_TYPE tree, so we will have to create first an empty
RECORD_TYPE tree. Field declarations must have their
DECL_CONTEXT set to this
RECORD_TYPE (so they know of which record type they are fields). The set of fields in a
RECORD_TYPE is chained using
TREE_CHAIN. The code simply remembers the first field and the last one so it can chain each field with the previous one. Finally the first field is used to set the
TYPE_FIELDS attribute of the
RECORD_TYPE. At this point we also need to request to GCC to lay out this type. The reason is that a
RECORD_TYPE will have to be represented in memory in a way that can hold all the field values, the function
layout_type makes sure each field gets the appropiate location in the record type.
Parse a field access
Parsing a field access is done by handling the dot as a binary operator with very high priority. So we assign it a high left binding power.
We will use a convenience function
is_record_type with the obvious meaning.
In GENERIC a field access is represented with a tree of kind
COMPONENT_REF, where the first tree is an tree expression of record type and the second tree is a
FIELD_DECL. Parsing a field access involves just checking that the left expression has a record type and the dot is followed by an identifier that is the name of a field of that record type. Recall that the list of fields of a
RECORD_TYPE is available in the
TYPE_FIELDS attribute. We traverse each
FIELD_DECL chaining through
TREE_CHAIN. Like all other declarations in GENERIC, a
FIELD_DECL has a
DECL_NAME which contains an attribute
IDENTIFIER_POINTER where we will find the name of the field. If we do not find a field with the given name, then this is an error, otherwise we create a tree COMPONENT_REF using the left tree (that we checked it is of record type) and the appropiate
Finally we must update
parse_expression_naming_variable because a COMPONENT_REF tree also names a variable. This way we can put it in the left hand side of an assignment or as the operand of a
And we are done. Let's try a simple program.
That's all for today