In the last installment we saw how flang splits the input in tokens. Once we have the tokens identified we need to parse them.

Top level parser routines

The top-level parser routine in flang1 is called, not unexpectedly, parser. It is invoked by main right after all the initialization has happened for each program unit in the input file.

Parsing as we mentioned in the last chapter happens in two passes. A first one gets the token from the user code and as a side effect generates an intermediate file. This intermediate file is stored in the global variable astb.astfil, contrarily to what the name suggests it does not contain an AST per se but a simplified rendition of the tokens we found in the user code. The second pass gets the tokens from this intermediate file.

The first step is identified by the global variable sem.which_pass being set to 0. After a few more initialization steps, parser calls _parser. Where all the interesting stuff happens.

Function _parser is basically a big loop which parses a statement at a time. In the code below LOOP is just a macro that expands to while(TRUE). It starts by reading the first token of the statement. Recall that get_token will either tokenize (scan) the user code or get the token from the intermediate file depending on which parsing pass we are (we can expect, except for errors, that in the two passes the sequence of tokens obtained will be the same).

tools/flang1/flang1exe/parser.c
 static void
 _parser(void)
 {
   /* ... omitted lines ... */
   LOOP
   {
     parse_init();
     /* loop once for each token in current Fortran stmt: */
     tkntyp = get_token(&ctknval); /* first token of a statement */
     /* ... omitted lines ... */
   } /* end foreach statement LOOP */
 }

Parsing is commonly used to mean a language recognizer that is able to associate some actions, usually called semantic actions, when parts of the language are recognized. If our recognizer is only able to say whether the input is a valid Fortran program, then it is of little use for a compiler. We need to be able to react to the parts of the language because the designers have assigned meaning (semantics) to them. And this is what the Fortran parser has to do as well.

The grammar

The syntax of programming languages can be described using a grammar and Fortran is no exception. A grammar in this context is basically a set of rules (called productions) of the form Variable ::= Sequence-of-Tokens-and-Variables. For instance we can define the form of an assignment-statement in Fortran using the following rule.

<assignment> ::= <var ref> = <expression>

In the rule above for <assignment>, = is just a token and <var ref> and <expressions> are productions.

At this point we can unveil where the Fortran grammar used by flang is defined. It is in the file flang/tools/flang1/utils/prstab/gram.txt. This file does not define a grammar for the whole Fortran language but it does this at the level of statement. This is, the grammar faithfully describes the form of each statement but does not describe the overall form of the program unit or constructs that are compound of several statements (like an if-construct or a do-construct). The syntax of those will have to be handled elsewhere.

The top-level rule of the grammar is <stmt>, because as stated above the grammar only describes the syntax of a single statement.

flang/tools/flang1/utils/prstab/gram.txt
<stmt> ::= <stbeg> <statement> <stend>

<stbeg> ::=

<stend> ::=

<statement> ::= <prog title>  |
                <nii> <nim> <entry statement> |
                <declaration> |
                <nii> <nim> <simple stmt> |
                <nii> <nim> <GOTO stmt>   |
                <nii> <nim> <control stmt> |
                <nii> <nim> <format stmt>  |
		<null stmt> |
                <end> <end stmt>     |
                <empty file>   |
		INCLUDE <quoted string> |
		<nii> <nim> OPTIONS |
		<nis> <nii> CONTAINS |
		<directive>

# ... many more lines omitted ...

As you can see a rule may have several variants or alternatives. These alternatives are numbered starting from one, this is <statement> ::= <prog title> is the alternative 1, and <nii> <nim> <entry statement> is the alternative 2. This numbering schema will be relevant when we talk about semantic analysis later.

You may be wondering how the tokens used in the grammar (like CONTAINS above) are related to the internal tokens used by the scanner (i.e. TK_CONTAINS). The relationship is defined in the file flang/tools/flang1/utils/prstab/gram.tki.

flang/tools/flang1/utils/prstab/gram.tki
END   TK_EOL
<END stmt> TK_ENDSTMT
<empty file> TK_EMPTYFILE
<id name>  TK_IDENT
<letter>   TK_LETTER
<integer>  TK_ICON
<int kind const>  TK_K_ICON
<real>     TK_RCON
<double>   TK_DCON
<fmtstr>   TK_FMTSTR
<quad>   TK_QCON
<complex>  TK_CCON
# ... a few lines omitted ...
'//' TK_CONCAT
'(/' TK_ACB
'/)' TK_ACE
(      TK_LPAREN
)      TK_RPAREN
+      TK_PLUS
-      TK_MINUS
*      TK_STAR

# ... many more lines omitted ...

A few files are generated at build time using these two files via the tool prstab found in tools/flang1/utils/prstab (we are going to ignore that tool because it is written in a style that makes it frankly unfathomable, we may go through it some day but not in a near future).

  • gramtk.h is a bunch of defines for the tokens used by the scanner.
  • gramtk.h
    #define TK_EOL                                 321
    #define TK_ENDSTMT                              38
    #define TK_EMPTYFILE                            52
    #define TK_IDENT                                56
    #define TK_LETTER                               63
    // ... and many more ...
  • gramnt.h is a bunch of defines for the rules of the grammar. Here nt means non-terminals as the tokens are often called terminals (as they cannot generate anything). Those identifiers are not directly used by the code but the parser, as we will see below, will need to internally represent them.
  • gramsm.h is a bunch of defines for the different alternatives of the rules of the grammar, as mentioned above the different alternatives of a rule are numbered starting from 1.
  • gramsm.h
    #define SYSTEM_GOAL_SYMBOL1                      0
    #define STMT1                                    1
    #define STBEG1                                   2
    #define STEND1                                   3
    #define STATEMENT1                               4
    #define STATEMENT2                               5
    #define STATEMENT3                               6
    #define STATEMENT4                               7
    #define STATEMENT5                               8
    #define STATEMENT6                               9
    #define STATEMENT7                              10
    #define STATEMENT8                              11
    #define STATEMENT9                              12
    #define STATEMENT10                             13
    #define STATEMENT11                             14
    #define STATEMENT12                             15
    #define STATEMENT13                             16
    #define STATEMENT14                             17
    #define III1                                    18
    // ... and many more ...
  • gramdf.h is a set of tables used by the parsing algorithm that we will discuss below.

Parser algorithm: LR(1)

As a I mentioned above the parser will use a grammar to parse each statement. In computer science, grammars are generative devices, i.e. they define a set of words that will ultimately be the language. What we really want, though is, given a grammar that generates Fortran statements obtain an algorithm that recognizes Fortran statements. And not only this, but we want the recognizer also allows us to associate semantic actions to the different parts of the statement (i.e. parse properly).

There is an abundant literature of algorithms that given a grammar can in many cases generate an algorithm that can recognize the language generated by the grammar. In the particular case of flang the algorithm used, as I mentioned above, is a bottom-up parser. In a bottom-up parser we read tokens but we don't immediately react to them until we identify that they form a part of the language (this step is called a reduction). For instance if we see the following sequence of tokens (end-of-statement is represented with ↲).

A = 1↲

we know that after the token ↲ we have seen an assignment-statement and we want to react accordingly. A compiler typically will create some internal representation that means assign 1 to the variable A (how it does this is not important at this point yet). Recognizing this fact means that we can reduce A = 1 into the <assignment>.

The rule <assignment> used in the examples is not exactly the one used by flang. The real one includes a couple more elements that I omitted for the sake of the explanation.

This reduction process happens as the input is read. Advancing through the input is called shift (which explains why bottom-up parsers are often called shift/reduce parsers). In the example below ↓ will mean the position of the input, which starts at the beginning of the statement and ends after the end-of-statement. I am using a simplified form of what happens really in the flang grammar, but should be enough to understand the process.

↓A=1↲                             [shift]
A↓=1↲                             [reduce: <var ref> ::= A]
<var ref>↓=1↲                     [shift]
<var ref>=↓1↲                     [shift]
<var ref>=1↓↲                     [reduce: <integer constant> ::= 1]
<var ref>=<integer constant>↓↲    [reduce: <expression> ::= <integer constant>]
<var ref>=<expression>↓↲          [reduce: <assignment> ::= <var ref> = <expression>]
<assignment>↲↓                    [shift leads to end of parsing]

The key in a bottom-up algorithm is knowing whether we have to shift or reduce and if reducing, which rule of the grammar has to be used. Once a reduction has been chosen, all the elements of the right hand side of the rule (recall it is a sequence of tokens and rules) will appear immediately before (to the left of) the ↓ so we just replace all them with the left hand side of the rule used for the reduction. The exact details are beyond the scope of this post. But suffice to know for now that the algorithm used by flang is a LR(1) (you may have heard of LALR(1) because of yacc/bison, LR(1) is just a more general version of it).

So now _parser uses the LR(1) algorithm to parse the statement. We can complete a bit more the function _parser now.

tools/flang1/flang1exe/parser.c
 static void
 _parser(void)
 {
   /* ... omitted lines ... */
   LOOP
   {
     parse_init();
     /* loop once for each token in current Fortran stmt: */
     tkntyp = get_token(&ctknval); /* first token of a statement */
     /* ... omitted lines ... */
     LOOP
     {

        /* Determine if there is a reduction to be done otherwise just shift */
    
       tkntyp = get_token(&ctknval); /* next token in the statement */  
     } /* end foreach token LOOP */  
   } /* end foreach statement LOOP */
 }

The complicated part above is determining if there is a reduction to be done. To do this the LR(1) algorithm uses a finite automaton that represents all the possible things that can be before the ↓. The particular state is represented in the flang parser using the variable cstate. The initial state for parsing is the state 1 and is reinitialized per every statement in the function parse_init (among other stuff used by the LR(1) algorithm itself).

Note that a rule with alternatives like <rule> ::= A | B | C | … will actually imply several possible reductions <rule> ::= A, <rule> ::= B, <rule> ::= C, . So the set of all possible reductions in the parser is all the alternatives of all the rules encoded in the defines of gramsm.h above (e.g. STATEMENT1, STATEMENT2, etc.).

In each state of the LR(1) automaton there will be a number of potential reductions (I repeat it here: each reduction is one rule alternative of the grammar) that can be done. The table fred tells us the set of those reductions. The 1 in LR(1) means that we use one token of look ahead to know exactly which of the reductions is to be done. The table fred is actually a table of pairs (start, end) that define the potential reductions associated with the current state.

tools/flang1/flang1exe/parser.c
start = fred[cstate];       
end = fred[cstate + 1] - 1; 
if (start > end)            
  break; /* no reduction */ 

Now the parser needs to check each reduction against the current token (in tkntyp). So it loops each possible reduction.

tools/flang1/flang1exe/parser.c
for (i = start; i <= end; i++) { // for each reduction
  jstart = lset[nset[i]];        
  jend = lset[nset[i] + 1] - 1;

The table nset maps from the reduction alternative to a look ahead set. I'm not exactly sure but I think this allows mapping different reduction alternatives to the same set of look ahead tokens (it looks plausible to me having different reductions in different states that end using the same set of look ahead tokens). With that look ahead set, we can query the lset table. This is a table of tuples again (like fred above) indexed by the look ahead set that denote a range of tokens in the table of look ahead tokens ls (more on this table below). Now the code does a binary search where it searches the token tkntyp in the look ahead set of the current reduction.

tools/flang1/flang1exe/parser.c
 while (jstart <= jend) {  // for each token    
   ptr = (jstart + jend) >> 1; 
   t = ls[ptr];                
   if (t == tkntyp)            
     goto perform_reduction;   
   if (t < tkntyp)             
     jstart = ptr + 1;         
   else                        
     jend = ptr - 1;           
 } // for each token
} // for each reduction
break; /* no reduction found */  

The code above uses the table ls. The table ls is a sequence of sorted sequences of token identifiers. So the tuple in lset tuple points us to the start and end of one of those sorted sequences. That range is then used for the binary search itself. If the token is found the parser goes to perform_reduction. Otherwise it tries all the other reductions. If no reduction is found then it will just proceed to do a shift.

A non-obvious precondition here is that the grammar fulfills the requirements of LR(1) which means that, for a given state, it is not possible both to have to do a reduction or a shift or two different reductions. This is why the code can eagerly assume that if a reduction is found that one is the right one to execute. Conversely, if no reduction is possible the only thing left to do is to shift.

Perform a reduction

Performing a reduction entails replacing a number of elements before ↓ with the right hand side of the reduced rule. So we keep in rednum the reduction identifier (the look ahead is not relevant anymore here). The table prod will map the current reduction with a rule alternative as defined above in gramsm.h (e.g. STATEMENT1, STATEMENT2, etc.). We also need to keep the current token in a global variable for communication purposes with other parts of the code that we will see below.

tools/flang1/flang1exe/parser.c
perform_reduction:    
  rednum = prod[i];   
  sem.tkntyp = tkntyp;

One of the nice things of the LR(1) algorithm is that it is possible to use a simple stack to represent the sequence of elements before ↓. Shifts will push new values onto the stack and a reduce will pop 0 values or more from it. flang1 actually uses two stacks (both with the same size): pstack where the states of the LR(1) automaton are stored and sst where the semantic values are stored. More on this later. The variable stktop tracks the top of the two stacks. Table len contains the number of elements on the right hand side of the rule alternative. For the example above, this means going from <var ref> = <expression>↓ to <assignment>↓. A table called len gives us the length of the current reduction. For the case of <assignment> its length would be 3.

tools/flang1/flang1exe/parser.c
newtop = stktop - len[rednum] + 1;

Now that we have identified some part of the language a semantic action can be run.

tools/flang1/flang1exe/parser.c
if (rednum < SEM2)                                 
  p_semant1[sem.which_pass](rednum, &sst[newtop]); 
else if (rednum < SEM3)                            
  p_semant2[sem.which_pass](rednum, &sst[newtop]); 
else if (rednum < SEM4)                            
  p_semant3[sem.which_pass](rednum, &sst[newtop]); 
else if (rednum < SEM5)                            
  p_semantio[sem.which_pass](rednum, &sst[newtop]);
else                                               
  p_semsmp[sem.which_pass](rednum, &sst[newtop]);  

Flang classifies the reductions in 5 sets depending on their number. Each set is defined in the grammar via the special separator .B (possibly means block but whatever). Variables p_semant1, p_semant2, p_semant3, p_semantio and p_semsmp are a tuple of pointer to functions. There is a pointer per parsing pass (first or second pass). The pointer points to a semantic function that is given the rule alternative and the new top of the stack (before we add the rule of the reduction to it).

After the semantic action has been executed (which is not strictly related to the LR(1) algorithm itself per se, just a process that fits very conveniently so we can actually parse) we have to update the state of the LR(1) automaton to represent the new state. The automaton must now represent the state that results from having applied the reduction. The table lhs returns the rule identifier (the one that contains the particular rule alternative rednum) as defined in gramnt.h above. Function next_state computes the next state given the current state (in pstack) and the next symbol. For shifts, as we will see below, this symbol is a token but for reductions is a rule because we have somehow read a whole rule (this is, something that the rule can generate).

tools/flang1/flang1exe/parser.c
nstate = next_state(pstack[newtop - 1], (int)lhs[rednum]);
if (nstate == NOSTATE)
  goto issue_error;
else {
  cstate = nstate;
  pstack[newtop] = nstate;
  stktop = newtop;
}

If the transition is invalid for the current state and symbol, next_state returns NOSTATE which means a syntactic error. This is handled later.

Shift

Regardless of whether a reduction has happened or not a shift always happens. Shifting is much simpler because in practice means just updating the LR(1) automaton to the new state.

tools/flang1/flang1exe/parser.c
nstate = next_state(cstate, tkntyp);

At this point the parser has to handle a syntactic infelicity in the flang grammar: complex constants. Complex constants in Fortran 90 are of the form (part, part) where part is either a constant name (like FOO) or a signed integer or real constant. So complex constants like (-1.2, +3.4) are valid but not (-1.2 + 3.4, 3.4). Flang, though, accepts the latter case as an extension. Or maybe because it is not feasible to constrain the grammar for this case (the result might not be a grammar valid for LR(1)) or it is too difficult to change the scanner to handle a complex constant as a whole token. The net result is that in some cases a comma (,) that does not have a valid transition may have one if a complex comma is used instead.

flang/tools/flang1/utils/prstab/gram.txt
elp ::= (
constant ::=  # several other rules 
   <elp> <expression> <cmplx comma> <expression> )

So the parser gives a second chance to a troubled comma.

tools/flang1/flang1exe/parser.c
if (nstate == NOSTATE) {
  /* tpr 535
   * the grammar cannot be modified to support complex
   * constants of the form '( const-expr , const-expr )' but
   * can modified if a special token is returned for ',' (i.e.,
   * a "complex comma").
   * if a syntax error occurs when the current token is a comma,
   * check if a "complex comma" is legal; if so, continue
   * by parsing as if we have a complex constant, and semant
   * will determine if the real & imag parts are constants.
   */
  if (tkntyp == TK_COMMA) {
    nstate = next_state(cstate, TK_CMPLXCOMMA);
    if (nstate != NOSTATE) {
      if (DBGBIT(1, 1))
        fprintf(gbl.dbgfil, ">>> comma changed to complex comma %d\n",
                gbl.lineno);
      goto read_trans;
    }
  }

otherwise a NOSTATE at this point just means a syntactical error that must be handled at this point (note that this label is inside the block of the if (nstate == NOSTATE)).

tools/flang1/flang1exe/parser.c
issue_error:
  error(34, 3, gbl.lineno, prettytoken(tkntyp, ctknval), CNULL);
  sem.psfunc = FALSE; /* allow no stmt func defs */
  break;
}

The variable sem.psfunc is used for a deprecated feature in Fortran called statement-function-statement which we can ignore for now. It is just enough to know that they look like assignments so they have to be handled very carefully. Not sure, but probably the parser disables them at this point for sanity later.

If the transition is a valid one we just increment the stacks, allocating more space for them if needed. We keep the new state and the semantic value of the current token. We finally move to the new state by updating cstate.

tools/flang1/flang1exe/parser.c
stktop++;
if (stktop >= sst_size) {
  sst_size += SST_SIZE;
  pstack = (PSTACK *)sccrelal((char *)pstack,
                              ((UINT)((sst_size) * sizeof(PSTACK))));
  sst = (SST *)sccrelal((char *)sst, ((UINT)((sst_size) * sizeof(SST))));
  assert(pstack != NULL, "parser:stack ovflw", stktop, 4);
  assert(sst != NULL, "parser:stack ovflw", stktop, 4);
}
pstack[stktop] = nstate;
SST_SYMP(&sst[stktop], ctknval);
cstate = nstate;                                                                          

At this point we may have to end the parsing for the current statement in case we reach an end of line. The flag endflg is set to 1 if we have just finished the current program unit, in which case the parsing ends (main will reinvoke parser to parse the next program units). Also the code checks whether we have reached the end of the file when the program unit has not ended yet: this is a severe error reported using errsev. The extra check for !scn.multiple_stmts is there because gbl.eof_flag may have been set when reading a line with more than one statement (separated by ;). It will have been unset when the scanner moves on to the last statement of the line, though.

tools/flang1/flang1exe/parser.c
if (tkntyp == TK_EOL) {
  if (endflg == 1)
    goto parse_done;
  if (!scn.multiple_stmts && gbl.eof_flag) {
    errsev(22);
    sem.mod_cnt = 0;
    sem.mod_sym = 0;
    sem.submod_sym = 0;
    goto parse_done;
  }
  break;
}

Semantic actions

When the parser recognizes syntactically a part of the program we want to be able to run some semantic actions with that part. These semantic actions include extra checks that cannot be represented in the grammar (i.e. redeclaring a variable) and also to do something with these parts of the language like creating an intermediate representation that will be later used to generate code.

Semantic actions are executed when a reduction happens. Conceptually it is like calling a function in which we pass the semantic values of the right hand side of the reduced rule. We can also understand that the result of the semantic function will be another semantic value, computed as the semantic value of the left hand side rule of the reduced rule.

In practice there is not one function per reduction. If you recall above we saw this code:

tools/flang1/flang1exe/parser.c
if (rednum < SEM2)                                 
  p_semant1[sem.which_pass](rednum, &sst[newtop]); 
else if (rednum < SEM3)                            
  p_semant2[sem.which_pass](rednum, &sst[newtop]); 
else if (rednum < SEM4)                            
  p_semant3[sem.which_pass](rednum, &sst[newtop]); 
else if (rednum < SEM5)                            
  p_semantio[sem.which_pass](rednum, &sst[newtop]);
else                                               
  p_semsmp[sem.which_pass](rednum, &sst[newtop]);  

They are defined earlier in parser.c like this. Note that p_semant2 has an empty field for the first component of the tuple.

tools/flang1/flang1exe/parser.c
/*  Function pointers indexed by sem.which_pass */
static void (*p_semant1[])(int, SST *) = {semant1, semant1};
static void (*p_semant2[])(int, SST *) = {NULL, semant2};
static void (*p_semant3[])(int, SST *) = {psemant3, semant3};
static void (*p_semantio[])(int, SST *) = {psemantio, semantio};
static void (*p_semsmp[])(int, SST *) = {psemsmp, semsmp};

Also right after the first token of the statement (before we enter the loop for the rest of the tokens of the statement), p_semant1 and p_semant2 are updated (not sure why p_semant1[0] needs any fixing because apparently it is not modified anywhere else in the code).

tools/flang1/flang1exe/parser.c
if (sem.which_pass == 0) {
    /*
     * For the first pass, we want to semantically analyze expressions
     * only if they occur within a declaration statement.  Also, there
     * are semantic actions shared by the semantic analysis routines
     * for declaration and executable statements; for the first
     * pass, we only want these actions to be performed only if they
     * occur within the declarations statements.
     */
    if (is_declaration(tkntyp) ||
            (tkntyp == TK_GENERIC && sem.type_mode == 2) /* generic tbp */
       ) {
        p_semant1[0] = semant1;
        p_semant2[0] = semant2;
    } else {
        p_semant2[0] = psemant2;
    }
    gbl.nowarn = FALSE;
} else {
    if (is_declaration(tkntyp)) {
        /* warnings issued in first pass for declarations */
        gbl.nowarn = TRUE;
    } else {
        gbl.nowarn = FALSE;
    }
}

The special psemant2 pointer is just a function that does nothing. It is used for all the statements in the first parsing pass that are not declarations (i.e. executable statements like an assignment or an if-statement starting an if-construct). Similarly for functions psemant3, psemantio and psemsmp used above in the definition of p_semant3, p_semantio and p_semsmp respectively.

tools/flang1/flang1exe/psemant2.c
void
psemant2(int rednum, SST *top)
{
  SST_ASTP(LHS, 0);
}

These functions all receive the reduction rule as the first parameter and the stack of semantic actions as the second one, always called top (or otherwise some macros will stop working!, LHS above is just an alias of top using a macro). They are defined in the files with their own name (except for semant1 which is defined in semant.c). These routines can access the right hand side semantic values of the reduction using the macro RHS(n) where n starts from 1.

Typically these routines have a giant switch statement with the value of rednum to handle each reduction specifically. Recall that rednum will be one of the values defined in gramsm.h above.

The semantic value itself is a generic structure that allows encoding several types of data. We will see with more detail how these routines work in the next chapter.

Parsing example

It is possible to see the parser in action passing the following flags -Hq,0,1 -Hq,1,1 -Hq,2,1. Consider the following silly program:

test.f90
PROGRAM MAIN
    X = 1
    X = 2.3; PRINT *, X
END PROGRAM MAIN
$ flang -c test.f90 -Hq,0,1 -Hq,1,1 -Hq,2,1

The first statement PROGRAM MAIN is parsed with the following debug output.

F90 PARSER begins
-----  First Parse  -----

tkntyp: PROGRAM tknval: 0 lineno: 1 
     prod(   2) <stbeg> ::=
tkntyp: <id name> tknval: 0 (main) lineno: 1 
tkntyp: END tknval: 0 lineno: 1 
     prod(  33) <id> ::= <id name>
     prod(  45) <routine id> ::= PROGRAM <id>
     prod(  23) <prog title> ::= <routine id> |
     prod(   4) <statement> ::= <prog title>  |
     prod(   3) <stend> ::=
     prod(   1) <stmt> ::= <stbeg> <statement> <stend>

Lines starting with tnktyp is the token returned by the scanner as requested by the parser. Every line of the form prod( n) is a reduction (shifts are not shown in the debug because are usually uninteresting as nothing happens but updating the LR(1) automaton and the stacks). END is a special token that means end-of-statement and as such is not encoded in the rules.

As you can see many rules are reduced by they have empty right hand sides. These are called nullary reductions and their sole purpose is to cause a side-effect (e.g. changes in the internal semantic state or emitting diagnostics) when calling the semantic routines. Nullary productions are reduced in the right order by the parser once a valid token is found. You can also see that the parser may need to reduce several times before advancing the next token. This happens here once the token END has been scanned.

The next statement X = 1 performs a similar process.

tkntyp: <id name> tknval: 0 (x) lineno: 2 
     prod(   2) <stbeg> ::=
     prod(  19) <nii> ::=
     prod(  20) <nim> ::=
     prod( 765) <psfunc> ::=
tkntyp: = tknval: 0 lineno: 2 
     prod(  33) <id> ::= <id name>
     prod(  32) <ident> ::= <id>
     prod( 627) <var ref> ::=    <ident>  |
     prod( 765) <psfunc> ::=
tkntyp: <integer> tknval: 1 lineno: 2 
tkntyp: END tknval: 0 lineno: 2 
     prod( 665) <constant> ::=   <integer>  |
     prod( 597) <primary> ::= <constant> |
     prod( 584) <expression> ::= <primary>   |
     prod( 764) <assignment> ::= <psfunc> <var ref> <psfunc> = <expression>
     prod( 742) <simple stmt> ::= <assignment> |
     prod(   7) <statement> ::= <nii> <nim> <simple stmt> |
     prod(   3) <stend> ::=
     prod(   1) <stmt> ::= <stbeg> <statement> <stend>

Except for the nullary reductions, the process looks relatively similar to the reduction process we showed above when talking about the parsing. As I mentioned above, the nullary productions are used for the semantic process. Here nii means "not inside interface" (certainly you cannot have an assignment inside an INTERFACE construct) and nim means "not inside module" (an assignment is not valid in the specification part of a MODULE). The semantic actions of these nullary reductions will diagnose these cases.

A similar process happens with the rest of the statements: the line X = 2.3; PRINT *, X has two statements so you will see two parses. One for X = 2.3 and another for PRINT *, X.

At some point you will see

-----  Second Parse  -----

And the parser will apparently parse exactly the same statements as in the first pass with the same tokens. This time, though, the semantic actions called are different.

Wrap-up

  • The top-level parser process is invoked for each program unit.
  • Each statement of the program unit is then parsed individually using a LR(1) parsing algorithm.
  • The LR(1) parsing algorithm is generated using the grammar in the file gram.txt and the tokens in gram.tki. The tables required by the algorithm itself are generated by the tool prstab. This grammar describes the syntax of a single statement.
  • When the LR(1) parser reduces the input to a rule it invokes a semantic function.
  • different semantic functions may be used in the first and second pass of the parser.
  • The LR(1) parser only parses a single statement at a time. Thus, something else (spoiler: the semantic functions) will have to make sure that the sequence of statements make sense for a Fortran program.

In the next chapter we will see in more detail the semantic functions and all the semantic information used by the parser itself.