Compilation of array expressions in Fortran

As I stated in my previous post, Fortran 90 improved the array capabilities of Fortran. Today we will discuss what are the challenges when compiling array expressions.

Array expressions

A new feature of Fortran 90 is the possibility of expressing array-wise calculations. So it is possible to do this.

REAL :: A(N), B(N), C(N)
 A = 1.0
 B = 2.0
 C = A + B

The first two assignments are correct because Fortran 90 allows promoting (conforming in the Standard parlance) a scalar to an array of some rank. So they set all the N elements of A and B to 1 and 2 respectively.

The third assignment is maybe more interesting. C is assigned the sum of the N elements of A and B. So each C(I) will have the value of A(I) + B(I).

Not only intrinsic operations can be used in the context of an array expressions. A procedure (i.e. a function or subroutine) that is ELEMENTAL can be called with array arguments, all of the same rank or promoteable to it. If the procedure is a function, it will return an array as well. Many intrinsics are defined as ELEMENTAL and the user can define new ELEMENTAL functions. There are a few reasonable restrictions for ELEMENTAL functions: their arguments and return must be scalars and should not have side effects, so they can be coherently extended to any rank and the order of application of the function should not be relevant. Used judiciously, this can end in very compact and expressive code.

INTEGER :: A(N), B(N), SAD
 IF (ANY(A /= B)) THEN
   PRINT *, "Some value in A is different to the corresponding in B"
 END IF
 ! Sum of absolute differences
 SAD = SUM(ABS(A - B))

In the examples above the arrays have rank 1, but nothing prevents to use this for arrays of higher rank. Note though, that a lower-ranked array is not conformed to a higher-ranked array, only scalars can be conformed this way. This means that all arrays involved in the array expression will always have the same rank. Not only this, they must have the same SIZE for each rank. This means that the number of elements of each rank should match.

REAL :: M1(N, M), M2(N, M), M3(N, M)
 ! Addition of two matrices 
 M3 = M1 + M2

Also array sections can be used liberally in the expression as long as the sizes of the ranks match.

INTEGER :: A(N), B(N), C(N)

 A = A(N:1:-1)
  ! Reverses A

 C(1:2) = A(3:4) + B(5:4:-1)
  ! Does
  !   C(1) = A(3) + B(5)
  !   C(2) = A(4) + B(4)

 C(1:4:2) = A(3:10:4) + B(5:4:-1)
  ! Does
  !   C(1) = A(3) + B(5)
  !   C(3) = A(7) + B(4)

Finally all this can be applied to pointers as well.

Compiling array expressions

Seems obvious at first that an array expression like

A = B + C

should be ideally compiled as something like

DO I = 1, N
  A(I) = B(I) + C(I)
END DO

And functionally we want something like that, except that it does not always work. If you check the example above where we reversed an array doing A = A(N:1:-1) because doing

DO I = 1, N
  A(I) = A(N - I + 1)
END DO

is simply wrong.

A way that will always work will be doing first the right hand side of the assignment and keeping it into a temporary, and then move from the temporary to the left hand side. Retaking A = B + C a way could be

DO I = 1, N
  TMP(I) = B(I) + C(I)
END DO
DO I = 1, N
  A(I) = TMP(I)
END DO

Which looks wasteful. First because given that the sizes of the rank will match, the two loops will do the same number of iterations for each operand and operation. So we might want to compile this like:

DO I = 1, N
  TMP(I) = B(I) + C(I)
  A(I) = TMP(I)
END DO

This, though, still looks off, basically because in this case we do not need TMP at all.

DO I = 1, N
  A(I) = B(I) + C(I)
END DO

This is exactly our intuitive first approach.

Unfortunately this is not always correct. In the reversal of the array, though, we cannot fuse the two loops.

DO I = 1, N
  TMP(I) = A(N - I + 1)
END DO
DO I = 1, N
  A(I) = TMP(I)
END DO

Note also that this way of reversing an array is rather uninspired as it would be enough iterating up to N/2 and then just doing A(I) = A(N - I + 1).

Dependencies

We have seen that we may need to introduce a temporary in order to evaluate the right hand side of an array assignment. The question is, can we avoid this and directly write to the result? We have seen that there are cases where this is feasible, can we tell which ones?

Removing TMP means that we will update each element of the left hand side. This is OK as long nothing in the right hand side must evaluate to any of the values written in the right hand side. That would create a dependence and we cannot violate them.

Dependences between array accesses depend on two elements: first the array object subscripted. If the two array objects are clearly different there will never be any dependence.

INTEGER :: A(10), B(10), C(10)
 C = A + B   ! No dependencies as A, B and C
             ! are different arrays.

If the two objects are the same, or can potentially be the same (more on this below), then there may be a potential dependency. A conservative and fast (in compilation time) is assume that in this case the dependence exists. This may be overly conservative but makes for a simple implementation.

INTEGER :: A(10), B(10)
A = 1
A(1:5) = A(1:5) + A(2:6)   ! (*)

B = 1
B(2:6) = B(2:6) + B(1:5)   ! (**)

While this is simple approach is safe there is still some room for improvement. If you check the assignment in (*), we are in practice doing A(I) = A(I) + A(I+1), with I = 1..5. This means that it there is no dependence between A(I) and A(I+1). Compare this case with the assignment in (**). Here in practice we are doing B(I) = B(I) + B(I-1) with I = 2..6. This second case would be reading a value already updated in an earlier iteration. But, note that if instead of I = 2..6 we do I = 6..2 this would work!

So if the two objects may potentially be the same, we still may want to analyze the indexes because the regions of the array are updated in an order that does not violate dependences. This kind of dependency based optimizations have been extensively studied in the textbook by Allen and Kennedy. We will not discuss them here, just let’s get the idea that these loop optimizations are useful in this case.

Aliasing

Before we attempt to analyze the indexes we may want to tell if two objects may refer to the same data in memory. When two names can refer the same data we say that they alias. Fortran makes relatively easy to tell when two objects may or may not alias. We use “may” here because in some times we need to assume they may alias even if they do not in practice. We may not have enough information at this point.

Given two names, they obviously may alias if they have the same name. If they have different names and neither is a POINTER then they do not alias unless both appear in the same EQUIVALENCE set. If only one is POINTER and the other cannot be a TARGET of the pointer, then they do not alias. Otherwise we have to assume they may alias. This includes cases where we operate two POINTERs or a POINTER and a potential TARGET of the first.

An important feature of Fortran is that parameters that are not POINTER never alias with other parameters. This is also valid for arrays.

More complicated expressions

An array assignment can be more complicated but only in the right hand side. The left hand side is more or less regular in practical terms. For instance consider

INTEGER :: A(10), B(10), C(10), D(10)

D = A * B + C

Note that now we have two subexpressions, the one with T ← A * B and the one with T + C. But this is where the complexity ends. Due to the nature of the right hand side, which cannot have side-effects to the existing variables, it is safe to reuse T all the time. Like this.

INTEGER :: A(10), B(10), C(10), D(10)
INTEGER :: TMP(10)

DO I = 1, 10
  TMP(I) = A(I) * B(I)
  TMP(I) = TMP(I) + C(I)
  D(I) = TMP(I)
END DO

Note that we may be tempted to scalarize the TMP(I) variable and this would be possible in the case above.

INTEGER :: A(10), B(10), C(10), D(10)
INTEGER :: TMP

DO I = 1, 10
  TMP = A(I) * B(I)
  TMP = TMP + C(I)
  D(I) = TMP
END DO

Again, though, we already know this is not something that can be done in general.

INTEGER :: A(10), B(10), C(10), D(10)

D = A * B + C + D(10:1:-1)

Here we are forced to evaluate the whole D(10:1:-1) first.

INTEGER :: A(10), B(10), C(10), D(10)
INTEGER :: TMP(10), TMP2

DO I = 1, 10
  TMP(I) = D(11 - I)
END DO
DO I = 1, 10
  TMP2 = A(I) * B(I)
  TMP2 = TMP2 + C(I)
  TMP2 = TMP2 + TMP(I)
  D(I) = TMP2
END DO

Non-assignment array expressions

If the array expression does not involve any assignment, some of the complexity is simplified. For instance in the expression above ANY(A /= B) this can be modeled as an assignment to a temporary. That temporary does not alias with any other variable, so this can be implemented with a single loop, even in the presence of complex array sections.

LOGICAL :: TMP(N), TMP2
DO I = 1, N
  TMP(I) = A(I) /= B(I)
END DO
TMP2 = ANY(TMP)
IF (TMP2) THEN
  ...

Who says that Fortran is not an interesting language for compilers? :)