Think In Geek

In geek we trust

Introduction to the gfortran array descriptor

With the approval of Fortran 90, its array capabilities were largely improved. While still far from languages like APL, the extra functionality required a rethinking of the concept array in Fortran. This led to the need for array descriptors in the language.

A zoo of arrays

Many programming languages have a concept of array, as a very essential kind of aggregated data type where all the elements of the aggregation are of the same type and can be accessed by some indexing mechanism usually an integer expression. Beyond that basic functionality, languages differ.

In pre-Fortran 90 arrays were pretty simple and their size could only be a compile-time constant or an expression of the parameters of the functions. This is, there were not dynamic arrays of any kind in Fortran 77. This changed in Fortran 90 with the introduction of 4 kinds of arrays:

  • Explicit-shape. These are the traditional arrays in Fortran 77. They have a dimension specification where all the elements are of the form U or L:U. These expressions are either integer constants or computed using the parameters of the current function (in Fortran a parameter of the function is called a dummy argument).
    INTEGER :: A(10)  ! Array of 10 integer elements from 1 to 10
    INTEGER :: B(0:9) ! Array of 10 integer elements from 0 to 9
    INTEGER :: M(20, 0:19) ! Array (of rank 2) of 400 integer elements.
                           ! First rank is indexed from 1 to 20
                           ! Second rank is indexed from 0 to 19
     
    SUBROUTINE SUB(N, M)
      INTEGER :: N
      INTEGER :: D(N, M + 1) ! Array of rank 2 of N * (M+1) elements
                             ! First rank from 1 to N
                             ! Second rank from 1 to M + 1
    END SUBROUTINE SUB
  • Assumed-size. In these arrays we omit the upper bound of the last rank and we use * instead. This means that we cannot really know the extent of such rank at all. This form of array is only valid for dummy arguments (i.e. arrays that are parameters of the function).
    SUBROUTINE SUB(A, B, M)
      INTEGER :: A(*)      ! Rank 1 array with lower bound 1 but of unknown size
      INTEGER :: B(0:*)    ! Rank 1 array with lower bound 0 but of unknown size
      INTEGER :: M(20, *)  ! Rank 2 array
                           ! First rank is indexed from 1 to 20
                           ! Second rank is indexed from 1 but has an unknown upper bound
    END SUBROUTINE SUB

    These arrays are conceptually equivalent to C ones as parameters. For instance, the last one is in practice equivalent to a parameter declaration int c[][20].

  • Assumed-shape. In this case all dimensions specifications lack an upper bound but a lower bound is allowed, otherwise assumed 1. This is only valid for dummy arguments again. Its size will be inferred from the actual argument associated during the function call.
    SUBROUTINE SUB(A, B, M)
      INTEGER :: A(:)      ! Rank 1, with lower bound 1 and upper bound only known at runtime
      INTEGER :: B(0:)     ! Rank 1, with lower bound 0 and upper bound only known at runtime
      INTEGER :: M(:, 0:)  ! Rank 2 array
                           ! First rank has lower bound 1 but upper bound only known at runtime
                           ! Second rank has lower bound 0 but upper bound only known at runtime
    END SUBROUTINE SUB
  • Deferred-shape. In this case all dimensions specifications lack lower and upper bound. This is only valid for arrays that are ALLOCATABLE or POINTER. Note that in some cases these may look like assumed-shape. Their sizes and bounds are inferred from argument association or after an ALLOCATE statement.
    INTEGER, ALLOCATABLE :: A(:)
    INTEGER, POINTER :: M(:, :)

    While POINTER and ALLOCATABLE seem the same thing they represent different ideas. POINTER arrays are references to other arrays (including anonymous ones obtained through ALLOCATE). An ALLOCATABLE array means an array that if allocated, its size is determined at runtime.

As you can see both assumed-shape and deferred-shape raise the bar of complexity of arrays. The reason is that we have to be able to compute their sizes at runtime. Fortran 90 introduces LBOUND, UBOUND and SIZE intrinsics that allow us to query the boundaries and number of elements of an array, or one of its dimensions. For explicit-shape either these values are constant, so they can be computed by the compiler, or must depend on parameters so we still can relatively easy compute those in runtime without too much effort. But for assumed-shape and deferred-shape, this property becomes so general that we need to have the boundaries of the array somewhere.

A sensible place to do this is in the array itself.

Dope vectors

Historically, arrays are convenient because they can be just represented with an address to the first element. All other information that is required is usually available at compile time (type size, size of the dimension if they are constant, etc). For arrays whose size is non-constant, like the explicit-shape arrays D above, it is a minor complication but not too hard basically because the size of the array does not change. Assumed-size are like explicit-shape except that their full size and last-rank upper bound is unknown (this restriction pervades all the language in inconvenient ways but is not a technical problem per se).

Now consider assumed-shape and deferred-shape. How can we tell the sizes of the array in runtime? Well, as we said above we will need to store this in the array itself. This is usually called a dope vector or an array descriptor. The name suggests that the array itself is not any longer just an address to the first element but some more information. Another way to regard these descriptors are fat pointers. This is, pointers that encode more than an address.

In order to give some latitude to implementors, standards do not fully specify all the details and as such array descriptors are not mandated anywhere in the standard (although probably noted as a possible solution). This means that every Fortran 90 implementation can choose the array descriptor it wants. This also means that Fortran code in one compiler is not interoperable with other Fortran compilers. While this may sound terrible, it is usually not a concern due to the niche status of Fortran. But sometimes is, so the Chasm project can help in this case.

gfortran array descriptor

GNU Fortran, also called gfortran, is the Fortran 90/95/2003/2008… implementation available in the GNU Compiler Collection. Its array descriptor looks like this in C++ (adapted from the gfortran sources).

struct descriptor_dimension
{
  ptrdiff_t stride;
  ptrdiff_t lower_bound;
  ptrdiff_t upper_bound;
};
 
template <int Rank, typename Type>
struct descriptor {
  Type *base_addr;
  size_t offset;
  ptrdiff_t dtype;
  descriptor_dimension dim[Rank];
};

The descriptor keeps the address where the array starts in base_addr field. In this case Type will never be an array and it will always be a scalar type or a derived type (structs in Fortran parlance). We will see later the meaning of the offset field. Field dtype encodes data type information described below. Finally, for each rank (i.e. dimension), the descriptor encodes a stride, we will see later why it is needed, the lower bound and upper bound of the rank.

Field dtype encodes three things: the rank of the array, the element type and the size of the element type. It is currently laid like this (because the code suggest it will have to change for Fortran 2008).

Bits 63/31..6 5..3 2..0
Meaning size of the element type element type rank of the array

The element type can take the following values

Fortran type Value
INTEGER 1
LOGICAL 2
REAL 3
COMPLEX 4
Any derived type 5
CHARACTER 6

This descriptor is used both for assumed-shape and deferred-shape arrays.

Creation of the descriptor

Consider the following Fortran code

SUBROUTINE SUB()
    INTEGER, ALLOCATABLE :: A(:, :)
 
    ALLOCATE(A(-1:5, 2:9))
END SUBROUTINE SUB

Here gfortran will allocate in the stack a local variable A for a descriptor of rank 2 and then it initializes it with the proper values. First let’s see the SIZE of this array: (5 – (-1) + 1) * (9 – 2 + 1) = 56. So we need to allocate 56 integers. The address to this storage is stored in base_address. The code would look like this in C++

descriptor<2, signed int> A;
A.base_address = new signed int[56];

We need to remember the boundaries.

// rank 1
A.dim[0].lower_bound = -1;
A.dim[0].upper_bound = 5;
// rank 2
A.dim[1].lower_bound = 2;
A.dim[1].upper_bound = 9;

The field stride, states how many elements we skip effectively every time we increment the index of that rank. For rank 1 is always easy, only 1 element is skipped.

A.dim[0].stride = 1;

For rank 2, we skip all the elements of rank 1. In this case 7 (there are 7 elements ranging from -1 to 5, recall that both ends are included).

A.dim[1].stride = 7;

If our array had rank 3, the stride of rank 3 would account the stride of rank 2 and rank 1, so it would be 56 (= 7 * 8).

Also we need to store the appropriate info in dtype.

// rank
A.dtype = 2;
//  element type (INTEGER == 1)
A.dtype |= (1 << 3);
// size of this INTEGER, by default 4
A.dtype |= (4 << 6);

Of course, all the above can be simplified as

A.dtype = 266;

We are still missing the offset field in our array descriptor. The purpose of this field will be obvious when we index elements using the descriptor. It contains a value that compensates for lower bounds being other than zero. In our case, our lower bounds are -1 and 2. The element (i, j) of our array will be the i + j * 7 (this 7 is the stride of rank 1!). For the lower bound elements, this is when i = -1 and j = 2 we want the previous expression evaluate to zero. Thus we want i + j * 7 + offset = 0. This is offset = -(i + j * 7), substituting the value of i and j by -1 and 2 respectively we have that offset = -13.

A.offset = -13;

And now we have our array descriptor fully initialized.

In general

For an ALLOCATE(X(l1:u1, l2:u2, ..., ln:un) the descriptor is initialized like this.

size = (  (u1 - l1 + 1)
        * (u2 - l2 + 1)
        * ...
        * (un - ln + 1) )
X.base_address = new T[ size ];

// rank 1
X.dim[0].lower_bound = l1;
X.dim[0].upper_bound = u1;
X.dim[0].stride = 1;
// rank 2
X.dim[1].lower_bound = l2;
X.dim[1].upper_bound = u2;
X.dim[1].stride = (u1 - l1 + 1) * X.dim[0].stride;
...
// rank n
X.dim[n-1].lower_bound = ln;
X.dim[n-1].upper_bound = un;
X.dim[n-1].stride = (un-1 - ln-1 + 1) * X.dim[n-2].stride;

X.dtype = n | (element_type << 3) | (element_size << 6)

X.offset = - (  l1 * X.dim[0].stride
              + l1 * X.dim[1].stride
              + ...
              + ln * X.dim[n - 1].stride );

Accessing elements using the descriptor

Now that our descriptor is fully initialized, we will want to access element A(i, j). We need to compute its address in memory. But first we need to locate the element (i, j) of our array. We need to compute its linear index. The linear index basically tells us, starting from 0, what is the position of the element inside the array.

In general this means that given an array X of rank n with descriptor, the linear index of X(i1, i2, ..., in) will be

li = (i1 - X.dim[0].lower_bound) * X.dim[0].stride
   + (i2 - X.dim[1].lower_bound) * X.dim[1].stride
   + ...
   + (in - X.dim[n-1].lower_bound) * X.dim[n-1].stride;

We can distribute the multiplication of the stride to get

li = i1 * X.dim[0].stride - X.dim[0].lower_bound * X.dim[0].stride
   + i2 * X.dim[1].stride - X.dim[1].lower_bound * X.dim[1].stride
   + ...
   + in * X.dim[n-1].stride - X.dim[n-1].lower_bound * X.dim[n-1].stride;

and then regroup additions and subtractions

li = i1 * X.dim[0].stride
   + i2 * X.dim[1].stride
   + ...
   + in * X.dim[n-1].stride

   - X.dim[0].lower_bound * X.dim[0].stride
   - X.dim[1].lower_bound * X.dim[1].stride
   - ...
   - X.dim[n-1].lower_bound * X.dim[n-1].stride;

Recall that offset field above? In general it will be

   X.offset = - (  X.dim[0].lower_bound * X.dim[0].stride
                 + X.dim[1].lower_bound * X.dim[1].stride
                 + ...
                 + X.dim[n-1].lower_bound * X.dim[n-1].stride );

if we distribute the minus sign

   X.offset = - X.dim[0].lower_bound * X.dim[0].stride
              - X.dim[1].lower_bound * X.dim[1].stride
              - ...
              - X.dim[n-1].lower_bound * X.dim[n-1].stride;

and this is exactly what we have above. So the linear index can be rewritten as

li = i1 * X.dim[0].stride
   + i2 * X.dim[1].stride
   + ...
   + in * X.dim[n-1].stride
   + X.offset;

Ok, now we have the linear index, but we still need to compute the address of the element. The linear index is very useful because we know each element in the array takes the same amount of storage in bytes, so we only have to multiply it by the element size (es). The element size is found encoded in the field dtype.

es = (X.dtype >> 6);

So we only have to scale the linear index by the number of bytes of each element.

ob = es * li;

This offset in bytes (ob) still has to be added to the base address to get the element address (ea).

ea = X.base_address + ob;

And ea contains the address of the element.

Other operations that involve creation of array descriptors

Besides the ALLOCATE statement, a Fortran compiler may need to create descriptors in other cases. For instance when passing an array to a function that expects an assumed-shape array or during pointer assignment. In fact pointer assignment is rather cool because it allows to remap the indices of an array. Consider for instance the following example.

INTEGER, TARGET :: A(10, 10)
INTEGER, POINTER :: P
 
P => A(3:5, 2:8)

Here what we are doing is kind of a window of 3×7 elements over A. Elements are mapped like this.

P(1, 1) => A(3, 2)
P(2, 1) => A(4, 2)
P(3, 1) => A(5, 2)
P(1, 2) => A(3, 3)
P(2, 2) => A(4, 3)
P(3, 2) => A(5, 3)
...
P(1, 7) => A(3, 8)
P(2, 7) => A(4, 8)
P(3, 7) => A(5, 8)

How is the array descriptor of P initialized in this case? First the base address of the array will not be set to the base address of A but to the element A(3, 2).

P.base_address = <<address of A(3, 2)>>

The lower and upper bound of the first rank is 1 and 3 respectively. The stride, as usually in the first rank is 1.

P.dim[0].lower_bound = 1;
P.dim[0].upper_bound = 3;
P.dim[0].stride = 1;

In the second rank, lower bound and upper bound is 1 to 7. Here the stride is 10. Why is it 10, because between two consecutive indexes of the second rank there still are 10 elements.

P.dim[1].lower_bound = 1;
P.dim[1].upper_bound = 7;
P.dim[1].stride = 10;

Field dtype will contain 266 again because this is a rank 2 array of INTEGERs of 4 bytes each.

Field offset will be -11, this is exactly -( 1 + 1 * 10 ).

P.offset = -11;

All this works because P is actually just a skewed version of A but with shorter bounds. This is why the base address is not A(1, 1) but A(3, 2) and everything else, but the bounds of course, is the same.

Now consider the same case but we change the pointer assignment to be

P => A(3:5:2, 2:8)

Now the mapping looks like this

P(1, 1) => A(3, 2)
P(2, 1) => A(5, 2)
P(1, 2) => A(3, 3)
P(2, 2) => A(5, 3)
...
P(1, 7) => A(3, 8)

How to achieve this? Note first that the rank 1 is now skipping every other element. This is equivalent to a stride of 2 in rank 1. Also note that now the offset has to compensate for this extra multiplication. So instead of -11 it is -12? Why, check the formula for the offset and you will see that we are multiplying the lower bound of each rank per its stride. So – ( 1 * 2 + 1 * 10 ) = -12.

P.dim[0].lbound = 1;
P.dim[0].ubound = 2;
P.dim[0].stride = 2;
P.dim[1].lbound = 1;
P.dim[1].ubound = 7;
P.dim[1].stride = 10;
p.offset = -12;

Yet another example

P => A(3:5:2, 2:8:3)

Now the mapping is

P(1, 1) => A(3, 2)
P(2, 1) => A(5, 2)
P(1, 2) => A(3, 5)
P(2, 2) => A(5, 5)
P(1, 3) => A(3, 8)
P(2, 3) => A(5, 8)

and the required array descriptor is the following

P.dim[0].lbound = 1;
P.dim[0].ubound = 2;
P.dim[0].stride = 2;
P.dim[1].lbound = 1;
P.dim[1].ubound = 3;
P.dim[1].stride = 30;
P.offset = -32;

Check the stride for the second rank. Now it is 30 because we are going in steps of three, so the original stride was 10 now it is 30. Obviously this requires also adjusting the offset.

Nice, isn’t it? 🙂

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

Leave a Reply

Your email address will not be published. Required fields are marked *