Think In Geek

In geek we trust

When an array is not an array

The C programming language comes with its own set of warts if we closely examine its syntax and semantics. One of the oddities that puzzles most people is the fact that there are no parameters of array types in C. This fact, though, does not prevent one using the array syntax in a parameter.

Anatomy of a declaration

At the top level, C syntax is just a sequence of declarations. They can either be function definitions or (proper) declarations.

int a_function_definition(int x)
{
  return x + 1;
}
int a_proper_declaration = 4;
enum E { V = 1, W = 2 } another_proper_declaration;
struct A { int x; }; /* yet another proper declaration */

A proper declaration has, conceptually, two parts: the specifier sequence and the declarator.

The specifier are things like typedef, int, signed, unsigned, long, char, const, volatile, inline, register, typedef-name, struct struct-name, union union-name, and a few others. Not all can be used at the same time and the standard clearly specifies which sequences are valid and their meaning. For instance const int long is the same as const long int and while long double (equivalent to double long) is correct, long float is not.

After the specifier sequence there may be a declarator. There may not be a declarator if we are also declaring a type (e.g. int; is wrong but struct A { int x; }; is right). A declarator names the entity being declared (if there is a declarator but it does not have a name, the declarator is called abstract). A declarator in C can have the following forms (here something of the form <X> means that is formed using the rule X and <Xopt> means that it is optional)

declarator := <simple-declarator
           := * <qualifieropt> <declarator>

simple-declarator := <identifier>
                  := <simple-declarator> [ <expressionopt> ]
                  := <simple-declarator> ( <parameter-declaration-listopt> )
                  := ( <declarator> )

qualifier := const
          := volatile

The interpretation of these rules seems a bit odd at first. It follows the rule of the spiral.

   T i       -- declares entity i of type T
   T * cv i  -- declares entity i of type cv pointer to T
   T i[e]    -- declares entity i of type array e of T
   T i( P )  -- declares entity i of type function ( P ) returning T

The declared entity can be a variable, a function (if the type is function), a parameter (when the declaration appears inside P above) or a typedef-name (when the specifiers of the declaration start with typedef).

To make things more concrete, here there are some examples of declarations.

int x;           /* variable 'x' of type int */
int *px;         /* variable 'px' of type pointer to int */
int *const cpx;  /* variable 'cpx' of type const pointer to int */
const int *pcx;  /* variable 'pcx' of type pointer to const int */
const int *const pcpx; /* variable 'pcpx' of type const pointer to const int */
 
int ax[10];      /* variable 'ax' of type array 10 of int */
 
int m[10][20];     /* variable 'm' of type array 10 of array 20 of int */
int c(int, float); /* function 'c' of type (int, float) returning int */
 
typedef signed long *T; /* typedef T of type pointer to signed long */
 
int (*pa)[20];     /* variable 'pa' of type pointer to array 20 of int */
int *ap[20];       /* variable 'ap' of type array 20 of pointer to int */

A family of declarators

In fact, declarators in declarations are rarely as simple as shown above. In a proper declaration either at top level or inside a function body, a declaration does not feature a single declarator but a list (i.e. a comma-separated sequence) of init-declarators. An init-declarator is just a declarator plus an optional initializer. An initializer is just a = followed by an expression or a braced-list.

int x,              /* no initializer */
    y = 3,          /* initializer is = 3 */
    z[2] = {1, 2};  /* initializer is = { 1, 2 } */

Inside a struct definition, a declaration declares a field of the struct by using a list of member-declarators. A member-declarator is a normal declarator (that should not declare a function) but allows bitfields.

struct C
{
   int *x,      /* declares field 'x' of type pointer to int */
       y:2;     /* declares bitfield 'y' of length 2 and type int */
};

Finally a declarator of function type contains a (possibly empty) parameter-declaration list (represented above as P). A parameter-declaration only allows a declarator (no comma-separated declarators) or no declarator (i.e. an abstract declarator) if we are not declaring a function definition.

void f1(int*,           /* abstract, read it like int *D1 */
       float (*)[10]    /* abstract, read it like float (*D2)[10] */
      );
void f2(float a, b);    /* probably you meant void f2(float a, float b); */

Parameter declarations

Let’s now dive in parameter-declarations. We already know they can be abstract but they also feature a few more properties. The first one is that the top-level qualifiers (const and volatile) of a parameter-declaration is discarded. This means that, from an external point of view (from the point of view of the caller) these two declarations are the same.

void f1(const float p1, const int p2);
void f1(float p1, int p2);

Note that this makes sense because parameters are passed by value in C, so from the caller point of view there is no difference to pass a value to an int parameter or to an const int parameter. This does not mean that the parameter simply lost its top-level qualifier (it did not) its just that inside the P of the function type the qualifier will be dropped.

void f(const int c)   // "externally" looks like 'void f(int c)'
{
  c = 3; /* ERROR: cannot assign to constant object 'c' */
}
void f(int c); /* OK: this is a (redundant) declaration of the f above */

Another feature of parameter-declarations is that there is an adjustment of the type of the declaration as follows:

  • If the parameter-declaration is of type “function (P) returning R” it is adjusted to be “pointer to function type (P) returning R”

    void foo(int bar(float, double));
    // becomes
    void foo(int (*bar)(float, double));

    This makes sense in C because we cannot pass a function (here a function value would mean passing the instructions themselves!) to another function. Only a pointer to a function. No one seems to have problems with this case, though.

  • If the parameter-declaration has type “array N of T”, it is adjusted to be a “pointer to T”

    void foo(int c[10]);
    void bar(int m[10][20]);
    // become
    void foo(int *c);
    void bar(int (*m)[20]);

    This also makes sense in C because we cannot pass an array to a function. So we demote the array to a pointer. The only thing we can pass is a pointer (usually to the first element of the array we wanted to pass). This seemingly inoffensive change is where all the fuss starts.

Array objects and array values

C defines an object as an entity in the memory of the program. Objects are manipulated using expressions. An expression has a type and a cathegory. There are two cathegories of expressions: those that simply yield a value (like 1, 'a', 2.3f or x+1, etc.) called (for historical reasons) rvalues and those that refer to an object (like x, *p, a[1], s.x, m[1][2], s.a[3], etc.) called (also for historical reasons) lvalues.

The naked truth in C is that there are no values of array type (of neither cathegory). Only array objects.

What does this mean? This means that we can declare an array but we will never be able to observe it as a whole thing. Well, only in one case, an array shows its array nature: when you try to assign to the whole array. This is not allowed.

int a[10], b[10];
 
a = b; /* ERROR: 'a' is not a modifiable lvalue */

In all cases, though, an array will never denote an array value so it will have to denote some value of another type. It denotes an rvalue of pointer to the element of the array.

int a[10];
a;  /* rvalue of pointer to int, the address of a[0] */

This conversion from array to pointer is conceptually the same that happens in a parameter-declaration. This is on purpose, of course.

Back to a parameter declaration, it means that even if you declare a parameter with an array type it will always be a pointer. This is, a parameter of pointer type. So, this is valid.

void f(int c[10])
{
   c = NULL;       /* probably dumb but OK */
}

Consequences will never be the same

So, if our array actually becomes a pointer, and because of that we lose the number of elements of the original array type, a C compiler cannot reliably diagnose anything based on the array declaration. It has become more of a comment than anything useful.

void f(int c[1])  /* read this as "void f(int *c)" */
{
  c[1] = 3;
}
void g()
{
  int w[2];
  f(w);     // this is fine
}

An attempt to fix things a bit

A function receiving an array never receives an array but a pointer. So in C99 the static keyword (applied to the size of an array declarator) can be used to assert that there will be at least that number of elements. This is not very useful in the example above.

void f(int c[static 1])  /* note the 'static' */
{
  c[1] = 3;
}

But it can be useful in a few cases.

void f(int c[static 10]) { ... }
void g(int *w)
{
  f(NULL); /* ERROR: argument 'NULL' will never be an address to an int [10] */
 
  int m[5];
  f(m);    /* ERROR: argument 'm' will never be an address to an int [10] */
 
  f(w);    /* OK: I do not know anything about w */
 
  w = m;
  f(w);    /* WARNING if the compiler does some data-flow analysis, OK otherwise */
}

Pointers to arrays

The Standard C does not preclude pointers to arrays, rarely used because of their funky declarators.

int a[10]; // 'a' is an array 10 of int
int *p; // 'p' is a pointer to int
int (*pa)[10]; // 'pa' is a pointer to array 10 of int
 
p = a;
pa = &a;
 
// The four statements below have the same effect
a[1] = 3;
p[1] = 3;
(*pa)[1] = 3;
pa[0][1] = 3;

As you can see, using pointers to arrays is awful compared to using a plain array or a pointer.

The funny thing with that is that pointers to arrays are not arrays, so they do not lose their array size in parameter declarations.

void f(int (*pa)[10]);
 
void g()
{
  int a[10];
  g(&a); // OK
  g(a);  // Incompatible types: argument is 'int*' but parameter is 'int (*)[10]'
 
  int b[11];
  g(&b); // Incompatible types: argument is 'int (*)[11]' but parameter is 'int (*)[10]'
}

Note that in the example above, the arguments &a and a have different types (int (*)[10] vs int*) but their value, would phyisically be the same. This is, the following assertion holds.

#include <stdint.h>
#include <assert.h>
void g()
{
   int a[10];
   assert((intptr_t)a == (intptr_t)&a);
}

Also note that, derreferencing (or doing a zero subscript) of a pointer to array is actually a no-op in terms of instructions (there is a conversion in the abstract point of view, though).

int a[10];
int (*pa)[10] = &a;
 
(*pa)[1] = 3;
pa[0][1] = 3;

In the example above, the expression (*pa)[1] is decomposed into expressions pa, *pa, 1, and (*pa)[1]. pa is an “lvalue of type pointer to aray 10 of int”. *pa should be a “lvalue of type array 10 of int”, but we already said that such values do not exist, so *pa is actually an “rvalue of type pointer to int” (this conversion is a no-op). 1 is an “rvalue of int type”. So, (*pa)[1] is an “lvalue of int”. A similar argument goes for p[0][1].

In fact, multidimensional arrays in parameter declarations become pointers to array (with the leftmost size out).

void f(int m[10][20]);
// this is the same as
void f(int (*m)[20]);

Discussion

While arrays are an essential part of C (and C++ as well) they are always second-class citizens. The fact that no array values can be denoted likely stems from the origins of C as a system programming language, relatively close to the assembler level, where arrays values do not, in general, exist as such.

This behaviour is, from a programming language design point of view, inconsistent with structs which do not suffer this problem: it is possible to generate values of struct type, pass them by value to a function, return them from functions and assign them (as a whole). None of them is possible with arrays. Instead, arrays happen to be systematically downgraded in C into pointers.

In my opinion I think there is little hope for arrays in the C world. It is unlikely that we ever see some sort of valued-arrays added in the Standard. After 40 years, tons of codes have been written aware of the fact that arrays in C do not express array values. Adding such feature would increase the, already nontrivial, complexity of the language just for the sake of consistency. As sad as it sounds, now it is too late to amend the language and only careful programming practices and tools can minimize the impact of this misdesign.

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

, , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *