When an array is not an array
The C programming language comes with its own set of warts if we closely examine its syntax and semantics. One of the oddities that puzzles most people is the fact that there are no parameters of array types in C. This fact, though, does not prevent one using the array syntax in a parameter.
Anatomy of a declaration
At the top level, C syntax is just a sequence of declarations. They can either be function definitions or (proper) declarations.
A proper declaration has, conceptually, two parts: the specifier sequence and the declarator.
The specifier are things like typedef
, int
, signed
, unsigned
, long
, char
, const
, volatile
, inline
, register
, typedef-name, struct
struct-name, union
union-name, and a few others. Not all can be used at the same time and the standard clearly specifies which sequences are valid and their meaning. For instance const int long
is the same as const long int
and while long double
(equivalent to double long
) is correct, long float
is not.
After the specifier sequence there may be a declarator. There may not be a declarator if we are also declaring a type (e.g. int;
is wrong but struct A { int x; };
is right). A declarator names the entity being declared (if there is a declarator but it does not have a name, the declarator is called abstract). A declarator in C can have the following forms (here something of the form <X>
means that is formed using the rule X
and <Xopt>
means that it is optional)
The interpretation of these rules seems a bit odd at first. It follows the rule of the spiral.
The declared entity can be a variable, a function (if the type is function), a parameter (when the declaration appears inside P
above) or a typedef-name (when the specifiers of the declaration start with typedef
).
To make things more concrete, here there are some examples of declarations.
A family of declarators
In fact, declarators in declarations are rarely as simple as shown above. In a proper declaration either at top level or inside a function body, a declaration does not feature a single declarator but a list (i.e. a comma-separated sequence) of init-declarators. An init-declarator is just a declarator plus an optional initializer. An initializer is just a = followed by an expression or a braced-list.
Inside a struct definition, a declaration declares a field of the struct by using a list of member-declarators. A member-declarator is a normal declarator (that should not declare a function) but allows bitfields.
Finally a declarator of function type contains a (possibly empty) parameter-declaration list (represented above as P
). A parameter-declaration only allows a declarator (no comma-separated declarators) or no declarator (i.e. an abstract declarator) if we are not declaring a function definition.
Parameter declarations
Let's now dive in parameter-declarations. We already know they can be abstract but they also feature a few more properties. The first one is that the top-level qualifiers (const
and volatile
) of a parameter-declaration is discarded. This means that, from an external point of view (from the point of view of the caller) these two declarations are the same.
Note that this makes sense because parameters are passed by value in C, so from the caller point of view there is no difference to pass a value to an int
parameter or to an const int
parameter. This does not mean that the parameter simply lost its top-level qualifier (it did not) its just that inside the P
of the function type the qualifier will be dropped.
Another feature of parameter-declarations is that there is an adjustment of the type of the declaration as follows:
- If the parameter-declaration is of type "function (P) returning R" it is adjusted to be "pointer to function type (P) returning R"</p>
This makes sense in C because we cannot pass a function (here a function value would mean passing the instructions themselves!) to another function. Only a pointer to a function. No one seems to have problems with this case, though.
If the parameter-declaration has type "array N of T", it is adjusted to be a "pointer to T"
This also makes sense in C because we cannot pass an array to a function. So we demote the array to a pointer. The only thing we can pass is a pointer (usually to the first element of the array we wanted to pass). This seemingly inoffensive change is where all the fuss starts.
Array objects and array values
C defines an object as an entity in the memory of the program. Objects are manipulated using expressions. An expression has a type and a cathegory. There are two cathegories of expressions: those that simply yield a value (like 1
, 'a'
, 2.3f
or x+1
, etc.) called (for historical reasons) rvalues and those that refer to an object (like x
, *p
, a[1]
, s.x
, m[1][2]
, s.a[3]
, etc.) called (also for historical reasons) lvalues.
The naked truth in C is that there are no values of array type (of neither cathegory). Only array objects.
What does this mean? This means that we can declare an array but we will never be able to observe it as a whole thing. Well, only in one case, an array shows its array nature: when you try to assign to the whole array. This is not allowed.
In all cases, though, an array will never denote an array value so it will have to denote some value of another type. It denotes an rvalue of pointer to the element of the array.
This conversion from array to pointer is conceptually the same that happens in a parameter-declaration. This is on purpose, of course.
Back to a parameter declaration, it means that even if you declare a parameter with an array type it will always be a pointer. This is, a parameter of pointer type. So, this is valid.
Consequences will never be the same
So, if our array actually becomes a pointer, and because of that we lose the number of elements of the original array type, a C compiler cannot reliably diagnose anything based on the array declaration. It has become more of a comment than anything useful.
An attempt to fix things a bit
A function receiving an array never receives an array but a pointer. So in C99 the static
keyword (applied to the size of an array declarator) can be used to assert that there will be at least that number of elements. This is not very useful in the example above.
But it can be useful in a few cases.
Pointers to arrays
The Standard C does not preclude pointers to arrays, rarely used because of their funky declarators.
As you can see, using pointers to arrays is awful compared to using a plain array or a pointer.
The funny thing with that is that pointers to arrays are not arrays, so they do not lose their array size in parameter declarations.
Note that in the example above, the arguments &a
and a
have different types (int (*)[10]
vs int*
) but their value, would phyisically be the same. This is, the following assertion holds.
Also note that, derreferencing (or doing a zero subscript) of a pointer to array is actually a no-op in terms of instructions (there is a conversion in the abstract point of view, though).
In the example above, the expression (*pa)[1]
is decomposed into expressions pa
, *pa
, 1
, and (*pa)[1]
. pa
is an "lvalue of type pointer to aray 10 of int". *pa
should be a "lvalue of type array 10 of int", but we already said that such values do not exist, so *pa
is actually an "rvalue of type pointer to int" (this conversion is a no-op). 1
is an "rvalue of int type". So, (*pa)[1]
is an "lvalue of int". A similar argument goes for p[0][1]
.
In fact, multidimensional arrays in parameter declarations become pointers to array (with the leftmost size out).
Discussion
While arrays are an essential part of C (and C++ as well) they are always second-class citizens. The fact that no array values can be denoted likely stems from the origins of C as a system programming language, relatively close to the assembler level, where arrays values do not, in general, exist as such.
This behaviour is, from a programming language design point of view, inconsistent with struct
s which do not suffer this problem: it is possible to generate values of struct
type, pass them by value to a function, return them from functions and assign them (as a whole). None of them is possible with arrays. Instead, arrays happen to be systematically downgraded in C into pointers.
In my opinion I think there is little hope for arrays in the C world. It is unlikely that we ever see some sort of valued-arrays added in the Standard. After 40 years, tons of codes have been written aware of the fact that arrays in C do not express array values. Adding such feature would increase the, already nontrivial, complexity of the language just for the sake of consistency. As sad as it sounds, now it is too late to amend the language and only careful programming practices and tools can minimize the impact of this misdesign.