Introduction

C++ is a language derived from C, so in essence all problems at link time boil down at declaring stuff but not defining it.

Declaring something in C++ means bringing the entity into existence in the program, so it can be used after the declaration point. Defining something means giving a complete description of the entity itself. You can declare a class or a function, and it means this class and this function do exist. But to completely describe a class and a function you have to define them. A class definition provides a list of base classes of that class, a list of members (data members and member functions) of that class, etc. A function definition provides the executable code of that function. All definitions are declarations but not all declarations are definitions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Defines variable 'x'
int x;
// Declares variable 'y'
extern int y;
// Declares class 'A'
struct A;
// Declares function 'f(int)'
void f(int);

// Defines class 'A'
struct A
{
    // Declares member function 'A::g(float)'
    void g(float);

    // Defines member function 'A::h(char)'
    void h(char) 
    { 
      // Code
    }

    // Defines data member 'A::x'
    int x;

    // Declares static data member 'A::y'
    static int y;
};

// Defines  function'f(int)'
void f(int)
{
 // Code
}

// Defines member function 'A::g(float)'
void A::g(float)
{
 // Code
}

// Defines static data member 'A::y'
int A::y;

C++, in contrast to C, strongly sticks to the One Definition Rule which states that entities can be defined at most once in an entire program. Of course this may not be completely true depending your own the definition of "entity": template functions when instantiated by the compiler can be defined more than once in the program, and some magic happens so this does not become a problem.

Anyway, C++ brings its own set of linking issues which may fool even the most experienced C++ developer.

Static data members are only declared inside the class specifier

Some might argue that this is one of the most common source of linking issues when using C++. Truth be told, static data members are just global variables in disguise so most people will avoid them. However, there are cases where a static data member may come in handy, for instance when implementing the singleton pattern.

The problem lies that, although usual (nonstatic) data members are defined when they are declared inside a class (like in line 23 of the example above), static data members are only declared. Thus in line 26 of the example above A::y is only being declared. Its actual definition is given in line 42. The actual definition of a static data member will go in the implementation file (usually a .cpp or .cc file).

So the usual case goes like this: you realize you need a static data member. You add it to the class. Your code compiles fine but does not link. In fact 'A::y', the static data member you just added is undefined? How can this be?

Now you know the reason.

What is the reason this issue is hit so many times? Well, there are three reasons. A historical one, where early versions of C++ compilers allowed this. A quirk in the C++ language itself where const integral and enumerator static data members can be declared and initialized in the class itself (thus defining them as well). And finally, a linguistic issue, since in Java and C# static fields are declared like any other fields plus a static specifier.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// -- Header file
class MySingleton
{
public:
    static MySingleton& getInstance()
    {
        if (singleton_ == 0)
            singleton_ = new MySingleton;
        return *singleton_;
    }
private:
    // Usual private constructor
    MySingleton() { }
    // Declaration
    static MySingleton *singleton_;
};

// -- Implementation file
// Definition
MySingleton* MySingleton::singleton_ = 0;

Not all headers are created equal

The usual myth is that C++ is a superset of C. Well, it looks like as a superset of C but they are actually two different languages. That said, they share so many thinks that interfacing C++ and C is pretty straightforward, in particular when the former must call the latter (the opposite may be a bit more challenging).

Thus, it is not unsual to see that a C++ program #includes C header files. Chances are that the headers of your operating system will be in C. Being able to #include a C header and using the entities declared in it is one of the strengths of C++. And this is the source of our second problem.

Remember that in C++ functions may be overloaded. This means that we can use the same name when declaring two functions in the same scope as long as they have different enough parameter types.

1
2
3
4
5
6
7
// Declaration of 'f(int)'
void f(int);
// Declaration of 'f(float)'
void f(float);
// Redeclaration of 'f(int)' since, in a parameter, 'const int' cannot
// be distinguished from 'int'
void f(const int);

It may be non obvious, but we cannot give these two functions declared above the same f name. So the compiler crafts an artificial name for f(int) and f(float) (this is called a decorated name or a mangled name). For instance they could be f_1_int and f_1_float (here 1 would mean the number of parameters). The C++ compiler will internally use these names when generating code and the lower levels will just see two diferent names.

But overloading cannot be applied to C. Thus we run into a problem here. If we #include C headers, the names of these functions cannot be overloaded thus a C compiler will generated code using the (undecorated) name of the function. If our C++ compiler always uses a decorated name, there will be an unresolved symbol. The C++ compiler cannot tell if this is C or C++. Can it?

Good news, it can. You can define the linkage of declarations in the code. By default linkage is C++ so overload works as described above. When you want to #include a C header, you will have to tell the C++ compiler that the linkage of the declarations is C, not C++. Most of the time you will find these lines in the beginning of a C header intended to be used from C++.

1
2
3
4
5
6
7
8
9
10
11
12
13
// Remember this is a C header so protect ourselves when this is compiled using C
#ifdef __cplusplus 
// This 'extern "C"' syntax is only valid in C++, not in C.
extern "C" { 
// From now everything has C linkage. 
#endif

/* Library declarations in C */

#ifdef __cplusplus 
// Close the brace opened above
}
#endif

Virtual member functions and virtual tables

Finally one of the, in my opinion, most confusing link errors when using a C++ compilers: virtual table unresolved references.

Virtual member functions are, in C++ parlance, polymorphic methods of other programming languages (like Java). Virtual member functions can be overridden by derived classes (descendant classes) thus when called, they must be dispatched dinamically.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
struct A
{
    virtual void vmf(float*);
    virtual void vmf2(float*);
};
struct B : A
{
    virtual void vmf(float*);
    virtual void vmf3(float*);
};

virtual B::vmf(float*)
{
    // Code
}

void g(A* pa, float *pfl)
{
  // Dynamic dispatch 
  // we don't really know if A::vmf or B::vmf will be called
  pa->vmf(pfl);

  // Static call to A::vmf since we qualified the function being called
  pa->A::vmf(pfl);

  B b;
  // Static call to B::vmf, no doubts here since the dynamic type (in memory)
  // of 'b' and its declared type must be the same
  b.vmf(pfl);

  A& ra(*pa);
  // Dynamic dispatch again
  ra.vmf(pfl);
}

Dynamic dispatch is implemented using a virtual method table (or vtable). Every class with virtual methods (called a dynamic class) has a vtable. This vtable is a sequence of addresses to member functions. Every virtual member function is assigned an index in this table and the addresses points to the function implementing the virtual member function for that class. For instance class A above has two member functions vmf and vmf2. The vtable of A, then will have two entries, 0 and 1, and will point to the functions A::vmf and A::vmf2 respectively. The vtable of B will have three entries, 0, 1, 2, that will point to functions B::vmf, A::vmf2 and B::vmf3 respectively.

Every object of a dynamic class has a hidden data member (called the virtual pointer) that points to the vtable of its class. When C++ specifies that a call goes through dynamic dispatch (in C++ parlance, a call to the ultimate overrider), we do not call directly any function but instead, through this hidden data member, we reach the vtable and using the index of the virtual member function being called, we retrieve the entry containing the addresses to the real function. Then this addresses is used in an indirect call.

Since both the virtual table and the virtual pointer are hidden from the eyes of the developer, sometimes errors in our code may cause link errors.

The compiler does not emit a vtable

This may not apply to all C++ compilers, but usually a C++ compiler only emits a vtable when it finds a definition of a virtual member function. Note that virtual member function definitions for a given class may be scattered in several files. Magic happens again so more than one definition of the vtable of a given class in several files does not become a problem at link time.

But, what if you forget to define all virtual functions? This may look contrived but in my experience this may happen by accident. The problem lies on the error at link time, which is really confusing.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
struct A
{
    int x_;
    A(int x) : x_(x) { }

    // We forget to define A::foo
    virtual void foo();
};

void quux(A* a)
{
    // Dynamic dispatch
    a->foo();
}

int main(int argc, char * argv[])
{
    A a(3);
    quux(&a);
}

If you compile and link this with g++ (I use -g since it improves link error messages by using the debugging information).

$ g++ -o prova test.cc -g
/tmp/ccl71r2A.o: In function `A':
test.cc:4: undefined reference to `vtable for A'
collect2: ld returned 1 exit status

But the line 4 is the constructor. You see now how confusing this message is, don't you? What is going on?

Well, everything makes sense if we remember that hidden data member I mentioned above, the virtual pointer. As a data member of a class it must be initialized in the constructor. It is initialized with the address of the virtual table of A. But the virtual table of A was not emitted since we forgot to define all virtual member functions. Thus, unresolved reference for the virtual table.

Missing virtual member functions in base classes

Remember that the vtable contains entries for all the virtual member functions of the base tables. The vtable is statically initialized (this is, the compiler "hardcodes" in the generated code, in the data section) the addresses of each entry. What if we forget to define a virtual member function of a base class?

Consider this example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
struct A
{
    int x_;
    A(int x) : x_(x) { }

    virtual void foo();
    // We forget to define A::foo2
    virtual void foo2();
};

void A::foo() 
{
    // Definition of A::foo
}

struct B : A 
{
    B(int x) : A(x) { }

    virtual void foo() 
    { 
        // Definition of B::foo
    }
};

void quux(A* a)
{
    a->foo();
}

int main(int argc, char * argv[])
{
    B b(3);
    quux(&b);
}

If we compile and link with g++ we get

/tmp/cc4t9NG3.o:(.rodata._ZTV1B[vtable for B]+0xc): undefined reference to `A::foo2()'
/tmp/cc4t9NG3.o:(.rodata._ZTV1A[vtable for A]+0xc): undefined reference to `A::foo2()'
collect2: ld returned 1 exit status

This happens because vtables of A and B refer to A::foo2, but we forgot to define it. Fortunately, now the error message is easier to grasp: some function is missing.

Obviously, many more link errors caused by C++ exist, but I think the ones shown here are quite common and the error messages related to them are quite confusing.