Posts by Roger Ferrer Ibáñez

  • Subtleties with loops

    A common task in imperative programming languages is writing a loop. A loop that can terminate requires a way to check the terminating condition and a way to repeatedly execute some part of the code. These two mechanisms exists in many forms: from the crudest approach of using an if and a goto (that must jump backwards in the code) to higher-level structured constructs like for and while ending in very high-level constructs built around higher-order functions in for_each-like constructs and more recently, in the context of GPU programming, the idea of a kernel function instantiated over a n-dimensional domain (where typically n ≤ 3 but most of the time n = 1).

    These more advanced mechanisms make writing loops a commonplace task and typically regarded as uneventful. Yet, there are situations when things get subtler than we would like.

    Read on →

  • Mitigate runaway processes

    Sometimes I find myself running testsuites that typically, in order to make the most of the several cores available in the system, spawn many processes so the tests can run in parallel. This allows running the testsuites much faster.

    One side-effect, though, of these mechanisms is that they may not be able to handle correctly cancellation, say pressing Ctrl-C.

    Today we are going to see a way to mitigate this problem using systemd-run.

    Read on →

  • Graphical notifications for long-running tasks

    In my dayjob I often have to perform long-running tasks that do not require constant attention (e.g. compiling a compiler) on Linux systems. When this happens, it is unavoidable to context switch to other tasks even if experts advice against it. Turns out that compilation scrolls are not always very interesting.

    I would like to be able to resume working on the original task as soon as possible. So the idea is to receive a notification when the task ends.

    Read on →

  • Writing GObjects in C++

    In the last post I discussed about how glibmm, the wrapper of the GLib library exposes GObjects and we finished about a rationale about why one would want to write full-fledged GObjects in C++.

    Today we are exploring this venue and observing some of the pain points we are going to face.

    Read on →

  • Wrapping GObjects in C++

    GObject is the foundational dynamic type system implemented on top of the C language that is used by many other libraries like GLib, GTK and many other components, most of them part of the GNOME desktop environment stack.

    I’ve been lately wrapping a C library that uses GObject for C++ and I learned about some of the challenges.

    Read on →

  • OpenSSH as a SOCKS server

    Sometimes we are given access via ssh to nodes that do not have, for policy or technical reasons, access to the internet (i.e. they cannot make outbound connections). Depending on the policies, we may be able to open reverse SSH tunnels, so things are not so bad.

    Recently I discovered that OpenSSH comes with a SOCKS proxy server integrated. This is probably a well known feature of OpenSSH but I thought it was interesting to share how it can be used.

    Read on →

  • Distributed compilation in a cluster

    In software development there is an unavoidable trend in which applications become larger and more complex. For compiled programming languages one of the consequences is that their compilation takes longer.

    Today I want to talk about using distcc to speed C/C++ compilation using different nodes in a scientific cluster.

    Read on →

  • Fun with vectors in the Raspberry Pi 1 - Part 9

    I think we have enough pieces of machinery working already that we can start with the most exciting part of this journey: autovectorisation!

    Read on →

  • Fun with vectors in the Raspberry Pi 1 - Part 8

    In the last installment I mentioned we could start looking at enabling the vectoriser in the compiler. However when I did that I realised some benchmarks were giving weird results. I had made a mistake with copies, so let’s remediate this.

    Read on →

  • Fun with vectors in the Raspberry Pi 1 - Part 7

    We finished the last installment of this series mentioning that the compiler cannot copy, load or store a vector. Today we will address this.

    Read on →

  • Fun with vectors in the Raspberry Pi 1 - Part 6

    There is an issue we have mentioned several times in earlier installments: the value of the vector length at function boundaries. This is, when entering or leaving a function. We will address this question today.

    Read on →

  • Fun with vectors in the Raspberry Pi 1 - Part 5

    In the last installment we completed all the code generation step. However the whole process is still a bit suboptimal. Today we are going to see how we can improve this.

    Read on →

  • Fun with vectors in the Raspberry Pi 1 - Part 4

    In the last chapter we devised a way to tame the issue with fpscr. Today we are going to complete the code generation bits that we are still missing so we can start emitting code.

    Read on →

  • Fun with vectors in the Raspberry Pi 1 - Part 3

    In the last chapter we modelled the storage in form of pairs and quadruples of registers that we will use for vectors of double and single precision, respectively.

    But before we can do anything we need to deal with fpscr.

    Read on →

  • Fun with vectors in the Raspberry Pi 1 - Part 2

    In the previous installment we discussed a bit how to generate code using the vector feature of the CPU of the Raspberry Pi 1.

    Let’s start hacking LLVM.

    Read on →

  • Fun with vectors in the Raspberry Pi 1 - Part 1

    Long ago, we saw that the Raspberry Pi 1 has vector computation capabilities. However to the best of my knowledge no compiler attempted to exploit the vector capability in general.

    I think we are going to have some fun in trying to fix this.

    Read on →

  • RAII, locks and clang-tidy

    A colleague of mine spent some time chasing a bug in a C++ library related to concurrency.

    At the end it all boiled down to a silly declaration that wasn’t one.

    Read on →

  • Process-wide information and Linux key management

    I believe this is not a very common scenario, but sometimes one has to develop libraries whose scope is the whole process. In such a situation, we may need to identify if a process has already loaded another copy of the library.

    Read on →

  • Forgotten memories (2)

    So in the first part of this experiment we saw a simple strategy to spill general-purpose registers into floating-point registers implemented in the RISC-V backend of LLVM.

    In this chapter, let’s see the results and some other interesting facts learnt during the process.

    Read on →

  • Forgotten memories (1)

    Most compiler infrastructures that target register machines do it by using the concept of virtual registers. In their intermediate representations instructions use virtual registers to represent their operands.

    Because hardware is finite, these virtual registers must be mapped to physical registers at some point. The compiler does this in a process called register allocation.

    Being physical registers finite, it may happen that not all the virtual registers used by the program can be held in physical registers at the same time. When this happens, the compiler must emit spill code. Spill code stores a value in a memory (spill) and loads it later, often close to the point of use (reload).

    The memory used for spill code is commonly the function stack. However nothing prevents us from using other kinds of “memories” as long as we can guarantee that nobody is going to use them. This is exactly the kind of experiment we will do today: we’re going to spill general-purpose registers into floating-point registers.

    Read on →

  • Create your own local domain and DHCP with dnsmasq

    Back in 2007, Bernat explained how to set up our own domain name using ISC BIND and ISC DHCP. You can’t go wrong with those servers but maybe you prefer something more straightforward. I present here a simpler alternative built on top of dnsmasq which is an integrated DNS and DHCP.

    Read on →

  • Using SSH Certificates

    Password-based authentication has a number of drawbacks, so many services (such as github) use SSH keys to authenticate. However distributing the keys over several nodes (be virtual machines or single-board computers such as Raspberry Pi) doesn’t scale over the number of nodes and users.

    Luckily, OpenSSH implementation of SSH supports a certificate-based mechanism. This mechanism may help reducing the complexity of users trusting SSH hosts and hosts trusting SSH users.

    Read on →

  • Fortran and modules

    Recently the committee that is preparing the next standard of C++, known as C++20, approved the inclusion of modules. Modules are good™ but they pose some interesting challenges to implementors and users. In this post I will ruminate a bit about what challenges have impacted Fortran.

    Read on →

  • Walk-through flang – Part 8

    In the last installment of this series we started to look at the AST and the symbol table by examining the compiler dumps of these two data structures. In this chapter we are going to explore a bit more the AST for the control flow statements.

    Read on →

  • Walk-through flang – Part 7

    In previous chapters we saw how the input source was lexed, parsed and semantically analysed and we looked at how the symbols and data types are represented. But we haven't looked at what happens once the semantic analysis finishes. In this installment we're going to talk about the AST.

    Read on →

  • Walk-through flang – Part 6

    At this point we should have a reasonable picture of how flang parses and generates semantic information. So now it is time to explore with more detail what is actually synthesized and how it can be used later in the compiler. In this chapter we are going to see the symbol table.

    Read on →

  • A very simple memory pool in C++11

    I’ve been implementing an algorithm that works on a graph. That algorithm needs to create and destroy lots of nodes and edges to keep track of the algorithm state correctly. The algorithm also needs to be fast in order to be competitive against a similar algorithm that uses sets of stacks instead of graphs. Profiles show that memory allocations are impacting negatively the performance, so maybe a memory pool can help.

    Read on →

  • Exploring AArch64 assembler – Chapter 9

    In chapter 6 we saw conditional branches and we ended commenting that they can be used to implement higher control constructs. In this chapter we will see a few of them.

    Read on →

  • Walk-through flang – Part 5

    In the previous installment of this series we saw how flang parses the statements using an LR(1) algorithm. As the parser recognized the parts of the statements it invokes semantic actions. Today we’re going to talk more about them.

    Read on →

  • Walk-through flang – Part 4

    In the last installment we saw how flang splits the input in tokens. Once we have the tokens identified we need to parse them.

    Read on →

  • Walk-through flang – Part 3

    In the last chapter we saw how the driver handles the compilation and how it invokes flang1 and flang2. In this chapter we are going to start with flang1.

    Read on →

  • Walk-through flang – Part 2

    In the previous installment of this series we saw basically how to install flang and we ran a simple smoke test. In this post we will see a high level overview of what happens when we compile a Fortran program using flang. We will also compare it with what usually happens with clang.

    Read on →

  • Walk-through flang – Part 1

    Flang is an open source project to create a Fortran compiler for LLVM. It is based on NVIDIA/PGI Fortran and it has been released under Apache License 2.0. In this series we will do a walk-through the code of this compiler and how it has been integrated in the existing LLVM infrastructure.

    Read on →

  • 10 years of Think In Geek

    10 years ago Bernat (brafales) started this blog. This is still his blog though he is a bit busy these days and he cannot publish as much as he wants.

    All in all, without his initiative this blog would not exist. As would not exist many of the posts I published here since 2012, when Bernat insisted that I should collaborate. I guess his persistence has somehow paid off :)

    Consider this post a very small celebration of those 10 years. Hopefully we will be able to celebrate many more years.

  • Compilers as a memory error detectors

    This is a small anecdote of something that happened to me the other day.

    Read on →

  • Exploring AArch64 assembler – Chapter 8

    In the last chapter we saw how to call a function. We mentioned a special memory called the stack but we did not delve into it. Let’s see in this chapter how we can use the stack and why it is important in function calls.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 27

    We saw in the previous chapter what is the process required to build a program from different compilation units. This process happened before we obtained the final program. The question is, can this process happen when the program runs? This is, is it possible to dynamically link a program?

    Read on →

  • Whose is this optimization?

    Today we will toy around a very simple optimization in clang and discuss a bit about separation of concerns when optimizing code.

    Read on →

  • Exploring AArch64 assembler – Chapter 7

    In the previous installment of this series we saw how to alter the sequencing of our programs. Today we will see how we can reuse instructions by means of branches. Let's talk about functions.

    Read on →

  • Compilation of array expressions in Fortran

    As I stated in my previous post, Fortran 90 improved the array capabilities of Fortran. Today we will discuss what are the challenges when compiling array expressions.

    Read on →

  • Introduction to the gfortran array descriptor

    With the approval of Fortran 90, its array capabilities were largely improved. While still far from languages like APL, the extra functionality required a rethinking of the concept array in Fortran. This led to the need for array descriptors in the language.

    Read on →

  • How (not) to write a C++ front end – Part 3

    In the previous installment we talked about the parsing technology we used, which looks like the canonical academic approach to parsing. In this chapter we will see some dificulties we encountered along the years.

    Read on →

  • Exploring AArch64 assembler – Chapter 6

    So far we know how to do some computations and access memory. Today we will learn how to alter the control flow of our program.

    Read on →

  • Exploring AArch64 assembler – Chapter 5

    In this chapter we will see how we can access the memory in AArch64.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 26

    In this chapter we will talk about a fascinating step that is required to create a program, even when using assembler. Today we will talk about linking.

    Read on →

  • Exploring AArch64 assembler – Chapter 4

    In this chapter we will see some instructions that will allow us to compute things.

    Read on →

  • Exploring AArch64 assembler – Chapter 3

    In the last chapter we saw that instructions may have register operands and immediate operands. We also mentioned that mixing 32-bit and 64-bit register was not allowed. Today we will talk a bit more about register operands.

    Read on →

  • Exploring AArch64 assembler – Chapter 2

    In the first installment of this series we did a very first simple program. In this chapter we will continue learning a bit more about AArch64.

    Read on →

  • How (not) to write a C++ front end – Part 2

    In the previous installment I gave some context about the existence of Mercurium as a tool. In this chapter we will start digging into the parsing technology used.

    Read on →

  • How (not) to write a C++ front end – Part 1

    As part of the work I did in my previous employer, we had to develop a C++ front end. This is never an easy task so I will use this series to share some experiences while developing it.

    Read on →

  • Exploring AArch64 assembler – Chapter 1

    AArch64 is a new 64 bit mode that is part of the ARMv8 architecture presented in 2011 by ARM. It has been progressively been deployed in smartphones and servers. So I think it is a good moment to learn a bit more about the assembler of this architecture.

    Read on →

  • A tiny GCC front end – Part 11

    Our tiny language features a few types: int, float, bool, string and arrays of those types. We can even declare new type names based on other types but it still missing a record type. Today we will address this.

    Read on →

  • A tiny GCC front end – Part 10

    Today we will add a relatively simple feature that will be very useful for a future extension: type declarations.

    Read on →

  • A tiny GCC front end – Part 9

    Today we will do something relatively easy: let's add a way to declare boolean variables and express boolean literals.

    Read on →

  • A tiny GCC front end – Part 8

    Now that we have the basic language set implemented we can consider adding new features to it. Today we will add arrays.

    Read on →

  • A tiny GCC front end – Part 7

    In this part we will complete the missing statements from part 6 and finish our front end.

    Read on →

  • A tiny GCC front end – Part 6

    In part 5 we described the objects that we will need to semantically analyze a tiny program. In current part we will extend the parser of part 4 to do the semantic analysis and create the GENERIC trees.

    Read on →

  • A tiny GCC front end – Part 5

    In the last installment of this series we saw how to verify that the sequence of tokens of the input is syntactically valid. Today we will see what we need to give it meaning.

    Read on →

  • A tiny GCC front end – Part 4

    Now that we have a stream of tokens we can start performing syntactic analysis.

    Read on →

  • A tiny GCC front end – Part 3

    Now that the minimal infrastructure is already set, we can start with the implementation of our tiny front end. Today we will talk about the lexer.

    Read on →

  • A tiny GCC front end – Part 2

    The previous installment of this series was all about the syntax and the semantics of the tiny language. In this chapter we will start implementing a front end for tiny in GCC. The journey will be long but rewarding. Let's get started.

    Read on →

  • A tiny GCC front end – Part 1

    In this series we will see the process of adding a new front end for a very simple language in GCC. If you, like me, marvel at the magic of compilers then these posts may be for you.

    Read on →

  • Toying with GCC JIT – Part 3

    In the last two parts of this series we've seen how to use GCC JIT and how to apply it to a simple regular expression matcher. But an open question remains, has it been worth the effort? This part will try to figure it out.

    Read on →

  • Toying with GCC JIT – Part 2

    One of the quintessential UNIX tools is the grep tool. The global regular expression print is a tool that prints lines of text file that match a given regular expression. In this post we will apply JIT compilation to a very simple regular expression matcher by Rob Pike.

    Read on →

  • Toying with GCC JIT – Part 1

    A just-in-time (JIT) compiler is a compiler that in contrast to the usual compilers is not run ahead-of-time, i.e. before running the actual program, but during the program itself.

    Read on →

  • A simple plugin for GCC – Part 3

    In the last chapter, we extended our plugin so it was possible to visualize the control flow graph of a function. In this chapter we will reach our goal to warn unused results of function calls.

    Read on →

  • A simple plugin for GCC – Part 2

    In the last post we set up everything in order to write a GCC plugin. Before we can go on we need to understand a bit how GCC compiles your files.

    Read on →

  • A simple plugin for GCC – Part 1

    GCC's C and C++ compilers provide several extensions to address several programming needs not covered in the standards. One of these is the warn_unused_result attribute. This attribute warns us that we are discarding the result of a function. Unfortunately, for C++ it does not always work as expected.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 25

    In chapter 13 we saw VFPv2 and the fact that it allows vectorial operations on floating-point numbers. You may be wondering if such a similar feature exists for integers. The answer is yes although in a more limited way.

    Read on →

  • When an array is not an array

    The C programming language comes with its own set of warts if we closely examine its syntax and semantics. One of the oddities that puzzles most people is the fact that there are no parameters of array types in C. This fact, though, does not prevent one using the array syntax in a parameter.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 24

    Today we will continue with nested functions.

    Read on →

  • Read DVDs with bogus permissions in Ubuntu

    What if your DVD recorder sets bogus permissions to your DVDs?

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 23

    Today we will see what happens when we nest a function inside another. It seems a harmless thing to do but it happens to come with its own dose of interesting details.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 22

    Several times in previous chapters we have talked about ARM as an architecture that has several features aimed at embedding systems. In embedded systems memory is scarce and expensive, so designs that help reduce the memory footprint are very welcome. Today we will see another of these features: the Thumb instruction set.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 21

    We already know that ARM is a 32-bit architecture: general purpose registers are 32-bit wide and addresses in memory are 32-bit numbers. The natural integer size for an architecture is usually called a word and in ARM is obviously 32-bit integers. Sometimes, though, we need to deal with subword data: integers of size smaller than 32 bits.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 20

    Today we will see how to make indirect calls.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 19

    So far our small assembler programs have output messages using printf and some of them have read input using scanf. These two functions are implemented in the C library, so they are more or less supported in any environment supporting the C language. But how does a program actually communicate with the world?

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 18

    In this chapter we will delve a bit more into the stack.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 17

    In chapter 10 we saw the basics to call a function. In this chapter we will cover more topics related to functions.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 16

    We saw in chapters 6 and 12 several control structures but we left out a usual one: the switch also known as select/case. In this chapter we will see how we can implement it in ARM assembler.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 15

    It may be suprising, but the ARMv6 architecture does not provide an integer division instruction while it does have a floating point instruction in VFPv2. In this chapter we will see usual ways to workaround this limitation with different techniques that can be used in specific scenarios involving divisions.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 14

    In chapter 13 we saw the basic elements of VFPv2, the floating point subarchitecture of ARMv6. In this chapter we will implement a floating point matrix multiply using VFPv2.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 13

    So far, all examples have dealt with integer values. But processors would be rather limited if they were only able to work with integer values. Fortunately they can work with floating point numbers. In this chapter we will see how we can use the floating point facilities of our Raspberry Pi.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 12

    We saw in chapter 6 some simple schemes to implement usual structured programming constructs like if-then-else and loops. In this chapter we will revisit these constructs and exploit a feature of the ARM instruction set that we have not learnt yet.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 11

    Several times, in earlier chapters, I stated that the ARM architecture was designed with the embedded world in mind. Although the cost of the memory is everyday lower, it still may account as an important part of the budget of an embedded system. The ARM instruction set has several features meant to reduce the impact of code size. One of the features which helps in such approach is predication.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 10

    In chapter 9 we were introduced to functions and we saw that they have to follow a number of conventions in order to play nice with other functions. We also briefly mentioned the stack, as an area of memory owned solely by the function. In this chapter we will go in depth with the stack and why it is important for functions.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 9

    In previous chapters we learnt the foundations of ARM assembler: registers, some arithmetic operations, loads and stores and branches. Now it is time to put everything together and add another level of abstraction to our assembler skills: functions.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 8

    In the previous chapter we saw that the second operand of most arithmetic instructions can use a shift operator which allows us to shift and rotate bits. In this chapter we will continue learning the available indexing modes of ARM instructions. This time we will focus on load and store instructions.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 7

    ARM architecture has been for long targeted at embedded systems. Embedded systems usually end being used in massively manufactured products (dishwashers, mobile phones, TV sets, etc). In this context margins are very tight so a designer will always try to spare as much components as possible (a cent saved in hundreds of thousands or even millions of appliances may pay off). One relatively expensive component is memory although every day memory is less and less expensive. Anyway, in constrained memory environments being able to save memory is good and ARM instruction set was designed with this goal in mind. It will take us several chapters to learn all of these techniques, today we will start with one feature usually named shifted operand.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 6

    Control structures

    In the previous chapter we learnt branch instructions. They are really powerful tools because they allow us to express control structures. Structured programming is an important milestone in better computing engineering (a foundational one, but nonetheless an important one). So being able to map usual structured programming constructs in assembler, in our processor, is a Good Thing™.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 5

    Branching

    Until now our small assembler programs execute one instruction after the other. If our ARM processor were only able to run this way it would be of limited use. It could not react to existing conditions which may require different sequences of instructions. This is the purpose of the branch instructions.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 4

    As we advance learning the foundations of ARM assembler, our examples will become longer. Since it is easy to make mistakes, I think it is worth learning how to use GNU Debugger gdb to debug assembler. If you develop C/C++ in Linux and never used gdb, shame on you. If you know gdb this small chapter will explain you how to debug assembler directly.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 3

    We saw in chapter 1 and chapter 2 that we can move values to registers (using mov instruction) and add two registers (using add instruction). If our processor were only able to work on registers it would be rather limited.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 2

    Registers

    At its core, a processor in a computer is nothing but a powerful calculator. Calculations can only be carried using values stored in very tiny memories called registers. The ARM processor in a Raspberry Pi has 16 integer registers and 32 floating point registers. A processor uses these registers to perform integer computations and floating point computations, respectively. We will put floating registers aside for now and eventually we will get back to them in a future installment. Let’s focus on the integer registers.

    Read on →

  • ARM assembler in Raspberry Pi – Chapter 1

    In my opinion, it is much more beneficial learning a high level language than a specific architecture assembler. But I fancied learning some ARM assembler just for fun since I know some 386 assembler. The idea is not to become a master but understand some of the details of what happens underneath.

    Read on →

  • Common linking issues in C++

    Introduction

    C++ is a language derived from C, so in essence all problems at link time boil down at declaring stuff but not defining it.

    Declaring something in C++ means bringing the entity into existence in the program, so it can be used after the declaration point. Defining something means giving a complete description of the entity itself. You can declare a class or a function, and it means this class and this function do exist. But to completely describe a class and a function you have to define them. A class definition provides a list of base classes of that class, a list of members (data members and member functions) of that class, etc. A function definition provides the executable code of that function. All definitions are declarations but not all declarations are definitions.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    
    // Defines variable 'x'
    int x;
    // Declares variable 'y'
    extern int y;
    // Declares class 'A'
    struct A;
    // Declares function 'f(int)'
    void f(int);
    
    // Defines class 'A'
    struct A
    {
        // Declares member function 'A::g(float)'
        void g(float);
    
        // Defines member function 'A::h(char)'
        void h(char) 
        { 
          // Code
        }
    
        // Defines data member 'A::x'
        int x;
    
        // Declares static data member 'A::y'
        static int y;
    };
    
    // Defines  function'f(int)'
    void f(int)
    {
     // Code
    }
    
    // Defines member function 'A::g(float)'
    void A::g(float)
    {
     // Code
    }
    
    // Defines static data member 'A::y'
    int A::y;
    

    C++, in contrast to C, strongly sticks to the One Definition Rule which states that entities can be defined at most once in an entire program. Of course this may not be completely true depending your own the definition of "entity": template functions when instantiated by the compiler can be defined more than once in the program, and some magic happens so this does not become a problem.

    Anyway, C++ brings its own set of linking issues which may fool even the most experienced C++ developer.

    Static data members are only declared inside the class specifier

    Some might argue that this is one of the most common source of linking issues when using C++. Truth be told, static data members are just global variables in disguise so most people will avoid them. However, there are cases where a static data member may come in handy, for instance when implementing the singleton pattern.

    The problem lies that, although usual (nonstatic) data members are defined when they are declared inside a class (like in line 23 of the example above), static data members are only declared. Thus in line 26 of the example above A::y is only being declared. Its actual definition is given in line 42. The actual definition of a static data member will go in the implementation file (usually a .cpp or .cc file).

    So the usual case goes like this: you realize you need a static data member. You add it to the class. Your code compiles fine but does not link. In fact 'A::y', the static data member you just added is undefined? How can this be?

    Now you know the reason.

    What is the reason this issue is hit so many times? Well, there are three reasons. A historical one, where early versions of C++ compilers allowed this. A quirk in the C++ language itself where const integral and enumerator static data members can be declared and initialized in the class itself (thus defining them as well). And finally, a linguistic issue, since in Java and C# static fields are declared like any other fields plus a static specifier.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    // -- Header file
    class MySingleton
    {
    public:
        static MySingleton& getInstance()
        {
            if (singleton_ == 0)
                singleton_ = new MySingleton;
            return *singleton_;
        }
    private:
        // Usual private constructor
        MySingleton() { }
        // Declaration
        static MySingleton *singleton_;
    };
    
    // -- Implementation file
    // Definition
    MySingleton* MySingleton::singleton_ = 0;
    

    Not all headers are created equal

    The usual myth is that C++ is a superset of C. Well, it looks like as a superset of C but they are actually two different languages. That said, they share so many thinks that interfacing C++ and C is pretty straightforward, in particular when the former must call the latter (the opposite may be a bit more challenging).

    Thus, it is not unsual to see that a C++ program #includes C header files. Chances are that the headers of your operating system will be in C. Being able to #include a C header and using the entities declared in it is one of the strengths of C++. And this is the source of our second problem.

    Remember that in C++ functions may be overloaded. This means that we can use the same name when declaring two functions in the same scope as long as they have different enough parameter types.

    1
    2
    3
    4
    5
    6
    7
    
    // Declaration of 'f(int)'
    void f(int);
    // Declaration of 'f(float)'
    void f(float);
    // Redeclaration of 'f(int)' since, in a parameter, 'const int' cannot
    // be distinguished from 'int'
    void f(const int);
    

    It may be non obvious, but we cannot give these two functions declared above the same f name. So the compiler crafts an artificial name for f(int) and f(float) (this is called a decorated name or a mangled name). For instance they could be f_1_int and f_1_float (here 1 would mean the number of parameters). The C++ compiler will internally use these names when generating code and the lower levels will just see two diferent names.

    But overloading cannot be applied to C. Thus we run into a problem here. If we #include C headers, the names of these functions cannot be overloaded thus a C compiler will generated code using the (undecorated) name of the function. If our C++ compiler always uses a decorated name, there will be an unresolved symbol. The C++ compiler cannot tell if this is C or C++. Can it?

    Good news, it can. You can define the linkage of declarations in the code. By default linkage is C++ so overload works as described above. When you want to #include a C header, you will have to tell the C++ compiler that the linkage of the declarations is C, not C++. Most of the time you will find these lines in the beginning of a C header intended to be used from C++.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
    // Remember this is a C header so protect ourselves when this is compiled using C
    #ifdef __cplusplus 
    // This 'extern "C"' syntax is only valid in C++, not in C.
    extern "C" { 
    // From now everything has C linkage. 
    #endif
    
    /* Library declarations in C */
    
    #ifdef __cplusplus 
    // Close the brace opened above
    }
    #endif
    

    Virtual member functions and virtual tables

    Finally one of the, in my opinion, most confusing link errors when using a C++ compilers: virtual table unresolved references.

    Virtual member functions are, in C++ parlance, polymorphic methods of other programming languages (like Java). Virtual member functions can be overridden by derived classes (descendant classes) thus when called, they must be dispatched dinamically.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    
    struct A
    {
        virtual void vmf(float*);
        virtual void vmf2(float*);
    };
    struct B : A
    {
        virtual void vmf(float*);
        virtual void vmf3(float*);
    };
    
    virtual B::vmf(float*)
    {
        // Code
    }
    
    void g(A* pa, float *pfl)
    {
      // Dynamic dispatch 
      // we don't really know if A::vmf or B::vmf will be called
      pa->vmf(pfl);
    
      // Static call to A::vmf since we qualified the function being called
      pa->A::vmf(pfl);
    
      B b;
      // Static call to B::vmf, no doubts here since the dynamic type (in memory)
      // of 'b' and its declared type must be the same
      b.vmf(pfl);
    
      A& ra(*pa);
      // Dynamic dispatch again
      ra.vmf(pfl);
    }
    

    Dynamic dispatch is implemented using a virtual method table (or vtable). Every class with virtual methods (called a dynamic class) has a vtable. This vtable is a sequence of addresses to member functions. Every virtual member function is assigned an index in this table and the addresses points to the function implementing the virtual member function for that class. For instance class A above has two member functions vmf and vmf2. The vtable of A, then will have two entries, 0 and 1, and will point to the functions A::vmf and A::vmf2 respectively. The vtable of B will have three entries, 0, 1, 2, that will point to functions B::vmf, A::vmf2 and B::vmf3 respectively.

    Every object of a dynamic class has a hidden data member (called the virtual pointer) that points to the vtable of its class. When C++ specifies that a call goes through dynamic dispatch (in C++ parlance, a call to the ultimate overrider), we do not call directly any function but instead, through this hidden data member, we reach the vtable and using the index of the virtual member function being called, we retrieve the entry containing the addresses to the real function. Then this addresses is used in an indirect call.

    Since both the virtual table and the virtual pointer are hidden from the eyes of the developer, sometimes errors in our code may cause link errors.

    The compiler does not emit a vtable

    This may not apply to all C++ compilers, but usually a C++ compiler only emits a vtable when it finds a definition of a virtual member function. Note that virtual member function definitions for a given class may be scattered in several files. Magic happens again so more than one definition of the vtable of a given class in several files does not become a problem at link time.

    But, what if you forget to define all virtual functions? This may look contrived but in my experience this may happen by accident. The problem lies on the error at link time, which is really confusing.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    
    struct A
    {
        int x_;
        A(int x) : x_(x) { }
    
        // We forget to define A::foo
        virtual void foo();
    };
    
    void quux(A* a)
    {
        // Dynamic dispatch
        a->foo();
    }
    
    int main(int argc, char * argv[])
    {
        A a(3);
        quux(&a);
    }
    

    If you compile and link this with g++ (I use -g since it improves link error messages by using the debugging information).

    $ g++ -o prova test.cc -g
    /tmp/ccl71r2A.o: In function `A':
    test.cc:4: undefined reference to `vtable for A'
    collect2: ld returned 1 exit status

    But the line 4 is the constructor. You see now how confusing this message is, don't you? What is going on?

    Well, everything makes sense if we remember that hidden data member I mentioned above, the virtual pointer. As a data member of a class it must be initialized in the constructor. It is initialized with the address of the virtual table of A. But the virtual table of A was not emitted since we forgot to define all virtual member functions. Thus, unresolved reference for the virtual table.

    Missing virtual member functions in base classes

    Remember that the vtable contains entries for all the virtual member functions of the base tables. The vtable is statically initialized (this is, the compiler "hardcodes" in the generated code, in the data section) the addresses of each entry. What if we forget to define a virtual member function of a base class?

    Consider this example

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    
    struct A
    {
        int x_;
        A(int x) : x_(x) { }
    
        virtual void foo();
        // We forget to define A::foo2
        virtual void foo2();
    };
    
    void A::foo() 
    {
        // Definition of A::foo
    }
    
    struct B : A 
    {
        B(int x) : A(x) { }
    
        virtual void foo() 
        { 
            // Definition of B::foo
        }
    };
    
    void quux(A* a)
    {
        a->foo();
    }
    
    int main(int argc, char * argv[])
    {
        B b(3);
        quux(&b);
    }
    

    If we compile and link with g++ we get

    /tmp/cc4t9NG3.o:(.rodata._ZTV1B[vtable for B]+0xc): undefined reference to `A::foo2()'
    /tmp/cc4t9NG3.o:(.rodata._ZTV1A[vtable for A]+0xc): undefined reference to `A::foo2()'
    collect2: ld returned 1 exit status

    This happens because vtables of A and B refer to A::foo2, but we forgot to define it. Fortunately, now the error message is easier to grasp: some function is missing.

    Obviously, many more link errors caused by C++ exist, but I think the ones shown here are quite common and the error messages related to them are quite confusing.

  • Crazy stuff in C++ (1)

    Introduction

    C++ is a controversial language: you love it or you hate it. As always, knowing better about something allows one to make better arguments for or against that thing. This is what this series is about. Here I’ll explain some of the less known (except for C++ expert programmers, of course) features of C++.

    Let’s start with templates, which account for such a huge part of the language.

    Templates

    Everyone knows that C++ has templates. Templates is an implementation of the «I have an algorithm that can be used with many different types» idea. This idea is called generic programming and is a pretty powerful one. This is why it is present in almost all modern languages.

    Back to C++. C++ defines two kinds of templates: classes templates and function templates. Class templates define an infinite set of classes while function templates define an infinite set of functions. The elements of these sets of classes or functions are called specializations. Every template has a template-name which will be used to name a specific specialization.

    Template declarations

    Consider these two declarations

    1
    2
    
    template <typename T>
    struct my_list { ... }
    
    1
    2
    
    template <typename T>
    void max(T a, T b) { return a > b ? a : b; }
    

    These are template declarations. The first one declares a class template and its template-name is my_list, the second one defines a function template and its template-name is max. A template declaration is just a declaration preceded with something without an official name that starts with template <…>, I will call it the template header (but note that this name is not blessed by the C++ standard at all, it just makes sense to me call it like this).

    The template header defines what are the parameters of the template class. These are called the template parameters. A type-template parameter, like that T shown above, is a “type variable”. This is the most usual template parameter as it allows to parameterize the declaration over one or more type variables. C++, though, has two more kinds of template parameters: nontype-template parameters and (the funny named) template-template parameter. A nontype-template parameter allows us to parameterize the declaration over a (compile-time) constant integer value. Here “integer value” is a very broad term: of course it includes all integers, but also enum values (enumerators) and addresses of (statically allocated) variables and functions. A template-template parameter allows us to parameterize a declaration over another class template with appropiate template parameters.

    1
    2
    
    template <typename T, int N> // N is a nontype-template parameter
    struct my_fixed_array { };
    
    1
    2
    
    template <template <typename T> MyContainer> // MyContainer is a template-template parameter
    struct adaptor { };
    

    Specializations

    I said above that a class template or function template defines an infinite set of classes or function and that each element of that set was called a specialization. There is a specialization for every possible value that a template parameter can have. Such values are not bounded thus there is an infinite number of specializations (well, we could argue that constant integer values are finite in the language, but types are clearly not finite).

    We give value to template parameters of a template by means of template arguments. These template arguments always appear in what is called a template-id. A template-id is just the template-name followed by a list of template-arguments enclosed in < and >.

    1
    2
    
    my_list<int> l;// Here T has value int, we will write it as T ← int
    max<float>(3.4f, 5.6f); // T ← float
    

    Primary template and partial specializations

    When we first declare a class template or a function template, such declaration defines the primary template.

    1
    2
    
    template <typename T>
    struct my_list { ... };
    
    1
    2
    
    template <typename T>
    void swap(T& a, T& b);
    

    Class templates (but not function templates!) can have an extra set of template declarations called partial specializations. A partial specialization looks like a normal class template declaration but the template-name is now a template-id where the template-arguments partially specialize the given template parameters.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    // 1) Partial specialization for "pointer to (any) P" type
    template <typename P>
    struct my_list<P*> { }; 
    
    // 2) Partial specialization for "array of (any) Size of (any) Element"
    template <typename Element, int Size>
    struct my_list<Element[Size]> { };
    
    // 3) Partial specialization for "pointer to function with two parameters 
    // Arg1 and Arg2 returning Ret"
    template <typename Ret, typename Arg1, typename Arg2>
    struct my_list<Ret (*)(Arg1, Arg2)>;
    

    A C++ compiler will always pick the partial specialization (if any) that is “closer” to the one requested in the template arguments. If no partial specialization matches, the primary template is chosen instead. The exact algorithm is not important here.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
    my_list<int> l0; // will pick the primary template T ← int
    
    my_list<int*> l1; // will pick partial specialization 1) 
    // where P ← int (note that respect to the primary template this is T ← int*)
    
    my_list<int[10]> l2; // will pick partial specialization 2) 
    // where Element ← int and Size ← 10
    
    my_list<int (*)(float, double)> l3; // will pick partial specialization 3) 
    // where Ret ← int, Arg1 ← float and Arg2 ← double
    

    I think this is enough for today regarding C++ templates. More craziness to come. Stay tuned.