ARM assembler in Raspberry Pi

In this chapter we will talk about a fascinating step that is required to create a program, even when using assembler. Today we will talk about linking.

Linkers, the magic between symbols and addresses

Linkers are an essential yet often forgotten tool. Their main job is sticking all the pieces that form our program in a way that it can be executed. The fundamental work of a linker is binding symbolic names with addresses (i.e. physical names). This process is conceptually simple but it is full of interesting details. Linking is a necessary step when separate compilation is used.

Separate compilation and modules

Modules are a mechanism in which programming languages let their users split programs in different logical parts. Modularization requires some amount of support from the tools that implement the programming language. Separate compilation is a mechanism to achieve this. In C, a program may be decomposed in several source files. Usually compiling a C source file generates an object file, thus several source files will lead to several object files. These object files are combined using a linker. The linker generates the final program.

ELF

Given that several tools manipulate object files (compilers, assemblers, linkers) a common format comes handy. There are a few formats available for this purpose like COFF, Mach-O or ELF. In the UNIX world (including Linux) the most popular format is ELF (Executable and Linking Format). This format is used for object files (called relocatable objects, we will see below why), shared objects (dynamic libraries) and executables (the program itself).

For a linker, an ELF relocatable file is a collection of sections. Sections represent a contiguous chunk of data (which can be anything: instructions, initial values of global variables, debug information, etc). Each section has a name and attributes like whether it has to be allocated in memory, loaded from the image (i.e. the file that contains the program), whether it can be executed, whether it is writable, its size and alignment, etc.

Labels as symbolic names

When we use global variables we have to use the following schema:

1
2
3
4
5
6
7
8
9
.data:
var: .word 42
.text
func:
    /* ... */
    ldr r0, addr_of_var  /* r0 ← &var */
    ldr r0, [r0]         /* r0 ← *r0 */
    /* ... */
addr_of_var : .word var

The reason is that in ARM instructions we cannot encode the full 32-bit address of a variable inside an instruction. So it makes sense to keep the address in a place, in this case in addr_of_var, which is amenable for finding it from the current instruction. In the case shown above, the assembler replaces the usage of addr_of_var into something like this:

6
   ldr r0, [pc, #offset]

Which means load the value found in the given offset of the current instruction. The assembler computes the right offset here so we do not have to. This is a valid approach because addr_of_var is found in the same section as the instruction. This means that it will for sure be located after the instructions. It also happens that it is close enough in memory. This addressing mode can encode any offset of 12-bit (plus a sign bit) so anything within 4096 bytes (i.e. within 1024 instructions) is addressable this way.

But the question that remains is, what does the assembler put in the that location designated by addr_of_var? We have written .word var but what does this mean? The assembler should emit the address of var, but at this point its address is unknown. So the assembler can only emit partial information at this point. This information will be completed later.

An example

Let’s consider a more complex example to see this process in action. Consider the following code that takes two global variables and adds them into a result variable. Then we call a function, that we will write in another file. This function will increment the result variable by one. The result variable has to be accessible from the other file, so we will have to mark it as global (similar to what we do with main).

/* main.s */
.data

one_var : .word 42
another_var : .word 66

.globl result_var             /* mark result_var as global */
result_var : .word 0

.text

.globl main
main:
    ldr r0, addr_one_var      /* r0 ← &one_var */
    ldr r0, [r0]              /* r0 ← *r0 */
    ldr r1, addr_another_var  /* r1 ← &another_var */
    ldr r1, [r1]              /* r1 ← *r1 */
    add r0, r0, r1            /* r0 ← r0 + r1 */
    ldr r1, addr_result       /* r1 ← &result */
    str r0, [r1]              /* *r1 ← r0 */
    bl inc_result             /* call to inc_result */
    mov r0, #0                /* r0 ← 0 */
    bx lr                     /* return */
   

addr_one_var  : .word one_var
addr_another_var  : .word another_var
addr_result  : .word result_var

Let’s create an object file. Recall that an object file is an intermediate file that is used before we create the final program. Once created, we can use objdump -d to see the code contained in this object file. (The use of -march=armv6 avoids some legacy info be emitted that would be confusing for the sake of the exposition)

$ as -march=armv6 -o main.o main.s      # creates object file main.o

Relocations

We said above that the assembler does not know the final value and instead may put some partial information (e.g. the offsets from .data). It also annotates that some fix up is required here. This fix up is called a relocation. We can read the relocations using flags -dr of objdump.

$ objdump -dr main.o

main.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <main>:
   0:	e59f0020 	ldr	r0, [pc, #32]	; 28 <addr_one_var>
   4:	e5900000 	ldr	r0, [r0]
   8:	e59f101c 	ldr	r1, [pc, #28]	; 2c <addr_another_var>
   c:	e5911000 	ldr	r1, [r1]
  10:	e0800001 	add	r0, r0, r1
  14:	e59f1014 	ldr	r1, [pc, #20]	; 30 <addr_result>
  18:	e5810000 	str	r0, [r1]
  1c:	ebfffffe 	bl	0 <inc_result>
			1c: R_ARM_CALL	inc_result
  20:	e3a00000 	mov	r0, #0
  24:	e12fff1e 	bx	lr

00000028 <addr_one_var>:
  28:	00000000 	.word	0x00000000
			28: R_ARM_ABS32	.data

0000002c <addr_another_var>:
  2c:	00000004 	.word	0x00000004
			2c: R_ARM_ABS32	.data

00000030 <addr_result>:
  30:	00000000 	.word	0x00000000
			30: R_ARM_ABS32	result_var

Relocations are rendered the output above like

			OFFSET: TYPE	VALUE

They are also printed right after the point they affect.

OFFSET is the offset inside the section for the bytes that will need fixing up (in this case all of them inside .text). TYPE is the kind of relocation. The kind of relocation determines which and how bytes are fixed up. VALUE is a symbolic entity for which we have to figure the physical address. It can be a real symbol, like inc_result and result_var, or a section name like .data.

In the current list, there is a relocation at .text+1c so we can call the actual inc_result. The other two relocations in .text+28, .text+2c are the relocations required to access .data. These relocations could have as VALUE the symbols one_var and another_var respectively but GNU as seems to prefer to represent them as offsets relative to .data section. Finally .text+30 refers to the global symbol result_var.

Every relocation kind is defined in terms of a few parameters: S is the address of the symbol referred by the relocation (the VALUE above), P is the address of the place (the OFFSET plus the address of the section itself), A (for addenda) is the value that the assembler has left in place. In our example, R_ARM_ABS32 it is the value of the .word, for R_ARM_CALL it is a set of bits in the bl instruction itself. Using these parameters, earch relocation has a related operation. Relocations of kind R_ARM_ABS32 do an operation S + A. Relocations of kind R_ARM_CALL do an operation (S + A) – P.

Due to Thumb, ARM relocations have an extra parameter T that has the value 1 if the symbol S is a Thumb function, 0 otherwise. This is not the case for our examples, so I have omitted T in the description of the relocations above

Before we can see the result computed by the linker, we will define inc_result otherwise linking will fail. This function will increment the value of addr_result (whose storage is defined in the first file main.s).

/* inc_result.s */
.text

.globl inc_result
inc_result:
    ldr r1, addr_result  /* r1 ← &result */
    ldr r0, [r1]         /* r0 ← *r1 */
    add r0, r0, #1       /* r0 ← r0 + 1 */
    str r0, [r1]         /* *r1 ← r0 */
    bx lr                /* return */

addr_result  : .word result_var

Let’s check the relocations as well.

$ as -march=armv6 -o inc_result.o inc_result.s
$ objdump -dr inc_result.o

inc_result.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <inc_result>:
   0:	e59f100c 	ldr	r1, [pc, #12]	; 14 <addr_result>
   4:	e5910000 	ldr	r0, [r1]
   8:	e2800001 	add	r0, r0, #1
   c:	e5810000 	str	r0, [r1]
  10:	e12fff1e 	bx	lr

00000014 <addr_result>:
  14:	00000000 	.word	0x00000000
			14: R_ARM_ABS32	result_var

We can see that it has a relocation for result_var as expected.

Now we can combine the two object files to generate an executable binary.

$ gcc -o test.exe print_float.o reloc.o

And check the contents of the file. Our program will include a few functions from the C library that we can ignore.

$ objdump -d test.exe

...
00008390 <main>:
    8390:       e59f0020        ldr     r0, [pc, #32]   ; 83b8 <addr_one_var>
    8394:       e5900000        ldr     r0, [r0]
    8398:       e59f101c        ldr     r1, [pc, #28]   ; 83bc <addr_another_var>
    839c:       e5911000        ldr     r1, [r1]
    83a0:       e0800001        add     r0, r0, r1
    83a4:       e59f1014        ldr     r1, [pc, #20]   ; 83c0 <addr_result>
    83a8:       e5810000        str     r0, [r1]
    83ac:       eb000004        bl      83c4 <inc_result>
    83b0:       e3a00000        mov     r0, #0
    83b4:       e12fff1e        bx      lr

000083b8 <addr_one_var>:
    83b8:       00010578        .word   0x00010578

000083bc <addr_another_var>:
    83bc:       0001057c        .word   0x0001057c

000083c0 <addr_result>:
    83c0:       00010580        .word   0x00010580

000083c4 <inc_result>:
    83c4:       e59f100c        ldr     r1, [pc, #12]   ; 83d8 <addr_result>
    83c8:       e5910000        ldr     r0, [r1]
    83cc:       e2800001        add     r0, r0, #1
    83d0:       e5810000        str     r0, [r1]
    83d4:       e12fff1e        bx      lr

000083d8 <addr_result>:
    83d8:       00010580        .word   0x00010580

...

From the output above we can observe that addr_one_var is in address 0x00010578, addr_another_var is in address 0x0001057c and addr_result is in address 0x00010580. The last one appears repeated, but this is because both files main.s and inc_result.s refer to it so they need to keep the address somewhere. Note that in both cases it contains the same address.

Let’s start with the relocations of addr_one_var, addr_another_var and addr_result. These three relocations were R_ARM_ABS32 so their operation is S + A. S is the address of section .data whose address can be determined also with objdump -h (plus flag -w to make it a bit more readable). A file may contain many sections so I will omit the uninteresting ones.

$ objdump -hw test.exe

test.exe:     file format elf32-littlearm

Sections:
Idx Name          Size      VMA       LMA       File off  Algn  Flags
...
 13 .text         0000015c  000082e4  000082e4  000002e4  2**2  CONTENTS, ALLOC, LOAD, READONLY, CODE
...
 23 .data         00000014  00010570  00010570  00000570  2**2  CONTENTS, ALLOC, LOAD, DATA
...

Column VMA defines the address of the section. In our case .data is located at 00010570. And our variables are found in 0x00010578, 0x0001057c and 0x00010580. These are offsets 8, 12 and 16 respectively from the beginning of .data. The linker has laid some other variables in this section before ours. We can see this asking the linker to print a map of the generated executable.

$ gcc -o test.exe main.o inc_result.o -Wl,--print-map > map.txt
$ cat map.txt

314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
.data           0x00010570       0x14
                0x00010570                PROVIDE (__data_start, .)
 *(.data .data.* .gnu.linkonce.d.*)
 .data          0x00010570        0x4 /usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/crt1.o
                0x00010570                data_start
                0x00010570                __data_start
 .data          0x00010574        0x0 /usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/crti.o
 .data          0x00010574        0x4 /usr/lib/gcc/arm-linux-gnueabihf/4.6/crtbegin.o
                0x00010574                __dso_handle
 .data          0x00010578        0xc main.o
                0x00010580                result_var
 .data          0x00010584        0x0 inc_result.o
 .data          0x00010584        0x0 /usr/lib/arm-linux-gnueabihf/libc_nonshared.a(elf-init.oS)
 .data          0x00010584        0x0 /usr/lib/gcc/arm-linux-gnueabihf/4.6/crtend.o
 .data          0x00010584        0x0 /usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/cr

If you check lines 317 to 322, you will see that that the final .data section (that effectively starts 0x00010570 as we checked above) of our program includes 4 bytes from crt1.o for the symbols data_start (and its alias __data_start). File crtbegin.o also has contributed a symbol __dso_handle. These global symbols come from the C library. Only symbol result_var appears here because is a global symbol, all other global variables are not global symbols. The storage, though, is accounted for all of them in line 323. They take 0xc bytes (i.e. 12 bytes because of 3 variables each one of 4 bytes).

So with this info we can infer what has happened: variable one_var is in address 0x00010570, variable another_var is in 0x00010574 and variable result_var is in 0x00010578. If you check the result of objdump -d test.exe above you will see that

000083b8 <addr_one_var>:
    83b8:       00010578        .word   0x00010578

000083bc <addr_another_var>:
    83bc:       0001057c        .word   0x0001057c

000083c0 <addr_result>:
    83c0:       00010580        .word   0x00010580
...
000083d8 <addr_result>:
    83d8:       00010580        .word   0x00010580

What about the call to inc_result?

    83ac:       eb000004        bl      83c4

This one is a bit more involved. Recall that the relocation operation is (S + A) - P. Here A is 0 and P is 0x000083ac, S is 0x000083c4. So the relocation has to define an offset of 24 bytes (83c4 - 83ac is 24₍₁₀). Instruction bl encodes the offset by shifting it 2 bits to the right. So the current offset encoded in eb000004 is 16. Recall that the current pc points to the current instruction plus 8 bytes, so this instruction is exactly telling us to jump to an offset + 24 bytes. Exactly what we wanted.

...
    83ac:       eb000004        bl      83c4 <inc_result>
    83b0:       e3a00000        mov     r0, #0
    83b4:       e12fff1e        bx      lr

000083b8 <addr_one_var>:
    83b8:       00010578        .word   0x00010578

000083bc <addr_another_var>:
    83bc:       0001057c        .word   0x0001057c

000083c0 <addr_result>:
    83c0:       00010580        .word   0x00010580

000083c4 <inc_result>:
    83c4:       e59f100c        ldr     r1, [pc, #12]   ; 83d8 <addr_result>

...

More information

Linkers are a bit of arcana because they must handle with the lowest level parts of code. So sometimes it is hard to find good resources on them.

Ian Lance Taylor, author of gold, made a very nice linker essay in 20 chapters. If you want a book, Linkers & Loaders is not a bad one. The ELF standard is actually defined in two parts, a generic one and a processor specific one, including one for ARM.

That's all for today.