ARM assembler in Raspberry Pi – Chapter 26
In this chapter we will talk about a fascinating step that is required to create a program, even when using assembler. Today we will talk about linking.
Linkers, the magic between symbols and addresses
Linkers are an essential yet often forgotten tool. Their main job is sticking all the pieces that form our program in a way that it can be executed. The fundamental work of a link is binding symbolic names with addresses (i.e. physical names). This process is conceptually simple but it is full of interesting details. Linking is a necessary step when separate compilation is used.
Separate compilation and modules
Modules are a mechanism in which programming languages let their users split programs in different logical parts. Modularization requires some amount of support from the tools that implement the programming language. Separate compilation is a mechanism to achieve this. In C, a program may be decomposed in several source files. Usually compiling a C source file generates an object file, thus several source files will lead to several object files. These object files are combined using a linker. The linker generates the final program.
Given that several tools manipulate object files (compilers, assemblers, linkers) a common format comes handy. There are a few formats available for this purpose like COFF, Mach-O or ELF. In the UNIX world (including Linux) the most popular format is ELF (Executable and Linking Format). This format is used for object files (called relocatable objects, we will see below why), shared objects (dynamic libraries) and executables (the program itself).
For a linker, an ELF relocatable file is a collection of sections. Sections represent a contiguous chunk of data (which can be anything: instructions, initial values of global variables, debug information, etc). Each section has a name and attributes like whether it has to be allocated in memory, loaded from the image (i.e. the file that contains the program), whether it can be executed, whether it is writable, its size and alignment, etc.
Labels as symbolic names
When we use global variables we have to use the following schema:
1 2 3 4 5 6 7 8 9
.data: var: .word 42 .text func: /* ... */ ldr r0, addr_of_var /* r0 ← &var */ ldr r0, [r0] /* r0 ← *r0 */ /* ... */ addr_of_var : .word var
The reason is that in ARM instructions we cannot encode the full 32-bit address of a variable inside an instruction. So it makes sense to keep the address in a place, in this case in
addr_of_var, which is amenable for finding it from the current instruction. In the case shown above, the assembler replaces the usage of
addr_of_var into something like this:
ldr r0, [pc, #offset]
Which means load the value found in the given offset of the current instruction. The assembler computes the right offset here so we do not have to. This is a valid approach because
addr_of_var is found in the same section as the instruction. This means that it will for sure be located after the instructions. It also happens that it is close enough in memory. This addressing mode can encode any offset of 12-bit (plus a sign bit) so anything within 4096 bytes (i.e. within 1024 instructions) is addressable this way.
But the question that remains is, what does the assembler put in the that location designated by
addr_of_var? We have written
.word var but what does this mean? The assembler should emit the address of
var, but at this point its address is unknown. So the assembler can only emit partial information at this point. This information will be completed later.
Let’s consider a more complex example to see this process in action. Consider the following code that takes two global variables and adds them into a result variable. Then we call a function, that we will write in another file. This function will increment the result variable by one. The result variable has to be accessible from the other file, so we will have to mark it as global (similar to what we do with
/* main.s */ .data one_var : .word 42 another_var : .word 66 .globl result_var /* mark result_var as global */ result_var : .word 0 .text .globl main main: ldr r0, addr_one_var /* r0 ← &one_var */ ldr r0, [r0] /* r0 ← *r0 */ ldr r1, addr_another_var /* r1 ← &another_var */ ldr r1, [r1] /* r1 ← *r1 */ add r0, r0, r1 /* r0 ← r0 + r1 */ ldr r1, addr_result /* r1 ← &result */ str r0, [r1] /* *r1 ← r0 */ bl inc_result /* call to inc_result */ mov r0, #0 /* r0 ← 0 */ bx lr /* return */ addr_one_var : .word one_var addr_another_var : .word another_var addr_result : .word result_var
Let’s create an object file. Recall that an object file is an intermediate file that is used before we create the final program. Once created, we can use
objdump -d to see the code contained in this object file. (The use of
-march=armv6 avoids some legacy info be emitted that would be confusing for the sake of the exposition)
$ as -march=armv6 -o main.o main.s # creates object file main.o
We said above that the assembler does not know the final value and instead may put some partial information (e.g. the offsets from
.data). It also annotates that some fix up is required here. This fix up is called a
relocation. We can read the relocations using flags
$ objdump -dr main.o
main.o: file format elf32-littlearm Disassembly of section .text: 00000000 <main>: 0: e59f0020 ldr r0, [pc, #32] ; 28 <addr_one_var> 4: e5900000 ldr r0, [r0] 8: e59f101c ldr r1, [pc, #28] ; 2c <addr_another_var> c: e5911000 ldr r1, [r1] 10: e0800001 add r0, r0, r1 14: e59f1014 ldr r1, [pc, #20] ; 30 <addr_result> 18: e5810000 str r0, [r1] 1c: ebfffffe bl 0 <inc_result> 1c: R_ARM_CALL inc_result 20: e3a00000 mov r0, #0 24: e12fff1e bx lr 00000028 <addr_one_var>: 28: 00000000 .word 0x00000000 28: R_ARM_ABS32 .data 0000002c <addr_another_var>: 2c: 00000004 .word 0x00000004 2c: R_ARM_ABS32 .data 00000030 <addr_result>: 30: 00000000 .word 0x00000000 30: R_ARM_ABS32 result_var
Relocations are rendered the output above like
OFFSET: TYPE VALUE
They are also printed right after the point they affect.
OFFSET is the offset inside the section for the bytes that will need fixing up (in this case all of them inside
TYPE is the kind of relocation. The kind of relocation determines which and how bytes are fixed up.
VALUE is a symbolic entity for which we have to figure the physical address. It can be a real symbol, like
result_var, or a section name like
In the current list, there is a relocation at
.text+1c so we can call the actual
inc_result. The other two relocations in
.text+2c are the relocations required to access
.data. These relocations could have as
VALUE the symbols
another_var respectively but GNU as seems to prefer to represent them as offsets relative to
.data section. Finally
.text+30 refers to the global symbol
Every relocation kind is defined in terms of a few parameters:
S is the address of the symbol referred by the relocation (the
P is the address of the place (the
OFFSET plus the address of the section itself),
A (for addenda) is the value that the assembler has left in place. In our example,
R_ARM_ABS32 it is the value of the
R_ARM_CALL it is a set of bits in the
bl instruction itself. Using these parameters, earch relocation has a related operation. Relocations of kind
R_ARM_ABS32 do an operation
S + A. Relocations of kind
R_ARM_CALL do an operation
(S + A) – P.
Tthat has the value
1if the symbol
Sis a Thumb function,
0otherwise. This is not the case for our examples, so I have omitted
Tin the description of the relocations above
Before we can see the result computed by the linker, we will define
inc_result otherwise linking will fail. This function will increment the value of
addr_result (whose storage is defined in the first file
/* inc_result.s */ .text .globl inc_result inc_result: ldr r1, addr_result /* r1 ← &result */ ldr r0, [r1] /* r0 ← *r1 */ add r0, r0, #1 /* r0 ← r0 + 1 */ str r0, [r1] /* *r1 ← r0 */ bx lr /* return */ addr_result : .word result_var
Let’s check the relocations as well.
$ as -march=armv6 -o inc_result.o inc_result.s $ objdump -dr inc_result.o
inc_result.o: file format elf32-littlearm Disassembly of section .text: 00000000 <inc_result>: 0: e59f100c ldr r1, [pc, #12] ; 14 <addr_result> 4: e5910000 ldr r0, [r1] 8: e2800001 add r0, r0, #1 c: e5810000 str r0, [r1] 10: e12fff1e bx lr 00000014 <addr_result>: 14: 00000000 .word 0x00000000 14: R_ARM_ABS32 result_var
We can see that it has a relocation for
result_var as expected.
Now we can combine the two object files to generate an executable binary.
$ gcc -o test.exe print_float.o reloc.o
And check the contents of the file. Our program will include a few functions from the C library that we can ignore.
$ objdump -d test.exe
... 00008390 <main>: 8390: e59f0020 ldr r0, [pc, #32] ; 83b8 <addr_one_var> 8394: e5900000 ldr r0, [r0] 8398: e59f101c ldr r1, [pc, #28] ; 83bc <addr_another_var> 839c: e5911000 ldr r1, [r1] 83a0: e0800001 add r0, r0, r1 83a4: e59f1014 ldr r1, [pc, #20] ; 83c0 <addr_result> 83a8: e5810000 str r0, [r1] 83ac: eb000004 bl 83c4 <inc_result> 83b0: e3a00000 mov r0, #0 83b4: e12fff1e bx lr 000083b8 <addr_one_var>: 83b8: 00010578 .word 0x00010578 000083bc <addr_another_var>: 83bc: 0001057c .word 0x0001057c 000083c0 <addr_result>: 83c0: 00010580 .word 0x00010580 000083c4 <inc_result>: 83c4: e59f100c ldr r1, [pc, #12] ; 83d8 <addr_result> 83c8: e5910000 ldr r0, [r1] 83cc: e2800001 add r0, r0, #1 83d0: e5810000 str r0, [r1] 83d4: e12fff1e bx lr 000083d8 <addr_result>: 83d8: 00010580 .word 0x00010580 ...
From the output above we can observe that
addr_one_var is in address
addr_another_var is in address
addr_result is in address
0x00010580. The last one appears repeated, but this is because both files
inc_result.s refer to it so they need to keep the address somewhere. Note that in both cases it contains the same address.
Let’s start with the relocations of
addr_result. These three relocations were
R_ARM_ABS32 so their operation is
S + A.
S is the address of section
.data whose address can be determined also with
objdump -h (plus flag
-w to make it a bit more readable). A file may contain many sections so I will omit the uninteresting ones.
$ objdump -hw test.exe
test.exe: file format elf32-littlearm Sections: Idx Name Size VMA LMA File off Algn Flags ... 13 .text 0000015c 000082e4 000082e4 000002e4 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE ... 23 .data 00000014 00010570 00010570 00000570 2**2 CONTENTS, ALLOC, LOAD, DATA ...
VMA defines the address of the section. In our case
.data is located at
00010570. And our variables are found in
0x00010578, 0x0001057c and
0x00010580. These are offsets 8, 12 and 16 respectively from the beginning of
.data. The linker has laid some other variables in this section before ours. We can see this asking the linker to print a map of the generated executable.
$ gcc -o test.exe main.o inc_result.o -Wl,--print-map > map.txt $ cat map.txt
314 315 316 317 318 319 320 321 322 323 324 325 326 327 328
.data 0x00010570 0x14 0x00010570 PROVIDE (__data_start, .) *(.data .data.* .gnu.linkonce.d.*) .data 0x00010570 0x4 /usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/crt1.o 0x00010570 data_start 0x00010570 __data_start .data 0x00010574 0x0 /usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/crti.o .data 0x00010574 0x4 /usr/lib/gcc/arm-linux-gnueabihf/4.6/crtbegin.o 0x00010574 __dso_handle .data 0x00010578 0xc main.o 0x00010580 result_var .data 0x00010584 0x0 inc_result.o .data 0x00010584 0x0 /usr/lib/arm-linux-gnueabihf/libc_nonshared.a(elf-init.oS) .data 0x00010584 0x0 /usr/lib/gcc/arm-linux-gnueabihf/4.6/crtend.o .data 0x00010584 0x0 /usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/cr
If you check lines 317 to 322, you will see that that the final
.data section (that effectively starts
0x00010570 as we checked above) of our program includes 4 bytes from
crt1.o for the symbols
data_start (and its alias
crtbegin.o also has contributed a symbol
__dso_handle. These global symbols come from the C library. Only symbol
result_var appears here because is a global symbol, all other global variables are not global symbols. The storage, though, is accounted for all of them in line 323. They take 0xc bytes (i.e. 12 bytes because of 3 variables each one of 4 bytes).
So with this info we can infer what has happened: variable
one_var is in address 0x00010570, variable
another_var is in 0x00010574 and variable result_var is in 0x00010578. If you check the result of
objdump -d test.exe above you will see that
000083b8 <addr_one_var>: 83b8: 00010578 .word 0x00010578 000083bc <addr_another_var>: 83bc: 0001057c .word 0x0001057c 000083c0 <addr_result>: 83c0: 00010580 .word 0x00010580 ... 000083d8 <addr_result>: 83d8: 00010580 .word 0x00010580
What about the call to
83ac: eb000004 bl 83c4
This one is a bit more involved. Recall that the relocation operation is
(S + A) - P. Here
0x000083ac, S is
0x000083c4. So the relocation has to define an offset of 24 bytes (83c4 – 83ac is 24(10). Instruction
bl encodes the offset by shifting it 2 bits to the right. So the current offset encoded in
eb000004 is 16. Recall that the current
pc points to the current instruction plus 8 bytes, so this instruction is exactly telling us to jump to an offset + 24 bytes. Exactly what we wanted.
... 83ac: eb000004 bl 83c4 <inc_result> 83b0: e3a00000 mov r0, #0 83b4: e12fff1e bx lr 000083b8 <addr_one_var>: 83b8: 00010578 .word 0x00010578 000083bc <addr_another_var>: 83bc: 0001057c .word 0x0001057c 000083c0 <addr_result>: 83c0: 00010580 .word 0x00010580 000083c4 <inc_result>: 83c4: e59f100c ldr r1, [pc, #12] ; 83d8 <addr_result> ...
Linkers are a bit of arcana because they must handle with the lowest level parts of code. So sometimes it is hard to find good resources on them.
Ian Lance Taylor, author of
gold, made a very nice linker essay in 20 chapters. If you want a book, Linkers & Loaders is not a bad one. The ELF standard is actually defined in two parts, a generic one and a processor specific one, including one for ARM.
That’s all for today.