ARM assembler in Raspberry Pi – Chapter 26
In this chapter we will talk about a fascinating step that is required to create a program, even when using assembler. Today we will talk about linking.
Linkers, the magic between symbols and addresses
Linkers are an essential yet often forgotten tool. Their main job is sticking all the pieces that form our program in a way that it can be executed. The fundamental work of a linker is binding symbolic names with addresses (i.e. physical names). This process is conceptually simple but it is full of interesting details. Linking is a necessary step when separate compilation is used.
Separate compilation and modules
Modules are a mechanism in which programming languages let their users split programs in different logical parts. Modularization requires some amount of support from the tools that implement the programming language. Separate compilation is a mechanism to achieve this. In C, a program may be decomposed in several source files. Usually compiling a C source file generates an object file, thus several source files will lead to several object files. These object files are combined using a linker. The linker generates the final program.
ELF
Given that several tools manipulate object files (compilers, assemblers, linkers) a common format comes handy. There are a few formats available for this purpose like COFF, Mach-O or ELF. In the UNIX world (including Linux) the most popular format is ELF (Executable and Linking Format). This format is used for object files (called relocatable objects, we will see below why), shared objects (dynamic libraries) and executables (the program itself).
For a linker, an ELF relocatable file is a collection of sections. Sections represent a contiguous chunk of data (which can be anything: instructions, initial values of global variables, debug information, etc). Each section has a name and attributes like whether it has to be allocated in memory, loaded from the image (i.e. the file that contains the program), whether it can be executed, whether it is writable, its size and alignment, etc.
Labels as symbolic names
When we use global variables we have to use the following schema:
1
2
3
4
5
6
7
8
9
.data:
var: .word 42
.text
func:
/* ... */
ldr r0, addr_of_var /* r0 ← &var */
ldr r0, [r0] /* r0 ← *r0 */
/* ... */
addr_of_var : .word var
The reason is that in ARM instructions we cannot encode the full 32-bit address of a variable inside an instruction. So it makes sense to keep the address in a place, in this case in addr_of_var
, which is amenable for finding it from the current instruction. In the case shown above, the assembler replaces the usage of addr_of_var
into something like this:
6
ldr r0, [pc, #offset]
Which means load the value found in the given offset of the current instruction. The assembler computes the right offset here so we do not have to. This is a valid approach because addr_of_var
is found in the same section as the instruction. This means that it will for sure be located after the instructions. It also happens that it is close enough in memory. This addressing mode can encode any offset of 12-bit (plus a sign bit) so anything within 4096 bytes (i.e. within 1024 instructions) is addressable this way.
But the question that remains is, what does the assembler put in the that location designated by addr_of_var
? We have written .word var
but what does this mean? The assembler should emit the address of var
, but at this point its address is unknown. So the assembler can only emit partial information at this point. This information will be completed later.
An example
Let’s consider a more complex example to see this process in action. Consider the following code that takes two global variables and adds them into a result variable. Then we call a function, that we will write in another file. This function will increment the result variable by one. The result variable has to be accessible from the other file, so we will have to mark it as global (similar to what we do with main
).
Let’s create an object file. Recall that an object file is an intermediate file that is used before we create the final program. Once created, we can use objdump -d
to see the code contained in this object file. (The use of -march=armv6
avoids some legacy info be emitted that would be confusing for the sake of the exposition)
Relocations
We said above that the assembler does not know the final value and instead may put some partial information (e.g. the offsets from .data
). It also annotates that some fix up is required here. This fix up is called a relocation
. We can read the relocations using flags -dr
of objdump
.
Relocations are rendered the output above like
They are also printed right after the point they affect.
OFFSET
is the offset inside the section for the bytes that will need fixing up (in this case all of them inside .text
). TYPE
is the kind of relocation. The kind of relocation determines which and how bytes are fixed up. VALUE
is a symbolic entity for which we have to figure the physical address. It can be a real symbol, like inc_result
and result_var
, or a section name like .data
.
In the current list, there is a relocation at .text+1c
so we can call the actual inc_result
. The other two relocations in .text+28
, .text+2c
are the relocations required to access .data
. These relocations could have as VALUE
the symbols one_var
and another_var
respectively but GNU as seems to prefer to represent them as offsets relative to .data
section. Finally .text+30
refers to the global symbol result_var
.
Every relocation kind is defined in terms of a few parameters: S
is the address of the symbol referred by the relocation (the VALUE
above), P
is the address of the place (the OFFSET
plus the address of the section itself), A
(for addenda) is the value that the assembler has left in place. In our example, R_ARM_ABS32
it is the value of the .word
, for R_ARM_CALL
it is a set of bits in the bl
instruction itself. Using these parameters, earch relocation has a related operation. Relocations of kind R_ARM_ABS32
do an operation S + A
. Relocations of kind R_ARM_CALL
do an operation (S + A) – P
.
T
that has the value 1
if the symbol S
is a Thumb function, 0
otherwise. This is not the case for our examples, so I have omitted T
in the description of the relocations aboveBefore we can see the result computed by the linker, we will define inc_result
otherwise linking will fail. This function will increment the value of addr_result
(whose storage is defined in the first file main.s
).
Let’s check the relocations as well.
We can see that it has a relocation for result_var
as expected.
Now we can combine the two object files to generate an executable binary.
And check the contents of the file. Our program will include a few functions from the C library that we can ignore.
From the output above we can observe that addr_one_var
is in address 0x00010578
, addr_another_var
is in address 0x0001057c
and addr_result
is in address 0x00010580
. The last one appears repeated, but this is because both files main.s
and inc_result.s
refer to it so they need to keep the address somewhere. Note that in both cases it contains the same address.
Let’s start with the relocations of addr_one_var
, addr_another_var
and addr_result
. These three relocations were R_ARM_ABS32
so their operation is S + A
. S
is the address of section .data
whose address can be determined also with objdump -h
(plus flag -w
to make it a bit more readable). A file may contain many sections so I will omit the uninteresting ones.
Column VMA
defines the address of the section. In our case .data
is located at 00010570
. And our variables are found in 0x00010578
, 0x0001057c and 0x00010580
. These are offsets 8, 12 and 16 respectively from the beginning of .data
. The linker has laid some other variables in this section before ours. We can see this asking the linker to print a map of the generated executable.
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
.data 0x00010570 0x14
0x00010570 PROVIDE (__data_start, .)
*(.data .data.* .gnu.linkonce.d.*)
.data 0x00010570 0x4 /usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/crt1.o
0x00010570 data_start
0x00010570 __data_start
.data 0x00010574 0x0 /usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/crti.o
.data 0x00010574 0x4 /usr/lib/gcc/arm-linux-gnueabihf/4.6/crtbegin.o
0x00010574 __dso_handle
.data 0x00010578 0xc main.o
0x00010580 result_var
.data 0x00010584 0x0 inc_result.o
.data 0x00010584 0x0 /usr/lib/arm-linux-gnueabihf/libc_nonshared.a(elf-init.oS)
.data 0x00010584 0x0 /usr/lib/gcc/arm-linux-gnueabihf/4.6/crtend.o
.data 0x00010584 0x0 /usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/cr
If you check lines 317 to 322, you will see that that the final .data
section (that effectively starts 0x00010570
as we checked above) of our program includes 4 bytes from crt1.o
for the symbols data_start
(and its alias __data_start
). File crtbegin.o
also has contributed a symbol __dso_handle
. These global symbols come from the C library. Only symbol result_var
appears here because is a global symbol, all other global variables are not global symbols. The storage, though, is accounted for all of them in line 323. They take 0xc bytes (i.e. 12 bytes because of 3 variables each one of 4 bytes).
So with this info we can infer what has happened: variable one_var
is in address 0x00010570, variable another_var
is in 0x00010574 and variable result_var is in 0x00010578. If you check the result of objdump -d test.exe
above you will see that
What about the call to inc_result
?
This one is a bit more involved. Recall that the relocation operation is (S + A) - P
. Here A
is 0
and P
is 0x000083ac
, S is 0x000083c4
. So the relocation has to define an offset of 24 bytes (83c4 - 83ac is 24(10). Instruction bl
encodes the offset by shifting it 2 bits to the right. So the current offset encoded in eb000004
is 16. Recall that the current pc
points to the current instruction plus 8 bytes, so this instruction is exactly telling us to jump to an offset + 24 bytes. Exactly what we wanted.
More information
Linkers are a bit of arcana because they must handle with the lowest level parts of code. So sometimes it is hard to find good resources on them.
Ian Lance Taylor, author of gold
, made a very nice linker essay in 20 chapters. If you want a book, Linkers & Loaders is not a bad one. The ELF standard is actually defined in two parts, a generic one and a processor specific one, including one for ARM.
That's all for today.