Think In Geek

In geek we trust

# Exploring AArch64 assembler – Chapter 7

In the previous installment of this series we saw how to alter the sequencing of our programs. Today we will see how we can reuse instructions by means of branches. Let’s talk about functions.

## Routines

In the process of solving a problem with a computer we will encounter that some of its steps are repeatedly performed. These steps, probably part of an algorithm, can be encoded using instructions. This means that we may end with many sequences of instructions whose purpose is the same. What if we could factor out these instructions in a single place, and use them when needed. This is the fundamental idea behind a routine. We rarely use the word routine nowadays and most programming languages use other names like functions, procedures, subroutines, methods, lambda expressions, etc. Sure, there are differences between them but all of them encompass the idea of reusing code. I will use the common term function.

## Using a function

Functions can be handled like values (i.e. like an integer value) and there are a few operations we can do with them. At this level, though, we will only care about two of them: getting the address of the function, which we will usually achieve in a trivial way just using a label, and calling the function, which is the interesting bit.

Functions will be a sequence of instructions that we will reuse. We will identify functions using the address of their first instruction. Most of the time a label will be used to designate the function as, recall, a label is an address.

### Calling a function

Calling a function is a process that in practice means passing some data to the function, branching to the address of the function. When the function ends, it will branch back to the caller.

A call to a function is at its core a branch, but it is a special branch that is so frequent that it is worth to devote an instruction only for it. In AArch64 this instruction is `bl` which means something like branch and link. It is an unconditional branch that performs what an unconditional branch does plus it sets to `x30` to the address of the next instruction after `bl`. Recall that `x30` is a general purpose register but in this case we’re giving it a special meaning: it contains the address where the function must branch back when it ends. For historical reasons, when `x30` is used with this purpose it is called the link register.

To return from a function, then, the only thing we have to do is just branch to the value of `x30`. There is an instruction to unconditionally branch to the address stored in a register called `br`. So a way to call a function and just return from it is the following.

```.text my_function: br x30 caller: bl my_function // more instructions ...```

But returning from a function is also a very common operation so rather than doing `br x30` we can just do `ret`.

```.text my_function: ret caller: bl my_function // more instructions ...```

## Parameter passing

Ok, so we know the basics of calling a function and returning from it. But, this way, the only thing we achieve is just reuse sequences of instructions that will always do the same. Many times we will want to parameterize the behaviour of the function: for instance we want a function that computes the average of two given numbers. This means that when calling the function we need a mechanism to pass those parameters to the function.

At this point we can consider several approaches. A first approach is using global variables where we will first put the values and the function will read those values and leave the result in some other global variable. This works, kind of, and some early programming languages worked this way. The problem is that this mechanism composes badly because it bars using recursion and even worse, it does not work in multithreaded environments. In a modern setting this approach is rarely used.

Another alternative, is, well, we can use a private memory that we use only for calling functions. And we give it some kind of stack discipline: we can put or remove things only on its top (but we can access the i-th element below the top of the stack). Before we call a function we put the arguments on the top of the stack. The function can access the top of the stack and the elements below it to retrieve the parameters. The result can also be put in the stack, for instance, the function can replace the parameters with the result value, so the caller only has to check the top of the stack again. This technique works well for recursion and also for multithreading. This is a technique used in some programming languages and in architectures with a severe shortage of registers (like 32-bit x86).

A hybrid approach, the common used in RISC architectures, involves devoting a few registers to pass parameters and if we run out of them, use a stack-like approach like the one described above. This works well because in general most function calls involve a few parameters, usually less than 4 or so. And given that in AArch64 we have around 30 registers, it makes sense to devote a few of them as the parameter passing. Which ones? Well, this is a conventional thing, and as a convention it should be agreed upon first.

We could use any convention (even made our own) but in AArch64 there is one already described in Procedure Call Standard for the ARM 64-bit Architecture (or for short the PCS). That is a very long document with lots of details. For the purpose of this series we will simplify the convention as follows:

• Registers `x0``x7` are used to pass parameters and return values. The value of these registers may be freely modified by the called function (the callee) so the caller cannot assume anything about their content, even if they are not used in the parameter passing or for the returned value. This means that these registers are in practice caller-saved.
• Registers `x8``x18` are temporary registers for every function. As such no assumption can be made on their values upon returning a function. In practice these registers are also caller-saved.
• Registers `x19``x28` are registers, that, if used by the function, must have their values preserved and later restored upon returning the function. This registers are known as callee-saved.
• We already know that register x30 is the link register and its value must be preserved until the function uses the `ret` instruction to return to the caller.

This means that we can pass up to 8 parameters in registers `x0` to `x7`. If we are passing an integer of 64-bit or an address we will use the corresponding `xi` register. For 32-bit integers we will use the corresponding `wi` (we won’t bother packing two 32-bit integers in a single 64-bit register).

What if we have to pass more than 8 parameters? How do we keep registers `x19` to `x28` and more importantly, how do we keep `x30`? Well, in this case we will have to use the stack, but we will leave this for another chapter. In this chapter we will use a global variable to temporarily store `x30`.

## Say hello!

Ok, equipped with this knowledge, we can now start doing some interesting examples. As a starter for today we will say hello. We can use the function `puts` of the C library for this. This function only receives a parameter: an address to a null-ended string.

```1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 .data   .balign 8 /* This is the greeting message */ say_hello: .asciz "Hello world!"   .balign 8 /* We need to keep x30 otherwise we will not be able to return from main! */ keep_x30: .dword 0   .text   /* We are going to call a C-library puts function */ .globl puts   .globl main main: ldr x0, addr_keep_x30 // w0 ← &keep_30 [64] str x30, [x0] // *keep_30 ← x30 [64]   ldr x0, addr_say_hello // w0 ← &say_hello [64] bl puts // call puts   ldr x0, addr_keep_x30 // w0 ← &keep_30 [64] ldr x30, [x0] // x30 ← *keep_30 [64]   mov w0, #0 // w0 ← 0 ret // return   addr_keep_x30 : .dword keep_x30 addr_say_hello: .dword say_hello```

In line 3 we make sure the next data emitted by the assembler is aligned to an address multiple of 8 bytes (64-bit). In this case we want the assembler to emit a null-ended string of the characters `Hello world!`. We can use the directive `.asciz` for this (line 5). In order to be able to use this string later we set the label `say_hello` which will be the address of such string.

Since we will not see how the stack is used in this chapter, we still need to save the value of `x30` somewhere. So we allocate some storage for it. Again we want this storage to be aligned to 8 bytes, so we use a `.balign` directive again (line 7). Then we define the storage itself and we label it as `keep_x30`, so we can refer to it later. As we know, `.dword` directive will emit the specified integer value as a 64-bit integer. That’s all for the data section of our small program.

In line 14 we say that we are going to use the symbol `puts`. This is the name of a function defined in the C library, so we use `.globl` (line 14) to state that this is a global symbol (in contrast of a private one). As we already know, we need to do the same for main (line 17).

Now check lines 30 and 31. Here we define storage that will contain the addresses of `say_hello` and `keep_x30`. As you recall from chapter 5, this is because we need to keep the addresses close to the load instruction.

Now back to line 18, here we load in `x0` the address of `keep_x30`. Now we can use this address to store the register `x30`, line 19.

Now that we have kept x30, we can call `puts`. First we need to prepare the function call following the convention described above. Function `puts` is a function in the C-library that only receives an address to a null-ended buffer of bytes. Precisely what we have in `say_hello`. As puts receives the address, not the contents themselves, we will use `addr_say_hello` instead. As described above the first parameter is passed in `x0`, so we just load the address of `say_hello` (that as said we have in `addr_say_hello`) in `x0`, line 21.

Now everything is in place, then we make the call to `puts` in line 22. If all is correct our program will continue in the next instruction of the call, line 24. Here we simply restore the value of `x30`, as the instruction `bl` in line 22 overwrote it. Basically we load again the address of `keep_x30` and we do a load with that address to the `x30` register lines 24-25. Now everything is in place to return, so we set `w0` to 0, line 27, and we return using `ret`, line 28.

If we try to run this program we will be greeted.

```\$ ./hello Hello world!```

Yay! 🙂

This is all for today!

### 2 thoughts on “Exploring AArch64 assembler – Chapter 7”

• Linus Fernandes says:

./hello CANNOT LINK EXECUTABLE “./hello”: text relocations (DT_TEXTREL) found in 64-bit ELF file “/data/data/com.termux/files/home/LearnAssembly/hello” Aborted

I get the above error while running the program.

This is my build file.
#! /usr/bin/env bash # display usage
[ \$# -eq 0 ] && { echo “Usage: \$0 “;exit 1; } set +e `rm -f \$1.exe \$1 \$1.o` `as -o \$1.o \$1.s` [ -e \$1.o ] && { file \$1.o;} `gcc -s -o \$1.exe \$1.o -fpic` `ld -s -o \$1 -pie –dynamic-linker /system/bin/linker64 /data/data/com.termux/files/usr/lib/crtbegin_dynamic.o \$1.o -lc -lgcc -ldl /data/data/com.termux/files/usr/lib/crtend_android.o` [ -e \$1.exe ] && { file \$1.exe;nohup ./\$1.exe; } [ -e \$1 ] && { file \$1;nohup ./\$1;}
set -e

• Roger Ferrer Ibáñez says:

Hi Linus,

Android is a platform that uses PIC (position independent code) so what the linker is trying to tell you is that you have a relocation (i.e. something that the linker or the dynamic linker, aka loader, should have to “fix”) in the part of the program that corresponds to executable code (called “text” for historical reasons).

Having relocations in code is bad because it means your code is actually not position independent: the loader will have to amend the instructions in memory so they addresses are resolved. This impacts startup of applications but also precludes sharing memory (you can’t reuse the code you have loaded between processes).

For example, to access a 32-bit variable called `my_global_var` you can do this:

```my_global_var:
.word	44

foo:
// x0 ← pc-relative-offset-to-GOT + high-part-GOT-entry-my_global_var

ldr	x0, [x0, #:got_lo12:my_global_var]
// x0 ← *(x0 + low-part-GOT-entry-my_global_var)
// now x0 has the address of my_global_var

ldr	w1, [x0]
// w1 ← *x0
// Now w1 has the value of my_global_var
...
```

I guess I should write at some point a chapter about PIC addressing in AArch64.

Kind regards.

This site uses Akismet to reduce spam. Learn how your comment data is processed.