ARM assembler in Raspberry Pi

In this chapter we will delve a bit more into the stack.

Local data

Most of our examples involving data stored in memory (in contrast to data stored in registers) have used global variables. Global variables are global names, i.e. addresses of the memory that we use through labels. These addresses, somehow, pre-exist before the program runs. This is because we define them when defining the program itself.

Sometimes, though, we may want data stored in memory the existence of which is not tied to the program existence but to the dynamic activation of a function. You may recall from previous chapters, that the stack allows us to store data the lifetime of which is the same as the dynamic activation of a function. This is where we will store local variables, which in contrast to global variables, only exist because the function they belong has been dynamically activated (i.e. called/invoked).

In chapter 17 we passed a very big array through the stack in order to pass the array by value. This will lead us to the conclusion that, somehow, parameters act as local data, in particular when they are passed through the stack.

The frame pointer

In ARM, we have plenty of general-purpose registers (up to 16, albeit some of them with very narrow semantics, so actually about 12 are actually useable as general-purpose) and the AAPCS forces us to use registers for the 4 first parameters (r0 to r3, note how this is consistent with the fact that these 4 registers are caller-saved while all other registers are callee-saved). Other architectures, like 386, have a lower number of general purpose registers (about 6) and the usual approach when passing data to functions always involves the stack. This is so because with such a small number of registers, passing parameters through registers, would force the caller to save them, usually in the stack or some other memory, which in turn will usually require at least another register for indexing! By using the stack a few more registers are easily available.

Up to this point one might wonder why we don't always pass everything through the stack and forget about registers r0 to r3. Well, passing through registers is going to be faster as we do not have to mess with loads and stores in the memory. In addition, most functions receive just a few parameters, or at least not much more than 4, so it makes sense to exploit this feature.

But then a problem arises, what if we are passing parameters through the stack and at the same time we have local variables. Both entities will be stored in the stack. How can we deal with the two sources of data which happen to be stored in the same memory area?

Here is where the concept of frame pointer appears. A frame pointer is a sort of marker in the stack that we will use to tell apart local variables from parameters. I want to emphasize the fact that a frame register is almost always unnecessary and one can always devise ways to avoid it. That said, a frame pointer gives us a consistent solution to access local data and parameters in the stack. Of course, most good things come with a price, and the frame pointer is not an exception: we need to use a register for it. Sometimes this restriction may be unacceptable so we can, almost always, get rid of the frame pointer.

Due to its optional nature, the frame pointer is not specified nor mandated by the AAPCS. That said, the usual approach is using register r11. As an extension (apparently undocumented, as far as I have been able to tell) we can use the name fp which is far more informative than just r11. Nothing enforces this choice, we can use any other register as frame pointer. Since we will use fp (i.e. r11) we will have to refrain ourselves from using r11 for any other purpose.

Dynamic link of the activation record

Activation record is a fancy name to specify the context of a called function. This is, the local data and parameters (if passed through the stack) of that function. When a function is written using a frame pointer some bookkeeping is required to correctly maintain the activation record.

First lets examine the typical structure of a function.

1
2
3
4
5
6
function:
  /* Keep callee-saved registers */
  push {r4, lr} /* Keep the callee saved registers */
  ... /* code of the function */
  pop {r4, lr}  /* Restore the callee saved registers */
  bx lr         /* Return from the function */

Now let's modify the function to use a frame pointer (in the code snippet below do not mind the r5 register that only appears here to keep the stack 8-byte aligned).

1
2
3
4
5
6
7
8
9
10
11
function:
  /* Keep callee-saved registers */
  push {r4, r5, fp, lr} /* Keep the callee saved registers.
                           We added r5 to keep the stack 8-byte aligned
                           but the important thing here is fp */
  mov fp, sp            /* fp ← sp. Keep dynamic link in fp */
  ... /* code of the function */
  mov sp, fp            /* sp ← fp. Restore dynamic link in fp */
  pop {r4, r5, fp, lr}  /* Restore the callee saved registers.
                           This will restore fp as well */
  bx lr         /* Return from the function */

Focus on instructions at line 6 and 8. In line 6 we keep the address of the top of the stack in fp. In line 8 we restore the value of the stack using the value kept in fp. Now you should see why I said that the frame pointer is usually unnecessary: if the sp register does not change between lines 6 and 8, having a frame pointer will be pointless, why should we restore a register that didn't change?

Let's assume for now that the frame pointer is going to be useful. What we did in instruction line 6 is setting the dynamic link. The stack and registers will look like this after we have set it.

As you can see, the fp register will point to the top of the stack. But note that in the stack we have the value of the old fp (the value of the fp in the function that called us). If we assume that our caller also uses a frame pointer, then the fp we kept in the stack of the callee points to the top of the stack when our caller was called.

But still this looks useless because both registers fp and sp in the current function point to the same position in the stack.

Let's proceed with the example, make sure you check line 7.

1
2
3
4
5
6
7
8
9
10
11
12
function:
  /* Keep callee-saved registers */
  push {r4, r5, fp, lr} /* Keep the callee saved registers.
                           We added r5 to keep the stack 8-byte aligned
                           but the important thing here is fp */
  mov fp, sp            /* fp ← sp. Keep dynamic link in fp */
  sub sp, sp, #8        /* Enlarge the stack by 8 bytes */
  ... /* code of the function */
  mov sp, fp            /* sp ← fp. Restore dynamic link in fp */
  pop {r4, r5, fp, lr}  /* Restore the callee saved registers.
                           This will restore fp as well */
  bx lr         /* Return from the function */

Now, after line 7, the stack and registers will look like this.

Can you see the range of data from sp to fp? This is the local data of our function. We will keep local variables of a function in this space when using a frame pointer. We simply have to allocate stack space by decreasing the value of sp (and ensuring it is 8-byte aligned per AAPCS requirements).

Now consider the instruction mov sp, fp near the end of the function. What it does is leaving the state of the registers just like before we enlarged the stack (before the sub sp, sp, #8). And voilà, we have freed all the stack our function was using. A bonus of this approach is that it does not require keeping anywhere the amount of bytes we reserved in the stack. Neat, isn't it?

What about parameters passed in the stack?

A player is still missing in our frame pointer approach: parameters passed through the stack. Let's assume that our function may receive parameters in the stack and we have enlarged the stack by subtracting sp. The whole picture looks like this.

path4864

I want you to note that I just lied a bit in the two first figures. In them, the old fp pointer kept in the stack pointed to the top of the stack of the caller. Not exactly, it will point to the base of the local data of the caller, exactly like happens with the fp register in the current function.

Indexing through the frame pointer

When we are using a frame pointer a nice property (that maybe you have already deduced from the figures above) holds: local data is always at lower addresses than the address pointed by fp while parameters passed in the stack (if any) will always be at higher addresses than the one pointed by fp. It must be possible to access both kinds of local data through fp.

In the following example we will use a function that receives an integer by reference (i.e. an address to an integer) and then squares that integer.

void sq(int *c)
{
  (*c) = (*c) * (*c);
}

You may be wondering why the function sq has a parameter by reference (should not it be easier to return a value?), but bear with me for now. We can (should?) implement sq without using a frame pointer due to its simplicity.

1
2
3
4
5
sq: 
  ldr r1, [r0]   /* r1 ← (*r0) */
  mul r1, r1, r1 /* r1 ← r1 * r1 */
  str r1, [r0]   /* (*r0) ← r1 */
  bx lr          /* Return from the function */

Now consider the following function that returns the sum of the squares of its five parameters. It uses the function sq defined above.

int sq_sum5(int a, int b, int c, int d, int e)
{
  sq(&a);
  sq(&b);
  sq(&c);
  sq(&d);
  sq(&e);
  return a + b + c + d + e;
}

Parameters a, b, c and d will be passed through registers r0, r1, r2, and r3 respectively. The parameter e will be passed through the stack. The function sq, though, expects a reference, i.e. an address, to an integer and registers do not have an address. This means we will have to allocate temporary local storage for these registers. At least one integer will have to be allocated in the stack in order to be able to call sq but for simplicity we will allocate four of them.

This time we will use a frame pointer to access both the local storage and the parameter e.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
sq_sum5:
  push {fp, lr}         /* Keep fp and all callee-saved registers. */
  mov fp, sp            /* Set the dynamic link */

  sub sp, sp, #16      /* sp ← sp - 16. Allocate space for 4 integers in the stack */
  /* Keep parameters in the stack */
  str r0, [fp, #-16]    /* *(fp - 16) ← r0 */
  str r1, [fp, #-12]    /* *(fp - 12) ← r1 */
  str r2, [fp, #-8]     /* *(fp - 8) ← r2 */
  str r3, [fp, #-4]     /* *(fp - 4) ← r3 */

  /* At this point the stack looks like this
     | Value  |  Address(es)
     +--------+-----------------------
     |   r0   |  [fp, #-16], [sp]
     |   r1   |  [fp, #-12], [sp, #4]
     |   r2   |  [fp, #-8],  [sp, #8]
     |   r3   |  [fp, #-4],  [sp, #12]
     |   fp   |  [fp],       [sp, #16]
     |   lr   |  [fp, #4],   [sp, #20]
     |   e    |  [fp, #8],   [sp, #24]
     v
   Higher
   addresses
  */

  sub r0, fp, #16    /* r0 ← fp - 16 */
  bl sq              /* call sq(&a); */
  sub r0, fp, #12    /* r0 ← fp - 12 */
  bl sq              /* call sq(&b); */
  sub r0, fp, #8     /* r0 ← fp - 8 */
  bl sq              /* call sq(&c); */
  sub r0, fp, #4     /* r0 ← fp - 4 */
  bl sq              /* call sq(&d) */
  add r0, fp, #8     /* r0 ← fp + 8 */
  bl sq              /* call sq(&e) */

  ldr r0, [fp, #-16] /* r0 ← *(fp - 16). Loads a into r0 */
  ldr r1, [fp, #-12] /* r1 ← *(fp - 12). Loads b into r1 */
  add r0, r0, r1     /* r0 ← r0 + r1 */
  ldr r1, [fp, #-8]  /* r1 ← *(fp - 8). Loads c into r1 */
  add r0, r0, r1     /* r0 ← r0 + r1 */
  ldr r1, [fp, #-4]  /* r1 ← *(fp - 4). Loads d into r1 */
  add r0, r0, r1     /* r0 ← r0 + r1 */
  ldr r1, [fp, #8]   /* r1 ← *(fp + 8). Loads e into r1 */
  add r0, r0, r1     /* r0 ← r0 + r1 */

  mov sp, fp         /* Undo the dynamic link */
  pop {fp, lr}       /* Restore fp and callee-saved registers */
  bx lr              /* Return from the function */

As you can see, we first store all parameters (but e) in the local storage. This means that we need to enlarge the stack enough, as usual, by subtracting sp (line 5). Once we have the storage then we can do the actual store by using the fp register (lines 7 to 10). Note the usage of negative offsets, because local data will always be in lower addresses than the address in fp. As mentioned above, the parameter e does not have to be stored because it is already in the stack, in a positive offset from fp (i.e. at a higher address than the address in fp).

Note that, in this example, the frame pointer is not indispensable as we could have used sp to access all the required data (see the representation of the stack in lines 12 to 21).

In order to call sq we have to pass the addresses of the several integers, so we compute the address by subtracting fp the proper offset and storing it in r0, which will be used for passing the first (and only) parameter of sq (lines 27 to 36). See how, to pass the address of e, we just compute an address with a positive offset (line 35). Finally we add the values by loading them again in r0 and r1 and using r0 to accumulate the additions (lines 38 to 46).

An example program that calls sq_sum5(1, 2, 3, 4, 5) looks like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/* squares.s */
.data

.align 4
message: .asciz "Sum of 1^2 + 2^2 + 3^2 + 4^2 + 5^2 is %d\n"

.text

sq:
  <<defined above>>

sq_sum5:
  <<defined above>>

.globl main

main:
    push {r4, lr}          /* Keep callee-saved registers */

    /* Prepare the call to sq_sum5 */
    mov r0, #1             /* Parameter a ← 1 */
    mov r1, #2             /* Parameter b ← 2 */
    mov r2, #3             /* Parameter c ← 3 */
    mov r3, #4             /* Parameter d ← 4 */

    /* Parameter e goes through the stack,
       so it requires enlarging the stack */
    mov r4, #5             /* r4 ← 5 */
    sub sp, sp, #8         /* Enlarge the stack 8 bytes,
                              we will use only the
                              topmost 4 bytes */
    str r4, [sp]           /* Parameter e ← 5 */
    bl sq_sum5             /* call sq_sum5(1, 2, 3, 4, 5) */
    add sp, sp, #8         /* Shrink back the stack */

    /* Prepare the call to printf */
    mov r1, r0             /* The result of sq_sum5 */
    ldr r0, address_of_message
    bl printf              /* Call printf */

    pop {r4, lr}           /* Restore callee-saved registers */
    bx lr


address_of_message: .word message

$ ./square 
Sum of 1^2 + 2^2 + 3^2 + 4^2 + 5^2 is 55

That's all for today.