Think In Geek

In geek we trust

ARM assembler in Raspberry Pi – Chapter 9

In previous chapters we learnt the foundations of ARM assembler: registers, some arithmetic operations, loads and stores and branches. Now it is time to put everything together and add another level of abstraction to our assembler skills: functions.

Why functions?

Functions are a way to reuse code. If we have some code that will be needed more than once, being able to reuse it is a Good Thing™. This way, we only have to ensure that the code being reused is correct. If we repeated the code whe should verify it is correct at every point. This clearly does not scale. Functions can also get parameters. This way not only we reuse code but we can use it in several ways, by passing different parameters. All this magic, though, comes at some price. A function must be a a well-behaved citizen.

Do’s and don’ts of a function

Assembler gives us a lot of power. But with a lot of power also comes a lot of responsibility. We can break lots of things in assembler, because we are at a very low level. An error and nasty things may happen. In order to make all functions behave in the same way, there are conventions in every environment that dictate how a function must behave. Since we are in a Raspberry Pi running Linux we will use the AAPCS (chances are that other ARM operating systems like RISCOS or Windows RT follow it). You may find this document in the ARM documentation website but I will try to summarize it in this chapter.

New special named registers

When discussing branches we learnt that r15 was also called pc but we never called it r15 anymore. Well, let’s rename from now r14 as lr and r13 as sp. lr stands for link register and it is the address of the instruction following the instruction that called us (we will see later what is this). sp stands for stack pointer. The stack is an area of memory owned only by the current function, the sp register stores the top address of that stack. For now, let’s put the stack aside. We will get it back in the next chapter.

Passing parameters

Functions can receive parameters. The first 4 parameters must be stored, sequentially, in the registers r0, r1, r2 and r3. You may be wondering how to pass more than 4 parameters. We can, of course, but we need to use the stack, but we will discuss it in the next chapter. Until then, we will only pass up to 4 parameters.

Well behaved functions

A function must adhere, at least, to the following rules if we want it to be AAPCS compliant.

  • A function should not make any assumption on the contents of the cpsr. So, at the entry of a function condition codes N, Z, C and V are unknown.
  • A function can freely modify registers r0, r1, r2 and r3.
  • A function cannot assume anything on the contents of r0, r1, r2 and r3 unless they are playing the role of a parameter.
  • A function can freely modify lr but the value upon entering the function will be needed when leaving the function (so such value must be kept somewhere).
  • A function can modify all the remaining registers as long as their values are restored upon leaving the function. This includes sp and registers r4 to r11.
    This means that, after calling a function, we have to assume that (only) registers r0, r1, r2, r3 and lr have been overwritten.

Calling a function

There are two ways to call a function. If the function is statically known (meaning we know exactly which function must be called) we will use bl label. That label must be a label defined in the .text section. This is called a direct (or immediate) call. We may do indirect calls by first storing the address of the function into a register and then using blx Rsource1.

In both cases the behaviour is as follows: the address of the function (immediately encoded in the bl or using the value of the register in blx) is stored in pc. The address of the instruction following the bl or blx instruction is kept in lr.

Leaving a function

A well behaved function, as stated above, will have to keep the initial value of lr somewhere. When leaving the function, we will retrieve that value and put it in some register (it can be lr again but this is not mandatory). Then we will bx Rsource1 (we could use blx as well but the latter would update lr which is useless here).

Returning data from functions

Functions must use r0 for data that fits in 32 bit (or less). This is, C types char, short, int, long (and float though we have not seen floating point yet) will be returned in r0. For basic types of 64 bit, like C types long long and double, they will be returned in r1 and r0. Any other data is returned through the stack unless it is 32 bit or less, where it will be returned in r0.

In the examples in previous chapters we returned the error code of the program in r0. This now makes sense. C’s main returns an int, which is used as the value of the error code of our program.

Hello world

Usually this is the first program you write in any high level programming language. In our case we had to learn lots of things first. Anyway, here it is. A “Hello world” in ARM assembler.

(Note to experts: since we will not discuss the stack until the next chapter, this code may look very dumb to you)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/* -- hello01.s */
.data
 
greeting:
 .asciz "Hello world"
 
.balign 4
return: .word 0
 
.text
 
.global main
main:
    ldr r1, address_of_return     /*   r1 ← &address_of_return */
    str lr, [r1]                  /*   *r1 ← lr */
 
    ldr r0, address_of_greeting   /* r0 ← &address_of_greeting */
                                  /* First parameter of puts */
 
    bl puts                       /* Call to puts */
                                  /* lr ← address of next instruction */
 
    ldr r1, address_of_return     /* r1 ← &address_of_return */
    ldr lr, [r1]                  /* lr ← *r1 */
    bx lr                         /* return from main */
address_of_greeting: .word greeting
address_of_return: .word return
 
/* External */
.global puts

We are going to call puts function. This function is defined in the C library and has the following prototype int puts(const char*). It receives, as a first parameter, the address of a C-string (this is, a sequence of bytes where no byte but the last is zero). When executed it outputs that string to stdout (so it should appear by default to our terminal). Finally it returns the number of bytes written.

We start by defining in the .data the label greeting in lines 4 and 5. This label will contain the address of our greeting message. GNU as provides a convenient .asciz directive for that purpose. This directive emits as bytes as needed to represent the string plus the final zero byte. We could have used another directive .ascii as long as we explicitly added the final zero byte.

After the bytes of the greeting message, we make sure the next label will be 4 bytes aligned and we define a return label in line 8. In that label we will keep the value of lr that we have in main. As stated above, this is a requirement for a well behaved function: be able to get the original value of lr upon entering. So we make some room for it.

The first two instructions, lines 14 an 15, of our main function keep the value of lr in that return variable defined above. Then in line 17 we prepare the arguments for the call to puts. We load the address of the greeting message into r0 register. This register will hold the first (the only one actually) parameter of puts. Then in line 20 we call the function. Recall that bl will set in lr the address of the instruction following it (this is the instruction in line 23). This is the reason why we copied the value of lr in a variable in the beginning of the main function, because it was going to be overwritten by bl.

Ok, puts runs and the message is printed on the stdout. Time to get the initial value of lr so we can return successfully from main. Then we return.

Is our main function well behaved? Yes, it keeps and gets back lr to leave. It only modifies r0 and r1. We can assume that puts is well behaved as well, so everything should work fine. Plus the bonus of seeing how many bytes have been written to the output.

$ ./hello01 
Hello world
$ echo $?
12

Note that “Hello world” is just 11 bytes (the final zero is not counted as it just plays the role of a finishing byte) but the program returns 12. This is because puts always adds a newline byte, which accounts for that extra byte.

Real interaction!

Now we have the power of calling functions we can glue them together. Let’s call printf and scanf to read a number and then print it back to the standard output.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
/* -- printf01.s */
.data
 
/* First message */
.balign 4
message1: .asciz "Hey, type a number: "
 
/* Second message */
.balign 4
message2: .asciz "I read the number %d\n"
 
/* Format pattern for scanf */
.balign 4
scan_pattern : .asciz "%d"
 
/* Where scanf will store the number read */
.balign 4
number_read: .word 0
 
.balign 4
return: .word 0
 
.text
 
.global main
main:
    ldr r1, address_of_return        /* r1 ← &address_of_return */
    str lr, [r1]                     /* *r1 ← lr */
 
    ldr r0, address_of_message1      /* r0 ← &message1 */
    bl printf                        /* call to printf */
 
    ldr r0, address_of_scan_pattern  /* r0 ← &scan_pattern */
    ldr r1, address_of_number_read   /* r1 ← &number_read */
    bl scanf                         /* call to scanf */
 
    ldr r0, address_of_message2      /* r0 ← &message2 */
    ldr r1, address_of_number_read   /* r1 ← &number_read */
    ldr r1, [r1]                     /* r1 ← *r1 */
    bl printf                        /* call to printf */
 
    ldr r0, address_of_number_read   /* r0 ← &number_read */
    ldr r0, [r0]                     /* r0 ← *r0 */
 
    ldr lr, address_of_return        /* lr ← &address_of_return */
    ldr lr, [lr]                     /* lr ← *lr */
    bx lr                            /* return from main using lr */
address_of_message1 : .word message1
address_of_message2 : .word message2
address_of_scan_pattern : .word scan_pattern
address_of_number_read : .word number_read
address_of_return : .word return
 
/* External */
.global printf
.global scanf

In this example we will ask the user to type a number and then we will print it back. We also return the number in the error code, so we can check twice if everything goes as expected. For the error code check, make sure your number is lower than 255 (otherwise the error code will show only its lower 8 bits).

$ ./printf01 
Hey, type a number: 123↴
I read the number 123
$ ./printf01 ; echo $?
Hey, type a number: 124↴
I read the number 124
124

Our first function

Let’s define our first function. Lets extend the previous example but multiply the number by 5.

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
.balign 4
return2: .word 0
 
.text
 
/*
mult_by_5 function
*/
mult_by_5: 
    ldr r1, address_of_return2       /* r1 ← &address_of_return */
    str lr, [r1]                     /* *r1 ← lr */
 
    add r0, r0, r0, LSL #2           /* r0 ← r0 + 4*r0 */
 
    ldr lr, address_of_return2       /* lr ← &address_of_return */
    ldr lr, [lr]                     /* lr ← *lr */
    bx lr                            /* return from main using lr */
address_of_return2 : .word return2

This function will need another “return” variable like the one main uses. But this is for the sake of the example. Actually this function does not call another function. When this happens it does not need to keep lr as no bl or blx instruction is going to modify it. If the function wanted to use lr as the the r14 general purpose register, the process of keeping the value would still be mandatory.

As you can see, once the function has computed the value, it is enough keeping it in r0. In this case it was pretty easy and a single instruction was enough.

The whole example follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
/* -- printf02.s */
.data
 
/* First message */
.balign 4
message1: .asciz "Hey, type a number: "
 
/* Second message */
.balign 4
message2: .asciz "%d times 5 is %d\n"
 
/* Format pattern for scanf */
.balign 4
scan_pattern : .asciz "%d"
 
/* Where scanf will store the number read */
.balign 4
number_read: .word 0
 
.balign 4
return: .word 0
 
.balign 4
return2: .word 0
 
.text
 
/*
mult_by_5 function
*/
mult_by_5: 
    ldr r1, address_of_return2       /* r1 ← &address_of_return */
    str lr, [r1]                     /* *r1 ← lr */
 
    add r0, r0, r0, LSL #2           /* r0 ← r0 + 4*r0 */
 
    ldr lr, address_of_return2       /* lr ← &address_of_return */
    ldr lr, [lr]                     /* lr ← *lr */
    bx lr                            /* return from main using lr */
address_of_return2 : .word return2
 
.global main
main:
    ldr r1, address_of_return        /* r1 ← &address_of_return */
    str lr, [r1]                     /* *r1 ← lr */
 
    ldr r0, address_of_message1      /* r0 ← &message1 */
    bl printf                        /* call to printf */
 
    ldr r0, address_of_scan_pattern  /* r0 ← &scan_pattern */
    ldr r1, address_of_number_read   /* r1 ← &number_read */
    bl scanf                         /* call to scanf */
 
    ldr r0, address_of_number_read   /* r0 ← &number_read */
    ldr r0, [r0]                     /* r0 ← *r0 */
    bl mult_by_5
 
    mov r2, r0                       /* r2 ← r0 */
    ldr r1, address_of_number_read   /* r1 ← &number_read */
    ldr r1, [r1]                     /* r1 ← *r1 */
    ldr r0, address_of_message2      /* r0 ← &message2 */
    bl printf                        /* call to printf */
 
    ldr lr, address_of_return        /* lr ← &address_of_return */
    ldr lr, [lr]                     /* lr ← *lr */
    bx lr                            /* return from main using lr */
address_of_message1 : .word message1
address_of_message2 : .word message2
address_of_scan_pattern : .word scan_pattern
address_of_number_read : .word number_read
address_of_return : .word return
 
/* External */
.global printf
.global scanf

I want you to notice lines 58 to 62. There we prepare the call to printf which receives three parameters: the format and the two integers referenced in the format. We want the first integer be the number entered by the user. The second one will be that same number multiplied by 5. After the call to mult_by_5, r0 contains the number entered by the user multiplied by 5. We want it to be the third parameter so we move it to r2. Then we load the value of the number entered by the user into r1. Finally we load in r0 the address to the format message of printf. Note that here the order of preparing the arguments of a call is nonrelevant as long as the values are correct at the point of the call. We use the fact that we will have to overwrite r0, so for convenience we first copy r0 to r2.

$ ./printf02
Hey, type a number: 1234↴
1234 times 5 is 6170

That’s all for today.

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

, , , , , ,

18 thoughts on “ARM assembler in Raspberry Pi – Chapter 9

  • Pablo says:

    Great post! Is there any way to know get the length of an array that was previously created, or to add elements and extend the array?

    Thanks for all the info.

    • rferrer says:

      Well, it depends on what you understand by an “array” in this context and what you mean by “previously created”.

      • Pablo says:

        Let´s suppose I have [1,2,3]. On any high level language, there´s a command called “length” or anything similar that returns the number of elements on the array, which will be 3 on the structure I described previously.

        Is it possible to make the same thing on assembly?

        Thanks for the response 🙂

        • rferrer says:

          Yes, it must be possible, otherwise it would not be doable at all 😉

          If the size of your array is statically defined (this is, it is known when you assemble) the code, you can use GNU assembler features which may be useful. For instance, the following is a simple case to compute the size of elements of an array of 4-byte integers.

          .data

          array: .word 0x1, 0x2, 0x3, 0x4
          end_of_array :

          .globl main
          .text

          main:
          /* This is r1 ← 4 */
          mov r1, #(end_of_array - array) / 4
          ...

          This works because we substract end_of_array (the address past the last element of the array) to the array (the first element of the array). This gives us a value in bytes, so we divide it by 4 (each integer is 4 bytes). Note that this happens at assemble-time (or compilation time). So there is no real code emitted here: the assembler just computes a constant value and uses in-place of the whole expression. If it is not able to compute a constant value, this is an error.

          I suggest you to read the GNU as manual, in special the section 5 about expressions.

          If your array size is dynamic, then everyting is more complicated. Your array will be, in a straightforward approach, a pair of numbers: the address of the array itself and the number of elements. The “length” operation is just reading the latter. How the first would be used is beyond the scope of this tutorial as it may involve either upper bounded memory (your array may be up to N items) or dynamic memory (malloc, free, etc).

          Kind regards,

  • Another errata, “cpsr” instead “cspr”

  • Stellan says:

    Hi is there a way to create a function that is not included in the assembled code if it’s not called somewhere?

    • Roger Ferrer Ibáñez says:

      Hi Stellan,

      yes, there is a technique but requires two things: a) putting a function inside its own “text.nnn” section, and b) telling the linker to remove unused sections. A very small example follows.

      /* test.s */
      .text
      
      .section .text.foo
      foo:
          bx lr
      
      .section .text.main
      .globl main
      main:
          mov r0, #0
          bx lr
      

      Compile like shown below.

      $ gcc -c test.s
      $ gcc -o test test.o -Wl,--gc-sections,--print-gc-sections
      /usr/bin/ld: Removing unused section '.rodata.cst4' in file '/usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/crt1.o'
      /usr/bin/ld: Removing unused section '.data' in file '/usr/lib/gcc/arm-linux-gnueabihf/4.6/../../../arm-linux-gnueabihf/crt1.o'
      /usr/bin/ld: Removing unused section '.data' in file '/usr/lib/gcc/arm-linux-gnueabihf/4.6/crtbegin.o'
      /usr/bin/ld: Removing unused section '.text.foo' in file 'test.o'
      

      -Wl is used to tell gcc to pass-through a comma-separated list of flags (without further processing) to the linker. The linker flag --gc-sections tells the linker to garbage collect unused sections. The linker flag --print-gc-sections just reports the list of sections removed. Note that the C library has some sections that the linker considers unused, but note the last line .text.foo which corresponds to our unused foo function.

      Kind regards

  • Mayank says:

    If I use Loop like below between “bl printf”, I am getting some invalid output; I know it’s something because of bad handling of lr but I could not resolve it. Please do let me know the proper handling of lr while calling the functions. Also do lr gets updated by using of “beq ..” type instruction?

    .data
    .balign 4
    print_Statement:
    .asciz “Variable Print: %d\n”

    .balign 4
    myvar:
    .word 2
    .balign 4
    myarr:
    .skip 4
    .balign 4
    addr_return:
    .word 0

    .text
    .balign 4
    .global main
    main:
    ldr r1, add
    ldr r2,arr
    ldr r1,[r1]
    mov r3,#10
    ldr r0,add_ret
    str lr,[r0]
    ldr r0,print_patt
    mov r1,r3
    bl printf
    loop:
    cmp r3,#8
    beq endLoop
    ldr r0,print_patt
    mov r1,r3
    bl printf
    sub r3,r3,#1
    b loop
    endLoop:
    ldr r0,add_ret
    ldr lr,[r0]
    bx lr
    print_patt: .word print_Statement
    add: .word myvar
    print_patt: .word print_Statement
    add: .word myvar
    arr: .word myarr
    add_ret: .word addr_return

    .global printf

    • Roger Ferrer Ibáñez says:

      Hi,

      I haven’t checked in much detail but I think your problem is that printf is modifying r3. Recall that registers r0 to r3 can be freely modified by the callee, so if you’re keeping something important there, either back it up elsewhere or use another register. Recall that all other registers from r4 to r13 must be preserved by the callee so you know that their contents after the printf call are the same they had before the call.

      Concerning whether b (and all its conditional versions beq, bne, …) modify lr. No, they don’t. Only bl and blx modify lr.

      Kind regards,

      • Mayank says:

        but printf is an external function and is just used for printing the value stored in r3 coming from main function. So I am not modifying anything in r3 as printf is an external function. r3 is modifying only in main.

        • Roger Ferrer Ibáñez says:

          I see your point but the AAPCS clearly states that a function may freely modify registers r0 to r3, so you must assume that after the call to printf, registers r0 to r3 have got their values clobbered.

          • Mayank says:

            But my problem is that I am getting an infinite loop here and printing: “Variable print: -1”.
            So how do you explain this infinite loop here? If I am just removing the loop statement then I am getting the correct output of r3 so printf has nothing to do with this.

Leave a Reply

Your email address will not be published. Required fields are marked *