ARM assembler in Raspberry Pi

Several times in previous chapters we have talked about ARM as an architecture that has several features aimed at embedding systems. In embedded systems memory is scarce and expensive, so designs that help reduce the memory footprint are very welcome. Today we will see another of these features: the Thumb instruction set.

The Thumb instruction set

In previous installments we have been working with the ARMv6 instruction set (the one implemented in the Raspberry Pi). In this instruction set, all instructions are 32-bit wide, so every instruction takes 4 bytes. This is a common design since the arrival of RISC processors. That said, in some scenarios such codification is overkill in terms of memory consumption: many platforms are very simple and rarely need all the features provided by the instruction set. If only they could use a subset of the original instruction set that can be encoded in a smaller number of bits!

So, this is what the Thumb instruction set is all about. They are a reencoded subset of the ARM instructions that take only 16 bits per instructions. This means that we will have to waive away some instructions. As a benefit our code density is higher: most of the time we will be able to encode the code of our programs in half the space.

Support of Thumb in Raspbian

While the processor of the Raspberry Pi properly supports Thumb, there is still some software support that unfortunately is not provided by Raspbian. This means that we will be able to write some snippets in Thumb but in general this is not supported (if you try to use Thumb for a full C program you will end with a sorry, unimplemented message by the compiler).

Instructions

Thumb provides about 45 instructions (of about 115 in ARMv6). The narrower codification of 16 bit means that we will be more limited in what we can do in our code. Registers are split into two sets: low registers, r0 to r7, and high registers, r8 to r15. Most instructions can only fully work with low registers and some others have limited behaviour when working with high registers.

Also, Thumb instructions cannot be predicated. Recall that almost every ARM instruction can be made conditional depending on the flags in the cpsr register. This is not the case in Thumb where only the branch instruction is conditional.

Mixing ARM and Thumb is only possible at function level: a function must be wholly ARM or Thumb, it cannot be a mix of the two instruction sets. Recall that our Raspbian system does not support Thumb so at some point we will have to jump from ARM code to Thumb code. This is done using the instruction (available in both instruction sets) blx. This instruction behaves like the bl instruction we use for function calls but changes the state of the processor from ARM to Thumb (or Thumb to ARM).

We also have to tell the assembler that some portion of assembler is actually Thumb while the other is ARM. Since by default the assembler expects ARM, we will have to change to Thumb at some point.

From ARM to Thumb

Let's start with a very simple program returning an error code of 2 set in Thumb.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/* thumb-first.s */
.text

.code 16     /* Here we say we will use Thumb */
.align 2     /* Make sure instructions are aligned at 2-byte boundary */

thumb_function:
    mov r0, #2   /* r0 ← 2 */
    bx lr        /* return */
    
.code 32     /* Here we say we will use ARM */
.align 4     /* Make sure instructions are aligned at 4-byte boundary */

.globl main
main:
    push {r4, lr}
    
    blx thumb_function /* From ARM to Thumb we use blx */

    pop {r4, lr}
    bx lr

Thumb instructions in our thumb_function actually resemble ARM instructions. In fact most of the time there will not be much difference. As stated above, Thumb instructions are more limited in features than their ARM counterparts.

If we run the program, it does what we expect.

$ ./thumb-first; echo $?
2

How can we tell our program actually mixes ARM and Thumb? We can use objdump -d to dump the instructions of our thumb-first.o file.

$ objdump  -d thumb-first.o 

thumb-first.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <thumb_function>:
   0:	2002      	movs	r0, #2
   2:	4770      	bx	lr
   4:	e1a00000 	nop			; (mov r0, r0)
   8:	e1a00000 	nop			; (mov r0, r0)
   c:	e1a00000 	nop			; (mov r0, r0)

00000010 <main>:
  10:	e92d4010 	push	{r4, lr}
  14:	fafffff9 	blx	0 <thumb_function>
  18:	e8bd4010 	pop	{r4, lr}
  1c:	e12fff1e 	bx	lr

Check thumb_function: its two instructions are encoded in just two bytes (instruction bx lr is at offset 2 of mov r0, #2. Compare this to the instructions in main: each one is at offset 4 of its predecessor instruction. Note that some padding was added by the assembler at the end of the thumb_function in form of nops (that should not be executed, anyway).

Calling functions in Thumb

In in Thumb we want to follow the AAPCS convention like we do when in ARM mode, but then some oddities happen. Consider the following snippet where thumb_function_1 calls thumb_function_2.

.code 16     /* Here we say we will use Thumb */
.align 2     /* Make sure instructions are aligned at 2-byte boundary */
thumb_function_2:
    /* Do something here */
    bx lr

thumb_function_1:
    push {r4, lr}
    bl thumb_function_2
    pop {r4, lr}    /* ERROR: cannot use lr in pop  in Thumb mode */
    bx lr

Unfortunately, this will be rejected by the assembler. If you recall from chapter 10, in ARM push and pop are mnemonics for stmdb sp! and ldmia sp!, respectively. But in Thumb mode push and pop are instructions on their own and so they are more limited: push can only use low registers and lr, pop can only use low registers and pc. The behaviour of these two instructions almost the same as the ARM mnemomics. So, you are now probably wondering why these two special cases for lr and pc. This is the trick: in Thumb mode pop {pc} is equivalent to pop the value val from the stack and then do bx val. So the two instruction sequence: pop {r4, lr} followed by bx lr becomes simply pop {r4, pc}.

So, our code will look like this.

/* thumb-call.s */
.text

.code 16     /* Here we say we will use Thumb */
.align 2     /* Make sure instructions are aligned at 2-byte boundary */

thumb_function_2:
    mov r0, #2
    bx lr   /* A leaf Thumb function (i.e. a function that does not call
               any other function so it did not have to keep lr in the stack)
               returns using "bx lr" */

thumb_function_1:
    push {r4, lr}
    bl thumb_function_2 /* From Thumb to Thumb we use bl */
    pop {r4, pc}  /* This is how we return from a non-leaf Thumb function */

.code 32     /* Here we say we will use ARM */
.align 4     /* Make sure instructions are aligned at 4-byte boundary */
.globl main
main:
    push {r4, lr}

    blx thumb_function_1 /* From ARM to Thumb we use blx */

    pop {r4, lr}
    bx lr

From Thumb to ARM

Finally we may want to call an ARM function from Thumb. As long as we stick to AAPCS everything should work correctly. The Thumb instruction to call an ARM function is again blx. Following is an example of a small program that says "Hello world" four times calling printf, a function in the C library that in Raspbian is of course implemented using ARM instructions.

/* thumb-first.s */

.text

.data
message: .asciz "Hello world %d\n"
    
.code 16     /* Here we say we will use Thumb */
.align 2     /* Make sure instructions are aligned at 2-byte boundary */
thumb_function:
    push {r4, lr}         /* keep r4 and lr in the stack */
    mov r4, #0            /* r4 ← 0 */
    b check_loop          /* unconditional branch to check_loop */
    loop:        
       /* prepare the call to printf */
       ldr r0, addr_of_message  /* r0 ← &message */
       mov r1, r4               /* r1 ← r4 */
       blx printf               /* From Thumb to ARM we use blx.
                                   printf is a function
                                   in the C library that is implemented
                                   using ARM instructions */
       add r4, r4, #1           /* r4 ← r4 + 1 */
    check_loop:
       cmp r4, #4               /* compute r4 - 4 and update the cpsr */
       blt loop                 /* if the cpsr means that r4 is lower than 4 
                                   then branch to loop */

    pop {r4, pc}          /* restore registers and return from Thumb function */
.align 4
addr_of_message: .word message
    
.code 32     /* Here we say we will use ARM */
.align 4     /* Make sure instructions are aligned at 4-byte boundary */
.globl main
main:  
    push {r4, lr}      /* keep r4 and lr in the stack */
    blx thumb_function /* from ARM to Thumb we use blx  */       
    pop {r4, lr}       /* restore registers */
    bx lr              /* return */

To know more

In next installments we will go back to ARM, so if you are interested in Thumb, you may want to check this Thumb 16-bit Instruction Set Quick Reference Card provided by ARM. When checking that card, be aware that the processor of the Raspberry Pi only implements ARMv6T, not ARMv6T2.

That's all for today.