ARM assembler in Raspberry Pi

Branching

Until now our small assembler programs execute one instruction after the other. If our ARM processor were only able to run this way it would be of limited use. It could not react to existing conditions which may require different sequences of instructions. This is the purpose of the branch instructions.

A special register

In chapter 2 we learnt that our Raspberry Pi ARM processor has 16 integer general purpose registers and we also said that some of them play special roles in our program. I deliberately ignored which registers were special as it was not relevant at that time.

But now it is relevant, at least for register r15. This register is very special, so special it has also another name: pc. It is unlikely that you see it used as r15 since it is confusing (although correct from the point of view of the ARM architecture). From now we will only use pc to name it.

What does pc stand for? pc means program counter. This name, the origins of which are in the dawn of computing, means little to nothing nowadays. In general the pc register (also called ip, instruction pointer, in other architectures like 386 or x86_64) contains the address of the next instruction going to be executed.

When the ARM processor executes an instruction, two things may happen at the end of its execution. If the instruction does not modify pc (and most instructions do not), pc is just incremented by 4 (like if we did add pc, pc, #4). Why 4? Because in ARM, instructions are 32 bit wide, so there are 4 bytes between every instruction. If the instruction modifies pc then the new value for pc is used.

Once the processor has fully executed an instruction then it uses the value in the pc as the address for the next instruction to execute. This way, an instruction that does not modify the pc will be followed by the next contiguous instruction in memory (since it has been automatically increased by 4). This is called implicit sequencing of instructions: after one has run, usually the next one in memory runs. But if an instruction does modify the pc, for instance to a value other than pc + 4, then we can be running another instruction of the program. This process of changing the value of pc is called branching. In ARM this done using branch instructions.

Unconditional branches

You can tell the processor to branch unconditionally by using the instruction b (for branch) and a label. Consider the following program.

1
2
3
4
5
6
7
8
9
/* -- branch01.s */
.text
.global main
main:
    mov r0, #2 /* r0 ← 2 */
    b end      /* branch to 'end' */
    mov r0, #3 /* r0 ← 3 */
end:
    bx lr

If you execute this program you will see that it returns an error code of 2.

$ ./branch01 ; echo $?
2

What happened is that instruction b end branched (modifying the pc) to the instruction at the label end, which is bx lr, the instruction we run at the end of our program. This way the instruction mov r0, #3 has not actually been run at all (the processor never reached that instruction).

At this point the unconditional branch instruction b may look a bit useless. It is not the case. In fact this instruction is essential in some contexts, in particular when linked with conditional branching. But before we can talk about conditional branching we need to talk about conditions.

Conditional branches

If our processor were only able to branch just because, it would not be very useful. It is much more useful to branch when some condition is met. So a processor should be able to evaluate some sort of conditions.

Before continuing, we need to unveil another register called cpsr (for Current Program Status Register). This register is a bit special and directly modifying it is out of the scope of this chapter. That said, it keeps some values that can be read and updated when executing an instruction. The values of that register include four condition code flags called N (negative), Z (zero), C (carry) and V (overflow). These four condition code flags are usually read by branch instructions. Arithmetic instructions and special testing and comparison instruction can update these condition codes too if requested.

The semantics of these four condition codes in instructions updating the cpsr are roughly the following

N will be enabled if the result of the instruction yields a negative number. Disabled otherwise.
Z will be enabled if the result of the instruction yields a zero value. Disabled if nonzero.
C will be enabled if the result of the instruction yields a value that requires a 33rd bit to be fully represented. For instance an addition that overflows the 32 bit range of integers. There is a special case for C and subtractions where a non-borrowing subtraction enables it, disabled otherwise: subtracting a larger number to a smaller one enables C, but it will be disabled if the subtraction is done the other way round.
V will be enabled if the result of the instruction yields a value that cannot be represented in 32 bits two's complement.

So we have all the needed pieces to perform branches conditionally. But first, let's start comparing two values. We use the instruction cmp for this purpose.

cmp r1, r2 /* updates cpsr doing "r1 - r2", but r1 and r2 are not modified */

This instruction subtracts to the value in the first register the value in the second register. Examples of what could happen in the snippet above?

If r2 had a value (strictly) greater than r1 then N would be enabled because r1-r2 would yield a negative result.
If r1 and r2 had the same value, then Z would be enabled because r1-r2 would be zero.
If r1 was 1 and r2 was 0 then r1-r2 would not borrow, so in this case C would be enabled. If the values were swapped (r1 was 0 and r2 was 1) then C would be disabled because the subtraction does borrow.
If r1 was 2147483647 (the largest positive integer in 32 bit two's complement) and r2 was -1 then r1-r2 would be 2147483648 but such number cannot be represented in 32 bit two's complement, so V would be enabled to signal this.

How can we use these flags to represent useful conditions for our programs?

EQ (equal) When Z is enabled (Z is 1)
NE (not equal). When Z is disabled. (Z is 0)
GE (greater or equal than, in two's complement). When both V and N are enabled or disabled (V is N)
LT (lower than, in two's complement). This is the opposite of GE, so when V and N are not both enabled or disabled (V is not N)
GT (greather than, in two's complement). When Z is disabled and N and V are both enabled or disabled (Z is 0, N is V)
LE (lower or equal than, in two's complement). When Z is enabled or if not that, N and V are both enabled or disabled (Z is 1. If Z is not 1 then N is V)
MI (minus/negative) When N is enabled (N is 1)
PL (plus/positive or zero) When N is disabled (N is 0)
VS (overflow set) When V is enabled (V is 1)
VC (overflow clear) When V is disabled (V is 0)
HI (higher) When C is enabled and Z is disabled (C is 1 and Z is 0)
LS (lower or same) When C is disabled or Z is enabled (C is 0 or Z is 1)
CS/HS (carry set/higher or same) When C is enabled (C is 1)
CC/LO (carry clear/lower) When C is disabled (C is 0)

These conditions can be combined to our b instruction to generate new instructions. This way, beq will branch only if Z is 1. If the condition of a conditional branch is not met, then the branch is ignored and the next instruction will be run. It is the programmer task to make sure that the condition codes are properly set prior a conditional branch.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/* -- compare01.s */
.text
.global main
main:
    mov r1, #2       /* r1 ← 2 */
    mov r2, #2       /* r2 ← 2 */
    cmp r1, r2       /* update cpsr condition codes with the value of r1-r2 */
    beq case_equal   /* branch to case_equal only if Z = 1 */
case_different :
    mov r0, #2       /* r0 ← 2 */
    b end            /* branch to end */
case_equal:
    mov r0, #1       /* r0 ← 1 */
end:
    bx lr

If you run this program it will return an error code of 1 because both r1 and r2 have the same value. Now change mov r1, #2 in line 5 to be mov r1, #3 and the returned error code should be 2. Note that case_different we do not want to run the case_equal instructions, thus we have to branch to end (otherwise the error code would always be 1).

That's all for today.