# ARM assembler in Raspberry Pi – Chapter 6

## Control structures

In the previous chapter we learnt branch instructions. They are really powerful tools because they allow us to express control structures. *Structured programming* is an important milestone in better computing engineering (a foundational one, but nonetheless an important one). So being able to map usual structured programming constructs in assembler, in our processor, is a Good Thing™.

## If, then, else

Well, this one is a basic one, and in fact we already used this structure in the previous chapter. Consider the following structure, where `E`

is an expression and `S1`

and `S2`

are statements (they may be compound statements like `{ SA; SB; SC; }`

)

A possible way to express this in ARM assembler could be the following

If there is no else part, we can replace `bXX else`

with `bXX end_of_if`

.

## Loops

This is another usual one in structured programming. While there are several types of loops, actually all reduce to the following structure.

Supposedly `S`

makes something so `E`

eventually becomes false and the loop is left. Otherwise we would stay in the loop forever (sometimes this is what you want but not in our examples). A way to implement these loops is as follows.

A common loop involves iterating from a single range of integers, like in

But this is nothing but

So we do not have to learn a new way to implement the loop itself.

## 1 + 2 + 3 + 4 + ... + 22

As a first example lets sum all the numbers from 1 to 22 (I'll tell you later why I chose 22). The result of the sum is `253`

(check it with a calculator). I know it makes little sense to compute something the result of which we know already, but this is just an example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

/* -- loop01.s */
.text
.global main
main:
mov r1, #0 /* r1 ← 0 */
mov r2, #1 /* r2 ← 1 */
loop:
cmp r2, #22 /* compare r2 and 22 */
bgt end /* branch if r2 > 22 to end */
add r1, r1, r2 /* r1 ← r1 + r2 */
add r2, r2, #1 /* r2 ← r2 + 1 */
b loop
end:
mov r0, r1 /* r0 ← r1 */
bx lr

Here we are counting from 1 to 22. We will use the register `r2`

as the counter. As you can see in line 6 we initialize it to 1. The sum will be accumulated in the register `r1`

, at the end of the program we move the contents of `r1`

into `r0`

to return the result of the sum as the error code of the program (we could have used `r0`

in all the code and avoid this final `mov`

but I think it is clearer this way).

In line 8 we compare `r2`

(remember, the counter that will go from 1 to 22) to 22. This will update the `cpsr`

thus in line 9 we can check if the comparison was such that r2 was greater than 22. If this is the case, we end the loop by branching to `end`

. Otherwise we add the current value of `r2`

to the current value of `r1`

(remember, in `r1`

we accumulate the sum from 1 to 22).

Line 11 is an important one. We increase the value of `r2`

, because we are counting from 1 to 22 and we already added the current counter value in `r2`

to the result of the sum in `r1`

. Then at line 12 we branch back at the beginning of the loop. Note that if line 11 was not there we would hang as the comparison in line 8 would always be false and we would never leave the loop in line 9!

Well, now you could change the line 8 and try with let's say, #100. The result should be 5050.

What happened? Well, it happens that in Linux the error code of a program is a number from 0 to 255 (8 bits). If the result is 5050, only the lower 8 bits of the number are used. 5050 in binary is `1001110111010`

, its lower 8 bits are `10111010`

which is exactly 186. How can we check the computed `r1`

is 5050 before ending the program? Let's use GDB.

Let's tell gdb to stop at `0x000083ac`

, right before executing `mov r0, r1`

.

Great, this is what we expected but we could not see due to limits in the error code.

Maybe you have noticed that something odd happens with our labels being identified as functions. We will address this issue in a future chapter, this is mostly harmless though.

## 3n + 1

Let's make another example a bit more complicated. This is the famous *3n + 1* problem also known as the Collatz conjecture. Given a number `n`

we will divide it by 2 if it is even and multiply it by 3 and add one if it is odd.

Before continuing, our ARM processor is able to multiply two numbers but we should learn a new instruction `mul`

which would detour us a bit. Instead we will use the following identity `3 * n = 2*n + n`

. We do not really know how to multiply or divide by two yet, we will study this in a future chapter, so for now just assume it works as shown in the assembler below.

Collatz conjecture states that, for any number `n`

, repeatedly applying this procedure will eventually give us the number 1. Theoretically it could happen that this is not the case. So far, no such number has been found, but it has not been proved otherwise. If we want to repeatedly apply the previous procedure, our program is doing something like this.

If the Collatz conjecture were false, there would exist some `n`

for which the code above would hang, never reaching 1. But as I said, no such number has been found.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

/* -- collatz.s */
.text
.global main
main:
mov r1, #123 /* r1 ← 123 */
mov r2, #0 /* r2 ← 0 */
loop:
cmp r1, #1 /* compare r1 and 1 */
beq end /* branch to end if r1 == 1 */
and r3, r1, #1 /* r3 ← r1 & 1 */
cmp r3, #0 /* compare r3 and 0 */
bne odd /* branch to odd if r3 != 0 */
even:
mov r1, r1, ASR #1 /* r1 ← (r1 >> 1) */
b end_loop
odd:
add r1, r1, r1, LSL #1 /* r1 ← r1 + (r1 << 1) */
add r1, r1, #1 /* r1 ← r1 + 1 */
end_loop:
add r2, r2, #1 /* r2 ← r2 + 1 */
b loop /* branch to loop */
end:
mov r0, r2
bx lr

In `r1`

we will keep the number `n`

. In this case we will use the number 123. 123 reaches 1 in 46 steps: [123, 370, 185, 556, 278, 139, 418, 209, 628, 314, 157, 472, 236, 118, 59, 178, 89, 268, 134, 67, 202, 101, 304, 152, 76, 38, 19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]. We will count the number of steps in register `r2`

. So we initialize `r1`

with 123 and `r2`

with 0 (no step has been performed yet).

At the beginning of the loop, in lines 8 and 9, we check if `r1`

is 1. So we compare it with 1 and if it is equal we leave the loop branching to `end`

.

Now we know that `r1`

is not 1, so we proceed to check if it is even or odd. To do this we use a new instruction `and`

which performs a *bitwise and operation*. An even number will have the least significant bit (LSB) to 0, while an odd number will have the LSB to 1. So a bitwise and using 1 will return 0 or 1 on even or odd numbers, respectively. In line 11 we keep the result of the bitwise and in `r3`

register and then, in line 12, we compare it against 0. If it is not zero then we branch to `odd`

, otherwise we continue on the `even`

case.

Now some magic happens in line 15. This is a combined operation that ARM allows us to do. This is a `mov`

but we do not move the value of `r1`

directly to `r1`

(which would be doing nothing) but first we do an *arithmetic shift right* (ASR) to the value of `r1`

(to the value, no the register itself). Then this shifted value is moved to the register `r1`

. An *arithmetic shift right* shifts all the bits of a register to the right: the rightmost bit is effectively discarded and the leftmost is set to the same value as the leftmost bit prior the shift. Shifting right one bit to a number is the same as dividing that number by 2. So this `mov r1, r1, ASR #1`

is actually doing `r1 ← r1 / 2`

.

Some similar magic happens for the even case in line 18. In this case we are doing an `add`

. The first and second operands must be registers (destination operand and the first source operand). The third is combined with a *logical shift left* (LSL). The value of the operand is shifted left 1 bit: the leftmost bit is discarded and the rightmost bit is set to 0. This is effectively multiplying the value by 2. So we are adding `r1`

(which keeps the value of `n`

) to `2*r1`

. This is `3*r1`

, so `3*n`

. We keep this value in `r1`

again. In line 19 we add 1 to that value, so `r1`

ends having the value `3*n+1`

that we wanted.

Do not worry very much now about these LSL and ASR. Just take them for granted now. In a future chapter we will see them in more detail.

Finally, at the end of the loop, in line 22 we update `r2`

(remember it keeps the counter of our steps) and then we branch back to the beginning of the loop. Before ending the program we move the counter to `r0`

so we return the number of steps we did to reach 1.

Great.

That's all for today.

## Postscript

Kevin Millikin rightly pointed (in a comment below) that usually a loop is not implemented in the way shown above. In fact Kevin says that a better way to do the loop of `loop01.s`

is as follows.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

/* -- loop02.s */
.text
.global main
main:
mov r1, #0 /* r1 ← 0 */
mov r2, #1 /* r2 ← 1 */
b check_loop /* unconditionally jump at the end of the loop */
loop:
add r1, r1, r2 /* r1 ← r1 + r2 */
add r2, r2, #1 /* r2 ← r2 + 1 */
check_loop:
cmp r2, #22 /* compare r2 and 22 */
ble loop /* branch if r2 <= 22 to the beginning of the loop */
end:
mov r0, r1 /* r0 ← r1 */
bx lr

If you count the number of instruction in the two codes, there are 9 instructions in both. But if you look carefully in Kevin's proposal you will see that by unconditionally branching to the end of the loop, and reversing the condition check, we can skip one branch thus reducing the number of instructions of the loop itself from 5 to 4.

There is another advantage in this second version, though: there is only one branch in the loop itself as we resort to *implicit sequencing* to reach again the two instructions performing the check. For reasons beyond the scope of this post, the execution of a branch instruction may negatively affect the performance of our programs. Processors have mechanisms to mitigate the performance loss due to branches (and in fact the processor in the Raspberry Pi does have them). But avoiding a branch instruction entirely avoids the potential performance penalization of executing a branch instruction.

While we do not care very much now about the performance of our assembler. However, I thought it was worth developing a bit more Kevin's comment.