ARM assembler in Raspberry Pi – Chapter 6
Control structures
In the previous chapter we learnt branch instructions. They are really powerful tools because they allow us to express control structures. Structured programming is an important milestone in better computing engineering (a foundational one, but nonetheless an important one). So being able to map usual structured programming constructs in assembler, in our processor, is a Good Thing™.
If, then, else
Well, this one is a basic one, and in fact we already used this structure in the previous chapter. Consider the following structure, where E
is an expression and S1
and S2
are statements (they may be compound statements like { SA; SB; SC; }
)
A possible way to express this in ARM assembler could be the following
If there is no else part, we can replace bXX else
with bXX end_of_if
.
Loops
This is another usual one in structured programming. While there are several types of loops, actually all reduce to the following structure.
Supposedly S
makes something so E
eventually becomes false and the loop is left. Otherwise we would stay in the loop forever (sometimes this is what you want but not in our examples). A way to implement these loops is as follows.
A common loop involves iterating from a single range of integers, like in
But this is nothing but
So we do not have to learn a new way to implement the loop itself.
1 + 2 + 3 + 4 + ... + 22
As a first example lets sum all the numbers from 1 to 22 (I'll tell you later why I chose 22). The result of the sum is 253
(check it with a calculator). I know it makes little sense to compute something the result of which we know already, but this is just an example.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/* -- loop01.s */
.text
.global main
main:
mov r1, #0 /* r1 ← 0 */
mov r2, #1 /* r2 ← 1 */
loop:
cmp r2, #22 /* compare r2 and 22 */
bgt end /* branch if r2 > 22 to end */
add r1, r1, r2 /* r1 ← r1 + r2 */
add r2, r2, #1 /* r2 ← r2 + 1 */
b loop
end:
mov r0, r1 /* r0 ← r1 */
bx lr
Here we are counting from 1 to 22. We will use the register r2
as the counter. As you can see in line 6 we initialize it to 1. The sum will be accumulated in the register r1
, at the end of the program we move the contents of r1
into r0
to return the result of the sum as the error code of the program (we could have used r0
in all the code and avoid this final mov
but I think it is clearer this way).
In line 8 we compare r2
(remember, the counter that will go from 1 to 22) to 22. This will update the cpsr
thus in line 9 we can check if the comparison was such that r2 was greater than 22. If this is the case, we end the loop by branching to end
. Otherwise we add the current value of r2
to the current value of r1
(remember, in r1
we accumulate the sum from 1 to 22).
Line 11 is an important one. We increase the value of r2
, because we are counting from 1 to 22 and we already added the current counter value in r2
to the result of the sum in r1
. Then at line 12 we branch back at the beginning of the loop. Note that if line 11 was not there we would hang as the comparison in line 8 would always be false and we would never leave the loop in line 9!
Well, now you could change the line 8 and try with let's say, #100. The result should be 5050.
What happened? Well, it happens that in Linux the error code of a program is a number from 0 to 255 (8 bits). If the result is 5050, only the lower 8 bits of the number are used. 5050 in binary is 1001110111010
, its lower 8 bits are 10111010
which is exactly 186. How can we check the computed r1
is 5050 before ending the program? Let's use GDB.
Let's tell gdb to stop at 0x000083ac
, right before executing mov r0, r1
.
Great, this is what we expected but we could not see due to limits in the error code.
Maybe you have noticed that something odd happens with our labels being identified as functions. We will address this issue in a future chapter, this is mostly harmless though.
3n + 1
Let's make another example a bit more complicated. This is the famous 3n + 1 problem also known as the Collatz conjecture. Given a number n
we will divide it by 2 if it is even and multiply it by 3 and add one if it is odd.
Before continuing, our ARM processor is able to multiply two numbers but we should learn a new instruction mul
which would detour us a bit. Instead we will use the following identity 3 * n = 2*n + n
. We do not really know how to multiply or divide by two yet, we will study this in a future chapter, so for now just assume it works as shown in the assembler below.
Collatz conjecture states that, for any number n
, repeatedly applying this procedure will eventually give us the number 1. Theoretically it could happen that this is not the case. So far, no such number has been found, but it has not been proved otherwise. If we want to repeatedly apply the previous procedure, our program is doing something like this.
If the Collatz conjecture were false, there would exist some n
for which the code above would hang, never reaching 1. But as I said, no such number has been found.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/* -- collatz.s */
.text
.global main
main:
mov r1, #123 /* r1 ← 123 */
mov r2, #0 /* r2 ← 0 */
loop:
cmp r1, #1 /* compare r1 and 1 */
beq end /* branch to end if r1 == 1 */
and r3, r1, #1 /* r3 ← r1 & 1 */
cmp r3, #0 /* compare r3 and 0 */
bne odd /* branch to odd if r3 != 0 */
even:
mov r1, r1, ASR #1 /* r1 ← (r1 >> 1) */
b end_loop
odd:
add r1, r1, r1, LSL #1 /* r1 ← r1 + (r1 << 1) */
add r1, r1, #1 /* r1 ← r1 + 1 */
end_loop:
add r2, r2, #1 /* r2 ← r2 + 1 */
b loop /* branch to loop */
end:
mov r0, r2
bx lr
In r1
we will keep the number n
. In this case we will use the number 123. 123 reaches 1 in 46 steps: [123, 370, 185, 556, 278, 139, 418, 209, 628, 314, 157, 472, 236, 118, 59, 178, 89, 268, 134, 67, 202, 101, 304, 152, 76, 38, 19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1]. We will count the number of steps in register r2
. So we initialize r1
with 123 and r2
with 0 (no step has been performed yet).
At the beginning of the loop, in lines 8 and 9, we check if r1
is 1. So we compare it with 1 and if it is equal we leave the loop branching to end
.
Now we know that r1
is not 1, so we proceed to check if it is even or odd. To do this we use a new instruction and
which performs a bitwise and operation. An even number will have the least significant bit (LSB) to 0, while an odd number will have the LSB to 1. So a bitwise and using 1 will return 0 or 1 on even or odd numbers, respectively. In line 11 we keep the result of the bitwise and in r3
register and then, in line 12, we compare it against 0. If it is not zero then we branch to odd
, otherwise we continue on the even
case.
Now some magic happens in line 15. This is a combined operation that ARM allows us to do. This is a mov
but we do not move the value of r1
directly to r1
(which would be doing nothing) but first we do an arithmetic shift right (ASR) to the value of r1
(to the value, no the register itself). Then this shifted value is moved to the register r1
. An arithmetic shift right shifts all the bits of a register to the right: the rightmost bit is effectively discarded and the leftmost is set to the same value as the leftmost bit prior the shift. Shifting right one bit to a number is the same as dividing that number by 2. So this mov r1, r1, ASR #1
is actually doing r1 ← r1 / 2
.
Some similar magic happens for the even case in line 18. In this case we are doing an add
. The first and second operands must be registers (destination operand and the first source operand). The third is combined with a logical shift left (LSL). The value of the operand is shifted left 1 bit: the leftmost bit is discarded and the rightmost bit is set to 0. This is effectively multiplying the value by 2. So we are adding r1
(which keeps the value of n
) to 2*r1
. This is 3*r1
, so 3*n
. We keep this value in r1
again. In line 19 we add 1 to that value, so r1
ends having the value 3*n+1
that we wanted.
Do not worry very much now about these LSL and ASR. Just take them for granted now. In a future chapter we will see them in more detail.
Finally, at the end of the loop, in line 22 we update r2
(remember it keeps the counter of our steps) and then we branch back to the beginning of the loop. Before ending the program we move the counter to r0
so we return the number of steps we did to reach 1.
Great.
That's all for today.
Postscript
Kevin Millikin rightly pointed (in a comment below) that usually a loop is not implemented in the way shown above. In fact Kevin says that a better way to do the loop of loop01.s
is as follows.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/* -- loop02.s */
.text
.global main
main:
mov r1, #0 /* r1 ← 0 */
mov r2, #1 /* r2 ← 1 */
b check_loop /* unconditionally jump at the end of the loop */
loop:
add r1, r1, r2 /* r1 ← r1 + r2 */
add r2, r2, #1 /* r2 ← r2 + 1 */
check_loop:
cmp r2, #22 /* compare r2 and 22 */
ble loop /* branch if r2 <= 22 to the beginning of the loop */
end:
mov r0, r1 /* r0 ← r1 */
bx lr
If you count the number of instruction in the two codes, there are 9 instructions in both. But if you look carefully in Kevin's proposal you will see that by unconditionally branching to the end of the loop, and reversing the condition check, we can skip one branch thus reducing the number of instructions of the loop itself from 5 to 4.
There is another advantage in this second version, though: there is only one branch in the loop itself as we resort to implicit sequencing to reach again the two instructions performing the check. For reasons beyond the scope of this post, the execution of a branch instruction may negatively affect the performance of our programs. Processors have mechanisms to mitigate the performance loss due to branches (and in fact the processor in the Raspberry Pi does have them). But avoiding a branch instruction entirely avoids the potential performance penalization of executing a branch instruction.
While we do not care very much now about the performance of our assembler. However, I thought it was worth developing a bit more Kevin's comment.