Exploring AArch64 assembler – Chapter 9
In chapter 6 we saw conditional branches and we ended commenting that they can be used to implement higher control constructs. In this chapter we will see a few of them.
Ifstatement
It may be at this point rather obvious how to implement an ifstatement construct. Given a structure of the form
IF (E) THEN
S1
ELSE
S2
END IF
our code may look like
/* Evaluate condition E and update flags */
b else_label
S1
b end_if
else_label:
S2
end_if:
It can be done the other way around.
/* Evaluate condition E and update flags */
b then_label
S2
b end_if
then_label:
S1
end_if:
How to evaluate conditions?
Conditions in most programming languages are explicitly given a boolean type (e.g. bool
in C++, LOGICAL
in Fortran) or an integer type with a restricted set of values (e.g. 0 or 1 in C). Several programming languages implicitly define a conversion from regular integers to condition expressions. For instance in C, an expression e
of type integer or pointer when used in a condition is the same as evaluating the condition e != 0
(which will give a 0
or 1
value). C++ defines the same conversion but the resulting type is bool
so it can be true
or false
. Fortran does not have any of these conversions so the operand will always be required.
For expressions of the form e1 ⊕ e2
where ⊕ is a relational operator like <, >, ≤, ≥, etc. We can use the instruction cmp
followed by a b.cond
with the relevant condition.
Evaluation of conditional expressions may be impacted by the language specification. Some programming languages implement conditionally evaluated logical operators (also called shortcircuit evaluation). This makes these operators nonstrict and closer to ifstatements than the mathematical definition of the logical operators. Some other programming languages are strict in the definition of the logical operators, which means they evaluate all the operands and then apply the logical operation. C and C++ use shortcircuit evaluation which means an expression like e1 && e2
will only evaluate e2
if e1
is true. Similarly in an expression like e1  e2
, e2
is evaluated only if e1
is false (having to use “if” in the definition of these operators is what makes them “conditionally evaluated”). This means that in practice a C/C++ code like the following
if (e1 && e2)
S1;
else
S2
is implemented like
if (e1)
{
if (e2)
S1
else
S2
}
else
S2
of course doing exactly the above would be ridiculous, so in practice looks more like this (in C/C++ again, though not very nice).
if (e1)
{
if (e2)
S1
else
goto else_block:
}
else
{
else_block:
S2;
}
We can simplify this further to make it closer to assembly (I’m adding a few redundant C/C++ labels below for clarity)
if_start:
check_e1: if (e1) goto check_e2
goto else_block;
check_e2: if (e2) goto then_block;
goto else_block;
then_block:
S1;
goto if_end;
else_block:
S2;
if_end:
Do we always have to implement C/C++ logical operations like this? Not really, only if we’re unsure that unconditionally evaluating the second operand may be different than conditionally evaluating it. For instance a case like
int a;
...
if ((5 < a) && (a < 25))
{
}
it won't harm to evaluate both operands so we can avoid a few branches and actually evaluate it as if the code had been (note a single &
which means bitwise operation that always evaluates the two operands).
if ((5 < a) & (a < 25))
{
}
but in a case like
if (p != 0 && ((10 / p) > 5))
{
}
We don't really want to evaluate ((10 / p) > 5)
when p == 0
. The same strategy is used for 
in which sometimes we can evaluate it as 
.
The following table shows some examples. Take these as suggestions, cleverer sequences can be used in some circumstances.
Condition to evaluate (C/C++)  Instruction sequence 

e1 < e2 
Direct translation cmp e1, e2
b.lt true_case
If we want to evaluate cmp e1, e2
b.ge false_case
If we can reverse the operands cmp e2, e1
b.gt true_case
If we can reverse the operands cmp e2, e1
b.le false_case

e1 && e2 
Direct translation cmp e1, #0
b.ne check_e2
check_e2:
cmp e2, #0
b.ne false_case
true_case:
...
Without shortcut evaluation (i.e. tst e1, e2 // update flags with the
// result of bitwiseand(e1, e2)
b.eq false_case
true_case:
...
The 
e1  e2 
cmp e1, #0
b.eq check_e2
check_e2:
cmp e2, #0
b.eq false_case
true_case:
...
Without shortcut evaluation (i.e. orr x1, e1, e2 // x1 = bitwiseor(e1, e2)
cmp x1, #0
b.eq false_case:
true_case:
...

!e

For the less common case when cmp e1, #0
b.ne false_case
true_case:
...
But usually
In some other cases like !(e1 && e2) we can use De Morgan rules to evaluate it like 
Conditional assignment
A common idiom, in C/C++ is the following
if (a)
x = e;
In Fortran there is a shorthand syntax for this case
IF (A) X = E;
Or more generally, in C/C++
if (a)
x = e1;
else
x = e2;
Which in C/C++ can be written more compactly as
x = a ? e1 : e2;
In many circumstances e1
and e2
are such that it is correct to rewrite the general case as
x = e2;
if (a)
x = e1;
or the opposite form
x = e1;
if (!a)
x = e2;
If we can write the code in one of the two forms above, and given that this case is relatively common, AArch64 has specific support for it.
The instruction csel rdest, rsource1, rsource2, cond
sets the value of the destination register rdest
to rsource1
if the condition cond
is true, otherwise it is set to rsource2
. csel
stands for conditional selection/
For instance, a typical max
operation looks like this in C.
int m, a, b;
...
m = a > b ? a : b;
with csel we can implement the code above as
cmp x0, x1 // compare x0 and x1
csel x2, x0, x1, gt // x2 ← if x0 > x1
// then x0
// otherwise x1
which is handy because it avoids a branch.
Loop constructs
To implement a loop construct of the form
while (E)
S;
we can do
loop_check: if (!E) goto loop_end;
S;
goto loop_check;
loop_end:
Usually inverting the flow is a bit better
loop_start: goto loop_check;
loop_body:
S;
loop_check: if (E) goto loop_body;
This has the added benefit that if we remove the initial branch to loop_check
, this is the equivalent to the C dowhile
loop or the Pascal repeatuntil
.
While we can use regular branches and comparison instructions to implement loops, AArch64 provides a couple of specialized instructions for some common circumstances: cbnz
and cbz
. They conditionally branch to some label if a given register is nonzero or zero respectively. This is useful for loops like this.
i = ...;
while (i > 0)
{
S;
i;
}
because we can now implement them as
// Assume "i" is in x0
b loop_check
loop_body:
S
sub x0, x0, #1 // x0 ← x0  1
loop_check:
cbnz x0, loop_body // branch to loop_body if x0 != 0
Similarly for cbz
. That said, for this particular example it is possible to use subs instruction that acts like sub
but also updates the condition flags. So we can write
// Assume "i" is in x0
add x0, x0, #1 // We need to compensate the subs below
b loop_check
loop_body:
S
loop_check:
subs x0, x0, #1 // x0 ← x0  1 and update condition flags
b.ne loop_body // branch to loop_body if x0 != 0
So cbnz
and cbz
can be more useful when a register being zero is a condition and its computation cannot update the flags (or doing that would require adding an extra cmp
). We have not seen pointers yet, but a common scenario is traversing a linked list ended by a null pointer.
struct singly_linked_list {
struct singly_linked_list *next;
int some_data;
} *list;
...
it = list;
while (it != NULL)
{
it>some_data++;
it = it>next;
}
Because a null pointer is represented usually with all zeros after it = it>next
we can use cbnzr
without having to do cmp
+ b.ne
.
// Assume x0 is "list" and x1 is "it"
mov x1, x0
b loop_check:
loop_body:
ldr w2, [x1, #8] // w2 ← *(x1 + 8) [32bit load]
// this loads it>some_data into w2
add w2, w2, #1 // w2 ← w2 + 1
str w2, [x1, #8] // *(x1 + 8) ← w2 [32bit store]
// this updates it>some_data
ldr x1, [x1] // x1 ← *x1
// this loads it>next into x1
loop_check:
cbnz x1, loop_body // branch to loop_body if x1 != NULL
loop_end:
Fibonacci (using a loop)
In chapter 8 we made a function that recursively computed the Fibonacci numbers. Today we are going to see how to compute them using a loop. We will reuse the same main function and only change the fibonacci
function itself.
As a reminder of the fibonacci function is actually a sequence of numbers F_{i} defined by the following recurrence:
 F_{0} = 0
 F_{1} = 1
 F_{n} = F_{n1} + F_{n2}, where n > 1
To compute the current element we only need the previous two elements, we don't really need to strictly follow the mathematical definition doing a recursive function call, like we did in chapter 8. Results for fibonacci(0)
and fibonacci(1)
will be immediately solved, no need to loop. Any other fibonacci(n) can be computed by repeatedly computing n  1
previous Fibonacci numbers.
fibonacci:
// fibonacci(n) > result
// n is 32bit and will be passed in w0
// result is 64bit and will be returned in x0
mov w3, w0 // w3 ← w0
cmp w3, #1 // compare w3 and 1 and update the flags
b.le simple_case // jump to simple_case if w3 <= 1
sub w3, w3, #1 // w3 ← w3  1
mov x1, #0 // x1 ← 0
mov x2, #1 // x1 ← 1
b loop_check // branch to loop_check
loop_body:
add x0, x1, x2 // x0 ← x1 + x2
mov x1, x2 // x1 ← x2
mov x2, x0 // x2 ← x0
sub w3, w3, #1 // w3 ← w3  1
loop_check:
cbnz w3, loop_body // branch to loop_body if w3 != 0
b fibonacci_end // branch to fibonacci_end
simple_case:
sxtw x0, w0 // x0 ← ExtendSigned32To64(w0)
fibonacci_end:
ret
As in the recursive version of this function, cases 0 and 1 are handled specially. We first keep w0 into w3, line 65, and then we compare w3 with 1, line 66. If the flags are such that w3 ≤ 1
then we branch to the simple_case
which the only thing it does is extending w0
into x0
, line 82. If the fibonacci number to compute is not 0 or 1, then we have to compute the n  1
earlier ones. So we have to decrease w3
by 1, line 68. In register x1
we will have fibonacci(n2)
and in x2 we will have fibonacci(n1). The two initially known ones are 0 and 1 respectively (this is, for fibonacci(2)
), lines 6970.
Now everything is set for the loop, so we jump to the check, line 71. The loop check uses cbnz
with w3
, line 78. Once w3
becomes zero means that we do not have to compute any other fibonacci number, so cbnz will branch back to loop_body
only when w3
is not 0.
The loop body simply computes the current fibonacci by adding x1
and x2
(recall, x1 is fibonacci(n2)
and x2 is fibonacci(n1)
) and leaving the result in x0
, line 73. Now we need to prepare the next iteration of the loop: x2
(which is fibonacci(n1)
) will be fibonacci(n2)
in the next iteration, so we put it in x1
, line 74. Similarly, x0
(which is fibonacci(n)
) will be fibonacci(n1)
in the next iteration, so we put it x2
, line 75.
At this point one fibonacci number has been computed, so one less to compute. So we decrease w3
by one, line 76. Now the loop_check is executed which will iterate if needed. Once we leave the loop, we still need to skip the simple_case
, line 80.
To test this, just take the code in chapter 8 and replace the recursive function with the iterative function.
$ ./fibo Please type a number: 12↴ Fibonacci number 12 is 144 $ ./fibo Please type a number: 25↴ Fibonacci number 25 is 75025 $ ./fibo Please type a number: 40↴ Fibonacci number 40 is 102334155
Yay! 🙂
This is all for today.
Walkthrough flang – Part 5 A very simple memory pool in C++11
Greetings, Roger. How would you create a directory in arm assembly language using system calls? Can you post a quick example as to how it would work? Also, do you tutor privately? Like over Skype or hangout?
Hi Dan,
for ARM in the Raspberry Pi you may want to check http://thinkingeek.com/2014/05/24/armassemblerraspberrypichapter19/
For AArch64 I haven’t written yet a chapter about that.
Tutoring privately won’t be possible. Sorry.
Kind regards,
Roger
Hi Roger,
I think there is a small typo in the third code example:
/* Evaluate condition E and update flags */
b else_label
should read
/* Evaluate condition E and update flags */
b then_label
Hi Michael,
you’re right. I’ve fixed the post. Thanks for the heads up!
Kind regards,
Roger