Exploring AArch64 assembler – Chapter 9
In chapter 6 we saw conditional branches and we ended commenting that they can be used to implement higher control constructs. In this chapter we will see a few of them.
If-statement
It may be at this point rather obvious how to implement an if-statement construct. Given a structure of the form
our code may look like
It can be done the other way around.
How to evaluate conditions?
Conditions in most programming languages are explicitly given a boolean type (e.g. bool
in C++, LOGICAL
in Fortran) or an integer type with a restricted set of values (e.g. 0 or 1 in C). Several programming languages implicitly define a conversion from regular integers to condition expressions. For instance in C, an expression e
of type integer or pointer when used in a condition is the same as evaluating the condition e != 0
(which will give a 0
or 1
value). C++ defines the same conversion but the resulting type is bool
so it can be true
or false
. Fortran does not have any of these conversions so the operand will always be required.
For expressions of the form e1 ⊕ e2
where ⊕ is a relational operator like <, >, ≤, ≥, etc. We can use the instruction cmp
followed by a b.cond
with the relevant condition.
Evaluation of conditional expressions may be impacted by the language specification. Some programming languages implement conditionally evaluated logical operators (also called short-circuit evaluation). This makes these operators non-strict and closer to if-statements than the mathematical definition of the logical operators. Some other programming languages are strict in the definition of the logical operators, which means they evaluate all the operands and then apply the logical operation. C and C++ use short-circuit evaluation which means an expression like e1 && e2
will only evaluate e2
if e1
is true. Similarly in an expression like e1 || e2
, e2
is evaluated only if e1
is false (having to use "if" in the definition of these operators is what makes them "conditionally evaluated"). This means that in practice a C/C++ code like the following
is implemented like
of course doing exactly the above would be ridiculous, so in practice looks more like this (in C/C++ again, though not very nice).
We can simplify this further to make it closer to assembly (I'm adding a few redundant C/C++ labels below for clarity)
Do we always have to implement C/C++ logical operations like this? Not really, only if we're unsure that unconditionally evaluating the second operand may be different than conditionally evaluating it. For instance a case like
it won't harm to evaluate both operands so we can avoid a few branches and actually evaluate it as if the code had been (note a single &
which means bitwise operation that always evaluates the two operands).
but in a case like
We don't really want to evaluate ((10 / p) > 5)
when p == 0
. The same strategy is used for ||
in which sometimes we can evaluate it as |
.
The following table shows some examples. Take these as suggestions, more clever sequences can be used in some circumstances.
Condition to evaluate (C/C++) | Instruction sequence |
---|---|
e1 < e2 |
Direct translation If we want to evaluate If we can reverse the operands If we can reverse the operands |
e1 && e2 |
Direct translation Without short-cut evaluation (i.e. The |
e1 || e2 |
Without short-cut evaluation (i.e. |
!e
|
For the less common case when
But usually
In some other cases like !(e1 && e2) we can use De Morgan rules to evaluate it like |
Conditional assignment
A common idiom, in C/C++ is the following
In Fortran there is a shorthand syntax for this case
Or more generally, in C/C++
Which in C/C++ can be written more compactly as
In many circumstances e1
and e2
are such that it is correct to rewrite the general case as
or the opposite form
If we can write the code in one of the two forms above, and given that this case is relatively common, AArch64 has specific support for it.
The instruction csel rdest, rsource1, rsource2, cond
sets the value of the destination register rdest
to rsource1
if the condition cond
is true, otherwise it is set to rsource2
. csel
stands for conditional selection/
For instance, a typical max
operation looks like this in C.
with csel we can implement the code above as
which is handy because it avoids a branch.
Loop constructs
To implement a loop construct of the form
we can do
Usually inverting the flow is a bit better
This has the added benefit that if we remove the initial branch to loop_check
, this is the equivalent to the C do-while
loop or the Pascal repeat-until
.
While we can use regular branches and comparison instructions to implement loops, AArch64 provides a couple of specialized instructions for some common circumstances: cbnz
and cbz
. They conditionally branch to some label if a given register is non-zero or zero respectively. This is useful for loops like this.
because we can now implement them as
Similarly for cbz
. That said, for this particular example it is possible to use subs instruction that acts like sub
but also updates the condition flags. So we can write
So cbnz
and cbz
can be more useful when a register being zero is a condition and its computation cannot update the flags (or doing that would require adding an extra cmp
). We have not seen pointers yet, but a common scenario is traversing a linked list ended by a null pointer.
Because a null pointer is represented usually with all zeros after it = it->next
we can use cbnzr
without having to do cmp
+ b.ne
.
Fibonacci (using a loop)
In chapter 8 we made a function that recursively computed the Fibonacci numbers. Today we are going to see how to compute them using a loop. We will reuse the same main function and only change the fibonacci
function itself.
As a reminder of the fibonacci function is actually a sequence of numbers Fi defined by the following recurrence:
- F0 = 0
- F1 = 1
- Fn = Fn-1 + Fn-2, where n > 1
To compute the current element we only need the previous two elements, we don't really need to strictly follow the mathematical definition doing a recursive function call, like we did in chapter 8. Results for fibonacci(0)
and fibonacci(1)
will be immediately solved, no need to loop. Any other fibonacci(n) can be computed by repeatedly computing n - 1
previous Fibonacci numbers.
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
fibonacci:
// fibonacci(n) -> result
// n is 32-bit and will be passed in w0
// result is 64-bit and will be returned in x0
mov w3, w0 // w3 ← w0
cmp w3, #1 // compare w3 and 1 and update the flags
b.le simple_case // jump to simple_case if w3 <= 1
sub w3, w3, #1 // w3 ← w3 - 1
mov x1, #0 // x1 ← 0
mov x2, #1 // x1 ← 1
b loop_check // branch to loop_check
loop_body:
add x0, x1, x2 // x0 ← x1 + x2
mov x1, x2 // x1 ← x2
mov x2, x0 // x2 ← x0
sub w3, w3, #1 // w3 ← w3 - 1
loop_check:
cbnz w3, loop_body // branch to loop_body if w3 != 0
b fibonacci_end // branch to fibonacci_end
simple_case:
sxtw x0, w0 // x0 ← ExtendSigned32To64(w0)
fibonacci_end:
ret
As in the recursive version of this function, cases 0 and 1 are handled specially. We first keep w0 into w3, line 65, and then we compare w3 with 1, line 66. If the flags are such that w3 ≤ 1
then we branch to the simple_case
which the only thing it does is extending w0
into x0
, line 82. If the fibonacci number to compute is not 0 or 1, then we have to compute the n - 1
earlier ones. So we have to decrease w3
by 1, line 68. In register x1
we will have fibonacci(n-2)
and in x2 we will have fibonacci(n-1). The two initially known ones are 0 and 1 respectively (this is, for fibonacci(2)
), lines 69-70.
Now everything is set for the loop, so we jump to the check, line 71. The loop check uses cbnz
with w3
, line 78. Once w3
becomes zero means that we do not have to compute any other fibonacci number, so cbnz will branch back to loop_body
only when w3
is not 0.
The loop body simply computes the current fibonacci by adding x1
and x2
(recall, x1 is fibonacci(n-2)
and x2 is fibonacci(n-1)
) and leaving the result in x0
, line 73. Now we need to prepare the next iteration of the loop: x2
(which is fibonacci(n-1)
) will be fibonacci(n-2)
in the next iteration, so we put it in x1
, line 74. Similarly, x0
(which is fibonacci(n)
) will be fibonacci(n-1)
in the next iteration, so we put it x2
, line 75.
At this point one fibonacci number has been computed, so one less to compute. So we decrease w3
by one, line 76. Now the loop_check is executed which will iterate if needed. Once we leave the loop, we still need to skip the simple_case
, line 80.
To test this, just take the code in chapter 8 and replace the recursive function with the iterative function.
Yay! :)
This is all for today.