In the previous chapter we saw that the second operand of most arithmetic instructions can use a shift operator which allows us to shift and rotate bits. In this chapter we will continue learning the available indexing modes of ARM instructions. This time we will focus on load and store instructions.
Arrays and structures
So far we have been able to move 32 bits from memory to registers (load) and back to memory (store). But working on single items of 32 bits (usually called scalars) is a bit limiting. Soon we would find ourselves working on arrays and structures, even if we did not know.
An array is a sequence of items of the same kind in memory. Arrays are a foundational data structure in almost every low level language. Every array has a base address, usually denoted by the name of the array, and contains N items. Each of these items has associated a growing index, ranging from 0 to N-1 or 1 to N. Using the base address and the index we can access an item of the array. We mentioned in chapter 3 that memory could be viewed as an array of bytes. An array in memory is the same, but an item may take more than one single byte.
A structure (or record or tuple) is a sequence of items of possibly diferent kind. Each item of a structure is usually called a field. Fields do not have an associated index but an offset respect to the beginning of the structure. Structures are laid out in memory to ensure that the proper alignment is used in every field. The base address of a structure is the address of its first field. If the base address is aligned, the structure should be laid out in a way that all the field are properly aligned as well.
What do arrays and structure have to do with indexing modes of load and store? Well, these indexing modes are designed to make easier accessing arrays and structs.
Defining arrays and structs
To illustrate how to work with arrays and references we will use the following C declarations and implement them in assembler.
Let's first define in our assembler the array 'a'. It is just 100 integers. An integer in ARM is 32-bit wide so in our assembler code we have to make room for 400 bytes (4 * 100).
1 2 3 4 5 /* -- array01.s */ .data .balign 4 a: .skip 400
In line 5 we define the symbol
a and then we make room for 400 bytes. The directive .skip tells the assembler to advance a given number of bytes before emitting the next datum. Here we are skipping 400 bytes because our array of integers takes 400 bytes (4 bytes per each of the 100 integers). Declaring a structure is not much different.
b: .skip 8
Right now you should wonder why we skipped 8 bytes when the structure itself takes just 5 bytes. Well, it does need 5 bytes to store useful information. The first field
f0 is a
char takes 1 byte of storage. The next field
f1 is an integer. An integer takes 4 bytes and it must be aligned at 4 bytes as well, so we have to leave 3 unused bytes between the field
f0 and the field
f1. This unused storage put just to fulfill alignment is called padding. Padding should never be used by your program.
Naive approach without indexing modes
Ok, let's write some code to initialize every item of the array
a[i]. We will do something equivalent to the following C code.
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 .text .global main main: ldr r1, addr_of_a /* r1 ← &a */ mov r2, #0 /* r2 ← 0 */ loop: cmp r2, #100 /* Have we reached 100 yet? */ beq end /* If so, leave the loop, otherwise continue */ add r3, r1, r2, LSL #2 /* r3 ← r1 + (r2*4) */ str r2, [r3] /* *r3 ← r2 */ add r2, r2, #1 /* r2 ← r2 + 1 */ b loop /* Go to the beginning of the loop */ end: bx lr addr_of_a: .word a
Whew! We are using lots of things we have learnt from earlier chapters. In line 14 we load the base address of the array into
r1. The address of the array will not change so we load it once. In register
r2 we will keep the index that will range from 0 to 99. In line 17 we compare it to 100 to see if we have reached the end of the loop.
Line 19 is an important one. Here we compute the address of the item. We have in
r1 the base address and we know each item is 4 bytes wide. We know also that
r2 keeps the index of the loop which we will use to access the array element. Given an item with index
i its address must be
&a + 4*i, since there are 4 bytes between every element of this array. So
r3 has the address of the current element in this step of the loop. In line 20 we store
r2, this is
i, into the memory pointed by
i-th array item, this is
Then we proceed to increase
r2 and jump back for the next step of the loop.
As you can see, accessing an array involves calculating the address of the accessed item. Does the ARM instruction set provide a more compact way to do this? The answer is yes. In fact it provides several indexing modes.
In the previous chapter the concept indexing mode was a bit off because we were not indexing anything. Now it makes much more sense since we are indexing an array item. ARM provides nine of these indexing modes. I will distinguish two kinds of indexing modes: non updating and updating depending on whether they feature a side-effect that we will discuss later, when dealing with updating indexing modes.
Non updating indexing modes
It justs adds (or subtracts) the immediate value to form the address. This is very useful to array items the index of which is a constant in the code or fields of a structure, since their offset is always constant. In
Rsource1we put the base address and in
immediatethe offset we want in bytes. The immediate cannot be larger than 12 bits (0..4096). When the immediate is
#0it is like the usual we have been using
For example, we can set
ato 3 this way (we assume that r1 already contans the base address of a). Note that the offset is in bytes thus we need an offset of 12 (4 bytes * 3 items skipped).
This is like the previous one, but the added (or subtracted) offset is the value in a register. This is useful when the offset is too big for the immediate. Note that for the
+Rsource2case, the two registers can be swapped (as this would not affect the address computed).
Example. The same as above but using a register this time.
[Rsource1, +Rsource2, shift_operation #immediate]or
[Rsource1, -Rsource2, shift_operation #immediate]
This one is similar to the usual shift operation we can do with other instructions. A shift operation (remember:
ROR) is applied to
Rsource1is then added (or subtracted) to the result of the shift operation applied to
Rsource2. This is useful when we need to multiply the address by some fixed amount. When accessing the items of the integer array
awe had to multiply the result by 4 to get a meaningful address.
For this example, let's first recall how we computed above the address in the array of the item in position
add r3, r1, r2, LSL #2 /* r3 ← r1 + r2*4 */ str r2, [r3] /* *r3 ← r2 */
We can express this in a much more compact way (without the need of the register
Updating indexing modes
In these indexing modes the
Rsource1 register is updated with the address synthesized by the load or store instruction. You may be wondering why one would want to do this. A bit of detour first. Recheck the code of the array load. Why do we have to keep around the base address of the array if we are always effectively moving 4 bytes away from it? Would not it make much more sense to keep the address of the current entity? So instead of
19 20 add r3, r1, r2, LSL #2 /* r3 ← r1 + r2*4 */ str r2, [r3] /* *r3 ← r2 */
we might want to do something like
because there is no need to compute everytime from the beginning the address of the next item (as we are accessing them sequentially). Even if this looks slightly better, it still can be improved a bit more. What if our instruction were able to update
r1 for us? Something like this (obviously the exact syntax is not as shown)
Such indexing modes exist. There are two kinds of updating indexing modes depending on at which time
Rsource1 is updated. If
Rsource1 is updated after the load or store itself (meaning that the address to load or store is the initial
Rsource1 value) this is a post-indexing accessing mode. If
Rsource1 is updated before the actual load or store (meaning that the address to load or store is the final value of
Rsource1) this is a pre-indexing accessing mode. In all cases, at the end of the instruction
Rsource1 will have the value of the computation of the indexing mode. Now this sounds a bit convoluted, just look in the example above: we first load using
r1 and then we do
r1 ← r1 + 4. This is post-indexing: we first use the value of
r1 as the address where we store the value of
r1 is updated with
r1 + 4. Now consider another hypothetic syntax.
This is pre-indexing: we first compute
r1 + 4 and use it as the address where we store the value of
r2. At the end of the instruction
r1 has effectively been updated too, but the updated value has already been used as the address of the load or store.
The value of
Rsource1is used as the address for the load or store. Then
Rsource1is updated with the value of
immediateafter adding (or subtracting) it to
Rsource1. Using this indexing mode we can rewrite the loop of our first example as follows:
16 17 18 19 20 21 22
loop: cmp r2, #100 /* Have we reached 100 yet? */ beq end /* If so, leave the loop, otherwise continue */ str r2, [r1], #4 /* *r1 ← r2 then r1 ← r1 + 4 */ add r2, r2, #1 /* r2 ← r2 + 1 */ b loop /* Go to the beginning of the loop */ end:
Like the previous one but instead of an immediate, the value of
Rsource2is used. As usual this can be used as a workaround when the offset is too big for the immediate value.
[Rsource1], +Rsource2, shift_operation #immediateor
[Rsource1], -Rsource2, shift_operation #immediate
The value of
Rsource1is used as the address for the load or store. Then
Rsource2is applied a shift operation (
ROL). The resulting value of that shift is added (or subtracted) to
Rsource1is finally updated with this last value.
Pre-indexing modes may look a bit weird at first but they are useful when the computed address is going to be reused soon. Instead of recomputing it we can reuse the updated
! symbol in these indexing modes which distinguishes them from the non updating indexing modes.
It behaves like the similar non-updating indexing mode but
Rsource1gets updated with the computed address. Imagine we want to compute
a = a + a. We could do this (we assume that
r1already has the base address of the array).
Similar to the previous one but using a register
Rsource2instead of an immediate.
[Rsource1, +Rsource2, shift_operation #immediate]!or
[Rsource1, -Rsource2, shift_operation #immediate]!
Like to the non-indexing equivalent but Rsource1 will be updated with the address used for the load or store instruction.
Back to structures
All the examples in this chapter have used an array. Structures are a bit simpler: the offset to the fields is always constant: once we have the base address of the structure (the address of the first field) accessing a field is just an indexing mode with an offset (usually an immediate). Our current structure features, on purpose, a
char as its first field
f0. Currently we cannot work on scalars in memory of different size than 4 bytes. So we will postpone working on that first field for a future chapter.
For instance imagine we wanted to increment the field f1 like this.
r1 contains the base address of our structure, accessing the field
f1 is pretty easy now that we know all the available indexing modes.
1 2 3 ldr r2, [r1, #+4]! /* r1 ← r1 + 4 then r2 ← *r1 */ add r2, r2, #7 /* r2 ← r2 + 7 */ str r2, [r1] /* *r1 ← r2 */
Note that we use a pre-indexing mode to keep in
r1 the address of the field
f1. This way the second store does not need to compute that address again.
That's all for today.