ARM assembler in Raspberry Pi – Chapter 21
We already know that ARM is a 32-bit architecture: general purpose registers are 32-bit wide and addresses in memory are 32-bit numbers. The natural integer size for an architecture is usually called a word and in ARM is obviously 32-bit integers. Sometimes, though, we need to deal with subword data: integers of size smaller than 32 bits.
Subword data
In this chapter subword data will refer either to a byte or to a halfword. A byte is an integer of 8-bit and a halfword is an integer of 16-bit. Thus, a halfword occupies 2 bytes and a word 4 bytes.
To define storage for a byte in the data section we have to use .byte
. For a halfword the syntax is .hword
.
Note that, as usual, we are aligning data to 4 bytes. Later on we will see that for subword data alignment restrictions are slightly more relaxed.
Load and store
Before we start operating a subword integer we need to get it somewhere. If we are not going to load/store it from/to memory, we may simply use a register. We may have to check that we do not overflow the range of the subword, but that's all.
But if the data is in memory then it is important to load it properly since we do not want to read more data than actually needed. Recall that an address actually identifies a single byte of the memory: it is not possible to address anything smaller than a byte. Depending on the width of the load/store, the address will load/store 1 byte, 2 bytes or 4 bytes. A regular ldr
loads a word, so we need some other instruction.
ARM provides the instructions ldrb
and ldrh
to load a byte and a halfword respectively. The destination is a general purpose register, of 32-bit, so this instruction must extend the value from 8 or 16 bits to 32 bits. Both ldrb
and ldrh
perform zero-extension, which means that all the extra bits, not loaded, will be set to zero.
In the example above note the difference between the ldr
and the subsequent ldrb
/ldrh
. The ldr
instruction is needed to load an address into the register. Addresses in ARM are 32-bit integers so a regular ldr
must be used here. Then, once we have the address in the register we use ldrb
or ldrh
to load the byte or the halfword. As stated above, the destination register is 32-bit so the loaded integer is zero-extended. The following table shows what happens with zero-extension.
Content in memory (bytes) | Loaded in register (32-bit) | ||
---|---|---|---|
addr | addr+1 | ||
ldrb
|
11001101 | 00000000 00000000 00000000 11001101 | |
ldrh
|
11001101 | 10100101 | 00000000 00000000 10100101 11001101 |
ARM in the Raspberry Pi is a little endian architecture, this means that bytes in memory are laid in memory (from lower to higher addresses) starting from the least significant byte to the most significant byte. Load and store instructions preserve this ordering. This fact is usually not important unless viewing the memory as a sequence of bytes. This the reason why in the table above 11001101 always appears in the first column even if the number 42445 is 10100101 11001101 in binary.
Ok, loading using ldrb
and ldrh
is fine as long as we only use natural numbers. Integral numbers include negative numbers and are commonly represented using two's complement. If we zero-extend a negative number, the sign bit (the most significant bit of a two's complement) will not be propagated and we will end with an unrelated positive number. When loading two's complement subword integers we need to perform sign-extension using instructions lsrb
and lsrh
.
Note that sign-extension is the same as zero-extension when the sign bit is zero, as it happens in the two last rows of the following table that shows the effect of ldrsb
and ldrsh
.
Content in memory (bytes) | Loaded in register (32-bit) | ||
---|---|---|---|
addr | addr+1 | ||
ldrsb
|
11001101 | 11111111 11111111 11111111 11001101 | |
ldrsh
|
11001101 | 10100101 | 11111111 11111111 10100101 11001101 |
ldrsb
|
01001101 | 00000000 00000000 00000000 01001101 | |
ldrsh
|
11001101 | 00100101 | 00000000 00000000 00100101 11001101 |
It is very important not to mix both instructions when loading subword data. When loading natural numbers, lrb
and lrh
are the correct choice. If the number is an integer that could be negative always use ldrsb
and ldrsh
. The following table summarizes what happens when you mix interpretations and the different load instructions.
Interpretation of bits | |||
---|---|---|---|
Width | Bits | Binary | Two's complement |
8-bit | 11001101 | 205 | -51 |
32-bit after ldrb | 00000000000000000000000011001101 | 205 | 205 |
32-bit after ldrsb | 11111111111111111111111111001101 | 4294967245 | -51 |
16-bit | 1010010111001101 | 42445 | -23091 |
32-bit after ldrh | 00000000000000001010010111001101 | 42445 | 42445 |
32-bit after ldrsh | 11111111111111111010010111001101 | 4294944205 | -23091 |
Store
While load requires to take care whether the loaded subword is a binary or a two's complement encoded number, a store instruction does not require any of this consideration. The reason is that the corresponding strb
and strh
instructions will simply take the least significant 8 or 16 bits of the register and store it in memory.
Alignment restrictions
When loading or storing 32-bit integer from memory, the address must be 4 byte aligned, this means that the two least significant bits of the address must be 0. Such restriction is relaxed if the memory operation (load or store) is a subword one. For halfwords the address must be 2 byte aligned. For bytes, no restriction applies. This way we can reinterpret words and halfwords as either halfwords and bytes if we want.
Consider the following example, where we traverse a single word reinterpreting its bytes and halfwords (and finally the word itself).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
.data
.align 4
a_word: .word 0x11223344
.align 4
message_bytes : .asciz "byte #%d is 0x%x\n"
message_halfwords : .asciz "halfword #%d is 0x%x\n"
message_words : .asciz "word #%d is 0x%x\n"
.text
.globl main
main:
push {r4, r5, r6, lr} /* keep callee saved registers */
ldr r4, addr_a_word /* r4 ← &a_word */
mov r5, #0 /* r5 ← 0 */
b check_loop_bytes /* branch to check_loop_bytes */
loop_bytes:
/* prepare call to printf */
ldr r0, addr_message_bytes
/* r0 ← &message_bytes
first parameter of printf */
mov r1, r5 /* r1 ← r5
second parameter of printf */
ldrb r2, [r4, r5] /* r2 ← *{byte}(r4 + r5)
third parameter of printf */
bl printf /* call printf */
add r5, r5, #1 /* r5 ← r5 + 1 */
check_loop_bytes:
cmp r5, #4 /* compute r5 - 4 and update cpsr */
bne loop_bytes /* if r5 != 4 branch to loop_bytes */
mov r5, #0 /* r5 ← 0 */
b check_loop_halfwords /* branch to check_loop_halfwords */
loop_halfwords:
/* prepare call to printf */
ldr r0, addr_message_halfwords
/* r0 ← &message_halfwords
first parameter of printf */
mov r1, r5 /* r1 ← r5
second parameter of printf */
mov r6, r5, LSL #1 /* r6 ← r5 * 2 */
ldrh r2, [r4, r6] /* r2 ← *{half}(r4 + r6)
this is r2 ← *{half}(r4 + r5 * 2)
third parameter of printf */
bl printf /* call printf */
add r5, r5, #1 /* r5 ← r5 + 1 */
check_loop_halfwords:
cmp r5, #2 /* compute r5 - 2 and update cpsr */
bne loop_halfwords /* if r5 != 2 branch to loop_halfwords */
/* prepare call to printf */
ldr r0, addr_message_words /* r0 ← &message_words
first parameter of printf */
mov r1, #0 /* r1 ← 0
second parameter of printf */
ldr r2, [r4] /* r1 ← *r4
third parameter of printf */
bl printf /* call printf */
pop {r4, r5, r6, lr} /* restore callee saved registers */
mov r0, #0 /* set error code */
bx lr /* return to system */
addr_a_word : .word a_word
addr_message_bytes : .word message_bytes
addr_message_halfwords : .word message_halfwords
addr_message_words : .word message_words
Our word is the number 1122334416 (this is 28745402010). We load the address of the word, line 17, as usual with a ldr
and then we perform different sized loads. The first loop, lines 19 to 35, loads each byte and prints it. Note that the ldrb
, line 29, just adds the current byte (in r5
) to the address of the word (in r4
). We do not have to multiply r5
by anything. In fact ldrb
and ldrh
, unlike ldr
, do not allow a shift operand of the form LSL #x
. You can see how to dodge this restriction in the loop that prints halfwords, lines 37 to 55. The instruction ldrh
, line 48, we use r6
that is just r4 + r5*2
, computed in line 47. Since the original word was 4 byte aligned, we can read its two halfwords because they will be 2-byte aligned. It would be an error to attempt to load a halfword using the address of the byte 1, only the halfwords starting at bytes 0 and 2 can be loaded as a halfword.
This is the output of the program
As we stated above, ARM in the Raspberry Pi is a little endian architecture, so for integers of more than one byte, they are laid out (from lower addresses to higher addresses) starting from the less significant bytes, this is why the first byte is 4416 and not 1116. Similarly for halfwords, the first halfword will be 334416 instead of 112216.
Thats all for today