ARM assembler in Raspberry Pi

We already know that ARM is a 32-bit architecture: general purpose registers are 32-bit wide and addresses in memory are 32-bit numbers. The natural integer size for an architecture is usually called a word and in ARM is obviously 32-bit integers. Sometimes, though, we need to deal with subword data: integers of size smaller than 32 bits.

Subword data

In this chapter subword data will refer either to a byte or to a halfword. A byte is an integer of 8-bit and a halfword is an integer of 16-bit. Thus, a halfword occupies 2 bytes and a word 4 bytes.

To define storage for a byte in the data section we have to use .byte. For a halfword the syntax is .hword.

.align 4
one_byte: .byte 205
/* This number in binary is 11001101 */

.align 4
one_halfword: .hword 42445
/* This number in binary is 1010010111001101 */

Note that, as usual, we are aligning data to 4 bytes. Later on we will see that for subword data alignment restrictions are slightly more relaxed.

Load and store

Before we start operating a subword integer we need to get it somewhere. If we are not going to load/store it from/to memory, we may simply use a register. We may have to check that we do not overflow the range of the subword, but that's all.

But if the data is in memory then it is important to load it properly since we do not want to read more data than actually needed. Recall that an address actually identifies a single byte of the memory: it is not possible to address anything smaller than a byte. Depending on the width of the load/store, the address will load/store 1 byte, 2 bytes or 4 bytes. A regular ldr loads a word, so we need some other instruction.

ARM provides the instructions ldrb and ldrh to load a byte and a halfword respectively. The destination is a general purpose register, of 32-bit, so this instruction must extend the value from 8 or 16 bits to 32 bits. Both ldrb and ldrh perform zero-extension, which means that all the extra bits, not loaded, will be set to zero.

.text

.globl main
main:
    push {r4, lr}

    ldr r0, addr_of_one_byte     /* r0 ← &one_byte */
    ldrb r0, [r0]                /* r0 ← *{byte}r0 */

    ldr r1, addr_of_one_halfword /* r1 ← &one_halfword */
    ldrh r1, [r1]                /* r1 ← *{half}r1 */

    pop {r4, lr}
    mov r0, #0
    bx lr

addr_of_one_byte: .word one_byte
addr_of_one_halfword: .word one_halfword

In the example above note the difference between the ldr and the subsequent ldrb/ldrh. The ldr instruction is needed to load an address into the register. Addresses in ARM are 32-bit integers so a regular ldr must be used here. Then, once we have the address in the register we use ldrb or ldrh to load the byte or the halfword. As stated above, the destination register is 32-bit so the loaded integer is zero-extended. The following table shows what happens with zero-extension.

Effect of subword loads with `ldrb` and `ldrh`.
	Content in memory (bytes)		Loaded in register (32-bit)
	addr	addr+1
`ldrb`	11001101		00000000 00000000 00000000 11001101
`ldrh`	11001101	10100101	00000000 00000000 10100101 11001101

ARM in the Raspberry Pi is a little endian architecture, this means that bytes in memory are laid in memory (from lower to higher addresses) starting from the least significant byte to the most significant byte. Load and store instructions preserve this ordering. This fact is usually not important unless viewing the memory as a sequence of bytes. This the reason why in the table above 11001101 always appears in the first column even if the number 42445 is 10100101 11001101 in binary.

Ok, loading using ldrb and ldrh is fine as long as we only use natural numbers. Integral numbers include negative numbers and are commonly represented using two's complement. If we zero-extend a negative number, the sign bit (the most significant bit of a two's complement) will not be propagated and we will end with an unrelated positive number. When loading two's complement subword integers we need to perform sign-extension using instructions lsrb and lsrh.

    ldr r0, addr_of_one_byte     /* r0 ← &one_byte */
    ldrsb r0, [r0]               /* r0 ← *{signed byte}r0 */

    ldr r1, addr_of_one_halfword /* r1 ← &one_halfword */
    ldrsh r1, [r1]               /* r1 ← *{signed half}r1 */

Note that sign-extension is the same as zero-extension when the sign bit is zero, as it happens in the two last rows of the following table that shows the effect of ldrsb and ldrsh.

Effect of subword loads with `ldrsb` and `ldrsh`.
	Content in memory (bytes)		Loaded in register (32-bit)
	addr	addr+1
`ldrsb`	11001101		11111111 11111111 11111111 11001101
`ldrsh`	11001101	10100101	11111111 11111111 10100101 11001101
`ldrsb`	01001101		00000000 00000000 00000000 01001101
`ldrsh`	11001101	00100101	00000000 00000000 00100101 11001101

It is very important not to mix both instructions when loading subword data. When loading natural numbers, lrb and lrh are the correct choice. If the number is an integer that could be negative always use ldrsb and ldrsh. The following table summarizes what happens when you mix interpretations and the different load instructions.

Patterns of bits interpreted as (natural) binary or two's complement.
		Interpretation of bits
Width	Bits	Binary	Two's complement
8-bit	11001101	205	-51
32-bit after `ldrb`	00000000000000000000000011001101	205	205
32-bit after `ldrsb`	11111111111111111111111111001101	4294967245	-51
16-bit	1010010111001101	42445	-23091
32-bit after `ldrh`	00000000000000001010010111001101	42445	42445
32-bit after `ldrsh`	11111111111111111010010111001101	4294944205	-23091

Store

While load requires to take care whether the loaded subword is a binary or a two's complement encoded number, a store instruction does not require any of this consideration. The reason is that the corresponding strb and strh instructions will simply take the least significant 8 or 16 bits of the register and store it in memory.

    ldr r1, addr_of_one_byte     /* r0 ← &one_byte */
    ldrsb r0, [r1]               /* r0 ← *{signed byte}r1 */
    strb r0, [r1]                /* *{byte}r1 ← r0 */

    ldr r0, addr_of_one_halfword /* r0 ← &one_halfword */
    ldrsh r1, [r0]               /* r1 ← *{signed half}r0 */
    strh r1, [r0]                /* *{half}r0 ← r1 */

Alignment restrictions

When loading or storing 32-bit integer from memory, the address must be 4 byte aligned, this means that the two least significant bits of the address must be 0. Such restriction is relaxed if the memory operation (load or store) is a subword one. For halfwords the address must be 2 byte aligned. For bytes, no restriction applies. This way we can reinterpret words and halfwords as either halfwords and bytes if we want.

Consider the following example, where we traverse a single word reinterpreting its bytes and halfwords (and finally the word itself).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
.data

.align 4
a_word: .word 0x11223344

.align 4
message_bytes : .asciz "byte #%d is 0x%x\n"
message_halfwords : .asciz "halfword #%d is 0x%x\n"
message_words : .asciz "word #%d is 0x%x\n"

.text

.globl main
main:
    push {r4, r5, r6, lr}  /* keep callee saved registers */

    ldr r4, addr_a_word    /* r4 ← &a_word */

    mov r5, #0             /* r5 ← 0 */
    b check_loop_bytes     /* branch to check_loop_bytes */

    loop_bytes:
        /* prepare call to printf */
        ldr r0, addr_message_bytes
                           /* r0 ← &message_bytes
                              first parameter of printf */
        mov r1, r5         /* r1 ← r5
                              second parameter of printf */
        ldrb r2, [r4, r5]  /* r2 ← *{byte}(r4 + r5)
                              third parameter of printf */
        bl printf          /* call printf */
        add r5, r5, #1     /* r5 ← r5 + 1 */
    check_loop_bytes:
        cmp r5, #4         /* compute r5 - 4 and update cpsr */
        bne loop_bytes     /* if r5 != 4 branch to loop_bytes */

    mov r5, #0             /* r5 ← 0 */
    b check_loop_halfwords /* branch to check_loop_halfwords */

    loop_halfwords:
        /* prepare call to printf */
        ldr r0, addr_message_halfwords
                           /* r0 ← &message_halfwords
                              first parameter of printf */
        mov r1, r5         /* r1 ← r5
                              second parameter of printf */
        mov r6, r5, LSL #1 /* r6 ← r5 * 2 */
        ldrh r2, [r4, r6]  /* r2 ← *{half}(r4 + r6)
                              this is r2 ← *{half}(r4 + r5 * 2)
                              third parameter of printf */
        bl printf          /* call printf */
        add r5, r5, #1     /* r5 ← r5 + 1 */
    check_loop_halfwords:
        cmp r5, #2         /* compute r5 - 2 and update cpsr */
        bne loop_halfwords /* if r5 != 2 branch to loop_halfwords */

    /* prepare call to printf */
    ldr r0, addr_message_words /* r0 ← &message_words
                                  first parameter of printf */
    mov r1, #0                 /* r1 ← 0
                                  second parameter of printf */
    ldr r2, [r4]               /* r1 ← *r4
                                  third parameter of printf */
    bl printf                  /* call printf */

    pop {r4, r5, r6, lr}   /* restore callee saved registers */
    mov r0, #0             /* set error code */
    bx lr                  /* return to system */

addr_a_word : .word a_word
addr_message_bytes : .word message_bytes
addr_message_halfwords : .word message_halfwords
addr_message_words : .word message_words

Our word is the number 11223344₁₆ (this is 287454020₁₀). We load the address of the word, line 17, as usual with a ldr and then we perform different sized loads. The first loop, lines 19 to 35, loads each byte and prints it. Note that the ldrb, line 29, just adds the current byte (in r5) to the address of the word (in r4). We do not have to multiply r5 by anything. In fact ldrb and ldrh, unlike ldr, do not allow a shift operand of the form LSL #x. You can see how to dodge this restriction in the loop that prints halfwords, lines 37 to 55. The instruction ldrh, line 48, we use r6 that is just r4 + r5*2, computed in line 47. Since the original word was 4 byte aligned, we can read its two halfwords because they will be 2-byte aligned. It would be an error to attempt to load a halfword using the address of the byte 1, only the halfwords starting at bytes 0 and 2 can be loaded as a halfword.

This is the output of the program

$ ./reinterpret 
byte #0 is 0x44
byte #1 is 0x33
byte #2 is 0x22
byte #3 is 0x11
halfword #0 is 0x3344
halfword #1 is 0x1122
word #0 is 0x11223344

As we stated above, ARM in the Raspberry Pi is a little endian architecture, so for integers of more than one byte, they are laid out (from lower addresses to higher addresses) starting from the less significant bytes, this is why the first byte is 44₁₆ and not 11₁₆. Similarly for halfwords, the first halfword will be 3344₁₆ instead of 1122₁₆.

Thats all for today