Think In Geek

In geek we trust

ARM assembler in Raspberry Pi – Chapter 21

We already know that ARM is a 32-bit architecture: general purpose registers are 32-bit wide and addresses in memory are 32-bit numbers. The natural integer size for an architecture is usually called a word and in ARM is obviously 32-bit integers. Sometimes, though, we need to deal with subword data: integers of size smaller than 32 bits.

Subword data

In this chapter subword data will refer either to a byte or to a halfword. A byte is an integer of 8-bit and a halfword is an integer of 16-bit. Thus, a halfword occupies 2 bytes and a word 4 bytes.

To define storage for a byte in the data section we have to use `.byte`. For a halfword the syntax is `.hword`.

```.align 4 one_byte: .byte 205 /* This number in binary is 11001101 */   .align 4 one_halfword: .hword 42445 /* This number in binary is 1010010111001101 */```

Note that, as usual, we are aligning data to 4 bytes. Later on we will see that for subword data alignment restrictions are slightly more relaxed.

Before we start operating a subword integer we need to get it somewhere. If we are not going to load/store it from/to memory, we may simply use a register. We may have to check that we do not overflow the range of the subword, but that’s all.

But if the data is in memory then it is important to load it properly since we do not want to read more data than actually needed. Recall that an address actually identifies a single byte of the memory: it is not possible to address anything smaller than a byte. Depending on the width of the load/store, the address will load/store 1 byte, 2 bytes or 4 bytes. A regular `ldr` loads a word, so we need some other instruction.

ARM provides the instructions `ldrb` and `ldrh` to load a byte and a halfword respectively. The destination is a general purpose register, of 32-bit, so this instruction must extend the value from 8 or 16 bits to 32 bits. Both `ldrb` and `ldrh` perform zero-extension, which means that all the extra bits, not loaded, will be set to zero.

```.text   .globl main main: push {r4, lr}   ldr r0, addr_of_one_byte /* r0 ← &one_byte */ ldrb r0, [r0] /* r0 ← *{byte}r0 */   ldr r1, addr_of_one_halfword /* r1 ← &one_halfword */ ldrh r1, [r1] /* r1 ← *{half}r1 */   pop {r4, lr} mov r0, #0 bx lr   addr_of_one_byte: .word one_byte addr_of_one_halfword: .word one_halfword```

In the example above note the difference between the `ldr` and the subsequent `ldrb`/`ldrh`. The `ldr` instruction is needed to load an address into the register. Addresses in ARM are 32-bit integers so a regular `ldr` must be used here. Then, once we have the address in the register we use `ldrb` or `ldrh` to load the byte or the halfword. As stated above, the destination register is 32-bit so the loaded integer is zero-extended. The following table shows what happens with zero-extension.

Effect of subword loads with `ldrb` and `ldrh`.
Content in memory (bytes) Loaded in register (32-bit)
`ldrb` 11001101 00000000 00000000 00000000 11001101
```ldrh ``` 11001101 10100101 00000000 00000000 10100101 11001101

ARM in the Raspberry Pi is a little endian architecture, this means that bytes in memory are laid in memory (from lower to higher addresses) starting from the least significant byte to the most significant byte. Load and store instructions preserve this ordering. This fact is usually not important unless viewing the memory as a sequence of bytes. This the reason why in the table above 11001101 always appears in the first column even if the number 42445 is 10100101 11001101 in binary.

Ok, loading using `ldrb` and `ldrh` is fine as long as we only use natural numbers. Integral numbers include negative numbers and are commonly represented using two’s complement. If we zero-extend a negative number, the sign bit (the most significant bit of a two’s complement) will not be propagated and we will end with an unrelated positive number. When loading two’s complement subword integers we need to perform sign-extension using instructions `lsrb` and `lsrh`.

``` ldr r0, addr_of_one_byte /* r0 ← &one_byte */ ldrsb r0, [r0] /* r0 ← *{signed byte}r0 */   ldr r1, addr_of_one_halfword /* r1 ← &one_halfword */ ldrsh r1, [r1] /* r1 ← *{signed half}r1 */```

Note that sign-extension is the same as zero-extension when the sign bit is zero, as it happens in the two last rows of the following table that shows the effect of `ldrsb` and `ldrsh`.

Effect of subword loads with `ldrsb` and `ldrsh`.
Content in memory (bytes) Loaded in register (32-bit)
`ldrsb` 11001101 11111111 11111111 11111111 11001101
```ldrsh ``` 11001101 10100101 11111111 11111111 10100101 11001101
`ldrsb` 01001101 00000000 00000000 00000000 01001101
```ldrsh ``` 11001101 00100101 00000000 00000000 00100101 11001101

It is very important not to mix both instructions when loading subword data. When loading natural numbers, `lrb` and `lrh` are the correct choice. If the number is an integer that could be negative always use `ldrsb` and `ldrsh`. The following table summarizes what happens when you mix interpretations and the different load instructions.

Patterns of bits interpreted as (natural) binary or two’s complement.
Interpretation of bits
Width Bits Binary Two’s complement
8-bit 11001101 205 -51
32-bit after `ldrb` 00000000000000000000000011001101 205 205
32-bit after `ldrsb` 11111111111111111111111111001101 4294967245 -51
16-bit 1010010111001101 42445 -23091
32-bit after `ldrh` 00000000000000001010010111001101 42445 42445
32-bit after `ldrsh` 11111111111111111010010111001101 4294944205 -23091

Store

While load requires to take care whether the loaded subword is a binary or a two’s complement encoded number, a store instruction does not require any of this consideration. The reason is that the corresponding `strb` and `strh` instructions will simply take the least significant 8 or 16 bits of the register and store it in memory.

``` ldr r1, addr_of_one_byte /* r0 ← &one_byte */ ldrsb r0, [r1] /* r0 ← *{signed byte}r1 */ strb r0, [r1] /* *{byte}r1 ← r0 */   ldr r0, addr_of_one_halfword /* r0 ← &one_halfword */ ldrsh r1, [r0] /* r1 ← *{signed half}r0 */ strh r1, [r0] /* *{half}r0 ← r1 */```

Alignment restrictions

When loading or storing 32-bit integer from memory, the address must be 4 byte aligned, this means that the two least significant bits of the address must be 0. Such restriction is relaxed if the memory operation (load or store) is a subword one. For halfwords the address must be 2 byte aligned. For bytes, no restriction applies. This way we can reinterpret words and halfwords as either halfwords and bytes if we want.

Consider the following example, where we traverse a single word reinterpreting its bytes and halfwords (and finally the word itself).

```1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 .data   .align 4 a_word: .word 0x11223344   .align 4 message_bytes : .asciz "byte #%d is 0x%x\n" message_halfwords : .asciz "halfword #%d is 0x%x\n" message_words : .asciz "word #%d is 0x%x\n"   .text   .globl main main: push {r4, r5, r6, lr} /* keep callee saved registers */   ldr r4, addr_a_word /* r4 ← &a_word */   mov r5, #0 /* r5 ← 0 */ b check_loop_bytes /* branch to check_loop_bytes */   loop_bytes: /* prepare call to printf */ ldr r0, addr_message_bytes /* r0 ← &message_bytes first parameter of printf */ mov r1, r5 /* r1 ← r5 second parameter of printf */ ldrb r2, [r4, r5] /* r2 ← *{byte}(r4 + r5) third parameter of printf */ bl printf /* call printf */ add r5, r5, #1 /* r5 ← r5 + 1 */ check_loop_bytes: cmp r5, #4 /* compute r5 - 4 and update cpsr */ bne loop_bytes /* if r5 != 4 branch to loop_bytes */   mov r5, #0 /* r5 ← 0 */ b check_loop_halfwords /* branch to check_loop_halfwords */   loop_halfwords: /* prepare call to printf */ ldr r0, addr_message_halfwords /* r0 ← &message_halfwords first parameter of printf */ mov r1, r5 /* r1 ← r5 second parameter of printf */ mov r6, r5, LSL #1 /* r6 ← r5 * 2 */ ldrh r2, [r4, r6] /* r2 ← *{half}(r4 + r6) this is r2 ← *{half}(r4 + r5 * 2) third parameter of printf */ bl printf /* call printf */ add r5, r5, #1 /* r5 ← r5 + 1 */ check_loop_halfwords: cmp r5, #2 /* compute r5 - 2 and update cpsr */ bne loop_halfwords /* if r5 != 2 branch to loop_halfwords */   /* prepare call to printf */ ldr r0, addr_message_words /* r0 ← &message_words first parameter of printf */ mov r1, #0 /* r1 ← 0 second parameter of printf */ ldr r2, [r4] /* r1 ← *r4 third parameter of printf */ bl printf /* call printf */   pop {r4, r5, r6, lr} /* restore callee saved registers */ mov r0, #0 /* set error code */ bx lr /* return to system */   addr_a_word : .word a_word addr_message_bytes : .word message_bytes addr_message_halfwords : .word message_halfwords addr_message_words : .word message_words```

Our word is the number 1122334416 (this is 28745402010). We load the address of the word, line 17, as usual with a `ldr` and then we perform different sized loads. The first loop, lines 19 to 35, loads each byte and prints it. Note that the `ldrb`, line 29, just adds the current byte (in `r5`) to the address of the word (in `r4`). We do not have to multiply `r5` by anything. In fact `ldrb` and `ldrh`, unlike `ldr`, do not allow a shift operand of the form `LSL #x`. You can see how to dodge this restriction in the loop that prints halfwords, lines 37 to 55. The instruction `ldrh`, line 48, we use `r6` that is just `r4 + r5*2`, computed in line 47. Since the original word was 4 byte aligned, we can read its two halfwords because they will be 2-byte aligned. It would be an error to attempt to load a halfword using the address of the byte 1, only the halfwords starting at bytes 0 and 2 can be loaded as a halfword.

This is the output of the program

```\$ ./reinterpret byte #0 is 0x44 byte #1 is 0x33 byte #2 is 0x22 byte #3 is 0x11 halfword #0 is 0x3344 halfword #1 is 0x1122 word #0 is 0x11223344```

As we stated above, ARM in the Raspberry Pi is a little endian architecture, so for integers of more than one byte, they are laid out (from lower addresses to higher addresses) starting from the less significant bytes, this is why the first byte is 4416 and not 1116. Similarly for halfwords, the first halfword will be 334416 instead of 112216.

Thats all for today

6 thoughts on “ARM assembler in Raspberry Pi – Chapter 21”

• Nilesh says:

Thanks a lot for fantastic tutorial. It helped me to learn ARM assembly quickly.
I want to know one more thing. Is it possible to learn thumb set using such simple tutorial. Can you give a simple example?

• rferrer says:

Hi Nilesh,

Although it is not the goal of this tutorial maybe I will devote some article to Thumb.

Kind regards,

• Yen says:

This is an awesome series of tutorials, Thankyou so much for spending the time going through them rferrer!

For anyone that wants extra information i highly recommend the free course texas university has put up online http://users.ece.utexas.edu/~valvano/Volume1/

• rferrer says:

Hi Yen, thanks for the online resource!

Kind regards,

• Cor Massar says:

Love the dedication you’ve shown. My RasPi will march with your instructions. All kidding aside, thanks for a very clear explanation of how how assembler influences the ARM processor. Will be using it after New Year. Keep up the quality. Wish all readers fine Holidays !?

• rferrer says:

Thanks Cor! Happy holidays.

This site uses Akismet to reduce spam. Learn how your comment data is processed.