Think In Geek

In geek we trust

ARM assembler in Raspberry Pi – Chapter 7

ARM architecture has been for long targeted at embedded systems. Embedded systems usually end being used in massively manufactured products (dishwashers, mobile phones, TV sets, etc). In this context margins are very tight so a designer will always try to spare as much components as possible (a cent saved in hundreds of thousands or even millions of appliances may pay off). One relatively expensive component is memory although every day memory is less and less expensive. Anyway, in constrained memory environments being able to save memory is good and ARM instruction set was designed with this goal in mind. It will take us several chapters to learn all of these techniques, today we will start with one feature usually named shifted operand.

Indexing modes

We have seen that, except for load (ldr), store (str) and branches (b and bXX), ARM instructions take as operands either registers or immediate values. We have also seen that the first operand is usually the destination register (being str a notable exception as there it plays the role of source because the destination is now the memory). Instruction mov has another operand, a register or an immediate value. Arithmetic instructions like add and and (and many others) have two more source registers, the first of which is always a register and the second can be a register or an immediate value.

These sets of allowed operands in instructions are collectively called indexing modes. Today this concept will look a bit off since we will not index anything. The name indexing makes sense in memory operands but ARM instructions, except load and store, do not have memory operands. This is the nomenclature you will find in ARM documentation so it seems sensible to use theirs.

We can summarize the syntax of most of the ARM instructions in the following pattern

instruction Rdest, Rsource1, source2

There are some exceptions, mainly move (mov), branches, load and stores. In fact move is not so different actually.

mov Rdest, source2

Both Rdest and Rsource must be registers. In the next section we will talk about source2.

We will discuss the indexing modes of load and store instructions in a future chapter. Branches, on the other hand, are surprisingly simple and their single operand is just a label of our program, so there is little to discuss on indexing modes for branches.

Shifted operand

What is this misterious source2 in the instruction patterns above? If you recall the previous chapters we have used registers or immediate values. So at least that source2 is this: register or immediate value. You can use an immediate or a register where a source2 is expected. Some examples follow, but we have already used them in the examples of previous chapters.

mov r0, #1
mov r1, r0
add r2, r1, r0
add r2, r3, #4

But source2 can be much more than just a simple register or an immediate. In fact, when it is a register we can combine it with a shift operation. We already saw one of these shift operations in chapter 6. Not it is time to unveil all of them.

  • LSL #n
    Logical Shift Left. Shifts bits n times left. The n leftmost bits are lost and the n rightmost are set to zero.

  • LSL Rsource3
    Like the previous one but instead of an immediate the lower byte of a register specifies the amount of shifting.

  • LSR #n
    Logical Shift Right. Shifts bits n times right. The n rightmost bits are lost and the n leftmost bits are set to zero,

  • LSR Rsource3
    Like the previous one but instead of an immediate the lower byte of a register specifies the amount of shifting.

  • ASR #n
    Arithmetic Shift Right. Like LSR but the leftmost bit before shifting is used instead of zero in the n leftmost ones.

  • ASR Rsource3
    Like the previous one but using a the lower byte of a register instead of an immediate.

  • ROR #n
    Rotate Right. Like LSR but the n rightmost bits are not lost bot pushed onto the n leftmost bits

  • ROR Rsource3
    Like the previous one but using a the lower byte of a register instead of an immediate.

In the listing above, n is an immediate from 1 to 31. These extra operations may be applied to the value in the second source register (to the value, not to the register itself) so we can perform some more operations in a single instruction. For instance, ARM does not have any shift right or left instruction. You just use the mov instruction.

mov r1, r2, LSL #1

You may be wondering why one would want to shift left or right the value of a register. If you recall chapter 6 we saw that shifting left (LSL) a value gives a value that the same as multiplying it by 2. Conversely, shifting it right (ASR if we use two’s complement, LSR otherwise) is the same as dividing by 2. Since a shift of n is the same as doing n shifts of 1, shifts actually multiply or divide a value by 2n.

mov r1, r2, LSL #1      /* r1 ← (r2*2) */
mov r1, r2, LSL #2      /* r1 ← (r2*4) */
mov r1, r3, ASR #3      /* r1 ← (r3/8) */
mov r3, 4
mov r1, r2, LSL r3      /* r1 ← (r2*16) */

We can combine it with add to get some useful cases.

add r1, r2, r2, LSL #1   /* r1 ← r2 + (r2*2) equivalent to r1 ← r1*3 */
add r1, r2, r2, LSL #2   /* r1 ← r2 + (r2*4) equivalent to r1 ← r1*5 */

You can do something similar with sub.

sub r1, r2, r2, LSL #3  /* r1 ← r2 - (r2*8) equivalent to r1 ← r2*(-7)

ARM comes with a handy rsb (Reverse Substract) instruction which computes Rdest ← source2 - Rsource1 (compare it to sub which computes Rdest ← Rsource1 - source2).

rsb r1, r2, r2, LSL #3      /* r1 ← (r2*8) - r2 equivalent to r1 ← r2*7 */

Another example, a bit more contrived.

/* Complicated way to multiply the initial value of r1 by 42 = 7*3*2 */
rsb r1, r1, r1, LSL #3  /* r1 ← (r1*8) - r1 equivalent to r1 ← 7*r1 */
add r1, r1, r1, LSL #1  /* r1 ← r1 + (2*r1) equivalent to r1 ← 3*r1 */
add r1, r1, r1          /* r1 ← r1 + r1     equivalent to r1 ← 2*r1 */

You are probably wondering why would we want to use shifts to perform multiplications. Well, the generic multiplication instruction always work but it is usually much harder to compute by our ARM processor so it may take more time. There are times where there is no other option but for many small constant values a single instruction may be more efficient.

Rotations are less useful than shifts in everyday use. They are usually used in cryptography, to reorder bits and “scramble” them. ARM does not provide a way to rotate left but we can do a n rotate left doing a 32-n rotate right.

/* Assume r1 is 0x12345678 */
mov r1, r1, ROR #1   /* r1 ← r1 ror 1. This is r1 ← 0x91a2b3c */
mov r1, r1, ROR #31  /* r1 ← r1 ror 31. This is r1 ← 0x12345678 */

That’s all for today.

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

, , , ,

6 thoughts on “ARM assembler in Raspberry Pi – Chapter 7

  • Fernando says:

    Any insight of why is multiplication slower than addition?

    • rferrer says:

      The general multiplication algorithm is more expensive than addition because it involves more steps.

      When you add two numbers by hand, you start by vertically aligning their digits. Then you add vertically each column starting from the right and moving to the left, taking care of the carry values you may need during the addition. Well, the circuitry of a processor does something similar. But a processor is able to advance the different carries found along the computation in a way that adding two numbers of N bits does not take a “k*N” amount of time but “k*log(N)”. Compare how you calculate a sum: when you add two numbers of 10 digits you need twice more time than when you add two numbers of 5 digits. The processor would just need one extra step to add 10 digits compared to adding 5 digits. Cool, isn’t it? 🙂

      Now consider multiplication. When you multiply two numbers, you start as well by vertically aligning them. But now, you pick each digit of one of the rows (I use the lower one) and then you multiply it with every number of the other row (in my case it would be the upper one). This gives you an intermediate row. Then you repeat this process with the digit at the left of the previous one, giving you a new intermediate row that you put below the previous intermediate one but shifted left a position. Once you have done this with all the digits of the first row you use (the lower one in my case), then you add all the intermediate rows. As you can see, there are lots of steps involved here. When you multiply binary numbers a similar thing happens, though multiplying binary numbers is easier because it is just multiplying by 0 or 1, but you still have to add the intermediate rows. So the whole process is much more complicated. So the processor needs more time to calculate it.

      When multiplying a number by a power of 2 (2, 4, 8, 16, …) shifting left is as simple as moving 1 bit to the left. All this is done by just routing the bits inside the processor: no complex computation is needed at all. So avoiding multiplications is usually a Good Thing™

      • Fernando says:

        After asking, I thought a bit about it, and came to the same conclusion you did (I know, I should have thought before asking).

        Of course, you added some extra details I find really interesting. I hope this is useful for more people as well.

  • blulin says:

    For instance, ARM does not have any shift right or left instruction. You just use the mov instruction.

    Looks like shift instructions are available.
    Reference : http://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os/armv6.html

    • rferrer says:

      Hi blulin,

      no, lsl is not an ARMv6 instruction but a mnemonic implemented by the assembler. It looks like an instruction but it is just another instruction in disguise.

      You can see that by doing the following:

      1. create a test.s file that contains

        .text
        main:
          lsl r0, #3
          mov r0, r0, LSL #3

      2. compile it with gcc -c test.s, this will create a file test.o
      3. disassemble it with objdump -d test.o, you will see the following two instructions

        00000000 <main>:
           0: e1a00180  lsl r0, r0, #3
           4: e1a00180  lsl r0, r0, #3

        note that they are the same instruction. objdump recognizes the pattern and shows the mnemonic instead.

      Kind regards,

      • blulin says:

        Thanks for the explanation. I was going through the armv6 architecture manual and wondering why it’s not listed there as an instruction, if it.Now this explains why.

        And I almost forgot, thanks for writing up this great series. Thoroughly enjoying this series and began programming on my old samsumg galaxy 3 armv6 android phone running debian gnu/linux 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *