In the previous installment we discussed a bit how to generate code using the vector feature of the CPU of the Raspberry Pi 1.
Let’s start hacking LLVM.
One way to understand registers in LLVM is a set of storage resources that we can group in register classes. Those register classes can then be mentioned as register operands of instructions.
The register information for the
ARM backend (the one used for 32-bit Arm
CPUs, currently known as the AArch32 execution state of the Arm architecture)
is found in
This is a tablegen file. Tablegen is a domain specific language to generate records called definitions. Each definition is an instance of a class and classes define the attributes that a definition will contain. A tablegen file is then processed by one or more backends commonly to generate C++ code. This tablegen-generated C++ code is compiled along with the rest of the C++ code that makes up LLVM. This way it is relatively quick to update parts of the compiler without having to express them directly in C++.
Registers in LLVM are specified using definitions of class
backends have to specialise this class, so the Arm backend uses a class called
ARMFReg for floating point registers
The single precision floating point registers (
s<n>) are defined like this.
The double precision registers (
d<n>) are defined as registers
that include two single precision registers in it. This is achieved by
first declaring what is called a subregister index.
Now the registers can be defined by telling LLVM that they have two subregister
indices and then linking each subregister index to the corresponding
Ok so we can use a similar strategy for our vector registers. Let’s define first a couple of new subregister indices. For now let’s focus on double precision.
The first argument to
SubRegIndex is the size of the register. Because we are
defining vectors of double precision, this will be 64 bit. The second operand
represents the offset within the register. In contrast to
d<n> registers that
do include two consecutive registers, VFP vectors may include non-consecutive
registers due to the wraparound within a vector bank (recall
(d7, d4)). So we
-1 to represent that this is not a physical subregister but a
Now we can use tablegen looping features to define the pairs of registers.
This is a bit difficult to read.
base represents the
d<n> that begins a
offset represents how many elements there
are within each bank. These two loops execute and will be generating definitions.
Because of the
defset directive enclosing everything, those definitions will
also be referenced in a list called
So we compute first
base + offset and we name this
Then we compute
mnext as the logical next one but making sure we wrap around
(we achieve this using
!and(..., 0x3) as we have to compute mod 4).
Now that we have
mnext we can define the pair itself. The definition
will be named
D8_D9x2, …) this name is arbitrary but should be a valid C++
identifier because one of the tablegen backends will define enumerators for
In order to generate the register we use a specialised class called
VFPRegistersWithSubregs which is just a convenience for this task.
If you check above how we use this class, the first argument is the encoding
register. We will always use the first register of the group for the encoding
(however you will see that eventually we won’t be using this). We are naming
d<n>x2 in the assembly. We will not use them and in fact we
should forbid those names in the assembler that LLVM will generate for the ARM
backend, but for simplicity we will ignore this. Finally see how we link
the current definition to each
Now we have the registers defined. Those are the resources. Those resources
can be used in instructions via register classes, which are the sets of useable
registers in instructions. Due to the way we have designed the registers
all of them will be usable in a register class for vectors of doubles. We can
simply use the list
DPRx2Regs that we built using
The second operand is the list of machine types that we can represent with
this register. In this case
v2f64 is equivalent to
<2 x double> in LLVM IR.
Machine types are fixed set of types that backends can use (i.e. LLVM IR has
types that machine types do not represent) and are somehow associated to the
physical types of CPUs. The third operand is the alignment, in bits, used
when loading or storing a register from memory. Due to the way we are going
to load them, they can be aligned to 8 bytes (64 bit).
And that’s it. We can do the same for single precision. This time sizes
are 32 and each register will contain 4 subregisters. The type of the
registers will be
In the next chapter we will talk about what changes we have to do to be able
fpscr so we can change the
len field with confidence.