In the previous installment we discussed a bit how to generate code using the vector feature of the CPU of the Raspberry Pi 1.

Let’s start hacking LLVM.

Registers

One way to understand registers in LLVM is a set of storage resources that we can group in register classes. Those register classes can then be mentioned as register operands of instructions.

The register information for the ARM backend (the one used for 32-bit Arm CPUs, currently known as the AArch32 execution state of the Arm architecture) is found in llvm/lib/Target/ARM/ARMRegisterInfo.td.

This is a tablegen file. Tablegen is a domain specific language to generate records called definitions. Each definition is an instance of a class and classes define the attributes that a definition will contain. A tablegen file is then processed by one or more backends commonly to generate C++ code. This tablegen-generated C++ code is compiled along with the rest of the C++ code that makes up LLVM. This way it is relatively quick to update parts of the compiler without having to express them directly in C++.

Registers in LLVM are specified using definitions of class Register. Most backends have to specialise this class, so the Arm backend uses a class called ARMFReg for floating point registers

class ARMFReg<bits<16> Enc, string n> : Register<n> {
  let HWEncoding = Enc;
  let Namespace = "ARM";
}

The single precision floating point registers (s<n>) are defined like this.

def S0  : ARMFReg< 0, "s0">;  def S1  : ARMFReg< 1, "s1">;
def S2  : ARMFReg< 2, "s2">;  def S3  : ARMFReg< 3, "s3">;
def S4  : ARMFReg< 4, "s4">;  def S5  : ARMFReg< 5, "s5">;
def S6  : ARMFReg< 6, "s6">;  def S7  : ARMFReg< 7, "s7">;
def S8  : ARMFReg< 8, "s8">;  def S9  : ARMFReg< 9, "s9">;
def S10 : ARMFReg<10, "s10">; def S11 : ARMFReg<11, "s11">;
def S12 : ARMFReg<12, "s12">; def S13 : ARMFReg<13, "s13">;
def S14 : ARMFReg<14, "s14">; def S15 : ARMFReg<15, "s15">;
def S16 : ARMFReg<16, "s16">; def S17 : ARMFReg<17, "s17">;
def S18 : ARMFReg<18, "s18">; def S19 : ARMFReg<19, "s19">;
def S20 : ARMFReg<20, "s20">; def S21 : ARMFReg<21, "s21">;
def S22 : ARMFReg<22, "s22">; def S23 : ARMFReg<23, "s23">;
def S24 : ARMFReg<24, "s24">; def S25 : ARMFReg<25, "s25">;
def S26 : ARMFReg<26, "s26">; def S27 : ARMFReg<27, "s27">;
def S28 : ARMFReg<28, "s28">; def S29 : ARMFReg<29, "s29">;
def S30 : ARMFReg<30, "s30">; def S31 : ARMFReg<31, "s31">;

The double precision registers (d<n>) are defined as registers that include two single precision registers in it. This is achieved by first declaring what is called a subregister index.

def ssub_0  : SubRegIndex<32>;
def ssub_1  : SubRegIndex<32, 32>;

Now the registers can be defined by telling LLVM that they have two subregister indices and then linking each subregister index to the corresponding s<n> and s<n+1> registers.

// Aliases of the F* registers used to hold 64-bit fp values (doubles)
let SubRegIndices = [ssub_0, ssub_1] in {
def D0  : ARMReg< 0,  "d0", [S0,   S1]>, DwarfRegNum<[256]>;
def D1  : ARMReg< 1,  "d1", [S2,   S3]>, DwarfRegNum<[257]>;
def D2  : ARMReg< 2,  "d2", [S4,   S5]>, DwarfRegNum<[258]>;
def D3  : ARMReg< 3,  "d3", [S6,   S7]>, DwarfRegNum<[259]>;
def D4  : ARMReg< 4,  "d4", [S8,   S9]>, DwarfRegNum<[260]>;
def D5  : ARMReg< 5,  "d5", [S10, S11]>, DwarfRegNum<[261]>;
def D6  : ARMReg< 6,  "d6", [S12, S13]>, DwarfRegNum<[262]>;
def D7  : ARMReg< 7,  "d7", [S14, S15]>, DwarfRegNum<[263]>;
def D8  : ARMReg< 8,  "d8", [S16, S17]>, DwarfRegNum<[264]>;
def D9  : ARMReg< 9,  "d9", [S18, S19]>, DwarfRegNum<[265]>;
def D10 : ARMReg<10, "d10", [S20, S21]>, DwarfRegNum<[266]>;
def D11 : ARMReg<11, "d11", [S22, S23]>, DwarfRegNum<[267]>;
def D12 : ARMReg<12, "d12", [S24, S25]>, DwarfRegNum<[268]>;
def D13 : ARMReg<13, "d13", [S26, S27]>, DwarfRegNum<[269]>;
def D14 : ARMReg<14, "d14", [S28, S29]>, DwarfRegNum<[270]>;
def D15 : ARMReg<15, "d15", [S30, S31]>, DwarfRegNum<[271]>;
}

Ok so we can use a similar strategy for our vector registers. Let’s define first a couple of new subregister indices. For now let’s focus on double precision.

def dsub_len2_0: SubRegIndex<64, -1>;
def dsub_len2_1: SubRegIndex<64, -1>;

The first argument to SubRegIndex is the size of the register. Because we are defining vectors of double precision, this will be 64 bit. The second operand represents the offset within the register. In contrast to d<n> registers that do include two consecutive registers, VFP vectors may include non-consecutive registers due to the wraparound within a vector bank (recall (d7, d4)). So we specify -1 to represent that this is not a physical subregister but a logical one.

Now we can use tablegen looping features to define the pairs of registers.

// Double precision pairs
defset list<Register> DPRx2Regs = {
foreach base = [4, 8, 12] in {
    foreach offset = [0, 1, 2, 3] in {
        defvar m = !add(base, offset);
        defvar mnext = !add(base, !and(!add(offset, 1), 0x3));
        let SubRegIndices = [dsub_len2_0, dsub_len2_1] in {
            def "D" # m # "_D" # mnext # "x2" :
                VFPRegistersWithSubregs<
                    !cast<Register>("D" # m),
                    "d" # m # "x2",
                    [!cast<Register>("D" # m), !cast<Register>("D" # mnext)],
                    ["d" # m # "x2"]>;
        }
    }
}
}

This is a bit difficult to read. base represents the d<n> that begins a vector bank: d4, d8 and d12. offset represents how many elements there are within each bank. These two loops execute and will be generating definitions. Because of the defset directive enclosing everything, those definitions will also be referenced in a list called DPRx2Regs.

So we compute first base + offset and we name this m. Then we compute mnext as the logical next one but making sure we wrap around (we achieve this using !and(..., 0x3) as we have to compute mod 4).

Now that we have m and mnext we can define the pair itself. The definition will be named D<m>_D<mnext>x2 (e.g. D4_D5x2, D5_D6x2, D6_D7x2, D7_D4x2, D8_D9x2, …) this name is arbitrary but should be a valid C++ identifier because one of the tablegen backends will define enumerators for those registers.

In order to generate the register we use a specialised class called VFPRegistersWithSubregs which is just a convenience for this task.

class VFPRegistersWithSubregs<Register EncReg, string n, list<Register> subregs,
                          list<string> alt = []>
      : RegisterWithSubRegs<n, subregs> {
  let HWEncoding = EncReg.HWEncoding;
  let AltNames = alt;
  let Namespace = "ARM";
}

If you check above how we use this class, the first argument is the encoding register. We will always use the first register of the group for the encoding (however you will see that eventually we won’t be using this). We are naming those registers d<n>x2 in the assembly. We will not use them and in fact we should forbid those names in the assembler that LLVM will generate for the ARM backend, but for simplicity we will ignore this. Finally see how we link the current definition to each d<m> and d<mnext>.

Now we have the registers defined. Those are the resources. Those resources can be used in instructions via register classes, which are the sets of useable registers in instructions. Due to the way we have designed the registers all of them will be usable in a register class for vectors of doubles. We can simply use the list DPRx2Regs that we built using defset above.

def DPRx2 : RegisterClass<"ARM", [v2f64], 64, (add DPRx2Regs)>;

The second operand is the list of machine types that we can represent with this register. In this case v2f64 is equivalent to <2 x double> in LLVM IR. Machine types are fixed set of types that backends can use (i.e. LLVM IR has types that machine types do not represent) and are somehow associated to the physical types of CPUs. The third operand is the alignment, in bits, used when loading or storing a register from memory. Due to the way we are going to load them, they can be aligned to 8 bytes (64 bit).

And that’s it. We can do the same for single precision. This time sizes are 32 and each register will contain 4 subregisters. The type of the registers will be v4f32.

def ssub_len4_0: SubRegIndex<32, -1>;
def ssub_len4_1: SubRegIndex<32, -1>;
def ssub_len4_2: SubRegIndex<32, -1>;
def ssub_len4_3: SubRegIndex<32, -1>;

// Single precision quads
defset list<Register> SPRx4Regs = {
foreach base = [8, 16, 24] in {
    foreach offset = [0, 1, 2, 3, 4, 5, 6, 7] in {
        defvar m = !add(base, offset);
        defvar mnext1 = !add(base, !and(!add(offset, 1), 0x7));
        defvar mnext2 = !add(base, !and(!add(offset, 2), 0x7));
        defvar mnext3 = !add(base, !and(!add(offset, 3), 0x7));
        let SubRegIndices = [ssub_len4_0, ssub_len4_1, ssub_len4_2, ssub_len4_3]
        in {
            def "S" # m # "_S" # mnext1 # "_S" # mnext2 # "_S" # mnext3 # "x4" :
                VFPRegistersWithSubregs<
                    !cast<Register>("S" # m),
                    "s" # m # "x4",
                    [!cast<Register>("S" # m),
                     !cast<Register>("S" # mnext1),
                     !cast<Register>("S" # mnext2),
                     !cast<Register>("S" # mnext3)],
                    ["s" # m # "x4"]>;
        }
    }
}
}
def SPRx4 : RegisterClass<"ARM", [v4f32], 32, (add SPRx4Regs)>;

In the next chapter we will talk about what changes we have to do to be able to track fpscr so we can change the len field with confidence.