Most compiler infrastructures that target register machines do it by using the concept of virtual registers. In their intermediate representations instructions use virtual registers to represent their operands.

Because hardware is finite, these virtual registers must be mapped to physical registers at some point. The compiler does this in a process called register allocation.

Being physical registers finite, it may happen that not all the virtual registers used by the program can be held in physical registers at the same time. When this happens, the compiler must emit spill code. Spill code stores a value in a memory (spill) and loads it later, often close to the point of use (reload).

The memory used for spill code is commonly the function stack. However nothing prevents us from using other kinds of “memories” as long as we can guarantee that nobody is going to use them. This is exactly the kind of experiment we will do today: we’re going to spill general-purpose registers into floating-point registers.

Some context first

Today’s experiment will be done using RISC-V and LLVM.

RISC-V

RISC-V is an open-source, RISC-style, ISA maintained by the RISC-V Foundation. One of its features is that it is very modular so the ISA has a number of standard extensions including those that provide floating point instructions and registers.

The base RISC-V ISA defines 32 integer registers. They are 32-bit in 32-bit versions of RISC-V and 64-bit in 64-bit versions of RISC-V. We will call them general-purpose (GPR) even if they can only operate integers or addresses. The F standard extension adds 32 floating point registers (FPR) of 32-bit. The D standard extension extends those registers to be 64-bit. This way, the F extension provides support for IEEE 754 Binary32 and the D extension provides support for IEEE 754 Binary64.

LLVM

LLVM is an umbrella project for compilers and other related tools hosted by the LLVM Foundation. LLVM has a backend for RISC-V that is still pretty hackable for experiments.

Related work

This is not a new idea, of course. The paper Exploiting idle register classes for fast spill destination explored it already. The reported results seem promising (ranging from 1.7% to 10%) but are ultimately predicated to being able to do moves between different register banks with reasonable latency. This is not always the case in all architectures. However that paper is from 2008 so some of the results may need to be reevaluated with current architectures.

Spill code in LLVM

This is not obvious, but there are at least two reasons why we may need to spill (i.e. store) and reload (i.e. load) the value of a register. The first one we already saw it: register allocation.

However, there is a second reason: callee-saved registers. Application Binary Interfaces specify how functions can use registers. One of the things they specify is whether the contents of a register is preserved across function calls or not. If the value held in a register is preserved across function calls then either who does the call (the caller) or who is called (the callee) are responsible for preserving it. Thus a register is either caller-save or callee-save, respectively.

A way to simplify a bit all this is to assume that if a register is not callee-save then it is likely (though not necessarily) to be caller-save, which means it is up to the caller to preserve the value of the register across a function call, in case the value needs to be preserved.

These two kind of spill code are emitted in two different moments in the compilation pipeline of LLVM. Register allocation is executed earlier. Later on, a process called Prologue / Epilogue Emitter is the responsible of emitting the spills for the callee-saved registers.

Current status

Consider the following C code. This is the accumulating part of a naive 64x64 integer matrix multiplication. Nothing special in it other than we request clang to unroll it 16 times. Unrolling is used here as an easy way to increase register pressure (the number of register needed at the same time) so we force the compiler to spill values. Even in an architecture whith a large number of registers like RISC-V, spilling (caused by register allocation) may be unavoidable.

example.c

1
2
3
4
5
6
7
8
9
enum { N = 64 };

void f(int (*a)[N], int (*b)[N], int (*c)[N]) {
  for (int i = 0; i < N; i++)
#pragma clang loop unroll_count(16)
    for (int j = 0; j < N; j++)
      for (int k = 0; k < N; k++)
        c[i][j] += a[i][k] * b[k][j];
}

I’m going to use riscv64-unknown-linux-gnu as a handy example here of a RISC-V 64-bit architecture that has both F and D extensions. We can generate the assembly output like this.

$ clang --target=riscv64-unknown-linux-gnu -S -O2 -o example.s example.c

To store a GPR to a memory location in RISC-V 64-bit we use the sd instruction. Stack locations in this function are easy to spot because they are addresses based on the sp (stack pointer) register.

If we examine the assembly listing for this function, we first see a bunch of stores to the stack. Those are the spills caused by the callee-saved registers.

example.s

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
f:                                      # @f
        addi    sp, sp, -256    # Grow the stack
        sd      ra, 248(sp)
        sd      s0, 240(sp)
        sd      s1, 232(sp)
        sd      s2, 224(sp)
        sd      s3, 216(sp)
        sd      s4, 208(sp)
        sd      s5, 200(sp)
        sd      s6, 192(sp)
        sd      s7, 184(sp)
        sd      s8, 176(sp)
        sd      s9, 168(sp)
        sd      s10, 160(sp)
        sd      s11, 152(sp)
        sd      a2, 128(sp)
        sd      zero, 136(sp)

This is followed by a bunch of values that compute addresses for the b matrix that are also spilled. Note that the spill here is just the sd instruction. The addi was part of the original program.

example.s

24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
        sd      a2, 128(sp)
        sd      zero, 136(sp)
        addi    a2, a1, 4
        sd      a2, 112(sp)
        addi    a2, a1, 8
        sd      a2, 104(sp)
        addi    a2, a1, 12
        sd      a2, 96(sp)
        addi    a2, a1, 16
        sd      a2, 88(sp)
        addi    a2, a1, 20
        sd      a2, 80(sp)
        addi    a2, a1, 24
        sd      a2, 72(sp)
        addi    a2, a1, 28
        sd      a2, 64(sp)
        addi    a2, a1, 32
        sd      a2, 56(sp)
        addi    a2, a1, 36
        sd      a2, 48(sp)
        addi    a2, a1, 40
        sd      a2, 40(sp)
        addi    a2, a1, 44
        sd      a2, 32(sp)
        addi    a2, a1, 48
        sd      a2, 24(sp)
        addi    a2, a1, 52
        sd      a2, 16(sp)
        addi    a2, a1, 56
        sd      a2, 8(sp)
        sd      a1, 120(sp)
        addi    a1, a1, 60
        sd      a1, 0(sp)

To load a GPR from a memory location in RISC-V 64-bit we use the ld instruction.

When the outermost loop of the matrix multiplication starts, it reloads a bunch of things that we just spilled. If this seems pointless to you, note that this is a loop so it is going to be pointless only in the first iteration (i.e. these registers will be reused in the loop so we need to reload their value). The ld instruction is used to load a GPR from a memory location.

example.s

77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
        mv      s10, zero
        ld      a4, 0(sp)
        ld      s11, 8(sp)
        ld      ra, 16(sp)
        ld      s6, 24(sp)
        ld      s7, 32(sp)
        ld      s2, 40(sp)
        ld      t6, 48(sp)
        ld      t5, 56(sp)
        ld      t4, 64(sp)
        ld      t3, 72(sp)
        ld      t2, 80(sp)
        ld      t1, 88(sp)
        ld      t0, 96(sp)
        ld      a7, 104(sp)
        ld      a6, 112(sp)
        ld      s8, 120(sp)

If we go to the end of the function, we see the reloads of the callee-saved registers, right before returning.

example.s

467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
        ld      s11, 152(sp)
        ld      s10, 160(sp)
        ld      s9, 168(sp)
        ld      s8, 176(sp)
        ld      s7, 184(sp)
        ld      s6, 192(sp)
        ld      s5, 200(sp)
        ld      s4, 208(sp)
        ld      s3, 216(sp)
        ld      s2, 224(sp)
        ld      s1, 232(sp)
        ld      s0, 240(sp)
        ld      ra, 248(sp)
        addi    sp, sp, 256
        ret

What do we want to do?

The F and D extensions provide us with 32 floating point registers. However for leaf functions (functions that do not call other functions) like this one, which need to spill a bunch of GPRs, it could be beneficial to be able to spill those values onto those floating point registers, if possible.

Why are we restricting ourselves to leaf functions? We don't have, but in this post we're using a very naive approach that cannot support more advanced use cases. In a more advance model we would link a spill with all of its reloads. It may happen that the program never crosses a function call from the spill to any of its reloads. In this case we could allow functions having function calls. Also this advanced model might allow us to be more precise when gathering the available floating point registers (it might happen a FPR is used in the function but not in the part involving a spill and all its reloads).

Strategy

As I mentioned I’m going to use an extremely simplistic approach here. More complex approaches are possible but are going to require more infrastructure. We are not doing that today.

After “Prologue Epilogue Emitter” has spilled all the callee-saved registers it invokes a target-specific hook. We will run some extra code in that hook.

Check if this function is a leaf, otherwise bail out.
Check if we’re compiling with F (for RISC-V 32-bit) or D (for RISC-V 64-bit). If this is not the case, bail out.
Determine all the used registers by the function. From that information compute how many of FPRs are available. If none is available, bail out.
Now for each instruction that is a store/load to/from the stack, this is a spill/reload, find a FPR for it. If one is still available, map the frame index (I explain later what a frame index is) to the FPR, emit the proper move from/to GPR to/from FPR for the spill/reload and mark the frame index as dead. Otherwise just ignore this frame index. FPRs are assigned as we find spill/reloads instructions in the function.

Implementation

I will call this process “soften spills”: we are still morally spilling but because we are not hitting the memory system it could be a cheaper operation. I’m sure better names exist but I went with this one.

Entry point

The target specific that “Prologue/Epilogue Emitter” right after it has emitted the spills and reloads of callee-saved registers is called processFunctionBeforeFrameFinalized. For the RISC-V backend this function is in llvm/lib/Target/RISCV/RISCVFrameLowering.cpp. To keep things a bit tidy I added the call here but I implemented the algorithm in another file.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp

569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
void RISCVFrameLowering::processFunctionBeforeFrameFinalized(
    MachineFunction &MF, RegScavenger *RS) const {
  const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
  MachineFrameInfo &MFI = MF.getFrameInfo();
  const TargetRegisterClass *RC = &RISCV::GPRRegClass;
  // estimateStackSize has been observed to under-estimate the final stack
  // size, so give ourselves wiggle-room by checking for stack size
  // representable an 11-bit signed field rather than 12-bits.
  // FIXME: It may be possible to craft a function with a small stack that
  // still needs an emergency spill slot for branch relaxation. This case
  // would currently be missed.
  if (!isInt<11>(MFI.estimateStackSize(MF))) {
    int RegScavFI = MFI.CreateStackObject(
        RegInfo->getSpillSize(*RC), RegInfo->getSpillAlignment(*RC), false);
    RS->addScavengingFrameIndex(RegScavFI);
  }

  // If we want to soften spills, we do it now.
  RISCVSoftenXSpillsReload(&MF);
}

This is a convenient place to do the spill softening because here we will be able to see the spills and reloads emitted by the register allocator and also the spills and reloads emitted by “Prologue/Epilog Emitter”. Earlier than that we would be missing the latter kind of spills and reloads. And later than that, the stack layout of the function would have already been generated so while we could remove the spills, the storage for them would have already been accounted. In fact we will see later that changing the frame information here is kind of unexpected and we will have to amend a few bits.

Prolegomena

I implemented this in another file so we will need some boilerplate before we can continue.

Following is the main implementation file.

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
//===----- RISCVSoftenSpills.cpp - Soften Spills using FPR registers ------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//

#include "RISCV.h"
#include "RISCVTargetMachine.h"
#include "llvm/Support/Debug.h"
#include "llvm/ADT/DenseMap.h"
using namespace llvm;

#define DEBUG_TYPE "riscv-soften-spills"

static cl::opt<bool> EnableSoftenSpills(
    "riscv-soften-spills",
    cl::desc("Enable softening spills using FPR registers when available"),
    cl::init(false), cl::Hidden);

bool llvm::RISCVSoftenXSpillsReload(MachineFunction *MF) {
  if (!EnableSoftenSpills)
    return false;

  // .. rest of the code here ...
}

I added a command line option, so we can manually enable the softening from clang using -mllvm -riscv-soften-spills. The macro DEBUG_TYPE is needed when we later on use LLVM_DEBUG, this is used to filter debug messages via -mllvm -debug-only=riscv-soften-spills.

This function will return true if it changed something, otherwise it will return false. We do not use this value but a caller might be interested to know if we actually changed something.

This function is defined in the llvm namespace for simplicity. So we need a declaration in that namespace first. We can add one in RISCV.h. Note: There are better ways to organise this code, this one is just simple and effective.

llvm/lib/Target/RISCV/RISCV.h

50
51
52
53
// FIXME - Move this to a better place.
bool RISCVSoftenXSpillsReload(MachineFunction *MF);

} // namespace llvm

Also we need to let know cmake about this new file.

llvm/lib/Target/RISCV/CMakeLists.txt

19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
add_llvm_target(RISCVCodeGen
  RISCVAsmPrinter.cpp
  RISCVCallLowering.cpp
  RISCVExpandPseudoInsts.cpp
  RISCVFrameLowering.cpp
  RISCVInstrInfo.cpp
  RISCVInstructionSelector.cpp
  RISCVISelDAGToDAG.cpp
  RISCVISelLowering.cpp
  RISCVLegalizerInfo.cpp
  RISCVMCInstLower.cpp
  RISCVMergeBaseOffset.cpp
  RISCVRegisterBankInfo.cpp
  RISCVRegisterInfo.cpp
  RISCVSubtarget.cpp
  RISCVTargetMachine.cpp
  RISCVTargetObjectFile.cpp
  RISCVTargetTransformInfo.cpp

  RISCVSoftenSpills.cpp
  )

Early bail-outs

Before we continue we need to gather some information. We are passing the MachineFunction which is the object that represents the whole function in the code generation phase of LLVM. We get two kinds of objects, those that are MachineSomething concern to the current function being compiled. Those that are TargetSomething are backend-specific information not necessarily function-specific (i.e. they might be shared between functions).

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

26
27
28
29
30
31
32
33
  // Gather some information that we will need.
  MachineFrameInfo *MFI = &MF->getFrameInfo();
  const MachineRegisterInfo *MRI = &MF->getRegInfo();

  const RISCVSubtarget *Subtarget = &MF->getSubtarget<RISCVSubtarget>();
  const RISCVInstrInfo *TII =
      static_cast<const RISCVInstrInfo *>(Subtarget->getInstrInfo());
  const TargetRegisterInfo *TRI = Subtarget->getRegisterInfo();

The MachineFrameInfo (MFI) object concerns about objects in the stack for the current function. LLVM uses a stack abstraction called the frame indexes. Each frame index is an integer for which we can associate information (like size and alignment). Those indexes are later on used to compute the size of the elements required.

The MachineRegisterInfo (MRI) deals about the specific register information used by the function. This is more useful when the MachineFunction was in SSA form before Register Allocation. At this point is useful to know what registers are callee-saved, something we will want to use later.

The Subtarget is compilation-specific information for the current function. For instance it allows us to know if we are compiling with support for the F and D RISC-V standard extensions or whether we are compiling for 64-bit.

The TargetInstructionInfo (TII) gives us access to the instructions of this target (RISC-V in our case) so we can create new instructions. We will need this when replacing the spills/reloads with moves.

Finally the TargetRegisterInfo (TRI) gives us access to the register of this target. We will need this to enumerate the FPR registers.

A first check we can do now is see if this function has calls. MFI knows that.

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

35
36
37
38
39
  // If we have calls, for now do nothing.
  // There are still opportunities here if the pair spill/reload doesn't cross
  // function calls but they will require a more sophisticated model.
  if (MFI->hasCalls())
    return false;

Also if we are not compiling for F or D, we won’t be able to use the instruction we need, so check this as well.

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

42
43
44
45
46
47
48
  // If we are RV64 but we don't have D, give up.
  if (Subtarget->is64Bit() && !Subtarget->hasStdExtD())
    return false;

  // If we are RV32 but we don't have F, give up.
  if (!Subtarget->hasStdExtF())
    return false;

Gather all the registers used

Because we use a very simplistic approach, we want to know, globally for the whole function, what FPRs are available. For that we will iterate all the instructions and use the class LiveRegUnits.

LLVM uses a relatively flexible concept of registers. They can be virtual or physical. If they are physical their storage may be shared with other registers.

For instance the RISC-V backend in LLVM models floating point registers of the F extension as subregisters of the floating point registers of the D extension. This model is sensible because this is what the spec says: a RISC-V system with the D extension represents a register of the F extension in the lowest 32-bits of the floating point register. In that sense a floating point register such as f3 is modelled in LLVM with two registers: f3_f and f3_d, for F and D extensions respectively. LLVM must know that changing f3_f will change f3_d. This what register units are for.

The class LiveRegUnits allows us to accumulate the register units used by instructions. So we iterate for each basic block and then for each instruction.

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

51
52
53
54
55
56
57
58
59
60
61
  // Flow-insensitive analysis in which we identify FPR32/FPR64 that
  // are not used at all.
  // There are further opportunities as the static path for a spill and all its
  // reloads might have free FPR registers. However our model is very simple so
  // we can't represent these.
  LiveRegUnits LRU(*TRI);
  for (auto &MBB : *MF) {
    for (auto &MI : MBB) {
      LRU.accumulate(MI);
    }
  }

Callee-saved are of no interest

I we use callee-saved FPRs, then we will not have achieved anything. The reason is that if we modify a callee-saved register we need to preserve it. And if we need to preserve them we need to spill them. Clearly we need to make sure we don’t consider callee-saved registers.

MRI can tell us which are the callee-saved registers of the function. So we make a handy function for that.

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

63
64
65
66
67
68
69
  const MCPhysReg *CalleeSavedRegs = MRI->getCalleeSavedRegs();
  auto IsCalleeSaved = [&](MCPhysReg Reg) {
    for (const MCPhysReg *R = CalleeSavedRegs; *R; R++)
      if (*R == Reg)
        return true;
    return false;
  };

Now we can filter all the FPRs (both for F, FPR32 and D, FPR64) registers. If they are available but they are not callee-saved, they are candidates for our spills. If no register is available, we just bail out.

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

71
72
73
74
75
76
77
  BitVector RegsAvailableFPR(TRI->getNumRegs());
  const TargetRegisterClass &FPRRegClass =
      Subtarget->is64Bit() ? RISCV::FPR64RegClass : RISCV::FPR32RegClass;
  for (MCPhysReg PhysReg : FPRRegClass.getRegisters()) {
    if (LRU.available(PhysReg) && !IsCalleeSaved(PhysReg))
      RegsAvailableFPR.set(PhysReg);
  }

We use the bitvector RegsAvailableFPR to represent whether a FPR register is available or not. Registers are identified by numbers (up to a maximum of TRI->getNumRegs(), which depends on the target). If a register is available, its related position in the corresponding bitvector will be set.

Find the spills and reloads

A spill and a reload are no different to a store and a load. So we need a way to identify spills and reloads among other general load and store instructions. Luckily LLVM backends have to implement two functions that precisely answers this question. These two functions are in TII, who knows about the instructions of the target.

Let’s make first a convenient function.

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
  // Helper used to identify spills and reloads.
  auto IsSpillReload = [&](MachineInstr &MI) {
    int FI = 0;
    bool Result = false;
    bool IsSpill = false;
    switch (MI.getOpcode()) {
    default:
      break;
    case RISCV::LD:
    case RISCV::LW:
      if (TII->isLoadFromStackSlot(MI, FI))
        Result = true;
      break;
    case RISCV::SD:
    case RISCV::SW:
      if (TII->isStoreToStackSlot(MI, FI)) {
        Result = true;
        IsSpill = true;
      }
      break;
    }
    return std::make_tuple(Result, FI, IsSpill);
  };

Given a MachineInstruction we check if this is one of the spills and reloads we care about. We only care about ld/sd (or lw/sw in 32-bit). There are other stores and loads that can write the stack (like flw/fsw) so we have to filter them before we query isLoadFromStackSlot and isStoreToStackSlot. These two functions also give us the frame index (FI) that these instructions are using.

Frame indexes to registers

Now we can start mapping frame indexes to registers. To do that we define a mapping from frame indexes (represented as unsigned) and registers.

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

107
108
  using FrameIndexToFPRTy = DenseMap<unsigned, Register>;
  FrameIndexToFPRTy FrameIndexToFPR;

If a frame index cannot be mapped to a register (e.g. we ran out of them) then we will map it to the special RISCV::NoRegister value. Otherwise it will be mapped to one of the available registers computed above in RegsAvailableFPR.

But before we start mapping them, we need to take care of a detail: if a frame index is used in an instruction other than one of the instructions we care about, we should conservatively leave them alone. A way to achieve this is to map them first to RISCV::NoRegister. We can use the helper IsSpillReload we defined above to filter them.

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
  // Check frame indexes in other instructions and assign them to NoRegister
  // to avoid replacing them.
  for (auto &MBB : *MF) {
    for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();
         MBBI != E; MBBI++) {
      MachineInstr &MI = *MBBI;

      bool SpillOrReload;
      int FI;
      bool IsSpill;
      std::tie(SpillOrReload, FI, IsSpill) = IsSpillReload(MI);

      if (SpillOrReload)
        continue;

      for (MachineOperand &MO : MI.operands()) {
        if (MO.isFI())
          FrameIndexToFPR[MO.getIndex()] = RISCV::NoRegister;
      }
    }
  }

If the instruction is a spill or reload, we skip it. If it is not then we check if one of its operand is a frame index (MO.isFI()). If it is, we preemptively map that frame index (MO.getIndex()) to RISCV::NoRegister.

Now we can try to map the frame indexes into FPRs so we can move them to/from FPRs. First we filter all the instructions that are not spills or reloads of our interest.

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
  // Now replace the spills and reloads.
  for (auto &MBB : *MF) {
    MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();
    while (MBBI != E) {
      MachineBasicBlock::iterator NMBBI = std::next(MBBI);

      MachineInstr &MI = *MBBI;
      MBBI = NMBBI;

      bool SpillOrReload;
      int FI;
      bool IsSpill;
      std::tie(SpillOrReload, FI, IsSpill) = IsSpillReload(MI);

      if (!SpillOrReload)
        continue;

Note also the way we need to iterate through the instructions: we are going to replace instructions as we find them so this loop should be resilient to changing the list of instructions of the basic block.

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
      LLVM_DEBUG(llvm::dbgs() << "Found ");
      LLVM_DEBUG(MI.print(llvm::dbgs()));

      FrameIndexToFPRTy::iterator ItR;
      bool FINotFound;
      std::tie(ItR, FINotFound) =
          FrameIndexToFPR.insert(std::make_pair(FI, RISCV::NoRegister));
      if (FINotFound) {
        // Try to find a suitable free FPR.
        LLVM_DEBUG(llvm::dbgs()
                   << "Trying to find a free FPR for index " << FI << "\n");
        Register R = RISCV::NoRegister;
        int Idx = RegsAvailableFPR.find_first();
        if (Idx > 0) {
            RegsAvailableFPR.reset(Idx);
            R = Idx;
        }
        ItR->second = R;
        if (R != RISCV::NoRegister) {
          MFI->RemoveStackObject(FI);
        }
      }

We found a spill or reload. We check first if its frame index had already been mapped. If it hadn’t been mapped (as stated by FINotFound) we try to find a register in the bitvectors we computed above. If we find one Idx > 0 then we mark it as used and we make the current frame index map to it (line 164). If we actually mapped this frame index, we need to make sure it is not emitted as a stack object, so we remove it from the MachineFrameInfo (line 168).

Now we can make the mapping effective.

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

172
173
174
175
176
177
      Register R = ItR->second;
      if (R == RISCV::NoRegister) {
        LLVM_DEBUG(llvm::dbgs()
                   << "No register is available for index " << FI << "\n\n");
        continue;
      }

If it was not mapped to anything, just ignore this instruction. Otherwise we can do the map. Basically we need to create a fmv.d.x (or fmv.w.x) instruction for spills and fmv.x.d (or fmv.x.w) for reloads.

llvm/lib/Target/RISCV/RISCVSoftenSpills.cpp

182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
      // Ok so we found a suitable FPR, let's use that one.
      if (IsSpill) {
        unsigned Opcode =
            Subtarget->is64Bit() ? RISCV::FMV_D_X : RISCV::FMV_W_X;
        MachineInstr &NewMI =
            *BuildMI(MBB, MI, MI.getDebugLoc(), TII->get(Opcode), R)
                 .addReg(MI.getOperand(0).getReg());

        LLVM_DEBUG(llvm::dbgs() << "Replacing with ");
        LLVM_DEBUG(NewMI.print(llvm::dbgs()));
        LLVM_DEBUG(llvm::dbgs() << "\n");
      } else {
        unsigned Opcode =
            Subtarget->is64Bit() ? RISCV::FMV_X_D : RISCV::FMV_X_W;
        MachineInstr &NewMI =
            *BuildMI(MBB, MI, MI.getDebugLoc(), TII->get(Opcode),
                     MI.getOperand(0).getReg())
                 .addReg(R);

        LLVM_DEBUG(llvm::dbgs() << "Replacing with ");
        LLVM_DEBUG(NewMI.print(llvm::dbgs()));
        LLVM_DEBUG(llvm::dbgs() << "\n");
      }

      MI.eraseFromParent();
      Changed = true;
    }
  }

  return Changed;
}

In line 186 (or line 196 for reloads) we create a new instruction using the helper BuildMI. For spills (line 187) we write into the register R, that we computed above. What we write is the first operand of this store, which is exactly the register being stored into memory.

So we go from something like

sd x10, <frame-index.1>, 0   # here 0 is the offset of the memory operand
                             # it is 0 because it was unknown when this spill
                             # was created

into something like

fmv.d.x f4, x10

In this example <frame-index.1> would be associated to the register f4.

Show me the code!

Find the pass here.

Wrap-up

This has been a very long post, so in the next one we will see what are the results of this experiment, along with other issues I found in the way.