Forgotten memories (1)
Most compiler infrastructures that target register machines do it by using the concept of virtual registers. In their intermediate representations instructions use virtual registers to represent their operands.
Because hardware is finite, these virtual registers must be mapped to physical registers at some point. The compiler does this in a process called register allocation.
Being physical registers finite, it may happen that not all the virtual registers used by the program can be held in physical registers at the same time. When this happens, the compiler must emit spill code. Spill code stores a value in a memory (spill) and loads it later, often close to the point of use (reload).
The memory used for spill code is commonly the function stack. However nothing prevents us from using other kinds of “memories” as long as we can guarantee that nobody is going to use them. This is exactly the kind of experiment we will do today: we’re going to spill general-purpose registers into floating-point registers.
Some context first
Today’s experiment will be done using RISC-V and LLVM.
RISC-V
RISC-V is an open-source, RISC-style, ISA maintained by the RISC-V Foundation. One of its features is that it is very modular so the ISA has a number of standard extensions including those that provide floating point instructions and registers.
The base RISC-V ISA defines 32 integer registers. They are 32-bit in 32-bit versions of RISC-V and 64-bit in 64-bit versions of RISC-V. We will call them general-purpose (GPR) even if they can only operate integers or addresses. The F standard extension adds 32 floating point registers (FPR) of 32-bit. The D standard extension extends those registers to be 64-bit. This way, the F extension provides support for IEEE 754 Binary32 and the D extension provides support for IEEE 754 Binary64.
LLVM
LLVM is an umbrella project for compilers and other related tools hosted by the LLVM Foundation. LLVM has a backend for RISC-V that is still pretty hackable for experiments.
Related work
This is not a new idea, of course. The paper Exploiting idle register classes for fast spill destination explored it already. The reported results seem promising (ranging from 1.7% to 10%) but are ultimately predicated to being able to do moves between different register banks with reasonable latency. This is not always the case in all architectures. However that paper is from 2008 so some of the results may need to be reevaluated with current architectures.
Spill code in LLVM
This is not obvious, but there are at least two reasons why we may need to spill (i.e. store) and reload (i.e. load) the value of a register. The first one we already saw it: register allocation.
However, there is a second reason: callee-saved registers. Application Binary Interfaces specify how functions can use registers. One of the things they specify is whether the contents of a register is preserved across function calls or not. If the value held in a register is preserved across function calls then either who does the call (the caller) or who is called (the callee) are responsible for preserving it. Thus a register is either caller-save or callee-save, respectively.
A way to simplify a bit all this is to assume that if a register is not callee-save then it is likely (though not necessarily) to be caller-save, which means it is up to the caller to preserve the value of the register across a function call, in case the value needs to be preserved.
These two kind of spill code are emitted in two different moments in the compilation pipeline of LLVM. Register allocation is executed earlier. Later on, a process called Prologue / Epilogue Emitter is the responsible of emitting the spills for the callee-saved registers.
Current status
Consider the following C code. This is the accumulating part of a naive 64x64 integer matrix multiplication. Nothing special in it other than we request clang to unroll it 16 times. Unrolling is used here as an easy way to increase register pressure (the number of register needed at the same time) so we force the compiler to spill values. Even in an architecture whith a large number of registers like RISC-V, spilling (caused by register allocation) may be unavoidable.
1
2
3
4
5
6
7
8
9
enum { N = 64 };
void f(int (*a)[N], int (*b)[N], int (*c)[N]) {
for (int i = 0; i < N; i++)
#pragma clang loop unroll_count(16)
for (int j = 0; j < N; j++)
for (int k = 0; k < N; k++)
c[i][j] += a[i][k] * b[k][j];
}
I’m going to use riscv64-unknown-linux-gnu
as a handy example here of a
RISC-V 64-bit architecture that has both F and D extensions. We can generate
the assembly output like this.
To store a GPR to a memory location in RISC-V 64-bit we use the sd
instruction. Stack locations in this function are easy to spot because they are
addresses based on the sp
(stack pointer) register.
If we examine the assembly listing for this function, we first see a bunch of stores to the stack. Those are the spills caused by the callee-saved registers.
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
f: # @f
addi sp, sp, -256 # Grow the stack
sd ra, 248(sp)
sd s0, 240(sp)
sd s1, 232(sp)
sd s2, 224(sp)
sd s3, 216(sp)
sd s4, 208(sp)
sd s5, 200(sp)
sd s6, 192(sp)
sd s7, 184(sp)
sd s8, 176(sp)
sd s9, 168(sp)
sd s10, 160(sp)
sd s11, 152(sp)
sd a2, 128(sp)
sd zero, 136(sp)
This is followed by a bunch of values that compute addresses for the b
matrix
that are also spilled. Note that the spill here is just the sd
instruction. The
addi
was part of the original program.
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
sd a2, 128(sp)
sd zero, 136(sp)
addi a2, a1, 4
sd a2, 112(sp)
addi a2, a1, 8
sd a2, 104(sp)
addi a2, a1, 12
sd a2, 96(sp)
addi a2, a1, 16
sd a2, 88(sp)
addi a2, a1, 20
sd a2, 80(sp)
addi a2, a1, 24
sd a2, 72(sp)
addi a2, a1, 28
sd a2, 64(sp)
addi a2, a1, 32
sd a2, 56(sp)
addi a2, a1, 36
sd a2, 48(sp)
addi a2, a1, 40
sd a2, 40(sp)
addi a2, a1, 44
sd a2, 32(sp)
addi a2, a1, 48
sd a2, 24(sp)
addi a2, a1, 52
sd a2, 16(sp)
addi a2, a1, 56
sd a2, 8(sp)
sd a1, 120(sp)
addi a1, a1, 60
sd a1, 0(sp)
To load a GPR from a memory location in RISC-V 64-bit we use the ld
instruction.
When the outermost loop of the matrix multiplication starts, it reloads a bunch
of things that we just spilled. If this seems pointless to you, note that this
is a loop so it is going to be pointless only in the first iteration (i.e.
these registers will be reused in the loop so we need to reload their value).
The ld
instruction is used to load a GPR from a memory location.
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
mv s10, zero
ld a4, 0(sp)
ld s11, 8(sp)
ld ra, 16(sp)
ld s6, 24(sp)
ld s7, 32(sp)
ld s2, 40(sp)
ld t6, 48(sp)
ld t5, 56(sp)
ld t4, 64(sp)
ld t3, 72(sp)
ld t2, 80(sp)
ld t1, 88(sp)
ld t0, 96(sp)
ld a7, 104(sp)
ld a6, 112(sp)
ld s8, 120(sp)
If we go to the end of the function, we see the reloads of the callee-saved registers, right before returning.
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
ld s11, 152(sp)
ld s10, 160(sp)
ld s9, 168(sp)
ld s8, 176(sp)
ld s7, 184(sp)
ld s6, 192(sp)
ld s5, 200(sp)
ld s4, 208(sp)
ld s3, 216(sp)
ld s2, 224(sp)
ld s1, 232(sp)
ld s0, 240(sp)
ld ra, 248(sp)
addi sp, sp, 256
ret
What do we want to do?
The F and D extensions provide us with 32 floating point registers. However for leaf functions (functions that do not call other functions) like this one, which need to spill a bunch of GPRs, it could be beneficial to be able to spill those values onto those floating point registers, if possible.
Strategy
As I mentioned I’m going to use an extremely simplistic approach here. More complex approaches are possible but are going to require more infrastructure. We are not doing that today.
After “Prologue Epilogue Emitter” has spilled all the callee-saved registers it invokes a target-specific hook. We will run some extra code in that hook.
- Check if this function is a leaf, otherwise bail out.
- Check if we’re compiling with F (for RISC-V 32-bit) or D (for RISC-V 64-bit). If this is not the case, bail out.
- Determine all the used registers by the function. From that information compute how many of FPRs are available. If none is available, bail out.
- Now for each instruction that is a store/load to/from the stack, this is a spill/reload, find a FPR for it. If one is still available, map the frame index (I explain later what a frame index is) to the FPR, emit the proper move from/to GPR to/from FPR for the spill/reload and mark the frame index as dead. Otherwise just ignore this frame index. FPRs are assigned as we find spill/reloads instructions in the function.
Implementation
I will call this process “soften spills”: we are still morally spilling but because we are not hitting the memory system it could be a cheaper operation. I’m sure better names exist but I went with this one.
Entry point
The target specific that “Prologue/Epilogue Emitter” right after it has emitted
the spills and reloads of callee-saved registers is called
processFunctionBeforeFrameFinalized
. For the RISC-V backend this function is
in llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
. To keep things a bit tidy I
added the call here but I implemented the algorithm in another file.
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
void RISCVFrameLowering::processFunctionBeforeFrameFinalized(
MachineFunction &MF, RegScavenger *RS) const {
const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
MachineFrameInfo &MFI = MF.getFrameInfo();
const TargetRegisterClass *RC = &RISCV::GPRRegClass;
// estimateStackSize has been observed to under-estimate the final stack
// size, so give ourselves wiggle-room by checking for stack size
// representable an 11-bit signed field rather than 12-bits.
// FIXME: It may be possible to craft a function with a small stack that
// still needs an emergency spill slot for branch relaxation. This case
// would currently be missed.
if (!isInt<11>(MFI.estimateStackSize(MF))) {
int RegScavFI = MFI.CreateStackObject(
RegInfo->getSpillSize(*RC), RegInfo->getSpillAlignment(*RC), false);
RS->addScavengingFrameIndex(RegScavFI);
}
// If we want to soften spills, we do it now.
RISCVSoftenXSpillsReload(&MF);
}
This is a convenient place to do the spill softening because here we will be able to see the spills and reloads emitted by the register allocator and also the spills and reloads emitted by “Prologue/Epilog Emitter”. Earlier than that we would be missing the latter kind of spills and reloads. And later than that, the stack layout of the function would have already been generated so while we could remove the spills, the storage for them would have already been accounted. In fact we will see later that changing the frame information here is kind of unexpected and we will have to amend a few bits.
Prolegomena
I implemented this in another file so we will need some boilerplate before we can continue.
Following is the main implementation file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
//===----- RISCVSoftenSpills.cpp - Soften Spills using FPR registers ------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#include "RISCV.h"
#include "RISCVTargetMachine.h"
#include "llvm/Support/Debug.h"
#include "llvm/ADT/DenseMap.h"
using namespace llvm;
#define DEBUG_TYPE "riscv-soften-spills"
static cl::opt<bool> EnableSoftenSpills(
"riscv-soften-spills",
cl::desc("Enable softening spills using FPR registers when available"),
cl::init(false), cl::Hidden);
bool llvm::RISCVSoftenXSpillsReload(MachineFunction *MF) {
if (!EnableSoftenSpills)
return false;
// .. rest of the code here ...
}
I added a command line option, so we
can manually enable the softening from clang using -mllvm
-riscv-soften-spills
. The macro DEBUG_TYPE
is needed when we later on use
LLVM_DEBUG
, this is used to filter debug messages via -mllvm
-debug-only=riscv-soften-spills
.
This function will return true
if it changed something, otherwise it will
return false
. We do not use this value but a caller might be interested to
know if we actually changed something.
This function is defined in the llvm
namespace for simplicity. So we need a
declaration in that namespace first. We can add one in RISCV.h
. Note:
There are better ways to organise this code, this one is just simple and
effective.
50
51
52
53
// FIXME - Move this to a better place.
bool RISCVSoftenXSpillsReload(MachineFunction *MF);
} // namespace llvm
Also we need to let know cmake about this new file.
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
add_llvm_target(RISCVCodeGen
RISCVAsmPrinter.cpp
RISCVCallLowering.cpp
RISCVExpandPseudoInsts.cpp
RISCVFrameLowering.cpp
RISCVInstrInfo.cpp
RISCVInstructionSelector.cpp
RISCVISelDAGToDAG.cpp
RISCVISelLowering.cpp
RISCVLegalizerInfo.cpp
RISCVMCInstLower.cpp
RISCVMergeBaseOffset.cpp
RISCVRegisterBankInfo.cpp
RISCVRegisterInfo.cpp
RISCVSubtarget.cpp
RISCVTargetMachine.cpp
RISCVTargetObjectFile.cpp
RISCVTargetTransformInfo.cpp
RISCVSoftenSpills.cpp
)
Early bail-outs
Before we continue we need to gather some information. We are passing the
MachineFunction
which is the object that represents the whole function in the
code generation phase of LLVM. We get two kinds of objects, those that are
Machine
Something concern to the current function being compiled. Those that
are Target
Something are backend-specific information not necessarily
function-specific (i.e. they might be shared between functions).
26
27
28
29
30
31
32
33
// Gather some information that we will need.
MachineFrameInfo *MFI = &MF->getFrameInfo();
const MachineRegisterInfo *MRI = &MF->getRegInfo();
const RISCVSubtarget *Subtarget = &MF->getSubtarget<RISCVSubtarget>();
const RISCVInstrInfo *TII =
static_cast<const RISCVInstrInfo *>(Subtarget->getInstrInfo());
const TargetRegisterInfo *TRI = Subtarget->getRegisterInfo();
The MachineFrameInfo
(MFI
) object concerns about objects in the stack for the
current function. LLVM uses a stack abstraction called the frame indexes.
Each frame index is an integer for which we can associate information (like
size and alignment). Those indexes are later on used to compute the size of the
elements required.
The MachineRegisterInfo
(MRI
) deals about the specific register information
used by the function. This is more useful when the MachineFunction
was in SSA
form before Register Allocation. At this point is useful to know what registers
are callee-saved, something we will want to use later.
The Subtarget
is compilation-specific information for the current function. For
instance it allows us to know if we are compiling with support for the F and D
RISC-V standard extensions or whether we are compiling for 64-bit.
The TargetInstructionInfo
(TII
) gives us access to the instructions of this
target (RISC-V in our case) so we can create new instructions. We will need
this when replacing the spills/reloads with moves.
Finally the TargetRegisterInfo
(TRI
) gives us access to the register of this
target. We will need this to enumerate the FPR registers.
A first check we can do now is see if this function has calls. MFI
knows that.
35
36
37
38
39
// If we have calls, for now do nothing.
// There are still opportunities here if the pair spill/reload doesn't cross
// function calls but they will require a more sophisticated model.
if (MFI->hasCalls())
return false;
Also if we are not compiling for F or D, we won’t be able to use the instruction we need, so check this as well.
42
43
44
45
46
47
48
// If we are RV64 but we don't have D, give up.
if (Subtarget->is64Bit() && !Subtarget->hasStdExtD())
return false;
// If we are RV32 but we don't have F, give up.
if (!Subtarget->hasStdExtF())
return false;
Gather all the registers used
Because we use a very simplistic approach, we want to know, globally for the
whole function, what FPRs are available. For that we will iterate all the
instructions and use the class LiveRegUnits
.
LLVM uses a relatively flexible concept of registers. They can be virtual or physical. If they are physical their storage may be shared with other registers.
For instance the RISC-V backend in LLVM models floating point registers of the
F extension as subregisters of the floating point registers of the D
extension. This model is sensible because this is what the spec says: a RISC-V
system with the D extension represents a register of the F extension in the
lowest 32-bits of the floating point register. In that sense a floating point
register such as f3
is modelled in LLVM with two registers: f3_f
and
f3_d
, for F and D extensions respectively. LLVM must know that changing
f3_f
will change f3_d
. This what register units are for.
The class LiveRegUnits
allows us to accumulate the register units used by
instructions. So we iterate for each basic block
and then for each instruction.
51
52
53
54
55
56
57
58
59
60
61
// Flow-insensitive analysis in which we identify FPR32/FPR64 that
// are not used at all.
// There are further opportunities as the static path for a spill and all its
// reloads might have free FPR registers. However our model is very simple so
// we can't represent these.
LiveRegUnits LRU(*TRI);
for (auto &MBB : *MF) {
for (auto &MI : MBB) {
LRU.accumulate(MI);
}
}
Callee-saved are of no interest
I we use callee-saved FPRs, then we will not have achieved anything. The reason is that if we modify a callee-saved register we need to preserve it. And if we need to preserve them we need to spill them. Clearly we need to make sure we don’t consider callee-saved registers.
MRI
can tell us which are the callee-saved registers of the function. So we
make a handy function for that.
63
64
65
66
67
68
69
const MCPhysReg *CalleeSavedRegs = MRI->getCalleeSavedRegs();
auto IsCalleeSaved = [&](MCPhysReg Reg) {
for (const MCPhysReg *R = CalleeSavedRegs; *R; R++)
if (*R == Reg)
return true;
return false;
};
Now we can filter all the FPRs (both for F, FPR32
and D, FPR64
) registers.
If they are available but they are not callee-saved, they are candidates for
our spills. If no register is available, we just bail out.
71
72
73
74
75
76
77
BitVector RegsAvailableFPR(TRI->getNumRegs());
const TargetRegisterClass &FPRRegClass =
Subtarget->is64Bit() ? RISCV::FPR64RegClass : RISCV::FPR32RegClass;
for (MCPhysReg PhysReg : FPRRegClass.getRegisters()) {
if (LRU.available(PhysReg) && !IsCalleeSaved(PhysReg))
RegsAvailableFPR.set(PhysReg);
}
We use the bitvector RegsAvailableFPR
to represent whether a FPR register is
available or not. Registers are identified by numbers (up to a maximum of
TRI->getNumRegs()
, which depends on the target). If a register is available,
its related position in the corresponding bitvector will be set.
Find the spills and reloads
A spill and a reload are no different to a store and a load. So we need a way
to identify spills and reloads among other general load and store instructions.
Luckily LLVM backends have to implement two functions that precisely answers
this question. These two functions are in TII
, who knows about the
instructions of the target.
Let’s make first a convenient function.
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
// Helper used to identify spills and reloads.
auto IsSpillReload = [&](MachineInstr &MI) {
int FI = 0;
bool Result = false;
bool IsSpill = false;
switch (MI.getOpcode()) {
default:
break;
case RISCV::LD:
case RISCV::LW:
if (TII->isLoadFromStackSlot(MI, FI))
Result = true;
break;
case RISCV::SD:
case RISCV::SW:
if (TII->isStoreToStackSlot(MI, FI)) {
Result = true;
IsSpill = true;
}
break;
}
return std::make_tuple(Result, FI, IsSpill);
};
Given a MachineInstruction
we check if this is one of the spills and reloads
we care about. We only care about ld
/sd
(or lw
/sw
in 32-bit). There are
other stores and loads that can write the stack (like flw
/fsw
) so we have
to filter them before we query isLoadFromStackSlot
and isStoreToStackSlot
.
These two functions also give us the frame index (FI
) that these instructions
are using.
Frame indexes to registers
Now we can start mapping frame indexes to registers. To do that we define
a mapping from frame indexes (represented as unsigned
) and registers.
107
108
using FrameIndexToFPRTy = DenseMap<unsigned, Register>;
FrameIndexToFPRTy FrameIndexToFPR;
If a frame index cannot be mapped to a register (e.g. we ran out of them) then
we will map it to the special RISCV::NoRegister
value. Otherwise it will be
mapped to one of the available registers computed above in RegsAvailableFPR
.
But before we start mapping them, we need to take care of a detail: if a frame
index is used in an instruction other than one of the instructions we care
about, we should conservatively leave them alone. A way to achieve this is to
map them first to RISCV::NoRegister
. We can use the helper IsSpillReload
we
defined above to filter them.
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
// Check frame indexes in other instructions and assign them to NoRegister
// to avoid replacing them.
for (auto &MBB : *MF) {
for (MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();
MBBI != E; MBBI++) {
MachineInstr &MI = *MBBI;
bool SpillOrReload;
int FI;
bool IsSpill;
std::tie(SpillOrReload, FI, IsSpill) = IsSpillReload(MI);
if (SpillOrReload)
continue;
for (MachineOperand &MO : MI.operands()) {
if (MO.isFI())
FrameIndexToFPR[MO.getIndex()] = RISCV::NoRegister;
}
}
}
If the instruction is a spill or reload, we skip it. If it is not then we check
if one of its operand is a frame index (MO.isFI()
). If it is, we preemptively
map that frame index (MO.getIndex()
) to RISCV::NoRegister
.
Now we can try to map the frame indexes into FPRs so we can move them to/from FPRs. First we filter all the instructions that are not spills or reloads of our interest.
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
// Now replace the spills and reloads.
for (auto &MBB : *MF) {
MachineBasicBlock::iterator MBBI = MBB.begin(), E = MBB.end();
while (MBBI != E) {
MachineBasicBlock::iterator NMBBI = std::next(MBBI);
MachineInstr &MI = *MBBI;
MBBI = NMBBI;
bool SpillOrReload;
int FI;
bool IsSpill;
std::tie(SpillOrReload, FI, IsSpill) = IsSpillReload(MI);
if (!SpillOrReload)
continue;
Note also the way we need to iterate through the instructions: we are going to replace instructions as we find them so this loop should be resilient to changing the list of instructions of the basic block.
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
LLVM_DEBUG(llvm::dbgs() << "Found ");
LLVM_DEBUG(MI.print(llvm::dbgs()));
FrameIndexToFPRTy::iterator ItR;
bool FINotFound;
std::tie(ItR, FINotFound) =
FrameIndexToFPR.insert(std::make_pair(FI, RISCV::NoRegister));
if (FINotFound) {
// Try to find a suitable free FPR.
LLVM_DEBUG(llvm::dbgs()
<< "Trying to find a free FPR for index " << FI << "\n");
Register R = RISCV::NoRegister;
int Idx = RegsAvailableFPR.find_first();
if (Idx > 0) {
RegsAvailableFPR.reset(Idx);
R = Idx;
}
ItR->second = R;
if (R != RISCV::NoRegister) {
MFI->RemoveStackObject(FI);
}
}
We found a spill or reload. We check first if its frame index had already been mapped.
If it hadn’t been mapped (as stated by FINotFound
) we try to find a register
in the bitvectors we computed above. If we find one Idx > 0
then we mark it as
used and we make the current frame index map to it (line 164). If we actually
mapped this frame index, we need to make sure it is not emitted as a stack object, so
we remove it from the MachineFrameInfo
(line 168).
Now we can make the mapping effective.
172
173
174
175
176
177
Register R = ItR->second;
if (R == RISCV::NoRegister) {
LLVM_DEBUG(llvm::dbgs()
<< "No register is available for index " << FI << "\n\n");
continue;
}
If it was not mapped to anything, just ignore this instruction. Otherwise we can
do the map. Basically we need to create a fmv.d.x
(or fmv.w.x
) instruction
for spills and fmv.x.d
(or fmv.x.w
) for reloads.
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
// Ok so we found a suitable FPR, let's use that one.
if (IsSpill) {
unsigned Opcode =
Subtarget->is64Bit() ? RISCV::FMV_D_X : RISCV::FMV_W_X;
MachineInstr &NewMI =
*BuildMI(MBB, MI, MI.getDebugLoc(), TII->get(Opcode), R)
.addReg(MI.getOperand(0).getReg());
LLVM_DEBUG(llvm::dbgs() << "Replacing with ");
LLVM_DEBUG(NewMI.print(llvm::dbgs()));
LLVM_DEBUG(llvm::dbgs() << "\n");
} else {
unsigned Opcode =
Subtarget->is64Bit() ? RISCV::FMV_X_D : RISCV::FMV_X_W;
MachineInstr &NewMI =
*BuildMI(MBB, MI, MI.getDebugLoc(), TII->get(Opcode),
MI.getOperand(0).getReg())
.addReg(R);
LLVM_DEBUG(llvm::dbgs() << "Replacing with ");
LLVM_DEBUG(NewMI.print(llvm::dbgs()));
LLVM_DEBUG(llvm::dbgs() << "\n");
}
MI.eraseFromParent();
Changed = true;
}
}
return Changed;
}
In line 186 (or line 196 for reloads) we create a new instruction using the
helper BuildMI
. For spills (line 187) we write into the register R
, that we
computed above. What we write is the first operand of this store, which is
exactly the register being stored into memory.
So we go from something like
into something like
In this example <frame-index.1>
would be associated to the register
f4
.
Show me the code!
Find the pass here.
Wrap-up
This has been a very long post, so in the next one we will see what are the results of this experiment, along with other issues I found in the way.