Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regalloc all instructions #2

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 124 additions & 19 deletions llvm/lib/CodeGen/MLRegallocEvictAdvisor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@
#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/MLModelRunner.h"
#include "llvm/Analysis/TensorSpec.h"
#if defined(LLVM_HAVE_TF_AOT_REGALLOCEVICTMODEL) || defined(LLVM_HAVE_TF_API)
#include "llvm/MC/MCInstrInfo.h"
#include "llvm/Target/TargetMachine.h"
#if defined(LLVM_HAVE_TF_AOT_REGALLOCEVICTMODEL) || defined(LLVM_HAVE_TF_API)
#include "llvm/Analysis/ModelUnderTrainingRunner.h"
#include "llvm/Analysis/NoInferenceModelRunner.h"
#endif
Expand Down Expand Up @@ -126,6 +128,22 @@ static const int64_t MaxInterferences = 32;
static const int64_t CandidateVirtRegPos = MaxInterferences;
static const int64_t NumberOfInterferences = CandidateVirtRegPos + 1;

// When the model gets trained, it won't understand new instructions unless
// trained explicitly on them. This is the current cutoff for x86 (current
// architecture focus for ML regalloc work). Any instructions over this will be
// replaced with 0s so that the model will still function even with new opcodes.
// Comes from MCInstrInfo::getNumOpcodes()
static const int OpcodeCountCutoff = 17716;

// The number of instructions that a specific candidate virtual register
// might have is variable, but libtensorflow only supports models with a fixed
// number of inputs. Encodes the number of instructions (across all interfering
// live ranges) set in the variable and just ignoring the rest. Padding with
// zeroes if less than 100.
static const int ModelMaxSupportedInstructionCount = 300;
static const std::vector<int64_t> InstructionsAndMappingShape{
NumberOfInterferences + 1, ModelMaxSupportedInstructionCount};

// Most features are as described above, so we'll reuse this vector in defining
// them.
static const std::vector<int64_t> PerLiveRangeShape{1, NumberOfInterferences};
Expand Down Expand Up @@ -192,6 +210,8 @@ static const std::vector<int64_t> PerLiveRangeShape{1, NumberOfInterferences};
"largest stage of an interval in this LR") \
M(int64_t, min_stage, PerLiveRangeShape, \
"lowest stage of an interval in this LR") \
M(int64_t, instructions_and_mapping, InstructionsAndMappingShape, \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be helpful to show in a comment how this is laid out. IIUC, each line corresponds to an instruction. The first column is the instruction opcode, the rest correspond to each candidate LR. We tick (set to 1) each [line][ LR_corresponding_column] where that LR spans over that instruction.

2 questions:

  • (open question) would it be ML-preferable to place the opcodes instead of '1' tickmarks? i.e. repeat the opcode where it is used, and drop the leftmost column?
  • in the algo, it may be easier to not worry about overlaps. Instead, for each LR, set the opcode (idempotent if already set) and tick its position in the LR. (unless I'm missing smth)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'll talk with Yundi about that one. Thinking about it now, just passing the opcodes should be more efficient on the ML side because it avoids a matrix multiplication (at least I would think more efficient).
  2. We'd need frequency data too, but that would be pretty easy to pass along to the model as well structuring the extraction like that. It wouldn't even necessarily make things a lot more difficult to implement on the ML side as it could still be implemented as a matrix multiply and the position data that the current approach uses is not used at all. I'll let Yundi get back to me, but that is probably a better approach since the position data is currently irrelevant.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, not sure Yundi saw this, pasting it in slack

"instructions and binary map between instructions and live ranges") \
M(float, progress, {1}, "ratio of current queue size to initial size")

// The model learns to pick one of the mask == 1 interferences. This is the name
Expand Down Expand Up @@ -273,11 +293,12 @@ class MLEvictAdvisor : public RegAllocEvictionAdvisor {

/// Load the features of the given VirtReg (allocated or not) at column Pos,
/// but if that can't be evicted, return false instead.
bool
loadInterferenceFeatures(const LiveInterval &VirtReg, MCRegister PhysReg,
bool IsHint, const SmallVirtRegSet &FixedRegisters,
std::array<float, FeatureIDs::FeatureCount> &Largest,
size_t Pos) const;
bool loadInterferenceFeatures(
const LiveInterval &VirtReg, MCRegister PhysReg, bool IsHint,
const SmallVirtRegSet &FixedRegisters,
std::array<float, FeatureIDs::FeatureCount> &Largest, size_t Pos,
SmallVectorImpl<std::tuple<SlotIndex, SlotIndex, size_t>>
&StartEndSlotIndices) const;

private:
static float getInitialQueueSize(const MachineFunction &MF);
Expand All @@ -290,7 +311,13 @@ class MLEvictAdvisor : public RegAllocEvictionAdvisor {
void extractFeatures(const SmallVectorImpl<const LiveInterval *> &Intervals,
std::array<float, FeatureIDs::FeatureCount> &Largest,
size_t Pos, int64_t IsHint, int64_t LocalIntfsCount,
float NrUrgent) const;
float NrUrgent,
SmallVectorImpl<std::tuple<SlotIndex, SlotIndex, size_t>>
&StartEndSlotIndices) const;

void extractInstructionFeatures(
SmallVectorImpl<std::tuple<SlotIndex, SlotIndex, size_t>>
&StartEndSlotIndices) const;

// Point-in-time: we didn't learn this, so we always delegate to the default.
bool canEvictHintInterference(
Expand Down Expand Up @@ -531,7 +558,9 @@ int64_t MLEvictAdvisor::tryFindEvictionCandidatePosition(
bool MLEvictAdvisor::loadInterferenceFeatures(
const LiveInterval &VirtReg, MCRegister PhysReg, bool IsHint,
const SmallVirtRegSet &FixedRegisters, FeaturesListNormalizer &Largest,
size_t Pos) const {
size_t Pos,
SmallVectorImpl<std::tuple<SlotIndex, SlotIndex, size_t>>
&StartEndSlotIndices) const {
// It is only possible to evict virtual register interference.
if (Matrix->checkInterference(VirtReg, PhysReg) > LiveRegMatrix::IK_VirtReg) {
// leave unavailable
Expand Down Expand Up @@ -590,7 +619,7 @@ bool MLEvictAdvisor::loadInterferenceFeatures(
// OK, so if we made it this far, this LR is an eviction candidate, load its
// features.
extractFeatures(InterferingIntervals, Largest, Pos, IsHint, LocalIntfs,
NrUrgent);
NrUrgent, StartEndSlotIndices);
return true;
}

Expand Down Expand Up @@ -629,12 +658,13 @@ MCRegister MLEvictAdvisor::tryFindEvictionCandidate(
FeaturesListNormalizer Largest;
Largest.fill(0.0);

// Same overal idea as in the default eviction policy - we visit the values of
// AllocationOrder one at a time. If it's not legally available, we mask off
// the corresponding feature column (==do nothing because we already reset all
// the features to 0)
// Use Pos to capture the column we load features at - in AllocationOrder
// order.
// Same overall idea as in the default eviction policy - we visit the values
// of AllocationOrder one at a time. If it's not legally available, we mask
// off the corresponding feature column (==do nothing because we already reset
// all the features to 0) Use Pos to capture the column we load features at -
// in AllocationOrder order.
SmallVector<std::tuple<SlotIndex, SlotIndex, size_t>, NumberOfInterferences>
StartEndSlotIndices;
size_t Pos = 0;
for (auto I = Order.begin(), E = Order.getOrderLimitEnd(OrderLimit); I != E;
++I, ++Pos) {
Expand All @@ -645,7 +675,7 @@ MCRegister MLEvictAdvisor::tryFindEvictionCandidate(
continue;
}
if (loadInterferenceFeatures(VirtReg, PhysReg, I.isHint(), FixedRegisters,
Largest, Pos)) {
Largest, Pos, StartEndSlotIndices)) {
++Available;
Regs[Pos] = std::make_pair(PhysReg, true);
}
Expand All @@ -662,10 +692,11 @@ MCRegister MLEvictAdvisor::tryFindEvictionCandidate(
if (!MustFindEviction)
extractFeatures(SmallVector<const LiveInterval *, 1>(1, &VirtReg), Largest,
CandidateVirtRegPos, /*IsHint*/ 0, /*LocalIntfsCount*/ 0,
/*NrUrgent*/ 0.0);
/*NrUrgent*/ 0.0, StartEndSlotIndices);
assert(InitialQSize > 0.0 && "We couldn't have gotten here if we had "
"nothing to allocate initially.");
// Normalize the features.
extractInstructionFeatures(StartEndSlotIndices);
// Normalize the features.
for (auto &V : Largest)
V = V ? V : 1.0;
for (size_t FeatureIndex = 0; FeatureIndex < FeatureIDs::FeatureCount;
Expand Down Expand Up @@ -749,7 +780,9 @@ MLEvictAdvisor::getLIFeatureComponents(const LiveInterval &LI) const {
void MLEvictAdvisor::extractFeatures(
const SmallVectorImpl<const LiveInterval *> &Intervals,
std::array<float, FeatureIDs::FeatureCount> &Largest, size_t Pos,
int64_t IsHint, int64_t LocalIntfsCount, float NrUrgent) const {
int64_t IsHint, int64_t LocalIntfsCount, float NrUrgent,
SmallVectorImpl<std::tuple<SlotIndex, SlotIndex, size_t>>
&StartEndSlotIndices) const {
int64_t NrDefsAndUses = 0;
int64_t NrBrokenHints = 0;
double R = 0.0;
Expand Down Expand Up @@ -796,6 +829,11 @@ void MLEvictAdvisor::extractFeatures(

HintWeights += LIFC.HintWeights;
NrRematerializable += LIFC.IsRemat;

for (auto CurrentSegment : LI) {
StartEndSlotIndices.push_back(
std::make_tuple(CurrentSegment.start, CurrentSegment.end, Pos));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be more readable if the tuple were replaced by a struct with explicitly named fields. Then the StartEndSlotIndices could be modeled as a vector of const such values - because they only need to be emplaced, never mutated, iiuc.

}
}
size_t Size = 0;
if (!Intervals.empty()) {
Expand Down Expand Up @@ -838,6 +876,73 @@ void MLEvictAdvisor::extractFeatures(
#undef SET
}

void MLEvictAdvisor::extractInstructionFeatures(
SmallVectorImpl<std::tuple<SlotIndex, SlotIndex, size_t>>
&StartEndSlotIndices) const {
std::sort(StartEndSlotIndices.begin(), StartEndSlotIndices.end(),
[](std::tuple<SlotIndex, SlotIndex, size_t> A,
std::tuple<SlotIndex, SlotIndex, size_t> B) {
return std::get<0>(A) < std::get<0>(B);
});
size_t InstructionCount = 0;
size_t CurrentSegment = 0;
SlotIndex CurrentIndex = std::get<0>(StartEndSlotIndices[0]);
while (true) {
while (CurrentIndex <= std::get<1>(StartEndSlotIndices[CurrentSegment]) &&
InstructionCount < ModelMaxSupportedInstructionCount) {
// set instruction
auto *CurrentMachineInstruction =
LIS->getInstructionFromIndex(CurrentIndex);
if (!CurrentMachineInstruction) {
CurrentIndex = CurrentIndex.getNextIndex();
continue;
}
auto CurrentOpcode = CurrentMachineInstruction->getOpcode();
Runner->getTensor<int64_t>(
FeatureIDs::instructions_and_mapping)[InstructionCount] =
CurrentOpcode < OpcodeCountCutoff ? CurrentOpcode : 0;
// set mask for instruction
// add 1 to the resulting position as all of the segment indices are
// offset 1 as the first row is instruction opcodes
auto CurrentSegmentPosition =
std::get<2>(StartEndSlotIndices[CurrentSegment]) + 1;
Runner->getTensor<int64_t>(FeatureIDs::instructions_and_mapping)
[CurrentSegmentPosition * ModelMaxSupportedInstructionCount +
InstructionCount] = 1;
// handle the overlapping LR case
size_t OverlapCheckCurrentSegment = CurrentSegment + 1;
while (OverlapCheckCurrentSegment < StartEndSlotIndices.size()) {
if (std::get<0>(StartEndSlotIndices[OverlapCheckCurrentSegment]) >
CurrentIndex) {
break;
boomanaiden154 marked this conversation as resolved.
Show resolved Hide resolved
}
auto OverlapCurrentSegmentPosition =
std::get<2>(StartEndSlotIndices[OverlapCheckCurrentSegment]) + 1;
Runner->getTensor<int64_t>(FeatureIDs::instructions_and_mapping)
[OverlapCurrentSegmentPosition * ModelMaxSupportedInstructionCount +
InstructionCount] = 1;
++OverlapCheckCurrentSegment;
}
++InstructionCount;
CurrentIndex = CurrentIndex.getNextIndex();
}
// if we've just finished processing through the last segment or if we've
// hit the maximum number of instructions, break out of the loop.
if (CurrentSegment == StartEndSlotIndices.size() - 1 ||
InstructionCount >= ModelMaxSupportedInstructionCount) {
break;
}
// just finished processing the previous segment, transition to the next one
if (std::get<0>(StartEndSlotIndices[CurrentSegment + 1]) >
std::get<1>(StartEndSlotIndices[CurrentSegment])) {
// segments aren't overlapping, skip to the beginning of the next segment
CurrentIndex = std::get<0>(StartEndSlotIndices[CurrentSegment + 1]);
++CurrentSegment;
}
++CurrentSegment;
}
}

// Development mode-specific implementations
#ifdef LLVM_HAVE_TF_API
RegAllocEvictionAdvisorAnalysis *llvm::createDevelopmentModeAdvisor() {
Expand Down