diff --git a/doc/FASTALLOC.md b/doc/FASTALLOC.md new file mode 100644 index 00000000..21abd459 --- /dev/null +++ b/doc/FASTALLOC.md @@ -0,0 +1,321 @@ +# Fastalloc Design Overview + +Fastalloc is a register allocator made specifically for fast +compile times. It's based on the reverse linear scan register +allocation/SSRA algorithm. +This document describes the data structures used and the allocation steps. + +# Data Structures + +The main data structures that Fastalloc uses to track its state are +described below. + +## Current VReg Allocations (`vreg_allocs`) + +This is a vector that is used to hold the current allocation for every +VReg during execution. + +## VReg Spillslots (`vreg_spillslots`) + +Whenever a VReg needs a spillslot, a dedicated slot is allocated for it. +This vector is where all VReg's spillslots are stored. + +## Live VRegs (`live_vregs`) + +Live VReg information is kept in a `VRegSet`, a doubly linked list +based on a vector. This is used for quick insertion, removal, and +iteration. + +## Least Recently Used Caches (`lrus`) + +Every register class (int, float, and vector) has its own LRU and they +are stored together in an array: `lrus`. An LRU is represented similarly +to a `VRegSet`: it's a circular, doubly-linked list based on a vector. + +The last PReg in an LRU is the least-recently allocated PReg: + +most recently used PReg (head) -> 2nd MRU PReg -> ... -> LRU PReg + +## Current VReg In PReg Info (`vreg_in_preg`) + +During allocation, it's necessary to determine which VReg is in a PReg +to generate the right move(s) for eviction. +`vreg_in_preg` is a vector that stores this information. + +## Available PRegs For Use In Instruction (`available_pregs`) + +This is a 2-tuple of `PRegSet`s, a bitset of physical registers, one for +the instruction's early phase and one for the late phase. +They are used to determine which registers are available for use in the +early/late phases of an instruction. + +Prior to the beginning of any instruction's allocation, this set is reset +to include all allocatable physical registers, some of which may already +contain a VReg. + +## VReg Liverange Location Info (`vreg_to_live_inst_range`) + +This is a vector of 3-tuples containing the beginning and the end +of all VReg's liveranges, along with an allocation they are guaranteed +to be in throughout that liverange. +This is used to build the debug locations vector after allocation +is complete. + +# Allocation Process Breakdown + +Allocation proceeds in reverse: from the last block to the first block, +and in each block: from the last instruction to the first instruction. + +The allocation for each operand in an instruction can be viewed to happen +in four phases: selection, assignment, eviction, and edit insertion. + +## Allocation Phase: Selection + +In this phase, a PReg is selected from `available_pregs` for the +operand based on the operand constraints. Depending on the operand's +position the selected PReg is removed from either the early or late +phase or both, indicating that the PReg is no longer available for +allocation by other operands in that phase. + +## Allocation Phase: Assignment + +In this phase, the selected PReg is set as the allocation for +the operand in the final output. + +## Allocation Phase: Eviction + +In this phase, the previous VReg in the allocation assigned to +an operand is evicted, if any. + +During eviction, a dedicated spillslot is allocated for the evicted +VReg and an edit is inserted after the instruction to move from the +slot to the allocation it's expected to be in after the instruction. + +## Allocation Phase: Edit Insertion + +In this phase, edits are inserted to ensure that the dataflow from +before the instruction to the selected allocation to after +the instruction remain correct. + +# Invariants + +Some invariants that remain true throughout execution: + +1. During processing, the allocation of a VReg at any point in time +as indicated in `vreg_allocs` changes exactly twice or thrice. +Initially it is set to none. When it's allocated, it is +changed to that allocation. After this, it doesn't change unless +it's evicted or spilled across a block boundary; +if it is, then its current allocation will change to its dedicated +spillslot. After this, it doesn't change again until it's definition +is reached and it's deallocated, during which its `vreg_allocs` +entry is set to none. The only exception is block parameters that +are never used: these are never allocated. + +2. A virtual register that outlives the block it was defined in will +be in its dedicated spillslot by the end of the block. + +3. At the end of a block, before edits are inserted to move values +from branch arguments to block parameters spillslots, all branch +arguments will be in their dedicated spillslots. + +4. At the beginning of a block, all branch parameters and livein +virtual registers will be in their dedicated spillslots. + +# Instruction Allocation + +To allocate a single instruction, the first step is to reset the +`available_pregs` sets to all allocated PRegs. + +Next, the selection phase is carried out for all operands with +fixed register constraints: the registers they are constrained to use are +marked as unavailable in the `available_pregs` set, depending on the +phase that they are valid in. If the operand is an early use or late +def operand, then the register will be marked as unavailable in the +early set or late set, respectively. Otherwise, the PReg is marked +as unavailable in both the early and late sets, because a PReg +assigned to an early def or late use operand cannot be reused by another +operand in the same instruction. + +After selection for fixed register operands, the eviction phase is +carried out for fixed register operands. Any VReg in their selected +registers, indicated by `vreg_in_preg`, is evicted: a dedicated +spillslot is allocated for the VReg (if it doesn't have one already), +an edit is inserted to move from the slot to the PReg, which is where +the VReg expected to be after the instruction, and its current +allocation in `vreg_allocs` is set to the spillslot. + +Next, all clobbers are removed from the early and late `available_pregs` +sets to avoid allocating a clobber to a def. + +Next, the selection, assignment, eviction, and edit insertion phases are +carried out for all def operands. When each def operand's allocation is +complete, the def operands is immediately freed, marking the end of the +VReg's liverange. It is removed from the `live_vregs` set, its allocation +in `vreg_allocs` is set to none, and if it was in a PReg, that PReg's +entry in `vreg_in_preg` is set to none. The selection and eviction phases +are omitted if the operand has a fixed constraint, as those phases have +already been carried out. + +Next, the selection, assignment, and eviction phases are carried out for all +use operands. As with def operands, the selection and eviction phases are +omitted if the operand has a fixed constraint, as those phases have already +been carried out. + +Then the edit insertion phase is carried out for all use operands. + +Lastly, if the instruction being processed is a branch instruction, the +parallel move resolver is used to insert edits before the instruction +to move from the branch arguments spillslots to the block parameter +spillslots. + +## Operand Allocation + +During the allocation of an operand, a check is first made to +see if the VReg's current allocation as indicated in +`vreg_allocs` is within the operand constraints. + +If it is, the assignment phase is carried out, setting the final +allocation output's entry for that operand to the allocation. +The selection phase is carried out, marking the PReg +(if the allocation is a PReg) as unavailable in the respective +early/late sets. The state of the LRUs is also updated to reflect +the new most recently used PReg. +No eviction needs to be done since the VReg is already in the +allocation and no edit insertion needs to be done either. + +On the other hand, if the VReg's current allocation is not within +constraints, the selection and eviction phases are carried out for +non-fixed operands. First, a set of PRegs that can be drawn from is +created from `available_pregs`. For early uses and late defs, +this draw-from set is the early set or late set respectively. +For late uses and early defs, the draw-from set is an intersection +of the available early and late sets (because a PReg used for a late +use can't be reassigned to another operand in the early phase; +likewise, a PReg used for an early def can't be reassigned to another +operand in the late phase). +The LRU for the VReg's regclass is then traversed from the end to find +the least-recently used PReg in the draw-from set. Once a PReg is found, +it is marked as the most recently used in the LRU, unavailable in the +`available_pregs` sets, and whatever VReg was in it before is evicted. + +The assignment phase is carried out next: the final allocation for the +operand is set to the selected register. + +If the newly allocated operand has not been allocated before, that is, +this is the first use/def of the VReg encountered, the VReg is +inserted into `live_vregs` and marked as the value in the allocated +PReg in `vreg_in_preg`. + +Otherwise, if the VReg has been allocated before, then an edit will need +to be inserted to ensure that the dataflow remains correct. +The edit insertion phase is now carried out if the operand is a def +operand: an edit is inserted after the instruction to move from the +new allocation to the allocation it's expected to be in after the +instruction. + +The edit insertion phase for use operands is done after all operands +have been processed. Edits are inserted to move from the current +allocations in `vreg_allocs` to the final allocated position before +the instruction. This is to account for the possibility of multiple +uses of the same operand in the instruction. + +## Reuse Operands + +Reuse def operands are handled by creating a new operand identical to the +reuse def, except that its constraints are the constraints of the +reused input and allocating that in its place. + +Reused inputs are handled by creating a new operand with a fixed register +constraint to use whatever register was assigned to the reuse def. + +Because of the way reuse operands and reused inputs are handled, when +selecting a register for an early use operand with a fixed constraint, +the PReg is also marked as unavailable in the `available_pregs` late +set if the operand is a reused input. And when selecting a register +for reuse def operands, the selected register is marked as unavailable +in the `available_pregs` early set. + +## VReg Spillslots + +Whenever a VReg needs a spillslot, a suitable one is allocated and +marked as the VReg's dedicated spillslot in `vreg_spillslots`. +If a VReg never needs a spillslot, none is allocated for it. +To ensure that a VReg will always be in its spillslot when expected, +during the processing of a def operand, before it's deallocated, +an edit is inserted to move from its current allocation as indicated +in `vreg_allocs` to its dedicated spillslot, if one is present in +`vreg_spillslots`. + +## Branch Instructions + +As an invariant, all branch arguments will be in their dedicated +spillslots at the end of the block before edits are inserted to +move from those spillslots to the block parameter spillslots +of the successor blocks. + +If a branch argument is already in an allocation that isn't +its spillslot (this could happen if the branch argument is used +as an operand in the same instruction, because all normal +instruction processing is completed before branch-specific +processing), then an edit is inserted +to move from the spillslot to that allocation and its current +allocation in `vreg_allocs` is set to the spillslot. + +It's after these edits have been inserted that the parallel move +resolver is then used to generate and insert edits to move from +those spillslots to the spillslots of the block parameters. + +# Across Blocks + +When a block completes processing, some VRegs will still be live. +These VRegs are either block parameters or livein VRegs. +As an invariant, prior to the first instruction in a block, all +block parameters and livein VRegs will be in their dedicated spillslots. + +To maintain this invariant, after a block completes processing, edits +are inserted at the beginning of the block to move from the block +parameter and livein spillslots to the allocation they are expected +to be in from the first instruction. +All block parameters are freed, just like defs, and liveins' current +allocations in `vreg_allocs` are set to their spillslots. + +# Edits Order + +`regalloc2`'s outward interface guarantees that edits are in +sorted order. Since allocation proceeds in reverse, all edits +are also added in reverse. After all blocks have completed +processing the edits are simply reversed to put it in the +correct order. + +One of the reasons why the allocation order proceeds the way it +does is because of this edit-order constraint. All edits that +occur after the instruction must be inserted before all edits +that occur before the instruction. + +# Debug Info + +After all blocks have completed processing, the debug locations +vector is built. +The information it's built from is assembled from liverange info +that is tracked throughout the allocation. +Whenever a VReg is allocated for the first time, its liverange end +is saved in the VReg's slot in the `vreg_to_live_inst_range` +vector. Whenever a VReg's definition is encountered, its liverange +beginning is saved, too. And the allocation it will be in +throughout that range is also saved alongside. + +To determine the allocation the VReg will be in throughout the +liverange, the first invariant is used: the first time a VReg +is allocated, its current allocation in `vreg_allocs` doesn't +change unless its evicted or spilled across block boundaries. +Using this info, if by the time the def of a VReg is allocated, +that VReg has no dedicated spillslot, +that implies that the VReg was never evicted or spilled, so whatever +value its `vreg_allocs` entry says is the location it will be in +throughout its liverange. Otherwise, if it has a spillslot +allocated to it, that implies that the VReg was either evicted +at some point or it was a livein of a predecessor or a block parameter. +Either way, since all spillslots are dedicated to their respective VRegs, +it is safe to record the spillslot as the allocation for the +`vreg_to_live_inst_range` info. diff --git a/doc/GENERAL.md b/doc/GENERAL.md new file mode 100644 index 00000000..f5d70ff2 --- /dev/null +++ b/doc/GENERAL.md @@ -0,0 +1,212 @@ +# regalloc2 Design Overview + +This document describes the basic architecture of the regalloc2 +register allocator. It describes the externally-visible interface: +input CFG, instructions, operands, with their invariants; meaning of +various parts of the output. +`ION.md` and `FASTALLOC.md` describe the specifics of the main Ion +allocator and the fast allocator, respectively. + +# API, Input IR and Invariants + +The toplevel API to regalloc2 consists of a single entry point `run()` +that takes a register environment, which specifies all physical +registers, and the input program. The function returns either an error +or an `Output` struct that provides allocations for each operand and a +vector of additional instructions (moves, loads, stores) to insert. + +## Register Environment + +The allocator takes a `MachineEnv` which specifies, for each of the +two register classes `Int` and `Float`, a vector of `PReg`s by index. A +`PReg` is nothing more than the class and index within the class; the +allocator does not need to know anything more. + +The `MachineEnv` provides a vector of preferred and non-preferred +physical registers per class. Any register not in either vector will +not be allocated. Usually, registers that do not need to be saved in +the prologue if used (i.e., caller-save registers) are given in the +"preferred" vector. The environment also provides exactly one scratch +register per class. This register must not be in the preferred or +non-preferred vectors, and is used whenever a set of moves that need +to occur logically in parallel have a cycle (for a simple example, +consider a swap `r0, r1 := r1, r0`). + +With some more work, we could potentially remove the need for the +scratch register by requiring support for an additional edit type from +the client ("swap"), but we have not pursued this. + +## CFG and Instructions + +The allocator operates on an input program that is in a standard CFG +representation: the function body is a sequence of basic blocks, and +each block has a sequence of instructions and zero or more +successors. The allocator also requires the client to provide +predecessors for each block, and these must be consistent with the +successors. + +Instructions are opaque to the allocator except for a few important +bits: (1) `is_ret` (is a return instruction); (2) `is_branch` (is a +branch instruction); and (3) a vector of Operands, covered below. +Every block must end in a return or branch. + +Both instructions and blocks are named by indices in contiguous index +spaces. A block's instructions must be a contiguous range of +instruction indices, and block i's first instruction must come +immediately after block i-1's last instruction. + +The CFG must have *no critical edges*. A critical edge is an edge from +block A to block B such that A has more than one successor *and* B has +more than one predecessor. For this definition, the entry block has an +implicit predecessor, and any block that ends in a return has an +implicit successor. + +Note that there are *no* requirements related to the ordering of +blocks, and there is no requirement that the control flow be +reducible. Some *heuristics* used by the allocator will perform better +if the code is reducible and ordered in reverse postorder (RPO), +however: in particular, (1) this interacts better with the +contiguous-range-of-instruction-indices live range representation that +we use, and (2) the "approximate loop depth" metric will actually be +exact if both these conditions are met. + +## Operands and VRegs + +Every instruction operates on values by way of `Operand`s. An operand +consists of the following fields: + +- VReg, or virtual register. *Every* operand mentions a virtual + register, even if it is constrained to a single physical register in + practice. This is because we track liveranges uniformly by vreg. + +- Policy, or "constraint". Every reference to a vreg can apply some + constraint to the vreg at that point in the program. Valid policies are: + + - Any location; + - Any register of the vreg's class; + - Any stack slot; + - A particular fixed physical register; or + - For a def (output), a *reuse* of an input register. + +- The "kind" of reference to this vreg: Def, Use, Mod. A def + (definition) writes to the vreg, and disregards any possible earlier + value. A mod (modify) reads the current value then writes a new + one. A use simply reads the vreg's value. + +- The position: before or after the instruction. + - Note that to have a def (output) register available in a way that + does not conflict with inputs, the def should be placed at the + "before" position. Similarly, to have a use (input) register + available in a way that does not conflict with outputs, the use + should be placed at the "after" position. + +VRegs, or virtual registers, are specified by an index and a register +class (Float or Int). The classes are not given separately; they are +encoded on every mention of the vreg. (In a sense, the class is an +extra index bit, or part of the register name.) The input function +trait does require the client to provide the exact vreg count, +however. + +Implementation note: both vregs and operands are bit-packed into +u32s. This is essential for memory-efficiency. As a result of the +operand bit-packing in particular (including the policy constraints!), +the allocator supports up to 2^21 (2M) vregs per function, and 2^6 +(64) physical registers per class. Later we will also see a limit of +2^20 (1M) instructions per function. These limits are considered +sufficient for the anticipated use-cases (e.g., compiling Wasm, which +also has function-size implementation limits); for larger functions, +it is likely better to use a simpler register allocator in any case. + +## Reuses and Two-Address ISAs + +Some instruction sets primarily have instructions that name only two +registers for a binary operator, rather than three: both registers are +inputs, and the result is placed in one of the registers, clobbering +its original value. The most well-known modern example is x86. It is +thus imperative that we support this pattern well in the register +allocator. + +This instruction-set design is somewhat at odds with an SSA +representation, where a value cannot be redefined. + +Thus, the allocator supports a useful fiction of sorts: the +instruction can be described as if it has three register mentions -- +two inputs and a separate output -- and neither input will be +clobbered. The output, however, is special: its register-placement +policy is "reuse input i" (where i == 0 or 1). The allocator +guarantees that the register assignment for that input and the output +will be the same, so the instruction can use that register as its +"modifies" operand. If the input is needed again later, the allocator +will take care of the necessary copying. + +We will see below how the allocator makes this work by doing some +preprocessing so that the core allocation algorithms do not need to +worry about this constraint. + +## SSA + +regalloc2 takes an SSA IR as input, where the usual definitions apply: +every vreg is defined exactly once, and every vreg use is dominated by +its one def. (Using blockparams means that we do not need additional +conditions for phi-nodes.) + +## Block Parameters + +Every block can have *block parameters*, and a branch to a block with +block parameters must provide values for those parameters via +operands. When a branch has more than one successor, it provides +separate operands for each possible successor. These block parameters +are equivalent to phi-nodes; we chose this representation because they +are in many ways a more consistent representation of SSA. + +To see why we believe block parameters are a slightly nicer design +choice than use of phi nodes, consider: phis are special +pseudoinstructions that must come first in a block, are all defined in +parallel, and whose uses occur on the edge of a particular +predecessor. All of these facts complicate any analysis that scans +instructions and reasons about uses and defs. It is much closer to the +truth to actually put those uses *in* the predecessor, on the branch, +and put all the defs at the top of the block as a separate kind of +def. The tradeoff is that a vreg's def now has two possibilities -- +ordinary instruction def or blockparam def -- but this is fairly +reasonable to handle. + +## Output + +The allocator produces two main data structures as output: an array of +`Allocation`s and a sequence of edits. Some other miscellaneous data is also +provided. + +### Allocations + +The allocator provides an array of `Allocation` values, one per +`Operand`. Each `Allocation` has a kind and an index. The kind may +indicate that this is a physical register or a stack slot, and the +index gives the respective register or slot. All allocations will +conform to the constraints given, and will faithfully preserve the +dataflow of the input program. + +### Inserted Moves + +In order to implement the necessary movement of data between +allocations, the allocator needs to insert moves at various program +points. + +The vector of inserted moves contains tuples that name a program point +and an "edit". The edit is either a move, from one `Allocation` to +another, or else a kind of metadata used by the checker to know which +VReg is live in a given allocation at any particular time. The latter +sort of edit can be ignored by a backend that is just interested in +generating machine code. + +Note that the allocator will never generate a move from one stackslot +directly to another, by design. Instead, if it needs to do so, it will +make use of the scratch register. (Sometimes such a move occurs when +the scratch register is already holding a value, e.g. to resolve a +cycle of moves; in this case, it will allocate another spillslot and +spill the original scratch value around the move.) + +Thus, the single "edit" type can become either a register-to-register +move, a load from a stackslot into a register, or a store from a +register into a stackslot. + diff --git a/doc/DESIGN.md b/doc/ION.md similarity index 85% rename from doc/DESIGN.md rename to doc/ION.md index 4172a063..aea2be23 100644 --- a/doc/DESIGN.md +++ b/doc/ION.md @@ -1,217 +1,12 @@ -# regalloc2 Design Overview +# Ion Design Overview -This document describes the basic architecture of the regalloc2 -register allocator. It describes the externally-visible interface -(input CFG, instructions, operands, with their invariants; meaning of -various parts of the output); core data structures; and the allocation +This document describes the basic architecture of the Ion +register allocator. It describes the core data structures; and the allocation pipeline, or series of algorithms that compute an allocation. It ends with a description of future work and expectations, as well as an appendix that notes design influences and similarities to the IonMonkey backtracking allocator. -# API, Input IR and Invariants - -The toplevel API to regalloc2 consists of a single entry point `run()` -that takes a register environment, which specifies all physical -registers, and the input program. The function returns either an error -or an `Output` struct that provides allocations for each operand and a -vector of additional instructions (moves, loads, stores) to insert. - -## Register Environment - -The allocator takes a `MachineEnv` which specifies, for each of the -two register classes `Int` and `Float`, a vector of `PReg`s by index. A -`PReg` is nothing more than the class and index within the class; the -allocator does not need to know anything more. - -The `MachineEnv` provides a vector of preferred and non-preferred -physical registers per class. Any register not in either vector will -not be allocated. Usually, registers that do not need to be saved in -the prologue if used (i.e., caller-save registers) are given in the -"preferred" vector. The environment also provides exactly one scratch -register per class. This register must not be in the preferred or -non-preferred vectors, and is used whenever a set of moves that need -to occur logically in parallel have a cycle (for a simple example, -consider a swap `r0, r1 := r1, r0`). - -With some more work, we could potentially remove the need for the -scratch register by requiring support for an additional edit type from -the client ("swap"), but we have not pursued this. - -## CFG and Instructions - -The allocator operates on an input program that is in a standard CFG -representation: the function body is a sequence of basic blocks, and -each block has a sequence of instructions and zero or more -successors. The allocator also requires the client to provide -predecessors for each block, and these must be consistent with the -successors. - -Instructions are opaque to the allocator except for a few important -bits: (1) `is_ret` (is a return instruction); (2) `is_branch` (is a -branch instruction); and (3) a vector of Operands, covered below. -Every block must end in a return or branch. - -Both instructions and blocks are named by indices in contiguous index -spaces. A block's instructions must be a contiguous range of -instruction indices, and block i's first instruction must come -immediately after block i-1's last instruction. - -The CFG must have *no critical edges*. A critical edge is an edge from -block A to block B such that A has more than one successor *and* B has -more than one predecessor. For this definition, the entry block has an -implicit predecessor, and any block that ends in a return has an -implicit successor. - -Note that there are *no* requirements related to the ordering of -blocks, and there is no requirement that the control flow be -reducible. Some *heuristics* used by the allocator will perform better -if the code is reducible and ordered in reverse postorder (RPO), -however: in particular, (1) this interacts better with the -contiguous-range-of-instruction-indices live range representation that -we use, and (2) the "approximate loop depth" metric will actually be -exact if both these conditions are met. - -## Operands and VRegs - -Every instruction operates on values by way of `Operand`s. An operand -consists of the following fields: - -- VReg, or virtual register. *Every* operand mentions a virtual - register, even if it is constrained to a single physical register in - practice. This is because we track liveranges uniformly by vreg. - -- Policy, or "constraint". Every reference to a vreg can apply some - constraint to the vreg at that point in the program. Valid policies are: - - - Any location; - - Any register of the vreg's class; - - Any stack slot; - - A particular fixed physical register; or - - For a def (output), a *reuse* of an input register. - -- The "kind" of reference to this vreg: Def, Use, Mod. A def - (definition) writes to the vreg, and disregards any possible earlier - value. A mod (modify) reads the current value then writes a new - one. A use simply reads the vreg's value. - -- The position: before or after the instruction. - - Note that to have a def (output) register available in a way that - does not conflict with inputs, the def should be placed at the - "before" position. Similarly, to have a use (input) register - available in a way that does not conflict with outputs, the use - should be placed at the "after" position. - -VRegs, or virtual registers, are specified by an index and a register -class (Float or Int). The classes are not given separately; they are -encoded on every mention of the vreg. (In a sense, the class is an -extra index bit, or part of the register name.) The input function -trait does require the client to provide the exact vreg count, -however. - -Implementation note: both vregs and operands are bit-packed into -u32s. This is essential for memory-efficiency. As a result of the -operand bit-packing in particular (including the policy constraints!), -the allocator supports up to 2^21 (2M) vregs per function, and 2^6 -(64) physical registers per class. Later we will also see a limit of -2^20 (1M) instructions per function. These limits are considered -sufficient for the anticipated use-cases (e.g., compiling Wasm, which -also has function-size implementation limits); for larger functions, -it is likely better to use a simpler register allocator in any case. - -## Reuses and Two-Address ISAs - -Some instruction sets primarily have instructions that name only two -registers for a binary operator, rather than three: both registers are -inputs, and the result is placed in one of the registers, clobbering -its original value. The most well-known modern example is x86. It is -thus imperative that we support this pattern well in the register -allocator. - -This instruction-set design is somewhat at odds with an SSA -representation, where a value cannot be redefined. - -Thus, the allocator supports a useful fiction of sorts: the -instruction can be described as if it has three register mentions -- -two inputs and a separate output -- and neither input will be -clobbered. The output, however, is special: its register-placement -policy is "reuse input i" (where i == 0 or 1). The allocator -guarantees that the register assignment for that input and the output -will be the same, so the instruction can use that register as its -"modifies" operand. If the input is needed again later, the allocator -will take care of the necessary copying. - -We will see below how the allocator makes this work by doing some -preprocessing so that the core allocation algorithms do not need to -worry about this constraint. - -## SSA - -regalloc2 takes an SSA IR as input, where the usual definitions apply: -every vreg is defined exactly once, and every vreg use is dominated by -its one def. (Using blockparams means that we do not need additional -conditions for phi-nodes.) - -## Block Parameters - -Every block can have *block parameters*, and a branch to a block with -block parameters must provide values for those parameters via -operands. When a branch has more than one successor, it provides -separate operands for each possible successor. These block parameters -are equivalent to phi-nodes; we chose this representation because they -are in many ways a more consistent representation of SSA. - -To see why we believe block parameters are a slightly nicer design -choice than use of phi nodes, consider: phis are special -pseudoinstructions that must come first in a block, are all defined in -parallel, and whose uses occur on the edge of a particular -predecessor. All of these facts complicate any analysis that scans -instructions and reasons about uses and defs. It is much closer to the -truth to actually put those uses *in* the predecessor, on the branch, -and put all the defs at the top of the block as a separate kind of -def. The tradeoff is that a vreg's def now has two possibilities -- -ordinary instruction def or blockparam def -- but this is fairly -reasonable to handle. - -## Output - -The allocator produces two main data structures as output: an array of -`Allocation`s and a sequence of edits. Some other miscellaneous data is also -provided. - -### Allocations - -The allocator provides an array of `Allocation` values, one per -`Operand`. Each `Allocation` has a kind and an index. The kind may -indicate that this is a physical register or a stack slot, and the -index gives the respective register or slot. All allocations will -conform to the constraints given, and will faithfully preserve the -dataflow of the input program. - -### Inserted Moves - -In order to implement the necessary movement of data between -allocations, the allocator needs to insert moves at various program -points. - -The vector of inserted moves contains tuples that name a program point -and an "edit". The edit is either a move, from one `Allocation` to -another, or else a kind of metadata used by the checker to know which -VReg is live in a given allocation at any particular time. The latter -sort of edit can be ignored by a backend that is just interested in -generating machine code. - -Note that the allocator will never generate a move from one stackslot -directly to another, by design. Instead, if it needs to do so, it will -make use of the scratch register. (Sometimes such a move occurs when -the scratch register is already holding a value, e.g. to resolve a -cycle of moves; in this case, it will allocate another spillslot and -spill the original scratch value around the move.) - -Thus, the single "edit" type can become either a register-to-register -move, a load from a stackslot into a register, or a store from a -register into a stackslot. - # Data Structures We now review the data structures that regalloc2 uses to track its