Skip to content

Commit

Permalink
Introduce new GC root placement pass
Browse files Browse the repository at this point in the history
Design notes are in the devdocs. Algorithmic documentation in code comments.
  • Loading branch information
Keno committed May 15, 2017
1 parent 1129de3 commit 9f1cd63
Show file tree
Hide file tree
Showing 14 changed files with 2,563 additions and 291 deletions.
156 changes: 156 additions & 0 deletions doc/src/devdocs/llvm.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,159 @@ study it and the pass of interest in isolation.
4. Strip the debug metadata and fix up the TBAA metadata by hand.

The last step is labor intensive. Suggestions on a better way would be appreciated.

## GC root placement

GC root placement is done by an LLVM late in the pass pipeline. Doing GC root
placement this late enables LLVM to make more aggressive optimizations around
code that requires GC roots, as well as allowing us to reduce the number of
required GC roots and GC root store operations (since LLVM doesn't understand
our GC, it wouldn't otherwise know what it is and is not allowed to do with
values stored to the GC frame, so it'll conservatively do very little). As an
example, consider an error path
```
if some_condition()
#= Use some variables maybe =#
error("An error occurred")
end
```
During constant folding, LLVM may discover that the condition is always false,
and can remove the basic block. However, if GC root lowering is done early,
the GC root slots used in the deleted block, as well as any values kept alive
in those slots only because they were used in the error path, would be kept
alive by LLVM. By doing GC root lowering late, we give LLVM the license to do
any of its usual optimizations (constant folding, dead code elimination, etc.),
without having to worry (too much) about which values may or may not be gc
tracked.

However, in order to be able to do late GC root placement, we need to be able to
identify a) which pointers are gc tracked and b) all uses of such pointers. The
goal of the GC placement pass is thus simple:

Minimize the number of needed gc roots/stores to them subject to the constraint
that at every safepoint, any live gc-tracked pointer (i.e. is a path after this
point that contains a use of this pointer).

### Representation

The primary difficulty is thus choosing an IR representation that allows us to
identify gc-tracked pointers and their uses, even after the program has been
run through the optimizer. Our design makes use of three LLVM features to achieve
this:
- Custom address spaces
- Operand Bundles
- non-integral pointers

Custom address spaces allow us to tag every point with an integer that needs
to be preserved through optimizations. The compiler may not insert casts between
address spaces that did not exist in the original program and it must never
change the address space of a pointer on a load/store/etc operation. This allows
us to annotate which pointers are gc-tracked in an optimizer-resistant way. Note
that metadata would not be able to achieve the same purpose. Metadata is supposed
to always be discardable without altering the semantics of the program. However,
failing to identify a gc-tracked pointer alters the resulting program behavior
dramatically - it'll probably crash or return wrong results. We currently use
three different addressspaces (their numbers are defined in src/codegen_shared.cpp):

- GC Tracked Pointers (10): These are pointers to boxed values that may be put
into a GC frame. It is loosely equivalent to a `jl_value_t*` pointer on the C
side. N.B. It is illegal to ever have a pointer in this address space that may
not be stored to a GC slot.
- Derived Pointers (11): These are pointers that are derived from some GC
tracked pointer. Uses of these pointers generate uses of the original pointer.
However, they need not themselves be known to the GC. The GC root placement
pass MUST always find the GC tracked pointer from which this pointer is
derived and use that as the pointer to root.
- Callee Rooted Pointers (12): This is a utility address space to express the
notion of a callee rooted value. All values of this address space MUST be
storable to a GC root (though it is possible to relax this condition in the
future), but unlike the other pointers need not be rooted if passed to a
call (they do still need to be rooted if they are live across another safepoint
between the definition and the call).

### Invariants.
The GC root placement pass makes use of several invariants, which need
to be observed by the frontend and are preserved by the optimizer.

First, only the following addressspace casts are allowed
- 0->{10,11,12}: It is allowable to decay an untracked pointer to any of the
other. However, do note that the optimizer has broad license to not root
such a value. It is never safe to have a value in addressspace 0 in any part
of the program if it is (or is derived from) a value that requires a GC root.
- 10->11: This is the standard decay route for interior values. The placement
pass will look for these to identify the base pointer for any use.
- 10->12: Addressspace 12 servers merely as a hint that a GC root is not
required. However, do note that the 11->12 decay is prohibited, since
pointers should generally be storable to a GC slot, even in this address space.

Now let us consider what constitutes a use:
- Loads whose loaded values is in one of the address spaces
- Stores of a value in one of the address spaces to a location
- Calls for which a value in one of the address spaces is an operand
- Calls in jlcall ABI, for which the argument array contains a value
- Return instructions.

We explicitly allow load/stores and simple calls in address space 10/11. Elements of jlcall
argument arrays must always be in address space 10 (it is required by the ABI that
they are valid `jl_value_t*` pointers). The same is true for return instructions
(though note that struct return arguments are allowed to have any of the address
spaces). The only allowable use of an address space 11 pointer is to pass it to
a call (which must have an appropriately typed operand).

Further, we disallow getelementptr in addrspace(10). This is because unless
the operation is a noop, the resulting pointer will not be validly storable
to a GC slot and may thus not be in this address space. If such a pointer
is required, it should be decayed to addrspace(11) first.

Lastly, we disallow inttoptr/ptrtoint instructions in these address spaces.
Having these instructions would mean that some i64 values are really gc tracked.
This is problematic, because it breaks that stated requirement that we're able
to identify gc-relevant pointers. This invariant is accomplished using the LLVM
"non-integral pointers" feature, which is new in LLVM 5.0. It prohibits the
optimizer from making optimizations that would introduce these operations. Note
we can still insert static constants at JIT time by using inttoptr in address
space 0 and then decaying to the appropriate address space afterwards.

### Supporting ccall
One important aspect missing from the discussion so far is the handling of
`ccall`. `ccall` has the peculiar feature that the location and scope of a use
do not coincide. As an example consider:
```
A = randn(1024)
ccall(:foo, Void, (Ptr{Float64},), A)
```
In lowering, the compiler will insert a conversion from the array to the
pointer which drops the reference to the array value. However, we of course
need to make sure that the array does stay alive while we're doing the ccall.
To understand how this is done, first recall the lowering of the above code:
```
return $(Expr(:foreigncall, :(:foo), Void, svec(Ptr{Float64}), :($(Expr(:foreigncall, :(:jl_array_ptr), Ptr{Float64}, svec(Any), :(A), 0))), :(A)))
```
The last `:(A)`, is an extra argument list inserted during lowering that informs
the code generator which julia level values need to be kept alive for the
duration of this ccall. We then take this information and represent it in an
"operand bundle" at the IR level. An operand bundle is essentially a fake use
that is attached to the call site. At the IR level, this looks like so:
```
call void inttoptr (i64 ... to void (double*)*)(double* %5) [ "jl_roots"(%jl_value_t addrspace(10)* %A) ]
```
The GC root placement pass will treat the jl_roots operand bundle as if it were
a regular operand. However, as a final step, after the gc roots are inserted,
it will drop the operand bundle to avoid confusing codegen.

### Supporting pointer_from_objref
`pointer_from_objref` is special because it requires the user to take explicit
control of GC rooting. By our above invariants, this function is illegal,
because it performs an addressspace cast from 10 to 0. However, it can be useful,
in certain situations, so we provide a special intrinsic:
```
declared %jl_value_t *julia.pointer_from_objref(%jl_value_t addrspace(10)*)
```
which is lowered to the corresponding address space cast after gc root lowering.
Do note however that by using this intrinsic, the caller assumes all responsibility
for making sure that the value in question is rooted. Further this intrinsic is
not considered a use, so the GC root placement pass will not provide a GC root
for the function. As a result, the external rooting must be arranged while the
value is still tracked by the system. I.e. it is not valid to attempt use the
result of this operation to establish a global root - the optimizer may have
already dropped the value.
2 changes: 1 addition & 1 deletion src/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ endif
LLVMLINK :=

ifeq ($(JULIACODEGEN),LLVM)
SRCS += codegen jitlayers disasm debuginfo llvm-simdloop llvm-ptls llvm-gcroot cgmemmgr
SRCS += codegen jitlayers disasm debuginfo llvm-simdloop llvm-ptls llvm-gcroot llvm-late-gc-lowering llvm-lower-handlers llvm-gc-invariant-verifier cgmemmgr
FLAGS += -I$(shell $(LLVM_CONFIG_HOST) --includedir)
LLVM_LIBS := all
ifeq ($(USE_POLLY),1)
Expand Down
55 changes: 43 additions & 12 deletions src/ccall.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -459,7 +459,9 @@ static Value *llvm_type_rewrite(
// sizes.
Value *from;
Value *to;
#if JL_LLVM_VERSION >= 30600
#if JL_LLVM_VERSION >= 40000
const DataLayout &DL = jl_data_layout;
#elif JL_LLVM_VERSION >= 30600
const DataLayout &DL = jl_ExecutionEngine->getDataLayout();
#else
const DataLayout &DL = *jl_ExecutionEngine->getDataLayout();
Expand All @@ -485,8 +487,8 @@ static Value *runtime_apply_type(jl_value_t *ty, jl_unionall_t *unionall, jl_cod
args[0] = literal_pointer_val(ty);
args[1] = literal_pointer_val((jl_value_t*)ctx->linfo->def->sig);
args[2] = builder.CreateInBoundsGEP(
LLVM37_param(T_pjlvalue)
emit_bitcast(ctx->spvals_ptr, T_ppjlvalue),
LLVM37_param(T_prjlvalue)
emit_bitcast(ctx->spvals_ptr, T_pprjlvalue),
ConstantInt::get(T_size, sizeof(jl_svec_t) / sizeof(jl_value_t*)));
return builder.CreateCall(prepare_call(jlapplytype_func), makeArrayRef(args));
}
Expand Down Expand Up @@ -639,7 +641,7 @@ static Value *julia_to_native(Type *to, bool toboxed, jl_value_t *jlto, jl_union
// We're passing Any
if (toboxed) {
assert(!byRef); // don't expect any ABI to pass pointers by pointer
return boxed(jvinfo, ctx);
return maybe_decay_untracked(boxed(jvinfo, ctx));
}
assert(jl_is_datatype(jlto) && julia_struct_has_layout((jl_datatype_t*)jlto, jlto_env));

Expand Down Expand Up @@ -1208,7 +1210,9 @@ static jl_cgval_t mark_or_box_ccall_result(Value *result, bool isboxed, jl_value
Value *runtime_dt = runtime_apply_type(rt, unionall, ctx);
// TODO: is this leaf check actually necessary, or is it structurally guaranteed?
emit_leafcheck(runtime_dt, "ccall: return type must be a leaf DataType", ctx);
#if JL_LLVM_VERSION >= 30600
#if JL_LLVM_VERSION >= 40000
const DataLayout &DL = jl_data_layout;
#elif JL_LLVM_VERSION >= 30600
const DataLayout &DL = jl_ExecutionEngine->getDataLayout();
#else
const DataLayout &DL = *jl_ExecutionEngine->getDataLayout();
Expand Down Expand Up @@ -1306,7 +1310,7 @@ std::string generate_func_sig()
#else
paramattrs.push_back(AttributeSet::get(jl_LLVMContext, 1, retattrs));
#endif
fargt_sig.push_back(PointerType::get(lrt, 0));
fargt_sig.push_back(PointerType::get(lrt, AddressSpace::Derived));
sret = 1;
prt = lrt;
}
Expand Down Expand Up @@ -1349,6 +1353,8 @@ std::string generate_func_sig()
}

t = julia_struct_to_llvm(tti, unionall_env, &isboxed);
if (isboxed)
t = T_prjlvalue;
if (t == NULL || t == T_void) {
std::stringstream msg;
msg << "ccall: the type of argument ";
Expand All @@ -1369,7 +1375,7 @@ std::string generate_func_sig()
pat = t;
}
else if (byRef) {
pat = PointerType::get(t, 0);
pat = PointerType::get(t, AddressSpace::Derived);
}
else {
pat = abi->preferred_llvm_type((jl_datatype_t*)tti, false);
Expand Down Expand Up @@ -1457,6 +1463,8 @@ static const std::string verify_ccall_sig(size_t nargs, jl_value_t *&rt, jl_valu
lrt = julia_struct_to_llvm(rt, unionall_env, &retboxed);
if (lrt == NULL)
return "ccall: return type doesn't correspond to a C type";
else if (retboxed)
lrt = T_prjlvalue;

// is return type fully statically known?
if (unionall_env == NULL) {
Expand Down Expand Up @@ -1644,8 +1652,16 @@ static jl_cgval_t emit_ccall(jl_value_t **args, size_t nargs, jl_codectx_t *ctx)
ary = emit_unbox(largty, emit_expr(argi, ctx), tti);
}
JL_GC_POP();
return mark_or_box_ccall_result(emit_bitcast(ary, lrt),
retboxed, rt, unionall, static_rt, ctx);
if (lrt != T_prjlvalue) {
return mark_or_box_ccall_result(
emit_bitcast(emit_pointer_from_objref(
emit_bitcast(ary, T_prjlvalue)), lrt),
retboxed, rt, unionall, static_rt, ctx);
} else {
return mark_or_box_ccall_result(maybe_decay_untracked(
emit_bitcast(ary, lrt)),
retboxed, rt, unionall, static_rt, ctx);
}
}
else if (is_libjulia_func(jl_cpu_pause)) {
// Keep in sync with the julia_threads.h version
Expand Down Expand Up @@ -1961,6 +1977,7 @@ jl_cgval_t function_sig_t::emit_a_ccall(
ai + 1, ctx, &needStackRestore);
bool issigned = jl_signed_type && jl_subtype(jargty, (jl_value_t*)jl_signed_type);
if (byRef) {
v = decay_derived(v);
// julia_to_native should already have done the alloca and store
assert(v->getType() == pargty);
}
Expand All @@ -1976,6 +1993,13 @@ jl_cgval_t function_sig_t::emit_a_ccall(
}
v = julia_to_address(largty, jargty_in_env, unionall_env, arg,
ai + 1, ctx, &needStackRestore);
if (isa<UndefValue>(v)) {
JL_GC_POP();
return jl_cgval_t();
}
// A bit of a hack, but we're trying to get rid of this feature
// anyway.
v = emit_bitcast(emit_pointer_from_objref(v), pargty);
assert((!toboxed && !byRef) || isa<UndefValue>(v));
}

Expand Down Expand Up @@ -2003,7 +2027,7 @@ jl_cgval_t function_sig_t::emit_a_ccall(
literal_pointer_val((jl_value_t*)rt));
sretboxed = true;
}
argvals[0] = emit_bitcast(result, fargt_sig.at(0));
argvals[0] = emit_bitcast(decay_derived(result), fargt_sig.at(0));
}

Instruction *stacksave = NULL;
Expand Down Expand Up @@ -2091,9 +2115,11 @@ jl_cgval_t function_sig_t::emit_a_ccall(
// Mark GC use before **and** after the ccall to make sure the arguments
// are alive during the ccall even if the function called is `noreturn`.
mark_gc_uses(gc_uses);
OperandBundleDef OpBundle("jl_roots", gc_uses);
// the actual call
Value *ret = builder.CreateCall(prepare_call(llvmf),
ArrayRef<Value*>(&argvals[0], nargs + sret));
ArrayRef<Value*>(&argvals[0], nargs + sret),
ArrayRef<OperandBundleDef>(&OpBundle, gc_uses.empty() ? 0 : 1));
((CallInst*)ret)->setAttributes(attributes);

if (cc != CallingConv::C)
Expand Down Expand Up @@ -2135,6 +2161,9 @@ jl_cgval_t function_sig_t::emit_a_ccall(
}
else {
Type *jlrt = julia_type_to_llvm(rt, &jlretboxed); // compute the real "julian" return type and compute whether it is boxed
if (jlretboxed) {
jlrt = T_prjlvalue;
}
if (type_is_ghost(jlrt)) {
return ghostValue(rt);
}
Expand All @@ -2150,7 +2179,9 @@ jl_cgval_t function_sig_t::emit_a_ccall(
Value *strct = emit_allocobj(ctx, rtsz, runtime_bt);
int boxalign = jl_gc_alignment(rtsz);
#ifndef JL_NDEBUG
#if JL_LLVM_VERSION >= 30600
#if JL_LLVM_VERSION >= 40000
const DataLayout &DL = jl_data_layout;
#elif JL_LLVM_VERSION >= 30600
const DataLayout &DL = jl_ExecutionEngine->getDataLayout();
#else
const DataLayout &DL = *jl_ExecutionEngine->getDataLayout();
Expand Down
Loading

0 comments on commit 9f1cd63

Please sign in to comment.