Skip to content

Commit

Permalink
Implement support for union types.
Browse files Browse the repository at this point in the history
Closes dotnet#275.

Types with explicit layout may have fields that overlap one another (aka union types). Before this change we'd fail to compile any method that referenced one of these union types. We now model them as best we can using LLVM types.

LLVM doesn't provide a strong way to describe unions. Instead we generally provide a byte array that covers the extent of the union as a representative placeholder. That means for some fields there is no exact LLMV type counterpart. Downstream consumers cope with this as follows: for any field within the extent of a set of overlapping fields, we omit that field handle from the `FieldIndexMap`, so that when a ldfld or similar goes to find out what LLVM type index to use, it comes up empty-handed. We already had a fall-back path here to simply use the EE provided field offset when this happens, and so now that path kicks in for accesses to overlapping fields, and subsequent pointer casts then fix up the types properly.

However things are not quite that simple. We also want the LLVM type to fully describe the location of any GC references within the type, and it is valid for GC references to appear in one of these overlap sets. GC references must fully overlap one another and can't be safely overlapped with non-GC data (see Ecma-355, II.10.7). This means the extents of the GC references in an union partition the union into ranges of either non-GC data or GC references. We report the former using the byte arrays, and the latter as object references. Thus for instance a type `C` like
```
[StructLayout(LayoutKind.Explicit)]
public struct A
{
    [FieldOffset(0)]public string Name;
    [FieldOffset(8)]public int Size;
}

[StructLayout(LayoutKind.Explicit)]
public struct C
{
    [FieldOffset(0)]public A X;
    [FieldOffset(0)]public string Name;
    [FieldOffset(8)]public int Size;
}

```
would on x64 be described as
```
type <{ %System.Object addrspace(1)*, [8 x i8] }>
```
This special handling for GC references is strictly only necessary for value types (since GC reference locations for value types on the stack must be reported at GC safe points) but for uniformity we do it for all types. We also continue to double-check value type GC locations against the GC pointer info provided by the EE.
  • Loading branch information
AndyAyersMS committed May 7, 2015
1 parent b46451c commit 8e5b4e3
Show file tree
Hide file tree
Showing 2 changed files with 188 additions and 32 deletions.
31 changes: 31 additions & 0 deletions include/Reader/readerir.h
Original file line number Diff line number Diff line change
Expand Up @@ -1322,6 +1322,11 @@ class GenIR : public ReaderBase {
/// \returns LLVM type that models the built-in string type.
llvm::Type *getBuiltInStringType();

/// Get the LLVM type for the built-in object type.
///
/// \returns LLVM type that models the built-in object type.
llvm::Type *getBuiltInObjectType();

/// Get the LLVM type for an array of references.
///
/// Used when we know that some type must be an array but our local
Expand All @@ -1346,6 +1351,32 @@ class GenIR : public ReaderBase {
CorInfoType ElementCorType,
CORINFO_CLASS_HANDLE ElementClassHandle);

/// Add fields of a type to the field vector, expanding structures
/// (recursively) to the types they contain.
///
/// \param Fields vector of offset, type info for overlapping fields.
/// \param Offset offset of the new type to add.
/// \param Ty the new type to add.
void
addFieldsRecursively(std::vector<std::pair<uint32_t, llvm::Type *>> &Fields,
uint32_t Offset, llvm::Type *Ty);

/// Given a set of overlapping primitive typed fields, determine the set of
/// representative fields to used to describe these in an LLVM type and add
/// them to the field collection for that type. Ensure that any GC
/// references are properly described. Non-GC fields will be represented by
/// suitably sized byte arrays.
///
/// \param OverlapFields [in, out] On input, vector of offset, type info for
/// overlapping fields. Empty on on exit.
/// \param Fields [in, out] On input, vector of field types found so
/// far for the ultimate type being
/// constructed. On exit, extended with
/// representative fields for the overlap set.
void createOverlapFields(
std::vector<std::pair<uint32_t, llvm::Type *>> &OverlapFields,
std::vector<llvm::Type *> &Fields);

private:
LLILCJitContext *JitContext;
ABIInfo *TheABIInfo;
Expand Down
189 changes: 157 additions & 32 deletions lib/Reader/readerir.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1418,16 +1418,75 @@ Type *GenIR::getClassType(CORINFO_CLASS_HANDLE ClassHandle, bool IsRefClass,
// default comparator here.
std::sort(DerivedFields.begin(), DerivedFields.end());

// If we find overlapping fields, we'll stash them here so we can look
// at them collectively.
std::vector<std::pair<uint32_t, Type *>> OverlappingFields;

// Now walk the fields in increasing offset order, adding
// them and padding to the struct as we go.
for (const auto &FieldPair : DerivedFields) {
const uint32_t FieldOffset = FieldPair.first;
CORINFO_FIELD_HANDLE FieldHandle = FieldPair.second;

// Bail out for now if we see a union type.
// Prepare to add this field to the collection.
//
// If this field is a ref class reference, we don't need the full
// details on the referred-to class, and asking for the details here
// causes trouble with certain recursive type graphs, for example:
//
// class A { B b; }
// class B : extends A { int c; }
//
// We need to know the size of A before we can finish B. So we can't
// ask for B's details while filling out A.
CORINFO_CLASS_HANDLE FieldClassHandle;
CorInfoType CorInfoType = getFieldType(FieldHandle, &FieldClassHandle);
bool GetFieldDetails = (CorInfoType != CORINFO_TYPE_CLASS);
Type *FieldTy = getType(CorInfoType, FieldClassHandle, GetFieldDetails);
// If we don't get the details now, make sure to ask
// for them later.
if (!GetFieldDetails) {
DeferredDetailClasses.push_back(FieldClassHandle);
}

// If we see an overlapping field, we need to handle it specially.
if (FieldOffset < ByteOffset) {
ASSERT(IsUnion);
throw NotYetImplementedException("union types");
assert(IsUnion && "unexpected overlapping fields");
// TypedByref and String get special treatement which we will skip
// in processing overlaps.
assert(!IsTypedByref && "No overlap expected in this type");
assert(!IsString && "No overlap expected in this type");
if (OverlappingFields.empty()) {
// The previously processed field is also part of the overlap
// set. Back it out of the main field collection and add it to the
// overlap collection instead.
Type *PreviousFieldTy = Fields.back();
Fields.pop_back();
uint32_t PreviousSize =
DataLayout->getTypeSizeInBits(PreviousFieldTy) / 8;
uint32_t PreviousOffset = ByteOffset - PreviousSize;
addFieldsRecursively(OverlappingFields, PreviousOffset,
PreviousFieldTy);
}

// Add the current field to the overlap set.
uint32_t FieldSize = DataLayout->getTypeSizeInBits(FieldTy) / 8;
OverlappingFields.push_back(std::make_pair(FieldOffset, FieldTy));
addFieldsRecursively(OverlappingFields, FieldOffset, FieldTy);

// Determine new extent of the overlap region.
ByteOffset = std::max(ByteOffset, FieldOffset + FieldSize);

// Defer further processing until we find the end of the overlap
// region.
continue;
}

// This new field begins after any existing field. If we have an overlap
// set in the works, we need to process it now.
if (!OverlappingFields.empty()) {
createOverlapFields(OverlappingFields, Fields);
assert(OverlappingFields.empty());
}

// Account for padding by injecting a field.
Expand All @@ -1438,6 +1497,9 @@ Type *GenIR::getClassType(CORINFO_CLASS_HANDLE ClassHandle, bool IsRefClass,
ByteOffset += DataLayout->getTypeSizeInBits(PadTy) / 8;
}

// We should be at the field offset now.
ASSERT(FieldOffset == ByteOffset);

// Validate or record this field's index in the map.
auto FieldMapEntry = FieldIndexMap->find(FieldHandle);

Expand All @@ -1450,29 +1512,6 @@ Type *GenIR::getClassType(CORINFO_CLASS_HANDLE ClassHandle, bool IsRefClass,
(*FieldIndexMap)[FieldHandle] = Fields.size();
}

// Add this field to the collection.
//
// If this field is a ref class reference, we don't need the full
// details on the referred-to class, and asking for the details here
// causes trouble with certain recursive type graphs, for example:
//
// class A { B b; }
// class B : extends A { int c; }
//
// We need to know the size of A before we can finish B. So we can't
// ask for B's details while filling out A.

ASSERT(FieldOffset == ByteOffset);
CORINFO_CLASS_HANDLE FieldClassHandle;
CorInfoType CorInfoType = getFieldType(FieldHandle, &FieldClassHandle);
bool GetFieldDetails = (CorInfoType != CORINFO_TYPE_CLASS);
Type *FieldTy = getType(CorInfoType, FieldClassHandle, GetFieldDetails);
// If we don't get the details now, make sure to ask
// for them later.
if (!GetFieldDetails) {
DeferredDetailClasses.push_back(FieldClassHandle);
}

// The first field of a typed byref is really GC (interior)
// pointer. It's described in metadata as a pointer-sized integer.
// Tweak it back...
Expand All @@ -1494,8 +1533,11 @@ Type *GenIR::getClassType(CORINFO_CLASS_HANDLE ClassHandle, bool IsRefClass,
ByteOffset += DataLayout->getTypeSizeInBits(FieldTy) / 8;
}

// We should have detected unions up above and bailed.
ASSERT(!IsUnion);
// If we have one final overlap set in the works, we need to process it now.
if (!OverlappingFields.empty()) {
createOverlapFields(OverlappingFields, Fields);
assert(OverlappingFields.empty());
}

// If this is a value class, account for any additional end
// padding that the runtime sees fit to add.
Expand Down Expand Up @@ -1614,11 +1656,10 @@ Type *GenIR::getClassType(CORINFO_CLASS_HANDLE ClassHandle, bool IsRefClass,
}
}

const bool IsGCPointer = FieldTy->isPointerTy() &&
isManagedPointerType(cast<PointerType>(FieldTy));

// LLVM's type and the runtime must agree here.
ASSERT(ExpectGCPointer == IsGCPointer);
const bool IsGCPointer = isManagedPointerType(FieldTy);
assert((ExpectGCPointer == IsGCPointer) &&
"llvm type incorrectly describes location of gc references");
}
}

Expand All @@ -1634,6 +1675,84 @@ Type *GenIR::getClassType(CORINFO_CLASS_HANDLE ClassHandle, bool IsRefClass,
return ResultTy;
}

void GenIR::addFieldsRecursively(
std::vector<std::pair<uint32_t, llvm::Type *>> &Fields, uint32_t Offset,
llvm::Type *Ty) {
StructType *StructTy = dyn_cast<StructType>(Ty);
if (StructTy != nullptr) {
const DataLayout *DataLayout = JitContext->EE->getDataLayout();
for (Type *SubTy : StructTy->subtypes()) {
addFieldsRecursively(Fields, Offset, SubTy);
Offset += DataLayout->getTypeSizeInBits(SubTy) / 8;
}
} else {
Fields.push_back(std::make_pair(Offset, Ty));
}
}

void GenIR::createOverlapFields(
std::vector<std::pair<uint32_t, llvm::Type *>> &OverlapFields,
std::vector<llvm::Type *> &Fields) {

// Prepare to create and measure types.
LLVMContext &LLVMContext = *JitContext->LLVMContext;
const DataLayout *DataLayout = JitContext->EE->getDataLayout();

// Order the OverlapFields by offset.
std::sort(OverlapFields.begin(), OverlapFields.end());

// Walk the fields, accumulating the unique starting offsets of the gc
// references in increasing offset order.
std::vector<uint32_t> GcOffsets;
uint32_t OverlapEndOffset = 0;
for (const auto &OverlapField : OverlapFields) {
uint32_t Offset = OverlapField.first;
uint32_t Size = DataLayout->getTypeSizeInBits(OverlapField.second) / 8;
OverlapEndOffset = std::max(OverlapEndOffset, Offset + Size);
if (isManagedPointerType(OverlapField.second)) {
assert(((Offset % getPointerByteSize()) == 0) &&
"expect aligned gc pointers");
if (GcOffsets.empty()) {
GcOffsets.push_back(Offset);
} else {
uint32_t LastOffset = GcOffsets.back();
assert((Offset >= LastOffset) && "expect offsets to be sorted");
if (Offset > LastOffset) {
GcOffsets.push_back(Offset);
}
}
}
}

// Walk the GC reference offsets, creating representative fields.
uint32_t FirstOffset = OverlapFields.begin()->first;
uint32_t CurrentOffset = FirstOffset;
for (const auto &GcOffset : GcOffsets) {
assert((GcOffset >= CurrentOffset) && "expect offsets to be sorted");
uint32_t NonGcPreambleSize = GcOffset - CurrentOffset;
if (NonGcPreambleSize > 0) {
Type *NonGcTy =
ArrayType::get(Type::getInt8Ty(LLVMContext), NonGcPreambleSize);
Fields.push_back(NonGcTy);
}
Fields.push_back(getBuiltInObjectType());
CurrentOffset += getPointerByteSize();
}

// Create a trailing non-gc field if needed.
uint32_t OverlapSize = OverlapEndOffset - FirstOffset;
uint32_t CurrentOverlapSize = CurrentOffset - FirstOffset;
assert((CurrentOverlapSize <= OverlapSize) && "overlap size overflow");
if (CurrentOverlapSize < OverlapSize) {
uint32_t RemainingSize = OverlapSize - CurrentOverlapSize;
Type *NonGcTy = ArrayType::get(Type::getInt8Ty(LLVMContext), RemainingSize);
Fields.push_back(NonGcTy);
}

// Clear out the overlap fields as promised.
OverlapFields.clear();
}

Type *GenIR::getBoxedType(CORINFO_CLASS_HANDLE Class) {
assert(JitContext->JitInfo->isValueClass(Class));

Expand Down Expand Up @@ -2093,6 +2212,12 @@ void GenIR::createArrayOfReferenceType() {
ArrayOfReferenceType = getManagedPointerType(StructTy);
}

Type *GenIR::getBuiltInObjectType() {
CORINFO_CLASS_HANDLE ObjectClassHandle =
getBuiltinClass(CorInfoClassId::CLASSID_SYSTEM_OBJECT);
return getType(CORINFO_TYPE_CLASS, ObjectClassHandle);
}

Type *GenIR::getBuiltInStringType() {
CORINFO_CLASS_HANDLE StringClassHandle =
getBuiltinClass(CorInfoClassId::CLASSID_STRING);
Expand Down

0 comments on commit 8e5b4e3

Please sign in to comment.