-
Notifications
You must be signed in to change notification settings - Fork 186
Existential Structs
Update: this has been implemented and merged as of https://github.com/checkedc/checkedc-llvm-project/issues/679
Acknowledgements: Michael Hicks suggested the initial design of existential structs, on which this proposal is based.
This design document describes existential types in CheckedC (specifically, existentially-quantified structs). The goal of existential structs is to make certain idioms (e.g. callbacks) that currently require unsafe void *
casts in C, safe in CheckedC.
See https://github.com/checkedc/checkedc/blob/master/tests/exist_types_runtest/callbacks.c
We start by showing an example of how existential structs can be used to keep implementation details hidden from the user of an ADT. Specifically, we implement a set of ints where the representation type remains abstract.
Creating an existential struct is a three-step process.
- Define a corresponding generic struct.
struct IntSet _For_any(T) {
T *rep;
void (*add)(T *rep, int x);
void (*rem)(T *rep, int x);
int (*find)(T *rep, int x);
}
The definition above says that an IntSet
is anything with an underlying representation type T
(about which we know nothing), together with a field rep
of type T *
and operations to manipulate the representation.
- Define an instance of the generic type. The created instance is referred to as the witness for the existential. Notice that so far we have only used generic structs: the existential struct is introduced in step 3 below.
// A set of ints represented as a linked list.
struct List {
int head;
struct List *tail;
};
int find(struct List **elems, int x) {
struct List *curr = *elems;
while (curr) {
if (curr->head == x) return 1;
curr = curr->tail;
}
return 0;
}
void add(struct List **elems, int x) {
if (find(elems, x)) return;
struct List *node = malloc(sizeof(struct List));
node->head = x;
node->tail = *elems;
*elems = node;
}
struct IntSet<struct List *> listSet;
listSet.rep = 0;
listSet.find = find;
listSet.add = add;
listSet.rem = rem;
- Invoke a pack builtin function that hides the type of the witness, giving the now-packed witness an existential type.
_Exists(T, struct IntSet<T>) set = _Pack(listSet, _Exists(T, struct IntSet<T>), struct List *);
The invocation pack(listSet, _Exists(T, struct IntSet<T>), struct List *)
of pack
can be read as
"pack listSet
as an _Exists(T, struct IntSet<T>)
by checking that listSet
has type struct IntSet<T>
, where T
is replaced by struct List *
".
In general, to typecheck a pack operation pack(expr, exist_type, subst_type)
we:
- Compute the type
ExprType
ofexpr
. - Verify that
exist_type
is an existential type_Exists(T, InnerType)
. - Check that
ExprType
is equal toInnerType [subst_type/T]
, that is, the type of theexpr
should match the inner type, modulo the substitution "subst_type
replaces all bound instances ofT
inInnerType
".
For now, the call to pack will require re-stating the type of the existential, and being explicit about the type
being substituted during the pack
operation. If needed, we will implement some form of type inference to
reduce the typing burden on users.
This doesn't work in the current implementation because of a couple of bugs. See https://github.com/checkedc/checkedc-llvm-project/issues/657 for status
Once we have a value of type _Exists(T, IntSet<T>)
, how do we use it?
First, note that we cannot access any of set
's fields directly:
struct List **elems = set.rep; // error: 'elems' is an existential struct, so you need to unpack it before accessing its fields
set.add(set.rep, 42); // error: 'elems' is an existential struct, so you need to unpack it before accessing its fields
We can still copy set
, pass it to functions, etc. (we just can't look "inside" of it):
_Exists<struct IntSet<_> > set2 = set; // allowed
void setOp(_Exists(T, struct IntSet<T>) setArg) {}
setOp(set); // allowed
To access the set's fields, we first need to unpack it.
_Unpack (T) struct IntSet<T> intSet = set;
T rep2; // error: `T` is an incomplete type
T *rep = intSet.rep;
(*rep)->head; // error: `T` has no field `head` (even though the underlying type `struct List *` does)
intSet.add(intSet.rep, 42);
int found = intSet.find(intSet.rep, 42); // found == 1
intSet.rem(intSet.rep, 42);
int found2 = intSet.find(intSet.rep, 42); // found2 == 0
In the snippet above, T
is a user-written type variable whose scope starts after the unpack
and extends until the end of the current scope:
{
_Unpack (T) struct IntSet<T> set1 = set;
T *rep = intSet.rep; // allowed since T is in scope
}
T *rep; // error: T is not in scope
Multiple unpacks
produce incompatible types:
unpack (T) struct IntSet<T> intSet = set;
unpack (U) struct IntSet<U> intSet2 = set;
intSet.add(intSet.rep, 42); // allowed: types are consistent
intSet.add(intSet2.rep, 42); // error: expected argument of type `T *` but got `U *`
Closures can be represented via a mixture of existential and generic structs.
struct Fn _ForAny(Env, From, To) {
Env *env;
To*(*eval)(Env *env, From *from)
};
struct ComposeEnv _For_any(Env1, Env2, A, B, C) {
struct Fn<Env1, A, B> fn1;
struct Fn<Env2, B, C> fn2;
};
_For_any(Env1, Env2, A, B, C) C* composeEval(struct ComposeEnv<Env1, Env2, A, B, C> *env, A *a) {
struct Fn<Env, A, B> fn1 = env->fn1;
struct Fn<Env, B, C> fn2 = env->fn2;
B *b = fn1->eval(fn1->env, a);
C *c = fn2->eval(fn2->env, b);
return c;
}
_For_any(A, B, C) _Exists<struct Fn<_, A, C> > compose(_Exists<struct Fn<_, A, B> > f1, _Exists<struct Fn<_, B, C> > f2) {
_Unpack (E1) struct Fn<E1, A, B> unpackF1 = f1;
_Unpack (E2) struct Fn<E2, B, C> unpackF2 = f2;
struct ComposeEnv<E1, E2, A, B, C> composeEnv;
composeEnv.fn1 = unpackF1;
composeEnv.fn2 = unpackF2.
struct Fn<struct ComposeEnv<E1, E2, A, B, C>, A, C> composeFn;
composeFn.env = composeEnv;
composeFn.eval = composeEval<E1, E2, A, B, C>; // TODO: this step currently fails with the error below (need to fix)
// Assertion failed: getOperand(0)->getType() == cast<PointerType>(getOperand(1)->getType())->getElementType() && "Ptr
// must be a pointer to Val type!", file C:\cygwin64\home\t-abniet\src\llvm\lib\IR\Instructions.cpp, line 1210
_Exists(T, struct Fn<T, A, C>) res = _Pack(composeFn, _Exists(T, struct Fn<T, A, C>), struct ComposeEnv<E1, E2, A, B, C>);
return res;
}
pack
and unpack
are quite different.
-
pack
is a builtin function, and doesn't introduce any new bindings.
_Exists(T, struct IntSet<T>) set2 = pack(foo, _Exists(T, struct IntSet<T>), struct Foo); // the type of `set2` is `_Exists(T, struct IntSet<T>)`
We could also just call pack
without assigning its result: e.g. when returning from a function.
_Exists(T, struct IntSet<T>) getSet() {
...
return _Pack(listSet, _Exists(T, struct IntSet<T>), struct Foo);
}
-
unpack
generates two bindings: one for the fresh type variable, and one for the unpacked struct. unpack is a new type of declarator, not a function.
_Unpack (T) struct IntSet<T>) intSet = set; // the type of `intSet` is `IntSet<T>`
T *rep = intSet.rep; // T is available until the end of the current scope
We have discussed syntax for supporting existentials with multiple arguments.
These will not be included in the feature prototype, but we do think they'll be useful because
there are likely to be structs
with multiple fields that need to be made private.
_Exists(T, U, V, struct Foo<T, U, V>) foo;
_Pack(fooImpl, Exists(T, U, V, struct Foo<T, U, V>), (int, char, int)); // fooImpl should have type 'struct FOo<int, char, int>'
_Unpack (T, U, V) struct Foo<T, U, V> fooAbs = foo; // foo should have type _Exists(O, P, Q, struct Foo<O, P, Q>)
Checked C Wiki