Skip to content

Existential Structs

David Tarditi edited this page Jul 8, 2023 · 6 revisions

Update: this has been implemented and merged as of https://github.com/checkedc/checkedc-llvm-project/issues/679

Acknowledgements: Michael Hicks suggested the initial design of existential structs, on which this proposal is based.


This design document describes existential types in CheckedC (specifically, existentially-quantified structs). The goal of existential structs is to make certain idioms (e.g. callbacks) that currently require unsafe void * casts in C, safe in CheckedC.

Example 1: Callbacks

See https://github.com/checkedc/checkedc/blob/master/tests/exist_types_runtest/callbacks.c

Example 2: Abstract Sets

We start by showing an example of how existential structs can be used to keep implementation details hidden from the user of an ADT. Specifically, we implement a set of ints where the representation type remains abstract.

Creating Existentials (_Pack)

Creating an existential struct is a three-step process.

  1. Define a corresponding generic struct.
struct IntSet _For_any(T) {
  T *rep;
  void (*add)(T *rep, int x);
  void (*rem)(T *rep, int x);
  int (*find)(T *rep, int x);
}

The definition above says that an IntSet is anything with an underlying representation type T (about which we know nothing), together with a field rep of type T * and operations to manipulate the representation.

  1. Define an instance of the generic type. The created instance is referred to as the witness for the existential. Notice that so far we have only used generic structs: the existential struct is introduced in step 3 below.
// A set of ints represented as a linked list.
struct List {
  int head;
  struct List *tail;
};

int find(struct List **elems, int x) {
  struct List *curr = *elems;
  while (curr) {
    if (curr->head == x) return 1;
    curr = curr->tail;
  }
  return 0;
}

void add(struct List **elems, int x) {
  if (find(elems, x)) return;
  struct List *node = malloc(sizeof(struct List));
  node->head = x;
  node->tail = *elems;
  *elems = node;
}

struct IntSet<struct List *> listSet;
listSet.rep = 0;
listSet.find = find;
listSet.add = add;
listSet.rem = rem;
  1. Invoke a pack builtin function that hides the type of the witness, giving the now-packed witness an existential type.
_Exists(T, struct IntSet<T>) set = _Pack(listSet, _Exists(T, struct IntSet<T>), struct List *);

The invocation pack(listSet, _Exists(T, struct IntSet<T>), struct List *) of pack can be read as "pack listSet as an _Exists(T, struct IntSet<T>) by checking that listSet has type struct IntSet<T>, where T is replaced by struct List *".

In general, to typecheck a pack operation pack(expr, exist_type, subst_type) we:

  1. Compute the type ExprType of expr.
  2. Verify that exist_type is an existential type _Exists(T, InnerType).
  3. Check that ExprType is equal to InnerType [subst_type/T], that is, the type of the expr should match the inner type, modulo the substitution "subst_type replaces all bound instances of T in InnerType".

For now, the call to pack will require re-stating the type of the existential, and being explicit about the type being substituted during the pack operation. If needed, we will implement some form of type inference to reduce the typing burden on users.

Using Existentials (_Unpack)

This doesn't work in the current implementation because of a couple of bugs. See https://github.com/checkedc/checkedc-llvm-project/issues/657 for status Once we have a value of type _Exists(T, IntSet<T>), how do we use it?

First, note that we cannot access any of set's fields directly:

struct List **elems = set.rep; // error: 'elems' is an existential struct, so you need to unpack it before accessing its fields
set.add(set.rep, 42); // error: 'elems' is an existential struct, so you need to unpack it before accessing its fields

We can still copy set, pass it to functions, etc. (we just can't look "inside" of it):

_Exists<struct IntSet<_> > set2 = set; // allowed
void setOp(_Exists(T, struct IntSet<T>) setArg) {}
setOp(set); // allowed

To access the set's fields, we first need to unpack it.

_Unpack (T) struct IntSet<T> intSet = set;
T rep2; // error: `T` is an incomplete type
T *rep = intSet.rep;
(*rep)->head; // error: `T` has no field `head` (even though the underlying type `struct List *` does)
intSet.add(intSet.rep, 42);
int found = intSet.find(intSet.rep, 42); // found == 1
intSet.rem(intSet.rep, 42); 
int found2 = intSet.find(intSet.rep, 42); // found2 == 0

In the snippet above, T is a user-written type variable whose scope starts after the unpack and extends until the end of the current scope:

{
  _Unpack (T) struct IntSet<T> set1 = set;
  T *rep = intSet.rep; // allowed since T is in scope
}
T *rep; // error: T is not in scope 

Multiple unpacks produce incompatible types:

unpack (T) struct IntSet<T> intSet = set;
unpack (U) struct IntSet<U> intSet2 = set;
intSet.add(intSet.rep, 42); // allowed: types are consistent
intSet.add(intSet2.rep, 42); // error: expected argument of type `T *` but got `U *`

Example 3: Closures

Closures can be represented via a mixture of existential and generic structs.

struct Fn _ForAny(Env, From, To) {
  Env *env;
  To*(*eval)(Env *env, From *from)
};

struct ComposeEnv _For_any(Env1, Env2, A, B, C) {
  struct Fn<Env1, A, B> fn1;
  struct Fn<Env2, B, C> fn2;
};

_For_any(Env1, Env2, A, B, C) C* composeEval(struct ComposeEnv<Env1, Env2, A, B, C> *env, A *a) {
  struct Fn<Env, A, B> fn1 = env->fn1;
  struct Fn<Env, B, C> fn2 = env->fn2;
  B *b = fn1->eval(fn1->env, a);
  C *c = fn2->eval(fn2->env, b);
  return c;
}

_For_any(A, B, C) _Exists<struct Fn<_, A, C> > compose(_Exists<struct Fn<_, A, B> > f1, _Exists<struct Fn<_, B, C> > f2) {
  _Unpack (E1) struct Fn<E1, A, B> unpackF1 = f1;
  _Unpack (E2) struct Fn<E2, B, C> unpackF2 = f2;

  struct ComposeEnv<E1, E2, A, B, C> composeEnv;
  composeEnv.fn1 = unpackF1;
  composeEnv.fn2 = unpackF2.

  struct Fn<struct ComposeEnv<E1, E2, A, B, C>, A, C> composeFn;
  composeFn.env = composeEnv;
  composeFn.eval = composeEval<E1, E2, A, B, C>; // TODO: this step currently fails with the error below (need to fix)
  // Assertion failed: getOperand(0)->getType() == cast<PointerType>(getOperand(1)->getType())->getElementType() && "Ptr
  // must be a pointer to Val type!", file C:\cygwin64\home\t-abniet\src\llvm\lib\IR\Instructions.cpp, line 1210

  _Exists(T, struct Fn<T, A, C>) res = _Pack(composeFn, _Exists(T, struct Fn<T, A, C>), struct ComposeEnv<E1, E2, A, B, C>);
  return res;
}

pack vs unpack

pack and unpack are quite different.

  • pack is a builtin function, and doesn't introduce any new bindings.
_Exists(T, struct IntSet<T>) set2 = pack(foo, _Exists(T, struct IntSet<T>), struct Foo); // the type of `set2` is `_Exists(T, struct IntSet<T>)`

We could also just call pack without assigning its result: e.g. when returning from a function.

_Exists(T, struct IntSet<T>) getSet() {
  ...
  return _Pack(listSet, _Exists(T, struct IntSet<T>), struct Foo);
}
  • unpack generates two bindings: one for the fresh type variable, and one for the unpacked struct. unpack is a new type of declarator, not a function.
_Unpack (T) struct IntSet<T>) intSet = set; // the type of `intSet` is `IntSet<T>`
T *rep = intSet.rep; // T is available until the end of the current scope

Possible Extension: Multi-Argument Existentials

We have discussed syntax for supporting existentials with multiple arguments. These will not be included in the feature prototype, but we do think they'll be useful because there are likely to be structs with multiple fields that need to be made private.

_Exists(T, U, V, struct Foo<T, U, V>) foo;

_Pack(fooImpl, Exists(T, U, V, struct Foo<T, U, V>), (int, char, int)); // fooImpl should have type 'struct FOo<int, char, int>'

_Unpack (T, U, V) struct Foo<T, U, V> fooAbs = foo; // foo should have type _Exists(O, P, Q, struct Foo<O, P, Q>)