Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constructing an object of a derived type #741

Closed
josh11b opened this issue Aug 12, 2021 · 14 comments
Closed

Constructing an object of a derived type #741

josh11b opened this issue Aug 12, 2021 · 14 comments
Labels
leads question A question for the leads team

Comments

@josh11b
Copy link
Contributor

josh11b commented Aug 12, 2021

Defining how construction works is the main (semantic) blocker to defining inheritance in Carbon that I'm aware of. We'd like to extend the "returned var" approach introduced in #257 to I've previously circulated this Google doc with options. Note that this document predates our current approach of using struct literals rather than tuple literals in construction, I've tried to update the content when copying into this issue.

Option 1: no base values, construct derived values directly

The simplest option is to construct values of type Derived directly, specifying values for all fields whether they are defined in Base or Derived. Link to option 1

fn BaseMembers(p: Int) -> {.x: Int, .y: String} {
  return {.x = ..., .y = ...};
}

// You wouldn't have this function if `Base` was abstract (equivalent to
// having pure virtual methods).
fn BaseConstructor(p: Int) -> Base {
  return BaseMembers(p);
  // struct converted to Base via return
}

fn DerivedConstructor(p: Int, q: Int) -> Derived {
  // Or use some sort of struct concatenation operator
  return {BaseMembers(p)..., .z = q};
  // struct converted to Derived via return
} 

Advantages:

  • Very simple
  • Never construct an intermediate value of type Base that may be abstract, eliminating concerns that pure virtual methods would be called.
  • vptr set only once

Disadvantages:

  • No way to make members of Base private, including their names.
  • Pretty different from C++'s model; not clear how interop would work.
  • To encapsulate the common logic for initializing fields in Base, can end up with two Base constructor functions.
  • Concerns around accidentally producing intermediate values with the wrong layout, creating unnecessary work

Option 2: construct derived values from base values

The compiler-supplied constructor of type Derived takes a value of type Base as its first argument, which is used to initialize all fields defined in Base. The remaining arguments are for fields defined only in Derived.
Link to option 2

Option 2a: unsound option

Link to option 2a

fn BaseConstructor(p: Int) -> Base {
  return {.x = ..., .y = ...};
  // struct converted to Base via return
}

fn DerivedConstructor(p: Int, q: Int: q) -> Derived {
  return {.super = BaseConstructor(p), .z = q};
  // struct converted to Derived via return
}

Option 2b: Type enforcement option

Link to option 2b

// Notice different return type
fn BaseNVConstructor(p: Int) -> Base.NoVirt {
  return {.x = ..., .y = ...};
  // struct converted to Base.NoVirt via return
}

fn DerivedConstructor(p: Int, q: Int) -> Derived {
  return {.super = BaseNVConstructor(p), .z = q};
  // struct converted to Derived via return
}

There were a few sub-variations proposed. I think the most promising is described at the end of the section, in the paragraph starting with "A second, simpler approach would be to include the space for a virtual table pointer in the Base.NoVirt type, so the size and offset of Base.NoVirt equals Base. "

Option 3: use interfaces and mixins instead of abstract base classes

Link to option 3
This is the most radical option, probably outside what we could comfortably adopt.

@josh11b
Copy link
Contributor Author

josh11b commented Aug 12, 2021

In open discussion, we are leaning towards option 2b, and are looking for a better name than Base.novirt.

@josh11b
Copy link
Contributor Author

josh11b commented Aug 13, 2021

Expanding on option 2b: We are considering base Base in place of Base.novirt. Example code:

base class MyBaseType {
  fn Create(...) -> base MyBaseType;
}

class MyFinalType extends MyBaseType {
  fn Create(...) -> MyFinalType;
}

MyBaseType.Create is allowed to create instances of base MyBaseType even if MyBaseType is abstract. If MyBaseType is not abstract (no pure-virtual functions), then base MyBaseType may be converted to MyBaseType, so users might write:

var b : MyBaseType = MyBaseType.Create(...);

Methods defined on MyBaseType could not be called on a base MyBaseType value. Non-virtual methods could opt into marking themselves as "safe to call during construction" by taking the me parameter with type base MyBaseType (or base Self) instead of Self. Such methods would only be able to call methods on base MyBaseType, statically proving that they don't transitively call any virtual methods.

@josh11b
Copy link
Contributor Author

josh11b commented Aug 24, 2021

We decided base wasn't the best choice of keyword for describing this facet type, for a few reasons:

  • It was confusing with other uses of the word "base".
  • It seemed like a strange word to apply to a type that was the final version of a base type.
  • It read poorly in examples.

We have not picked a replacement yet, though. Other spellings that have been brought up: ctor, under_construction, partial, construction, impl, novirt, exact, construct, constructor, bare.
Some discussion on this has happened in Discord and in open discussion, but so far no conclusion has been reached.

@chandlerc
Copy link
Contributor

Just so folks can see what the option 2b looks like with the direction the syntax is converging:

// Notice different return type
fn BaseNVConstructor(p: Int) -> partial Base {
  return {.x = ..., .y = ...};
  // struct converted to `partial Base` via return
}

fn DerivedConstructor(p: Int, q: Int) -> Derived {
  return {.base = BaseNVConstructor(p), .z = q};
  // struct converted to Derived via return
}

And then we have the list from @josh11b's comment for keywords that could be used instead of partial here.

@chandlerc
Copy link
Contributor

chandlerc commented Aug 26, 2021

Other keyword ideas from a discussion I had with @KateGregory:

  • init
  • fields_only
  • subobject
  • ctor_type
  • construction_type

Ideas that seem at least plausible enough to the two of us to poll and get feedback on:

  • init
  • ctor
  • novirt
  • fields_only
  • subobject
  • partial
  • construction
  • under_construction
  • exact
  • ctor_type
  • construction_type

I'm going work on conjuring a poll of some kind here....

@chandlerc
Copy link
Contributor

chandlerc commented Aug 26, 2021

My memory at least of the concerns around some of the options:

  • impl
    • Seemed too overloaded at this point, especially with the use of impl for virtual functions
  • exact, bare
    • The rationale for why this keyword could work here was hard to really internalize. Seemed significantly less easy to internalize than alternatives like partial.
  • construct, constructor
    • construction seemed similar length and more clear.

@zygoloid
Copy link
Contributor

Do we expect to also use this type variant as the type of the base class during destruction? I think that would make sense, especially given the limits it applies to virtual function calls and expectations about the type's partially-constructed nature. If so, I think that we should discard any names based on initialization or construction.

@josh11b
Copy link
Contributor Author

josh11b commented Aug 27, 2021

Here is a variation on 2b for your consideration, which I will call "2b sugar", which starts with "2b" and adds:

There is an alternate introducer constructor (or ctor or factory or whatever) that you may use in place of fn:

base class Abstract {
  constructor Create() -> Self { return {}; }
  abstract fn F[me: Self]();
}

base class Extensible extends Abstract {
  constructor Create() -> Self {
    return {.base = Abstract.Create() };
  }
  impl fn F[me: Self]() {}
}

class Final extends Extensible {
  constructor Create() -> Self {
    return {.base = Extensible.Create() };
  }
}

The return type in a constructor declaration must include Self in some way, and must not use the name of the type directly.

The constructor declaration is rewritten into one or two class function declarations:

  • If the class is non-abstract, meaning it may be instantiated, there will be a class function with the declared name ("Create") returning Self as declared in the signature.
  • If the class is a base, meaning it is not final and may be extended, there will be a class function with the declared name ("Create") taking an additional parameter prepended to the parameter list. This parameter will have type Carbon.BaseConstructor, a zero-sized type with value Carbon.base_constructor, and is just used to distinguish this version from the first for overload resolution. Instances of the keyword Self in the return type and body of the function will be replaced with the AsBaseSubobject(Self) facet of Self.

In extensible classes where both overloads of the name will be generated, the first overload will be defined to return the result of calling the second (prepending Carbon.base_constructor to the argument list) and converting the result to the new return type, with Self instead of AsBaseSubobject(Self). In the body of the function, calls to base or Self constructors are rewritten to call the second version by prepending Carbon.base_constructor to the argument list, if the second version exists.

If the constructor is declared public (the default), the first version (if present) will be public and the second version (if present) will be protected. Otherwise both versions will have whatever access the constructor is declared with.

In this way, users only need to identify which functions are constructors, and don't need to concern themselves with where they should use an alternate type, nor be tempted to declare both public and protected versions of the constructor to make a nicer API for the class.

We could additionally provide a way to declare constructors that are not class members.

@josh11b
Copy link
Contributor Author

josh11b commented Aug 28, 2021

So the above example would be rewritten to:

base class Abstract {
  protected fn Create(_: Carbon.BaseConstructor)
      -> AsBaseSubobject(Self) { return {}; }
  abstract fn F[me: Self]();
}

base class Extensible extends Abstract {
  protected fn Create(_: Carbon.BaseConstructor)
      -> AsBaseSubobject(Self) {
    return {.base = Abstract.Create(Carbon.base_constructor) };
  }
  fn Create() -> Self {
    return Extensible.Create(Carbon.base_constructor) as Self;
  }
  impl fn F[me: Self]() {}
}

class Final extends Extensible {
  fn Create() -> Self {
    return {.base = Extensible.Create(Carbon.base_constructor) };
  }
}

@zygoloid
Copy link
Contributor

I think there's an interesting tradeoff here. The injected parameter and rewriting of calls to add a corresponding argument is a little more magical than I'd prefer, and I'm concerned that it'll be surprising in some cases:

base class X {
  constructor Make() -> Self;
  virtual fn Get[me: Self]() -> Int;
}
class Y extends X {
  var base_value: Int;
  impl fn Get[me: Self]() -> Int;
  constructor Make() -> Self {
    var x: auto = X.Make();
    return {.base = X.Make(), .base_value = x.Get()};
  }
}

(where the idea is to create an instance of X in order to call and cache its Get() value, which will then presumably be used when computing Y's Get() value). Here I think we'd transform both calls to X.Make into calling the base constructor version and then reject the call to x.Get().

But on the other hand if we don't have a mechanism like this then we end up requiring extra boilerplate to get the benefit of the base constructor / complete object constructor split for extensible classes.

@josh11b
Copy link
Contributor Author

josh11b commented Aug 28, 2021

Would you prefer marking the calls to other constructors that should be rewritten?

base class Abstract {
  factory Create() -> Self { return {}; }
  abstract fn F[me: Self]();
}

base class Extensible extends Abstract {
  factory Create() -> Self {
    return {.base = construct Abstract.Create() };
  }
  impl fn F[me: Self]() {}
}

class Final extends Extensible {
  factory Create() -> Self {
    return {.base = construct Extensible.Create() };
  }
}

These marks seem like things that the compiler or tooling could identify where they are needed for you.

@josh11b
Copy link
Contributor Author

josh11b commented Aug 31, 2021

It sounds like we are converging on option 2b without sugar. We definitely need an alternate type for constructing abstract base classes, and while we will support them for extensible classes (that is base classes that are not abstract), we will leave it up to the user whether they use that or the regular Self type.

@chandlerc
Copy link
Contributor

I'd also suggest that the keyword should be partial as that has IMO been the most effective one when used in discussions.

@chandlerc
Copy link
Contributor

Chatted with @KateGregory and I think we're all happy with this outcome. ship it!

@jonmeow jonmeow added the leads question A question for the leads team label Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leads question A question for the leads team
Projects
None yet
Development

No branches or pull requests

4 participants