Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensible classes with or without vtables #652

Closed
josh11b opened this issue Jul 14, 2021 · 6 comments
Closed

Extensible classes with or without vtables #652

josh11b opened this issue Jul 14, 2021 · 6 comments
Labels
leads question A question for the leads team

Comments

@josh11b
Copy link
Contributor

josh11b commented Jul 14, 2021

Note: answering this question is not urgent, but I wanted to capture the options while they were fresh in our minds

The use case in question is that we have type Derived extending Base, and Base does not have a virtual destructor. We would like to statically verify that the user's deallocation code is safe (with the possibility of an unsafe opt-out). We have a few approaches:

Restricted extension

We could say that the destructor defined in Base must be a legal destructor for Derived. Since we have use cases where Derived adds fields to Base (for example Base is "linked list node just the pointers used for sentinels" and Derived adds data members), non-virtual non-final deletes would have to be unsized. No fields added in Derived would be allowed to have non-trivial destructors, and Derived would not be allowed to define its own destructor. This follows the "you can't override what wasn't declared virtual" rule.

Only deallocate final types

The rule would be that you would not be allowed to allocate or deallocate non-final types with non-virtual destructors. Users would define another type (call it Concrete) derived from Base that would be declared final and actually instantiated, used for locals and containers, etc. Base would only be used for pointers that could point to values that could either be Concrete or Derived and deleting a Base* would be forbidden. In general this would be safe, except when users perform a downcast from Base* which is generally already understood to be an unsafe operation.

My expectation is that there would be some convention for naming the final class as Foo and the non-final one as FooBase. This makes variables and containers use the shorter name, and we keep the longer name for pointers where we are being explicit that it would be pointing to another type.

This option is tempting since:

  • it is simple and introduces relatively little mechanism for what is arguably an advanced and rare case
  • it also solves the assignment operator problem, since you would only define assignment on the final types

Automatic second type for pointers

The rule would be:

  • any non-final Base would automatically have a type member named OrChild (or some similar name)
  • it is illegal to cast from Derived* to Base*, only to Base.OrChild*
  • you would not be able to allocate or delete a Base.OrChild object

In non-pointer contexts, you would use Base and it would do what is expected in context. Unlike the previous option, you would be able to define Derived as extending Base even though you could instantiate values of type Base.

This option is not as clean as the previous case, since Base would be treated as final in some contexts but not others. This is evident in a less clear story for handing assignment. However it would save the boilerplate of manually defining an extra type, which might be important if this is a common use case.

Automatic final second type

The rule would be:

  • any non-final Base that could be instantiated would automatically have a type member named Final (or some similar name)
  • Base.Final would extend Base without changing anything except making it final
  • only Base.Final not Base would be allowed to be allocated or deleted

The downside here is that you would frequently be using the name Base.Final and it seems like it might be verbose and surprising. The good news is that I could imagine implementing the assignment operator only on Base.Final using an external interface implementation.

Two kinds of pointers

The sad part of the previous three options is that we would have two separate types that are only really different when using pointers. The concern is that they would cause a proliferation of types, leading to additional monomorphization with generics, etc. We could instead just have two kinds of pointer types: "pointer to exactly Base" and "pointer to Base or derived". You would not be able to deallocate through the latter type.

Advantage:

  • Perhaps we could conflate "pointer that is allowed to deallocate because it is pointing to the exact type" with "pointer that is allowed to deallocate because it has ownership of the value"

Disadvantage:

  • Proliferation of pointer types, causing trouble for generics, etc.
@josh11b
Copy link
Contributor Author

josh11b commented Jul 14, 2021

Going through the process of writing this up has moved me firmly into the "only deallocate final types" camp. I like the simplicity of the rule, how easy it is to understand why the rule makes the code safe, how you can bypass it when needed while clearly marking that what you are doing is unsafe, and how well it handles the assignment issue at the same time. I feel like the last two solutions are too much mechanism for an advanced use case that we don't really have a reason to encourage more strongly. If it did seem burdensome, I'd consider the "automatic final second type" option as something easy to add later, but I don't like the long names that result.

I think the main reason to support another solution would be if that solution also addressed the assignment operator problem for types with virtual destructors. For example, maybe all non-final types that can be instantiated will have a Final type member that assignment operators can be defined on?

@zygoloid
Copy link
Contributor

I think I like the model that you can only allocate and deallocate types that are either final or have virtual destructors (or some similar mechanism that hooks into deallocation and that we trust to do the right thing). I think this should only extend to heap allocation and deallocation, though, and not to construction and destruction in general. For example, for a case like:

base class X {};
fn F() { var x: X; }

... I think it would be surprising if we rejected the code, but that adding a virtual destructor to X would make it valid. But I think that's acceptable behavior for:

fn G() { delete new X; }

(or however we spell heap allocation and deallocation).

@josh11b
Copy link
Contributor Author

josh11b commented Aug 30, 2021

I've written a doc exploring options for the more general problem of what to do with extensible classes, whether they have virtual tables or not.

@josh11b
Copy link
Contributor Author

josh11b commented Aug 31, 2021

The open discussion seems to have been converging on following C++ for the most part:

  • We allow extensible types that are base classes that may be instantiated, with or without virtual methods.
  • A base class type generally means "or derived" but we allow users to use the type in other contexts where it only makes sense as that exact type.
  • The main difference from C++ is we will forbid deleting a pointer to an extensible type unless you use a separate unsafe_delete operator.
  • We do otherwise allow extensible types to be destroyed.
  • The compiler won't enforce rules to prevent slicing assignments, but we leave the door open for tooling to do static analysis to detect slicing bugs.
  • Carbon's sizeof operator will give the size of the common prefix of types derived from that type. For extensible types, this will match the size of values of that exact type.

For now, we are going to avoid having a shadow type system to track which values are exactly the declared type, rather than some derived type.

@josh11b josh11b changed the title Safe inheritance with non-virtual destructors and assignment Extensible classes with or without vtables Aug 31, 2021
@chandlerc
Copy link
Contributor

It seems we have consensus as summarized here: #652 (comment)

Everyone seems to largely agree that only having abstract and final classes provides a more clean and error resistant model. However, the ergonomics are significantly regressed with that model, especially compared to C++. It doesn't seem feasible to ask people to pay the overhead of that model -- both the direct cost is high and the errors prevented aren't bad or pervasive enough to warrant the cost.

The other alternatives seemed complex, inventive, or had unpleasant implications. None really were compelling compared to matching what C++ does.

So the result was to stay very close to C++, and to take a somewhat tactical approach to avoiding the pitfalls that do arise here such as restricting the deletion of an object through a pointer.

If more of the rationale is needed, feel free to poke folks to get them to add it, but let's consider this resolved.

@chandlerc
Copy link
Contributor

(To be clear, I believe the rationale I suggest here matches what is in #777 already. Happy to adjust / clarify if anyone spots an important difference here.)

@jonmeow jonmeow added the leads question A question for the leads team label Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
leads question A question for the leads team
Projects
None yet
Development

No branches or pull requests

4 participants