Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update standard library to a better "interface" pattern so that we have a null hypothesis against the interface/oop proposal #1829

Closed
andrewrk opened this issue Dec 13, 2018 · 18 comments
Labels
standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@andrewrk
Copy link
Member

Related:

Probably lots more. But the big parent issue is #130. It's one of the biggest topics in the community and we really don't know the direction Zig will go (it's a difficult design decision).

However, one thing we can do to get closer to knowing the answer, is to push the null hypothesis to the best thing it can be. That is, using status quo Zig and accepted proposals, how would we solve the problems that #130 would solve, without any other modifications to the language?

As @Hejsil points out on that issue, the status quo solution to "interfaces" is hostile to optimization. He created a demo of a different pattern here: https://github.com/Hejsil/zig-interface

@tgschultz has done similar prototypes (see for example #130 (comment)) I've also seen him post a bunch of related gists on IRC but I don't have any links.

So it looks like we have a clear way to update the standard library to use this pattern which is at least better on the optimizer. It also fixes a footgun and seems at least as readable, if not more, than status quo. So let's do it in the standard library. Then we can close this issue, and go back to all those other issues, armed with an improved null hypothesis.

@andrewrk andrewrk added the standard library This issue involves writing Zig code for the standard library. label Dec 13, 2018
@andrewrk andrewrk added this to the 0.5.0 milestone Dec 13, 2018
@ghost ghost mentioned this issue Dec 14, 2018
@ghost
Copy link

ghost commented Dec 14, 2018

Nice:

  • no need for fieldParentPtr (well, traded this for some *c_void evilry)
  • create trait binding from concrete implementation, instead of vice versa. The implementation can actually be free of such glue
  • don't need to use a pointer to pass the trait objects around (since they are just a box of two pointers). I dunno, it just looks nice

@Hejsil
Copy link
Contributor

Hejsil commented Dec 14, 2018

@dbandstra *c_void pointers are harmless as long as you don't @ptrCast them (which is just as unsafe as on any other pointer, and this unsafeness is documented). The idea with my zig-interface is that we have a function called vtable.populate which will populate a VTable (any struct with only fn ptr fields) with its implementation. This function will always be safe, as it uses @typeInfo to validate that the VTable is populated with functions of the correct signature.

So unless the users explicitly shoots thenself with unsafe features, this pattern is safe, unlike interface fields, where a simple copy will break everything.

@Rocknest
Copy link
Contributor

Why *c_void and not *@OpaqueType()?

@Hejsil
Copy link
Contributor

Hejsil commented Dec 15, 2018

@Rocknest No reason really. It was easier to just use c_void because it is there. Using @OpaqueType() is probably more proper though :)

@tgschultz
Copy link
Contributor

tgschultz commented Dec 19, 2018

I've been playing with this and I think I've come up with a pattern that should accomplish our goals with a minimum of API changes from the user perspective. It's a slight modification on @Hejsil's example which allows us to keep 'convenience' functions as part of the interface instead of separating them out. I've got an example in this gist. When I get to the Stream interface I'll have to see what can be done about the ErrorSet (#764) mess there.

@Hejsil
Copy link
Contributor

Hejsil commented Dec 19, 2018

I personally have come to prefer the free function approach more than member functions. What you end up doing, is creating more helper functions yourself and those cannot be member functions unless you do more wrapping. This Interface struct is kinda just a hack around the fact that Zig doesn't have Unified Call Syntax.

Let structs implement a minimal composable interface, and have free function that implements helpers around this interface.

@tgschultz
Copy link
Contributor

tgschultz commented Dec 19, 2018

I don't necessarily disagree. However doing it this way incurs a minimal API change from the user perspective, and while API changes aren't really a big deal pre-1.0.0, I personally think that one should be its own issue and PR since it isn't necessary to accomplish the goals of this one.

Doing it this way also allows for a relatively straight-forward trait implementation re #1669, if we decide to go that route, for example by comparing the parts of the type name or checking for the existence and value of a pub const.

@Hejsil
Copy link
Contributor

Hejsil commented Dec 19, 2018

@tgschultz Alright, reasonable 👍

@kristate
Copy link
Contributor

I am more for @Hejsil's approach at this point. I think it is cleaner and easier to document.

@tgschultz
Copy link
Contributor

tgschultz commented Jan 18, 2019

After working through #1848, here's what I'm thinking:

If we don't add anything to the compiler for interfaces, then I agree that @Hejsil's suggestion to separate the convenience functions from the interface struct will make things generally cleaner as there will be no more passing around wrapped and unwrapped versions of the implementation.

If we do add something to the compiler, then I think the most helpful things to have would be to have vtable checking built in, and automatic interface abstraction. I imagine a new interface container type:

const InStream = interface {
    read: fn(var,[]u8) anyerror!usize,

    pub fn readByte(self: InStream) !u8 {
        ///
    }
};

pub foo(stream: InStream) !void {
    const x = try stream.readByte();
    ///
};

try foo(my_stream_impl);

var target_stream: InStream = undefined;
target_stream = my_stream_impl;

The interface parameter would like var whenever the parameter can be comptime evaluated, just providing the convenience functions around the implementation, the function pointer fields are only used for verification. If it can't be comptime evaluated, then it would degrade to an automatically constructed abstract type which is a fat pointer of *Implementation and const *vtable. When used in a field or variable definition, it would always be abstract.

@tgschultz
Copy link
Contributor

tgschultz commented May 9, 2019

I'm giving this another go with what I learned from the previous effort. This iteration of the pattern can be seen in interface.zig, which contains some helper types and functions.

Improvements of this over the previous iteration:

  • implementation struct pointers can be 0-sized (c_allocator, NullOutStream).
  • interfaces are only one struct instead of two.
  • no ugly hacks should be necessary to work with current async.

Changes:

  • A global vtable is eschewed in favor of storing function pointers directly. This is mostly so that no ugly hacks are required to work with async.

Regressions:

  • Using an interface over a comptime-known implementation no longer supports the self.method() syntax and instead requires Interface.method(self), but abstracted/runtime interfaces still support self.method() syntax.

Notes (both iterations):

  • Interfaces with limited error sets (Allocator) will prefer to use the abstracted/runtime version because it can be stored without a generic struct and testing on godbolt indicates it optimizes well anyway.
  • Interfaces using highly variable error sets (*Stream) will prefer the comtime-known version since it preserves error sets without the need to pass them around manually.
  • Interfaces now contain only pointers and are therefore completely safe to copy.
  • Interface function implementations can no longer be arbitrarily named.

@tgschultz
Copy link
Contributor

So this second crack at this isn't going so well. The first issue is that async<> has proven more stubborn than anticipated, but a more important problem is that type erasure doesn't seem to work at comptime, the compiler is too aware of the actual types. I probably would have ran into this in the first iteration as well except that I never touched std.rand, which is used as part of the build process.

@Rocknest
Copy link
Contributor

Rocknest commented May 11, 2019

I had an idea recently that @reify (#383) or something similar could be used to make this usecase simple and enjoyable.

// imaginary code
const Allocator = InterfaceOf(struct {
    alloc: fn (usize) *c_void,
    free: fn (*c_void) void,
});

test "" {
    const impl = SomeAllocator { .a = b };
    const interface = Allocator.init(&iml);
    var ptr = interface.alloc(5);
    interface.free(ptr);
}

Function InterfaceOf would create a type similar to this one:

{
  @"$vtable": *struct {...},
  @"$impl": *OpaqueType(),
  pub fn init(impl: var) Allocator {
    // assert impl == Pointer
    return Allocator {
      .@"$vtable" = comptimeGenAllocatorVtable(@typeOf(var).Pointer.Child),
      .@"$impl" = impl,
    };
  }
  pub fn alloc(self: Allocator, arg0: usize) *c_void {
    return self.@"$vtable".alloc(arg0);
  }
  pub fn free(self: Allocator, arg0: *c_void) void {
    self.@"$vtable".free(arg0);
  }
}

But im not sure if comptime code generation would ever be allowed in zig.

@tgschultz
Copy link
Contributor

tgschultz commented May 21, 2019

My third attempt at this went pretty well. See #2533 for a summary.

My opinion from working with this pattern is that it probably shouldn't be merged, despite the benefits to optimization, because of the added complexity. A better solution will almost certainly involve new language features, and a simpler solution could involve:

  • standardize ErrorSets across interface implementations
  • teach the compiler how to optimize through @fieldParentPtr
  • implement un-copyable structs/fields

Update: I was struck with the idea to submit a version of this that follows the simpler conception above. Instead of using @fieldParentPtr, which is known to optimize poorly, it uses an Any pointer and the fromAny function. The interface is therefore safe to copy , and all implementations follow the same function signature including the error set, which would now be standardized. This means streams become much simpler at the cost of error set information. I can make an argument for why dynamic error set shouldn't be in streams, if necessary, but lets see how it turns out.

addendum: I'm kinda tired of rewriting the same changes over and over, so I might not get to this for a while, but if I do it'll be basically the barest possible change from status-quo that gets us the above.

@tgschultz
Copy link
Contributor

tgschultz commented May 25, 2019

The simpler version I discussed is up now, #2553 . The changes to the pattern over status-quo are that interfaces consist of a pointer to an implementer as well as a pointer to each implementation function. Consequently, instead of the implementer carrying around an interface struct and everything needing a pointer to it, they instead have an interface() method (inStream(), outStream(), allocator(), etc) that is used to retrieve the interface struct, which is then free to be copied. A minor concession was made for async<> in the form of Allocator taking *const Self instead of just Self.

Rather than use a single ?*@OpaqueType for all implementation pointers, each interface has its own. Implementers use Interface.ifaceCast(self) to assign to the impl field of the interface struct, and interface.implCast(Self) to retrieve a properly typed pointer to themselves from the interface struct (roughly analogous to @fieldParentPtr's usage in status-quo).

I'm still not entirely satisfied with it, as it doesn't solve the dynamic ErrorSet problem, but at least it doesn't make things even more complicated than they already are.

It occurred to me during this iteration that we only really care about maintaining ErrorSet information if we intend to switch on an error union response from an interface (or chain of interfaces as in Streams) where the implementer(s) are (all) comptime-known. Consequently, I think I will submit one more variation, which is a modification of #2553 where Streams (the only interface with dynamic ErrorSets currently) are generalized and only return anyerror, with some alternative method being employed to construct and cast to the proper ErrorSet in those few situations where it is desired.

@andrewrk: After that, there will be 3 open PRs for consideration. If any of them are considered enough of an improvement over the status-quo to merge, then I'll rebase it after the big posix layer change is merged.

@tgschultz
Copy link
Contributor

This variation is identical to the previous one, but I've made the Stream interface ErrorSets non-dynamic. From what I can tell, basically nothing is lost by doing this, some errors just become more standardized versions of themselves: error.OutOfMemory,error.OutOfSpace => error.NoSpaceLeft, etc. Pretty much everything is much cleaner as only like 3 places even took advantage of the dynamic ErrorSets anyway. However I don't think that this is Andrew's vision for how ErrorSets and Streams interact.

From the work on these two I believe I can improve #2533 so it at least works at comptime again and is a bit simpler. However, I will only commit time to that if the consensus is that we want to go in that direction, as it is it should serve well enough to demonstrate the pattern.

Anyways, that it, I'm out. I'm leaving these three PRs (#2533, #2553, #2582) for perusal and consideration. If we decide to move towards one of them let me know and I'll rebase it.

@fengb
Copy link
Contributor

fengb commented Jun 12, 2019

I'm a fan of #2553 and #2582 because it's simpler mental model. With the existing @fieldParentPtr pattern, we have to ensure the correct pointer is referenced everywhere, and every accidental struct copy can break without compiler help.

With an embedded opaque interface, we can copy the struct around without problems. This also allows us to return interfaces directly instead of needing to embed them into a wrapper struct.

Edit: I just realized that the interface has a pointer to the implementation and they can have different lifecycles, e.g. the implementation is on the stack but it can return back the interface. Is this a real concern?

@ghost ghost mentioned this issue Jul 9, 2019
@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 Aug 30, 2019
@andrewrk andrewrk modified the milestones: 0.6.0, 0.7.0 Jan 3, 2020
@andrewrk
Copy link
Member Author

#130 has been closed. I don't think there is anything to do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

6 participants