-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce @OpaqueHandle() builtin #9859
Comments
The status quo recommendation is to use |
My example was misleading. This is not about wrapping primitive types, but encapsulating any type (aka it is not about creating your own integers,doubles with better type safety). I have changed the sample to make it clear. I don't think you can do that with |
@lucabol It would help, if you give a (realistic) complete example before and after the change and motivate it with stuff like safety drawbacks of the API on concrete usage. Personally I am also missing how this cant be solved or should not be solved with comptime-generating the interface as described in the talk about static and dynamic dispatch as to justify the added language complexity. |
Just a thought: seeing as it is already possible and common-place to use Not completely against this, but also leaning towards agreement with the sentiment thus far. More real-world use case examples would be nice to see. |
If you can achieve what I describe below with In any case, here is the category of bugs we are trying to prevent, using our own // THE PROBLEM WE ARE TRYING TO SOLVE (i.e.lack of encapsulation)
test "Inexperienced new programmer code" {
const a = std.testing.allocator;
var l = std.ArrayList(u8).init(a);
defer l.deinit();
// Somewhere else, in a function far far away ...
// Oh, I see, I need to set the capacity before appending
l.capacity = 2;
l.appendAssumeCapacity(1);
l.appendAssumeCapacity(3);
// SIGFAULT
} We are trying to make it impossible for the inexperienced programmer on a large project to access things that are not meant for him. Aka, return some kind of opaque pointer: test "If ArrayList were implemented to return an opaque pointer, that category of bugs can't happen" {
const a = std.testing.allocator;
var l = try MyList(u8).init(a);
defer l.deinit(a);
// I cannot access the capacity field because opaque type cannot have fields
// l.capacity = 2;
try l.append(1);
try l.append(3);
} Today to achieve that, you have to write this (unless there is an // CURRENT STATE TO BE ABLE TO Implement MyList
fn Hidden(comptime T: type) type {
return struct { list: std.ArrayList(T), capacity: usize };
}
pub fn MyList(comptime T: type) type {
return opaque {
const MyPointPtr = *align(@alignOf(Hidden(T))) @This();
pub fn init(allocator: *std.mem.Allocator) !MyPointPtr {
var s = try allocator.create(Hidden(T));
s.list = std.ArrayList(T).init(allocator);
return @ptrCast(MyPointPtr, s);
}
pub fn deinit(self: MyPointPtr, allocator: *std.mem.Allocator) void {
var s = @ptrCast(*Hidden(T), self);
s.list.deinit();
allocator.destroy(s);
}
pub fn append(self: MyPointPtr, i: T) !void {
var s = @ptrCast(*Hidden(T), self);
return s.list.append(i);
}
};
} With this proposal the code looks much simpler and less error-prone as roughly: // PROPOSED STATE TO IMPLEMENT MYLIST
fn State(comptime T: type) type {
return struct { list: std.ArrayList(T), capacity: usize };
}
pub fn MyListOpaque(comptime T: type) type {
return opaque {
const OpaqueList = @OpaqueHandle(State(T));
pub fn init(allocator: *std.mem.Allocator) OpaqueList {
var s = try allocator.create(State(T));
s.list = std.ArrayList(T).init(allocator);
return OpaqueList;
}
pub fn deinit(self: OpaqueHanlde, allocator: *std.mem.Allocator) void {
var s = @fromOpaqueHandle(self);
s.list.deinit();
allocator.destroy(s);
}
pub fn append(self: OpaqueHanlde, i: T) !void {
var s = @fromOpaqueHandle(self);
return s.list.append(i);
}
};
} |
I can't personally say I'm very convinced. What about your demonstration is less bug-prone than your example of status quo? Given an API like so, the user still has the ability to misuse the API by, e.g., copying the handle, expanding it, and then expecting the original handle to have overgone the same changes (expecting it to act like a reference). Or using the 'deinit' method twice (double freeing), or assigning to the handle using the 'init' method multiple times before freeing (leaking memory). At the end of the day, one still has to understand, to a minimum degree, how to use the ArrayList(T) interface to make their program run. And that goes for any other interface. Zig is a language that gives the programmer control, and expects them to understand the tools it gives them; language-level encapsulation of fields has been discussed a few times, and the status quo has been to avoid it. For reference: #2059 and #2974. Given all that, I'd say this is a weak feature proposal at best; potentially regressive at worst. |
I was afraid we would devolve in a philosophical discussion about the value of encapsulation for a system-level language. As I see it, my opinion (and yours) on such a matter is not important. We can talk about this all day and there is no way to prove either of us right or wrong. The reality of the situation is that it is a very common pattern in C. It is recommended in most books. It is a CERT and MISRA explicit recommendation and it is used in many (most?) large-scale C applications (i.e., sqllite). This doesn't make it 'right' in any sense, but it does make it important. So our opinion doesn't matter. What matters is to do what is best for the Zig language. As Zig positions itself very explicitly as the C successor, it seems logical that such a pattern should be easily representable so as to facilitate the migration of C codebases and convince the (large?) subset of C programmers which deems this pattern important to use Zig. |
Alright, then omitting the second half of my comment, I'll ask again: what about this feature makes the pattern less bug-prone in zig? As stated, there are still numerous ways in which to misuse the API as is (as stated in my previous comment), which are far more common than setting the capacity to a higher value than allocated. |
This pattern is identical in C as it is in Zig. It has the same trade-offs. Apologies for my directness, but I am afraid you are just re-stating your opinion. You think the potential bugs you mention are "far more common than (generalizing from the example) having access to the private state of a component". In general, most large-codebase C programmers, book authors, MISRA and CERT authors disagree. They do recommend this. If I had to venture an opinion on why that is (and it is just my opinion, not a fact), I would say that it is because the errors you describe are 'generic' errors. You are misusing the language. Accessing the private state of a component is a 'contextual' error, as sometimes it is safe to do so. Sometimes accessing the fields of a struct is the right way to use a component. Sometimes it isn't. You have to go and look for it, which some might not do. What do we gain from this proposal, compared to the status quo? We gain a relatively clear way to represent a very common C idiom without resorting to mental or syntactic gymnastics, or possibly dangerous align pointer casting. Is this the best way to represent this idiom in Zig? I am not sure. Perhaps allowing fields on opaque pointers is better. Perhaps visibility annotations on fields is better. I see this as a 'minimal' change proposal (not rocket science) that gives us a decent (perhaps sub-optimal) syntax for a very common pattern. BTW: It is not even my proposal, so I have no deep emotional attachment to it. I just care about solving the problem. |
I believe we are at impasse then, so I will cede. And no worries, directness is the most efficient mode of communication. But one note:
There's no dangerous pointer casting required to use |
Thanks for the excellent discussion. I would expect no less than vigorous pushback on language additions. Otherwise, you end up with C++/Scala/.... Each new proposal starts with -100 to pay for the added complexity. For example, I would not have proposed this if it were a new keyword. As a builtin, and given that BTW: I thought of another place I saw this idiom in C, the handles-as-pointers pattern in game programming. Hopefully, there is enough in this thread to suggest that the idiom is widespread enough to be worth enabling. |
Coming back to this today, I think I've realized that there is more value to this than I was seeing, but I think what put me off was perhaps your example, as it contains a good few errors and inconsistencies (you return 'OpaqueList' in the init function instead of the casted memory for one example), which I think put me on a different mental page than you. That said, I'd like to prod a bit more: it's also not clear to me in the latest example how the returned opaque struct and " |
Sorry, it was late in the night and no compiler to validate :-) I got confused by the In any case, the intention is to enable a programming model like the below. I will post a full example once I get a bit more time, but your 'normal struct with opaque handle field' design seems right to me. Unless one goes heavy-handed allowing test "the goal is to hide the capacity field (aka impl details) on ArrayList from programmers" {
const a = std.testing.allocator;
var l = try MyList(u8).init(a);
defer l.deinit(a);
// I cannot access the capacity field here, but just access the made public functions
// l.capacity = 2;
try l.append(1);
try l.append(3);
} |
Ok, I figured out why my previous code worked. Perhaps it is a known 'feature' in zig. I don't think I did it on purpose (or did I?). Let's call it the 'opaque over struct' pattern. Once you see it, it kind of makes sense. I guess it is an alternative implementation to the 'normal struct with opaque handle field' way you suggested. It is more obscure for the implementer, but it has the advantage of not exposing an opaque pointer to the user of the API. const PrivateState = struct { i: u8 };
pub const PublicComponent = opaque {
pub fn init(allocator: *std.mem.Allocator) !*PublicComponent {
var state = try allocator.create(PrivateState);
state.i = 5;
return @ptrCast(*PublicComponent, state);
}
pub fn getPrivateState(self: *PublicComponent) u8 {
var state = @ptrCast(*PrivateState, self);
return state.i;
}
pub fn deinit(self: *PublicComponent, allocator: *std.mem.Allocator) void {
var state = @ptrCast(*PrivateState, self);
allocator.destroy(state);
}
};
test "Weird opaque over struct pattern" {
const a = std.testing.allocator;
var aComponent = try PublicComponent.init(a);
defer aComponent.deinit(a);
var i = aComponent.getPrivateState();
//var j = aComponent.i; // you can't do this. It is opaque.
try std.testing.expect(i == 5);
} |
Yes, I've used this pattern a few times in personal offline projects, although I haven't seen its use anywhere in the standard library, and unsure how popular its usage is. Was this the kind of functionality you were targeting? |
Yes. To be more precise, my overarching goal is to simplify the Zig representation of the 'opaque handle' C idiom at the start of the thread, as it is so widespread. I think that's a good goal. I am now not sure this proposal is the right solution. The pattern above is one way to do it. Your suggestion of a struct with an opaque field is another (btw: can that be done without allocating the state?). In a system language like Zig there are always ways to represent higher-level concepts by clever pointer manipulation and casting. So everything can always be done in some way. The question is how painful it is. |
This pattern of hiding the fields of a struct behind an opaque pointer as used in C is inefficient in many cases as it requires the struct to be heap allocated. The Zig pattern described here #9859 (comment) is the most direct translation of the C pattern and has the same disadvantage. If I understand correctly, the proposed |
In short, technically yes, but only if you give up absolute control of the memory, or give up flexibility. One alternative is to instead store the private state as an array of memory in the struct, which of course still leaves the possibility that the user may invalidate the memory anyway, but that's the deal. It could looks something like: const std = @import("std");
const testing = std.testing;
const Private = struct { i: u32, j: u32 };
pub const Public = struct {
impl: [@sizeOf(Private)]u8 = undefined,
pub fn init(i: u32, j: u32) Public {
var result = Public {};
result.impl = @bitCast([@sizeOf(Private)]u8, Private { .i = i, .j = j });
return result;
}
pub fn getI(self: Public) u32 {
return self.getPrivate().i;
}
pub fn getJ(self: Public) u32 {
return self.getPrivate().j;
}
fn getPrivate(self: Public) Private {
return @bitCast(Private, self.impl);
}
};
test {
var public = Public.init(3, 2);
try testing.expectEqual(public.getI(), 3);
try testing.expectEqual(public.getJ(), 2);
} This is what I originally thought you were proposing to replace (as described by @ifreund) by essentially making the type itself the "bag of bytes". |
@ifreund, @InKryption : this is how it would look. I think it also covers the case where Private needs to be allocated or it is a simple type (i.e., u32). const std = @import("std");
const testing = std.testing;
const Private = struct { i: u32, j: u32 };
pub const Public = struct {
impl: @OpaqueHandle(Private),
pub fn init(i: u32, j: u32) Public {
var result = Public {};
result.impl = @toOpaqueHandle(Private, Private { .i = i, .j = j });
return result;
}
pub fn getI(self: Public) u32 {
return self.getPrivate().i;
}
pub fn getJ(self: Public) u32 {
return self.getPrivate().j;
}
fn getPrivate(self: Public) Private {
return @fromOpaqueHandle(Private, self.impl);
}
};
test {
var public = Public.init(3, 2);
try testing.expectEqual(public.getI(), 3);
try testing.expectEqual(public.getJ(), 2);
} |
Right; so then I'm guessing |
Are you asking if the simpler |
No, it's just that in your example, there's no demonstration of how the 'allocated state' version would work, Specifically, I'm suggesting that either one or the other of these two should work for this proposal: fn getPrivatePtr(self: *Public) *Private {
return @fromOpaqueHandle(Private, &self.impl);
}
fn getPrivatePtr(self: *Public) *Private {
return &@fromOpaqueHandle(Private, self.impl);
} Essentially, how should the value semantics work? Edit: I said allocated state, but I actually meant just modifying the memory in-place. Otherwise without some mutable pointer semantics, you'd have to re-assign state each time you wanted to modify it. |
What about fn getPrivatePtr(self: *Public) *Private {
return @fromOpaqueHandle(Private, self.impl);
} |
Alright. Then the last thing I'd like to address before we try to do any summary of our points is |
Absolutely. |
This is a fairly long thread, so I apologize in advance if I missed something, but isn't this proposal essentially a form of round-about and coarse-grained field access control? If Zig had something like const Private = struct { i: u32, j: u32 };
pub const Public = struct {
impl: @OpaqueHandle(Private),
pub fn init(i: u32, j: u32) Public {
var result = Public {};
result.impl = @toOpaqueHandle(Private, Private { .i = i, .j = j });
return result;
}
pub fn getI(self: Public) u32 {
return self.getPrivate().i;
}
pub fn getJ(self: Public) u32 {
return self.getPrivate().j;
}
fn getPrivate(self: Public) Private {
return @fromOpaqueHandle(Private, self.impl);
}
}; ...would then simply become: const Public = struct {
private i: u32,
private j: u32,
pub fn init(i: u32, j: u32) Public {
return .{ .i = i, .j = j };
}
pub fn getI(self: Public) u32 {
return self.i;
}
pub fn getJ(self: Public) u32 {
return self.j;
}
}; |
@zzyxyzz I'd have to say you are correct; this was part of my original criticism, but we weren't speaking in similar terms, so I guess I got so caught up in trying to develop the details, that what you've just said didn't quite occur to me. |
I think I said it at the start of the thread: having private field would solve this (or having field on This proposal is way less intrusive than private fields, but gives you many of the same benefits. That is what is interesting about it. Edit: Another way to think about it is that in Zig you use
This is the same situation as Having On the other hand some form of private state is very desirable for case 2. This is where this proposal comes in. One could think of introducing the concept of |
Sorry, but I'll have to echo @zzyxyzz 's sentiment. And after having gone down through this discussion, to have it pointed out that this proposal would not solve what you originally intended it to, and now alternatively solving a use-case that, as you've well said, has been overall rejected, I think this is too fuzzy to really put into zig. Maybe there is a discussion to be had about private fields, or struct with private fields; but in that case, I would recommend you open an issue with razor-sharp focus specifically on that, what it would solve, the benefits gained from such, etc - if you really believe it is worth the discussion. Because this thread has been too all over the place. |
Maybe it needs to be discussed one more time then. Language Complexity is such a sleazy argument. How does introducing three builtins for a convoluted way of doing something so common and straight-forward make the language "simpler" and "smaller"? I think the use cases you present are legitimate, which is rather unsurprising, since information hiding is such a common-place technique in almost every modern language. My own opinion is that removing field access control was a bad idea, but also that it may not be too late to put it back. However, if Zig community/BDFL decides to make a stand on the status quo, then the use-cases presented here are (presumably) to be considered either unimportant or an anti-pattern, and the language should not be polluted with inferior solutions. Edit: @InKryption, sorry didn't see your comment :) |
As I said in my other comment, I believe the refusal of private fields stems from viewing struct as a smallish bucket of bits, while in Zig they also serve as But I agree my proposal wasn't precise enough. Actually @InKryption designed it for me in the end :-) As such, I am happy to close it, even as I feel the scenarios are very valid. |
Again, I know this isn't your fault, but that's such a weak argument. It doesn't matter how many bits the bucket has. If Hmm. Maybe I should actually write up a proposal 😄... |
@zzyxyzz : ArrayList definitely has something to hide. It is a module in the Parnas sense. Haskell, F#, and other languages have a module concept, C and Zig don't and use Perhaps This proposal, apart from the lack of precision, attempted to strike a middle ground enabling information hiding at a larger granularity level than the field, and at a lower language weight, hoping for the BDFL not to strike it down. At least, that was the intention. |
To me the functionality of
|
@rohlem An interesting idea, but even if implemented, I don't know if something like |
@InKryption Like |
@rohlem If you're referring to the And when it comes down to it, I also don't see where such functionality would become useful, unless you maybe argue that it would benefit very complex, generic functions, that in particular make use of this "opaque layout mimicry". |
I agree that @andrewrk's response in #9909 was pretty comprehensive It also reminded me of the "We're all consenting adults." standard in, e.g. Python, and I personally agree with him that eager, granular encapsulation can be an anti-pattern, often leading to limited extensibility and worse performance. I do think there's one open question that could use some more input, though:
I agree with @ifreund above that reference semantics can have serious performance implications for usages of Either
[Bikeshedding] If we go with (1), I think it's worth considering a qualifier syntax, |
Purpose of @opaque is information hiding, which implies hiding the value types with sizes. If you are not convinced, check the meaning of opaque. So intentionally there are no value semantics, because there are no values. Its basically just a pointer with a name to anything (that you intentionally dont want to expose with size+alignment, cause you dont want your code to access it). |
I think I should have defined my terms more clearly: An One way to implement this encapsulation is as in C: Make sure that the underlying layout is literally unavailable to the compilation unit, and only allow reference to opaque types. However, it's also possible to use opaque types as values in client code. In this case, the memory layout is available to the compilation "unit", as needed to pass value types across the ABI, but client code is not allowed to refer directly to structural details, preventing API breakage if these change. If you look at many of the examples above, you'll see that they correspond to this expanded sense of allowing Why hide structural information from client code?The guarantee that a library writer gets from (2) can actually have important performance implications for stable, public APIs when desired optimizations require changing core API structures, such as in the famous case of Python's C API |
Advantage: Drawback: Summary:
If you build files together with |
I recently thought of an interesting syntactic idea for this - it might be silly, but I figured I'd put it here anyway. Currently, we have Here's my idea. Extend the |
@mlugg (remashed from reddit discussion) What I am missing is which optimisation potential (along LTO) should be given: |
@mlugg, |
@zzyxyzz, in general (for e.g. structs) I totally agree with that conclusion, however DOD has some good examples of cases where hiding the operations of the underlying representation is desirable. Storing plain untyped indices, as you currently often do in Zig DOD, is type-unsafe: you can easily get them mixed up, and even do nonsensical things like add them. This can be improved by wrapping in a type with methods to do the reasonable access patterns (i.e. index the relevant thing), so then it becomes obvious when you do something wrong (you end up unwrapping the value!). Opaque types would be a very clean approach to this - right now you'd have to use an I agree that "data hiding for the sake of data hiding" is a bad idea, and goes against Zig's ideals. However, I think there are cases where it is useful to actually prevent bugs (particularly when wrapping primitive types), and in these cases it's justified. It's also worth noting that |
Perhaps in the general case, but in the specific case of typed indices in DOD setups, we already have non-exhaustive enums, which already accomplish this use case more than well enough. There's even an instance of this in the compiler iirc, so the argument for |
@mlugg, |
I separated this from here and expanded its description with a more robust rationale and current state.
Expanded rationale
This proposal allows representing a typical C idiom as exposed in several books (i.e., here) and libraries.
Also, it is a CERT C recomendation and a MISRA C recomendation to use in safe systems.
One can prefer more 'open' APIs or not, but the fact remains that this is a widely used and recommended pattern in the C (and C++) community. It should be easily expressible in Zig.
Current state
Assuming I got this right, the current state to implement something like this is unpleasing, and the presence of
@ptrCast
makes one think that it is dangerous (i.e., it is likely not going to make the safe system guys happy).Use case and proposal from the original issue
I'm not familiar with the gl api, but I assume that
program
andshader
are effectively opaque types. Despite being integers, it would not make sense to do any arithmetic on them, right? They're more like fd's in posix.Perhaps we can scope this down to enable specifying the in-memory representation of an otherwise opaque type. There are two features that we want at the same time:
The recommended way to do this is to make a type with
@OpaqueType()
, and then use single-item pointers to the type as the handle.But this mandates that the in-memory representation of the handle is a pointer, which is equivalent to a
usize
. This is not always appropriate. Sometimes the handle type must bec_int
instead, such as with posix fd's, andc_int
andusize
often have different size. You have to use the correct handle type, so a pointer to an opaque type is not appropriate with these handle types.Proposal
A new builtin
@OpaqueHandle(comptime T: type) type
.assert(H != T);
- You get a different type than you passed in.assert(G != H);
- Similar to@OpaqueType()
, each time you call it, you get a different type.assert(@sizeOf(H) == @sizeOf(T) and @alignOf(H) == @alignOf(T));
- Same in-memory representation.H
is guaranteed to behave identically toT
in theextern
calling convention. This includes when it is part of a larger type, such as a field in an extern struct.h = t; t = h; h = g; // all errors
- The handle types don't implicitly cast to or from any other type.if (h != h2) { h = h2; }
- Handles can be copied and equality-compared.h + 1, h + h2, h < h2 // all errors
- WhetherT
supported arithmetic or not, the handle types do not support any kind of arithmetic.t = @bitcast(T, h);
- If you really need to get at the underlying representation, I think@bitcast()
should be the way to do that. Or maybe we should add special builtins for this, idk.This is an exciting idea. I think this fits nicely into the Zig philosophy of beating C at its own game - Zig is preferable to C even when interfacing with C libraries. If you translate your GL and Posix apis into Zig extern function declarations with opaque handle types, then interfacing with the api gets cleaner, clearer, less error prone, etc.
Originally posted by @thejoshwolfe in #1595 (comment)
FYI Andrew commented on his preference to have special builtins instead of reusing @bitcast. I think they were @toOpaqueHandle(x) and @fromOpaqueHandle(x).
The text was updated successfully, but these errors were encountered: