-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal to improve the ergonomics and precision of type inference in generic functions #9260
Comments
This is a nice proposal, and indeed, having to systematically use There wouldn't be a clear difference between So, I'm wondering if, instead of introducing a new keyword, we couldn't have |
That's actually how I intended it. With this change |
There would be a difference. Since |
Oh it's definitely possible. All you need is a way to reverse the operation from a custom type to its input parameters. The compiler could maintain this mapping internally. Note this would only work for "custom types" though. This wouldn't work with non-custom types and would probably be mostly useless for them anyway: fn foo(x: IgnoreTGiveMeU32(infer T)) void { }
fn IgnoreTGiveMeU32(comptime T: type) type { return u32; } It's impossible to do this in the general case, unless all your functions are bijective mappings. Whether it's desirable is another question which I would have to ponder on. Some things to think about. |
This doesn't work for most actual cases though, because things like |
Yes the general case meaning all functions is undecidable, but it's very much possible for bijective functions. In the case of
So when the compiler tries to pattern match This only breaks down when we have non-bijective functions. For example: fn IgnoreTGiveMeU32(comptime T: type) type { return u32; } In this case our inverse table looks like this:
So now when we try to pattern match The concept of bijective functions is well-understood, and that is the exact criteria that would determine whether or not a function could support
|
@marler8997 Non-bijective functions would be a problem, of course. Especially in libraries, where unambiguous inference needs to be ensured for all possible child types, and not only instances currently used in the project. There are also other failure modes that should not occur in good code, but are still possible in principle, like So, if we had an annotation for well-behaved type constructors ( |
@zzyxyzz i don't think all of this is worth the benefit in the end, especially as pub fn BitBuffer(comptime size: usize) type {
return [(size + 7)/8]u8;
} where |
@MasterQ32 As it stands, the most ambitious variant of this proposal only allows pattern matching against built-in types, such as pointers, optionals and slices, which should not pose any problems. BTW, |
@MasterQ32 yes your example pub fn BitBufferBijective(comptime size: usize) type {
std.debug.assert(size % 8 == 0); // Blocks inputs in order to create a 1 to 1 mapping
return BitBuffer(size);
} This example doesn't really apply to this pub fn Foo(comptime T: type) type {
// black box, that is not bijective, this Foo is a stand-in for any type funciton.
}
pub fn bar(x: Foo(infer T) void {...}
// compile error: cannot use 'infer T' with 'Foo' because it is not bijective, the input types x and y both map to z
pub fn FooBijective(comptime T: type) type {
std.debug.assert( limitInputsToRemoveConflicts(T) );
return Foo(T);
}
pub fn bar(x: FooBijective(infer T) void {...}
// now it will work, T will become whatever type was passed to the caller's Foo(T) instance I thought alot about this stuff years ago when I was thinking about how D implemented pattern matching and what the challenges and potential solutions were with it. I determined it is feasible and actually not too difficult to implement and I'd be happy to help come up with a design if we want to go this route. And to re-iterate, I'm not advocating that we do implement it, just sharing exactly what a solution would look so we can make informed decisions. |
On a comment on #5893, I suggested an alternative. If we add support for default parameters, and make default parameters go first, we can do something like this: const std = @import("std");
fn square(comptime T: type = any, n: T) T {
return n * n;
}
fn main() void {
const x: u32 = 5;
std.debug.print(
"{} = {}\n",
square(u32, x),
square(x)
);
} |
The reverse lookup table doesn't work, because the outer function may never have been called. fn foo(x: std.ArrayList(infer T)) { }
test {
const T = std.ArrayListAligned(u32, null);
var v: T = undefined;
// assert(T == std.ArrayList(u32))
foo(v);
}
|
@SpexGuy Shoot you're right. I can't think of a way to get around this without having a way to generally reverse code evaluation. |
There are some ways. E.g. the compiler could automatically expand trivial type constructors like |
There is a simpler and more powerful way that does not depend on memoization or other type constructor shenanigans. Given a parameter declaration such as For example, the function fn find(haystack: anytype, needle: anytype) usize {
const T = @typeInfo(@TypeOf(haystack.items)).Pointer.child;
return findImpl(T, haystack, needle);
}
fn findImpl(comptime T: type, haystack: ArrayList(T), needle: T) usize {...} This method also allows For example, the declaration fn max(a: anytype, b: anytype) anytype {
const T = @TypeOf(a, b);
return maxImpl(T, a, b);
}
fn maxImpl(comptime T: type, a: T, b: T) T {...} Furthermore, this method allows using anonymous initializers for generic parameters: fn Vec3(comptime T: type) type {
return struct { x: T, y: T, z: T };
}
comptime {
const a: f32 = 2;
_ = length(.{ .x = a, .y = 3, .z = 6 });
}
fn length(v: Vec3(infer T)) T {
// T == f32
// @TypeOf(v) == Vec3(f32)
return @sqrt(v.x * v.x + v.y * v.y + v.z * v.z);
} |
@zigazeljko, I couldn't quite parse what you are suggesting. Would the following rephrasing correctly describe your idea? All type constructors must eventually produce a concrete type, often a struct of some kind. E.g., struct {
items: []i32,
capacity: usize,
allocator: *Allocator,
// 300 lines of methods and other decls omitted
} If we perform the expansion with struct {
items: []infer T,
capacity: usize,
allocator: *Allocator,
// ...
} Matching the two structs against each other, we can now correctly infer that If that is the idea, then the problem is that not all type constructors do simple template expansion. E.g. consider the following type: fn StackArray(comptime T: type) type {
const maxSize = 1024;
const size = @sizeOf(T);
if (size > maxSize) @compileError("Too big!");
return struct {
items: [maxSize/size]T,
};
}
But please correct me if I misunderstood something. |
@windows-h8r fn square|comptime T: type = any|(n: T) T {
return n * n;
}
//...
square|i32|(5); // explicit
square(@as(i32,5)); // inferred In any case, it seems to me that our proposals solve different problems, despite some overlap. Yours allows the same generic function to be used with an explicit type parameter or with an inferred one, whichever is more convenient. This sort of reminds me of #7835. A key part of my proposal, however, is the ability to talk about the types of generic functions, which seems more difficult with what you are suggesting. What would the type of |
@zzyxyzz
That's a good question. I guess we could make the parameter names part of the function type, like fn (comptime T: type, n: T) T This also simplifies the grammar, and complements #1717, since banning this... const x = fn (type, anytype) anytype; // error: missing parameter names ...also bans this... const x = fn (type, anytype) anytype { // error: missing parameter names
// ....
}; ...where a valid function would be... const square = fn (comptime T: type = any, n: T) T {
return n * n;
};
|
The problem with using the |
@zigazeljko that's the problem right there. How do you take any type constructor function and find its "equation". This is equivalent to "reversing a function" which is very difficult as far as I can tall, and also "undecidable" in many cases (when functions are not bijective). |
That would certainly solve the type representation problem. We could say that the type of a function is simply the header, as written, with equality defined up to parameter names, so that |
@zzyxyzz Yes, you got the basic idea, but there is one important detail. In your second example, the compiler would do best-effort expansion (i.e. expand what it can, and mark other things as struct {
items: [unknown]infer T,
} This expansion is enough to get a type candidate for @marler8997 Okay, "equation" may not be the correct term here, but my approach is still doable as outlined below.
This approach fails only if:
In particular, if the resulting type is a struct, you can make this approach always work by including a zero-sized member like |
So you're essentially suggesting to solve the inference problem the hard way, using a specialized backtracking engine: Given a concrete type Since most type constructors don't do anything fancy and usually confine themselves to filling in templates and doing straight-forward error checking, this procedure should be acceptably fast in practice. On the other hand, it will fail if the resulting type is constructed with I concede that this approach would probably be more effective in practice than memoization. But given the complexity of the required inference engine, and the fact that this still represents a best-effort solution, I still think that the cost-benefit ratio is unfavorable. |
Taking a type constructor and evaluating it with a generic unknown type would be extremely limiting. Any time the type is used in a branch such as an |
More use-cases: Most of the functions in mem.copy(i32, dest, source); // old
mem.copy(dest, source); // new
math.shl(i64, x, 5);
math.shl(x, 5); I think this has been discussed before, but is probably held back by the usual problems of duck-typing. With this proposal, the change could be made without making the implementations more complex or losing precision in type signatures and error messages: pub fn copy(comptime T: type, dest: []T, source: []const T) void; // old
pub fn copy(dest: []infer T, source: []const T) void; // new |
I thought I should also share my other idea to implement this for completeness. It requires work from the developer but it can support all possible cases. The idea is to allow developers to provide their own custom type constructor inverse functions. Consider a mechanism that associates a type with a user-defined inverse function. We can imagine a builtin that does this such as: // Takes a type T that resulted from a type constructor and it's inverse, namely, a function that
// will derive the type constructor inputs from the resulting type.
//
// When a compiler sees that a type constructor returns an inversible type, it will associate
// the inverse function with that type constructor which would allow that type constructor
// to be used with `infer T`.
@Inversible(comptime T: type, comptime inverse: fn(comptime T: type, comptime param: comptime_int) type For example, imagine we start with a simple type constructor like this: fn Pair(comptime A: type, comptime B: type) type {
return struct { a: A, b: B };
}; Now let's redefine it to make it inversible: fn Pair(comptime A: type, comptime B: type) type {
return @Inversible(struct { a: A, b: B }, pair_inverse);
};
fn pair_inverse(comptime T: type, comptime param: comptime_int) type {
return switch (param) {
0 => @typeInfo(T).Struct.fields[0].field_type,
1 => @typeInfo(T).Struct.fields[1].field_type,
else => unreachable,
};
} So if you had a function like this: fn foo(p: Pair(infer A, infer B)) void {
// ...
} If the function was called with an instance of inverse(Pair(u8, i32), 0) -> u8
inverse(Pair(u8, i32), 1) -> i32 The compiler would also verify internally that the inverse function is correct by taking the result of the inverse function and re-evaluating the type constructor to ensure it evaluates to the same type that was given. This would mean that any error in the inverse function that would have an affect on the program would automatically become a compile error. This also prevents the inverse function from working with different types. For example, say we had passed a different type: const T = struct { x: i32, y: i64, z: u8};
foo(T { .x = 0, .y = 0, .z = 0});
// inverse(T, 0) -> i32
// inverse(T, 1) -> i64 The inverse function would appear to work in this case, but once it evaluated This solution is a tradeoff between having the ability to support |
@mlugg, |
Consider this: fn foo(_: infer T, _: std.ArrayList(T)) void {}
const a: fn (infer T, []const T) void = foo;
const b: fn (infer T, infer T) void = a;
const c: fn (infer T, ?T) void = b;
const d: fn (infer T, infer U) void = c; None of those types can be checked to be valid until an instantiation, so what should happen when Additionally, look at this case: fn foo(_: infer T, _: std.ArrayList(T)) void {}
const a: fn (infer T, std.ArrayList(u32)) void = foo;
test {
a(@as(u32, 5), undefined);
} The instantiation created by this call passes the type check, despite the type annotation being in general incredibly wrong. In my opinion, having these precise function types which are not well-enforced may be more confusing than a well-docunented version of the status quo (with the exception that |
my understanding of this proposal is that
yes they would all be checked because of the way in that checking process the user would find that and |
running it yields an even faster error const std = @import("std");
fn foo(x: anytype, y: std.ArrayList(@TypeOf(x))) void {
_ = y;
}
const a: fn (x: anytype, y: []const @TypeOf(x)) void = foo;
const b: fn (x: anytype, y: anytype) void = a;
const c: fn (x: anytype, y: ?@TypeOf(x)) void = b;
const d: fn (x: anytype, y: anytype) void = c;
test {
_ = d;
}
|
I was writing the comment under the assumption that the entire proposal was implemented, so backreferences in function types would be valid and multiple The point of my first example is that every single one of those assignments, including the initial assignment to |
@mlugg, |
I've been sitting on this idea for a while, and since this appears to have been somewhat revived, I'd like to add my take on what this could instead look like. const std = @import("std");
fn ConstSliceParam(comptime T: type) type {
const Elem = std.meta.Elem(T);
return []const Elem;
}
fn toSlice(slice: ConstSliceParam) @TypeOf(slice) {
return slice;
}
comptime {
@compileLog(@TypeOf(toSlice("foo"))); // `[]const u8`
@compileLog(@TypeOf(toSlice(&[_]u32{ 1, 2, 3 }))); // `[]const u32`
@compileLog(@TypeOf(toSlice(@as(*[0]i16, &.{})))); // `[]const i16`
}
fn TypeArrayParam(comptime T: type) type {
const len = switch (@typeInfo(T)) {
.Struct => |structure| structure.fields.len,
.Array => |array| array.len,
else => @compileError("Expected array of types, got " ++ @typeName(T)),
};
return [len]type;
}
fn Tuple(comptime types: TypeArrayParam) type {
// basically the same code as in `std.meta.Tuple`, except
// without a need for the `CreateUniqueTuple` trick.
}
fn ArrayListPtrParam(comptime T: type) type {
std.debug.assert(@typeInfo(T).Pointer.size == .One);
const List = std.meta.Child(T);
return std.ArrayList(std.meta.Child(List.Slice));
}
fn pop3(list: ArrayListPtrParam) [3]std.meta.Child(@TypeOf(list).Slice) {
return .{
list.pop(),
list.pop(),
list.pop(),
};
} This by itself doesn't resolve the original issue, being that the types of variables still have to be referred to as fn ArrayListPtrParam(comptime T: type) type {
std.debug.assert(@typeInfo(T).Pointer.size == .One);
const List = std.meta.Child(T);
return std.ArrayList(std.meta.Child(List.Slice));
}
fn pop3(list: ArrayListPtrParam |List|) [3]std.meta.Child(List.Slice) {
return .{
list.pop(),
list.pop(),
list.pop(),
};
} And thus, unless I'm mistaken, this would remove the issue of making the type system "smart" enough to match As a plus, since the parameter type functions are just functions, more extensive type checking could also be moved into them, and out of function bodies. |
Yeah, I'm not saying there is a great one - in fact, I'm almost certain there's not. But from my perspective, I think I actually prefer the status quo behaviour, because although not overly helpful, it's explicitly so. Lazy checking at instantiation sites, aside from having potentially unbounded performance impacts which IMO is already pretty bad, can easily allow incorrect types to slip through, so lulls you into a false sense of security where the types make it look like your code is being correctly checked, but in reality it's not. I think I'd prefer a system which doesn't check anything, but is clear about that, than a system where you have to look into function internals to determine to what extent something is actually being checked. |
@mlugg fn eql(x: anytype, y: @TypeOf(x)) bool {
return x == y;
}
fn nonsense() bool {
const foo: fn(anytype, []const u8) bool = eql;
const x: i32 = 5;
const y: i32 = 6;
return foo(x, y);
} Here, the generic parameters are not checked at all, even at instantiation time, even though syntactically this looks like a proper type signature. This is why I still think that this proposal will improve things 90% of the time, and is unlikely to make things actively worse in the remaining 10%. |
@zzyxyzz Ah, so that I actually do think should be changed from status quo - my bad for not properly recalling the current situation. I think it makes sense (and is simple implementation-wise) to check signatures up to "is this argument generic" - so that would fail, because the second arg of (For the record, I still strongly support the rest of this proposal - |
Ok, that makes more sense to me now. So your counter-proposal would be to keep the |
@InKryption
This I still don't get. Could you elaborate? |
@zzyxyzz That is all a much smarter way of saying what I meant, yes. As for
All I mean is that this avoids the issues mentioned recently in this thread concerning the complicated semantics of the generalized compiler-implemented specialization logic, since it's left to userland to define it for concrete use cases. |
I think the path to getting this potentially accepted is by reducing scope and making follow up proposals. getting only the syntax change in for |
Speaking of scope, was it mentioned in this thread what was the scope of bindings introduced by fn foo(x: infer T, y: T) T {
...
// can use T here, or must I say @TypeOf(x)?
...
} |
you would be able to use T in the body yes |
Instead of introducing a new keyword, can the parser gods bless us with "comptime T" instead of "infer T", or is this ambigious? fn find(needle: infer T, haystack: ArrayList(T)) usize {...}
// vs
fn find(needle: comptime T, haystack: ArrayList(T)) usize {...} |
@Pyrolistical I feel like that would be overloading the meaning of the Everywhere else |
How ugly would this be? 😄 fn find(needle: anytype |T|, haystack: ArrayList(T)) usize {...} edit: Found this comment which seems to have had a similar idea. |
Imho something like this would also be cool too fn isGenerator(T: type) bool { return @hasDecl(T, "gen"); }
fn generate(generator: conform isGenerator) usize {...} |
To expand on this and the previous comment, maybe the captured type can be used within a comptime expression which must evaluate. I think this is better than a boolean condition, because it plays well with fn Loggable(comptime T: type) void {
// perhaps with a helper function for conciseness;
std.meta.expectField(T, .logFn, std.fmt.comptimePrint("{s} missing .logFn: LogFn!", @TypeName(T));
// ... more constraints
}
fn log(value: anytype |T| Loggable(T)) !void {
// type capture has asserted this field exists in a way visible from the function signature
try value.logFn(...);
} Maybe // same but specify the field which holds our function for ex.
fn log(comptime log_field: @TypeOf(.Tag), value: |T| Loggable(T, log_field)) !void {...} Because it's an expression, you could uhh fn foo(value: |T| {
SomeCondition(T);
if (T == i32) return;
switch (T) {
u8, u16, i8, i16 => {},
else => @compileError("maybe too cursed..?"),
}
} void {} |
I actually was thinking something more like const Foo = struct {
logger: anytype |T| isLogger(T),
} currently you have to do something like fn GetFoo(comptime T: type) type {
isLogger(T);
return struct { logger: T, };
} and the worst part is, like function coloring this problem is contagious. const Bar = struct {
foo: Foo,
baz: Baz,
}; now you'll need a |
@ItsMeSamey I don't think Thinking about how that could actually be implemented, I can't think of many things less zig-like. |
Problem
Generic functions, as currently supported in Zig have the form
This works well enough in general, but also has some problems and limitations:
1. Aesthetics
Besides the controversial keyword (#5893), the
@TypeOf
pattern is a bit of a hack, even though everybody is used to it by now. It somewhat obscures the fact thateql
takes two arguments of the same type. It also leads to unnecessary verbosity in some cases:2. Type signatures
The type of a simple function like
same(x: i32, y: i32) bool {...}
, isfn(i32, i32) bool
. For the generic functioneql
, the intended type is something likefn<T>(T,T) bool
. But the actual type inferred by the compiler seems to befn(anytype,anytype) anytype
. In my limited testing, I found that the compiler essentially considers all generic and parametric functions to have the same type. Only the number of arguments is checked. E.g., the following code compiles:It seems that the type check automatically simplifies to
const eq: any_generic_function = eql;
, which then succeeds. Maybe this is a limitation of the compiler that can be fixed independently, but then only in part. For example, the fact that both arguments ofeql
should have the same type is simply not expressible in the current system, since parameter names are not part of the type:fn(anytype, @TypeOf(?)) bool
.This state of affairs reduces the precision of type checks, and also discards valuable information that could be used by documentation generators, mataprogrammed interfaces and error messages.
3. Order dependence
Another minor nit is that generic functions constrain the order of function parameters in some cases. This, for example, is not possible:
The arguments need to be the other way round, although logically they can be in any order. This is an arbitrary, albeit minor, limitation.
Proposed solution
Change the
fn(x: anytype)
syntax tofn(x: infer T)
, which infers the type ofx
and binds it toT
. Example:This makes the intent much clearer, IMO. More importantly, the naming of inferred types allows us to talk about the types of generic functions with much greater precision:
This forms the bare bones of the proposal, but there are additional possibilities that could be easily layered on top of it:
Peer type resolution
Another straight-forward possibility is to allow peer type resolution, as suggested by @mlugg in this comment. Peer type resolution is enabled by using an
infer
with the same label more than once:This allows us to call
max(@as(u8, 5), @as(u16, 3))
and get au16
as a result. If the function were instead declared asfn max(x: infer T, y: T) T { ... }
, then a call with different argument types would be an error.Nearly the same thing could be achieved by declaring
fn max(x: infer _, y: infer _) @TypeOf(x, y) { ... }
, but this is less nice syntactically. It may also lead to over-instantiation in some cases:max(u8, u16)
,max(u16, u8)
andmax(u16, u16)
would produce separate instances. This can be avoided if the peer type resolution happens at the argument level and is not deferred to the return type calculation.Similar proposals
This proposal is an improved (I hope) version of something suggested by @YohananDiamond here: #5893 (comment). Towards the end of the comment, the following is proposed:
In the same comment it is pointed out that this makes it somewhat unclear that T is being declared here. The present proposal addresses this shortcoming and generally makes it clear what and where exactly is inferred. It also leads to a more natural representation of the type of a generic function, but is otherwise only a minor reshuffling of keywords.
Edit: There's also the proposal #6997, which provides a partial solution to the type signature imprecision problem.
Summary
Advantages over status quo:
Optionally also:
std.meta
in simple casesIt should also be mentioned that inference remains simple and fully deterministic and does not open the door to template metaprogamming or some such.
The text was updated successfully, but these errors were encountered: