Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to improve the ergonomics and precision of type inference in generic functions #9260

Open
ghost opened this issue Jun 28, 2021 · 70 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@ghost
Copy link

ghost commented Jun 28, 2021

Problem

Generic functions, as currently supported in Zig have the form

fn eql(x: anytype, y: @TypeOf(x)) bool {
    return x == y;
}

This works well enough in general, but also has some problems and limitations:

1. Aesthetics

Besides the controversial keyword (#5893), the @TypeOf pattern is a bit of a hack, even though everybody is used to it by now. It somewhat obscures the fact that eql takes two arguments of the same type. It also leads to unnecessary verbosity in some cases:

fn foo(x: anytype, y: @TypeOf(x)) Container(@TypeOf(x)) {
    const T = @TypeOf(x);
    ...
}

2. Type signatures

The type of a simple function like same(x: i32, y: i32) bool {...}, is fn(i32, i32) bool. For the generic function eql, the intended type is something like fn<T>(T,T) bool. But the actual type inferred by the compiler seems to be fn(anytype,anytype) anytype. In my limited testing, I found that the compiler essentially considers all generic and parametric functions to have the same type. Only the number of arguments is checked. E.g., the following code compiles:

const eq: fn(anytype, f32) i32 = eql; // should have failed
assert(eq(1, 1));

It seems that the type check automatically simplifies to const eq: any_generic_function = eql;, which then succeeds. Maybe this is a limitation of the compiler that can be fixed independently, but then only in part. For example, the fact that both arguments of eql should have the same type is simply not expressible in the current system, since parameter names are not part of the type: fn(anytype, @TypeOf(?)) bool.

This state of affairs reduces the precision of type checks, and also discards valuable information that could be used by documentation generators, mataprogrammed interfaces and error messages.

3. Order dependence

Another minor nit is that generic functions constrain the order of function parameters in some cases. This, for example, is not possible:

fn find(haystack: ArrayList(@TypeOf(needle)), needle: anytype) usize {...} // error

The arguments need to be the other way round, although logically they can be in any order. This is an arbitrary, albeit minor, limitation.

Proposed solution

Change the fn(x: anytype) syntax to fn(x: infer T), which infers the type of x and binds it to T. Example:

// before:
fn eql(x: anytype, y: @TypeOf(x)) bool {...}
// after:
fn eql(x: infer T, y: T) bool {...}

// before:
fn find(needle: anytype, haystack: ArrayList(@TypeOf(needle))) usize {...}
// after:
fn find(needle: infer T, haystack: ArrayList(T)) usize {...}

This makes the intent much clearer, IMO. More importantly, the naming of inferred types allows us to talk about the types of generic functions with much greater precision:

// before:
eql: fn(anytype, anytype) anytype
// after:
eql: fn(infer T, T) bool

// before:
find: fn(anytype, anytype) anytype
// after:
find: fn(infer T, ArrayList(T)) usize

This forms the bare bones of the proposal, but there are additional possibilities that could be easily layered on top of it:

// Make type name optional:
fn wrapper(x: infer _) void { other_generic_function(x, 0); }

// Unrestrict order of inferred parameters:
fn find(haystack: ArrayList(T), needle: infer T) usize {...}
// Note: T can still only be inferred once, and only at a particular location, but it
// can be referred to before that. The procedure would be to resolve all infers first,
// and then use them in a second pass to finalize the types of the remaining arguments.

// Structural pattern matching on simple types:
fn foo(bar: ?*infer T, baz: []const T) bool {...}

// However, note that structural inference across custom types is not possible
// and will probably never be possible, because type constructors in Zig can be
// arbitrarily complex. 
fn convert(x: ArrayList(infer T)) HashSet(T) {...} // No can do. Use this instead:
fn convert(comptime T: type, x: ArrayList(T)) HashSet(T) {...}

Peer type resolution

Another straight-forward possibility is to allow peer type resolution, as suggested by @mlugg in this comment. Peer type resolution is enabled by using an infer with the same label more than once:

fn max(x: infer T, y: infer T) T {
    return if (x > y) x else y;
}

This allows us to call max(@as(u8, 5), @as(u16, 3)) and get a u16 as a result. If the function were instead declared as fn max(x: infer T, y: T) T { ... }, then a call with different argument types would be an error.

Nearly the same thing could be achieved by declaring fn max(x: infer _, y: infer _) @TypeOf(x, y) { ... }, but this is less nice syntactically. It may also lead to over-instantiation in some cases: max(u8, u16), max(u16, u8) and max(u16, u16) would produce separate instances. This can be avoided if the peer type resolution happens at the argument level and is not deferred to the return type calculation.

Similar proposals

This proposal is an improved (I hope) version of something suggested by @YohananDiamond here: #5893 (comment). Towards the end of the comment, the following is proposed:

fn max(generic a: T, b: T) T {
    return if (a > b) a else b;
}

In the same comment it is pointed out that this makes it somewhat unclear that T is being declared here. The present proposal addresses this shortcoming and generally makes it clear what and where exactly is inferred. It also leads to a more natural representation of the type of a generic function, but is otherwise only a minor reshuffling of keywords.

Edit: There's also the proposal #6997, which provides a partial solution to the type signature imprecision problem.

Summary

Advantages over status quo:

  • Preserve more type information for docgen, metaprogramming and error messages
  • Reduce visual noise and unintuitive keyword naming
  • Express intent more clearly

Optionally also:

  • Remove constraints on order of arguments
  • Allow structural pattern matching without std.meta in simple cases

It should also be mentioned that inference remains simple and fully deterministic and does not open the door to template metaprogamming or some such.

@jedisct1
Copy link
Contributor

This is a nice proposal, and indeed, having to systematically use @TypeOf() for this use case is a little bit cumbersome.

There wouldn't be a clear difference between anytype and infer, though. The only difference is that with infer, the type has a name, while with anytype, it doesn't.

So, I'm wondering if, instead of introducing a new keyword, we couldn't have anytype (perhaps renamed to infer if this improves clarity) require a name, which could be _ to discard it.

@ghost
Copy link
Author

ghost commented Jun 28, 2021

There wouldn't be a clear difference between anytype and infer, though. The only difference is that with infer, the type has a name, while with anytype, it doesn't.

So, I'm wondering if, instead of introducing a new keyword, we couldn't have anytype (perhaps renamed to infer if this improves clarity) require a name, which could be _ to discard it.

That's actually how I intended it. With this change anytype would no longer be allowed in function declarations. In principle, it's not even necessary to rename the keyword, as you say. The proposal could be changed to fn(x: anytype T, ...), but I thought if we are already changing syntax, then we can also change the keyword to something more appropriate.

@zigazeljko
Copy link
Contributor

There wouldn't be a clear difference between anytype and infer, though. The only difference is that with infer, the type has a name, while with anytype, it doesn't.

There would be a difference. Since infer could be used inside a more complex type, it would allow pattern matching and creating type constraints.

@marler8997
Copy link
Contributor

// However, note that structural inference across custom types is not possible
// and will probably never be possible, because type constructors in Zig can be
// arbitrarily complex and impure.

Oh it's definitely possible. All you need is a way to reverse the operation from a custom type to its input parameters. The compiler could maintain this mapping internally. Note this would only work for "custom types" though. This wouldn't work with non-custom types and would probably be mostly useless for them anyway:

fn foo(x: IgnoreTGiveMeU32(infer T)) void { }

fn IgnoreTGiveMeU32(comptime T: type) type { return u32; }

It's impossible to do this in the general case, unless all your functions are bijective mappings. Whether it's desirable is another question which I would have to ponder on. Some things to think about.

@SpexGuy
Copy link
Contributor

SpexGuy commented Jun 29, 2021

This doesn't work for most actual cases though, because things like std.ArrayList are just wrappers for more complex types like std.ArrayListAligned. Determining whether a function like std.ArrayList could return a given type for some input is undecidable in the general case.

@marler8997
Copy link
Contributor

marler8997 commented Jun 29, 2021

Yes the general case meaning all functions is undecidable, but it's very much possible for bijective functions. In the case of std.ArrayList, it's a very simple bijective function, so this "actual case" is quite simple if you use the table I mentioned. Let's say we have an ArrayList(u32). In this case, the compiler's "inverse table" would look like this:

ArrayList
    (u32) -> ArrayListAligned(u32, null)
    (Foo) -> ArrayListAligned(Foo, null)

So when the compiler tries to pattern match ArrayList(infer T), and receives an instance of ArrayListAligned(u32, null) (aka ArrayList(u32)), all it needs to do is look in the ArrayList table for an output type that matches ArrayListAligned(u32, null), and assign the infer T to the corresponding input parameter u32.

This only breaks down when we have non-bijective functions. For example:

fn IgnoreTGiveMeU32(comptime T: type) type { return u32; }

In this case our inverse table looks like this:

IgnoreTGiveMeU32
    (u32) -> u32
    (Foo) -> u32

So now when we try to pattern match IgnoreTGiveMeU32(infer T) with an instance of u32, we will find multiple entries that match the output type u32, which means we cannot determine the input parameter and thus the problem becomes "undecidable" as you say.

The concept of bijective functions is well-understood, and that is the exact criteria that would determine whether or not a function could support infer T.

P.S. I think that determining whether a function is actually bijective is very difficult. Determining whether the subset of input of all instantiations is bijective is very easy. Relying on these imperfect tables to determine whether a function is bijective would cause cascading errors when a conflict is instantiated. I'm not sure of an easy way to solve this, so the compiler would only know when this is violated when a conflict happens to be instantiated.

@ghost
Copy link
Author

ghost commented Jun 29, 2021

@marler8997
You're making a good point about the reverse lookup table. To ask the question "given a particular instance of ArrayList, what type T was used to construct it?", we need to have already constructed said instance, meaning that the mapping T -> ArrayList(T) must already be in the cache, and performing the reverse operation is possible. So for custom types that play nice inference is a lot easier than I thought.

Non-bijective functions would be a problem, of course. Especially in libraries, where unambiguous inference needs to be ensured for all possible child types, and not only instances currently used in the project. There are also other failure modes that should not occur in good code, but are still possible in principle, like ArrayList(T) doing something funny on April 1.

So, if we had an annotation for well-behaved type constructors (callconv(.InvertibleTypeConstructor)?), then structural pattern matching on custom types would become feasible, and maybe not even difficult. But without it, trying to infer things would be a rather adventurous undertaking.

@ikskuh
Copy link
Contributor

ikskuh commented Jun 29, 2021

@zzyxyzz i don't think all of this is worth the benefit in the end, especially as ArrayList(T) aliases with ArrayListAligned(T, null) already and a lot of comptime functions just configure other types. Or you have functions like this:

pub fn BitBuffer(comptime size: usize) type {
    return [(size + 7)/8]u8;
}

where BitBuffer(1) and BitBuffer(8) are the same type

@ghost
Copy link
Author

ghost commented Jun 29, 2021

@MasterQ32
Just to be clear, pattern matching on custom types is not part of the proposal. I think it's an intriguing possibility to keep in mind, especially now that a workable implementation idea has been suggested. But whether it can really be made to work and at what cost, and whether we want it in the first place, has to be decided separately.

As it stands, the most ambitious variant of this proposal only allows pattern matching against built-in types, such as pointers, optionals and slices, which should not pose any problems.

BTW, BitBuffer is a nice example of a non-invertible type constructor, thanks for that. I also forgot about type aliases. They would further compound the difficulties with general type inference.

@marler8997
Copy link
Contributor

marler8997 commented Jun 29, 2021

@MasterQ32 yes your example BitBuffer is a non-bijective function so it's not reversible. We could create a bijective function that wraps it like this:

pub fn BitBufferBijective(comptime size: usize) type {
    std.debug.assert(size % 8 == 0); // Blocks inputs in order to create a 1 to 1 mapping
    return BitBuffer(size);
}

This example doesn't really apply to this infer T proposal because BitBuffer takes a usize instead of a type, so there's no type to infer, but the same concept applies to functions that do accept types as input parameters. This mechanism of defining wrapper functions to make type functions bijective could be used to extract the input types from any type function.

pub fn Foo(comptime T: type) type {
    // black box, that is not bijective, this Foo is a stand-in for any type funciton.
}

pub fn bar(x: Foo(infer T) void {...}
// compile error: cannot use 'infer T' with 'Foo' because it is not bijective, the input types x and y both map to z

pub fn FooBijective(comptime T: type) type {
    std.debug.assert( limitInputsToRemoveConflicts(T) );
    return Foo(T);
}

pub fn bar(x: FooBijective(infer T) void {...}
// now it will work, T will become whatever type was passed to the caller's Foo(T) instance

I thought alot about this stuff years ago when I was thinking about how D implemented pattern matching and what the challenges and potential solutions were with it. I determined it is feasible and actually not too difficult to implement and I'd be happy to help come up with a design if we want to go this route. And to re-iterate, I'm not advocating that we do implement it, just sharing exactly what a solution would look so we can make informed decisions.

@windows-h8r
Copy link

On a comment on #5893, I suggested an alternative. If we add support for default parameters, and make default parameters go first, we can do something like this:

const std = @import("std");

fn square(comptime T: type = any, n: T) T {
    return n * n;
}

fn main() void {
    const x: u32 = 5;
    std.debug.print(
         "{} = {}\n",
         square(u32, x),
         square(x)
    );
}

@SpexGuy
Copy link
Contributor

SpexGuy commented Jun 29, 2021

The reverse lookup table doesn't work, because the outer function may never have been called.

fn foo(x: std.ArrayList(infer T)) { }
test {
  const T = std.ArrayListAligned(u32, null);
  var v: T = undefined;
  // assert(T == std.ArrayList(u32))
  foo(v);
}

std.ArrayList has never run in the above example, but the type being passed to foo is indeed a valid possible return value for it.

@marler8997
Copy link
Contributor

@SpexGuy Shoot you're right. I can't think of a way to get around this without having a way to generally reverse code evaluation.

@ghost
Copy link
Author

ghost commented Jun 29, 2021

Shoot you're right. I can't think of a way to get around this without having a way to generally reverse code evaluation.

There are some ways. E.g. the compiler could automatically expand trivial type constructors like ArrayList, or we could make the system operate in What You Write Is What You Get fashion and simply refuse to match ArrayList(T) against ArrayListAligned(T, null). The non-injectivity issue is also both fairly rare and can be caught automatically in most cases (you have to check, recursively, that none of the return paths discards the input type). So I still think general inference could be made to work in almost all non-pathological cases. But if someone were to argue that this creates complexity way beyond what is acceptable in Zig, I wouldn't disagree 😄.

@zigazeljko
Copy link
Contributor

There is a simpler and more powerful way that does not depend on memoization or other type constructor shenanigans.

Given a parameter declaration such as haystack: ArrayList(infer T), the compiler expands the type constructor into the actual type, in this case haystack: struct { items: []infer T, capacity: usize, allocator: *Allocator }. Treating that as an equation, we get const T = @typeInfo(@TypeOf(haystack.items)).Pointer.child;.

For example, the function fn find(haystack: ArrayList(infer T), needle: T) usize {...} would behave like this:

fn find(haystack: anytype, needle: anytype) usize {
    const T = @typeInfo(@TypeOf(haystack.items)).Pointer.child;
    return findImpl(T, haystack, needle);
}

fn findImpl(comptime T: type, haystack: ArrayList(T), needle: T) usize {...}

This method also allows infer T to be used more that once per function declaration.

For example, the declaration fn max(a: infer T, b: infer T) T would mean this:

fn max(a: anytype, b: anytype) anytype {
    const T = @TypeOf(a, b);
    return maxImpl(T, a, b);
}

fn maxImpl(comptime T: type, a: T, b: T) T {...}

Furthermore, this method allows using anonymous initializers for generic parameters:

fn Vec3(comptime T: type) type {
    return struct { x: T, y: T, z: T };
}

comptime {
    const a: f32 = 2;
    _ = length(.{ .x = a, .y = 3, .z = 6 });
}

fn length(v: Vec3(infer T)) T {
    // T == f32
    // @TypeOf(v) == Vec3(f32)
    return @sqrt(v.x * v.x + v.y * v.y + v.z * v.z);
}

@ghost
Copy link
Author

ghost commented Jun 29, 2021

@zigazeljko, I couldn't quite parse what you are suggesting. Would the following rephrasing correctly describe your idea?

All type constructors must eventually produce a concrete type, often a struct of some kind. E.g., ArrayList(i32) expands into

struct {
    items: []i32,
    capacity: usize,
    allocator: *Allocator,
    // 300 lines of methods and other decls omitted
}

If we perform the expansion with infer T as a placeholder type, we would get

struct {
    items: []infer T,
    capacity: usize,
    allocator: *Allocator,
    // ...
}

Matching the two structs against each other, we can now correctly infer that T == i32.

If that is the idea, then the problem is that not all type constructors do simple template expansion. E.g. consider the following type:

fn StackArray(comptime T: type) type {
    const maxSize = 1024;
    const size = @sizeOf(T);
    if (size > maxSize) @compileError("Too big!");
    return struct {
        items: [maxSize/size]T,
    };
}

StackArray(i32) will expand into struct { items: [256]i32; }, but StackArray(infer T) cannot expand past the point of @sizeOf(infer T), since the size is unknown at that time. Thus, StackArray(i32) cannot be matched against StackArray(infer T) by direct comparison, although memoization would work in this case.

But please correct me if I misunderstood something.

@ghost
Copy link
Author

ghost commented Jun 29, 2021

@windows-h8r
I must say I'm not too thrilled with the idea of optional parameters in the leading position. If we go in that direction, it may be cleaner to introduce a formal split between comptime and runtime parameters and put them in separate lists:

fn square|comptime T: type = any|(n: T) T {
    return n * n;
}
//...
square|i32|(5); // explicit
square(@as(i32,5)); // inferred

In any case, it seems to me that our proposals solve different problems, despite some overlap. Yours allows the same generic function to be used with an explicit type parameter or with an inferred one, whichever is more convenient. This sort of reminds me of #7835. A key part of my proposal, however, is the ability to talk about the types of generic functions, which seems more difficult with what you are suggesting. What would the type of fn square(comptime T: type = any, n: T) T {...} be?

@windows-h8r
Copy link

@zzyxyzz

What would the type of fn square(comptime T: type = any, n: T) T {...} be?

That's a good question. I guess we could make the parameter names part of the function type, like

fn (comptime T: type, n: T) T

This also simplifies the grammar, and complements #1717, since banning this...

const x = fn (type, anytype) anytype; // error: missing parameter names

...also bans this...

const x = fn (type, anytype) anytype { // error: missing parameter names
    // ....
};

...where a valid function would be...

const square = fn (comptime T: type = any, n: T) T {
    return n * n;
};

@windows-h8r
Copy link

windows-h8r commented Jun 29, 2021

fn square|comptime T: type = any|(n: T) T {
  return n * n;
}
//...
square|i32|(5); // explicit
square(@as(i32,5)); // inferred

The problem with using the |...| syntax is that it would be ambiguous. For example, how would a|b|(c) be parsed?

@marler8997
Copy link
Contributor

marler8997 commented Jun 29, 2021

Treating that as an equation, we get const T = @typeinfo(@typeof(haystack.items)).Pointer.child

@zigazeljko that's the problem right there. How do you take any type constructor function and find its "equation". This is equivalent to "reversing a function" which is very difficult as far as I can tall, and also "undecidable" in many cases (when functions are not bijective).

@ghost
Copy link
Author

ghost commented Jun 30, 2021

I guess we could make the parameter names part of the function type

That would certainly solve the type representation problem. We could say that the type of a function is simply the header, as written, with equality defined up to parameter names, so that fn(foo: i32) void == fn(bar: i32) void. The main thing I don't like about this is that most functions aren't generic, but would also need to carry around (unnecessarily) their parameter names in their type.

@zigazeljko
Copy link
Contributor

@zzyxyzz Yes, you got the basic idea, but there is one important detail. In your second example, the compiler would do best-effort expansion (i.e. expand what it can, and mark other things as unknown), which would produce this:

struct {
    items: [unknown]infer T,
}

This expansion is enough to get a type candidate for T, in this case @typeinfo(@typeof(x.items)).Array.child, which is then plugged back into StackArray(T) to do the actual type checking and coercion.

@marler8997 Okay, "equation" may not be the correct term here, but my approach is still doable as outlined below.

  1. Given a type constructor, the compiler does a best-effort expansion. This results in a type template containing infer placeholders and unknown parts.
  2. For each infer T, the compiler generates a typeInfo-style path that extracts T. This is always possible, since we are dealing with one infer at a time.
  3. The compiler merges all of the extracted types from step 2 using peer type resolution. This produces a type candidate for T.

This approach fails only if:

  • the resulting type does not contain infer.
  • the control flow diverges and the resulting types have no common parts containing infer.

In particular, if the resulting type is a struct, you can make this approach always work by including a zero-sized member like _phantom: [0]T (the same trick as PhantomData<T> in Rust).

@ghost
Copy link
Author

ghost commented Jun 30, 2021

So you're essentially suggesting to solve the inference problem the hard way, using a specialized backtracking engine: Given a concrete type C that should match MyType(infer T), you walk the entire branch space of MyType, while culling any return path that provably does not result in a type, does not use the input type, or is structurally incompatible with the target type C. When a compatible T is found, MyType(T) is re-computed. If the result is identical to C, we're done, otherwise we backtrack and continue the search. If the search terminates without producing viable candidates, an error is reported.

Since most type constructors don't do anything fancy and usually confine themselves to filling in templates and doing straight-forward error checking, this procedure should be acceptably fast in practice. On the other hand, it will fail if the resulting type is constructed with @Type or is selected by enumeration. In addition, excessive branching in the constructor will easily exhaust the branch quota.

I concede that this approach would probably be more effective in practice than memoization. But given the complexity of the required inference engine, and the fact that this still represents a best-effort solution, I still think that the cost-benefit ratio is unfavorable.

@marler8997
Copy link
Contributor

Taking a type constructor and evaluating it with a generic unknown type would be extremely limiting. Any time the type is used in a branch such as an if or switch, the compiler would have to punt and turn that expression into an "unknown" type as you said. And any subsequent code affected by that type would also have to be "unknown". It's possible this limitation is good enough and we could make some of our types conform to it for the sake of supporting infer T on them.

@Vexu Vexu added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Aug 6, 2021
@Vexu Vexu added this to the 0.9.0 milestone Aug 6, 2021
@ghost
Copy link
Author

ghost commented Oct 8, 2021

More use-cases:

Most of the functions in std.math and std.mem` could drop their explicit type parameters:

mem.copy(i32, dest, source); // old
mem.copy(dest, source); // new

math.shl(i64, x, 5);
math.shl(x, 5);

I think this has been discussed before, but is probably held back by the usual problems of duck-typing. With this proposal, the change could be made without making the implementations more complex or losing precision in type signatures and error messages:

pub fn copy(comptime T: type, dest: []T, source: []const T) void; // old
pub fn copy(dest: []infer T, source: []const T) void; // new

@marler8997
Copy link
Contributor

I thought I should also share my other idea to implement this for completeness. It requires work from the developer but it can support all possible cases. The idea is to allow developers to provide their own custom type constructor inverse functions.

Consider a mechanism that associates a type with a user-defined inverse function. We can imagine a builtin that does this such as:

// Takes a type T that resulted from a type constructor and it's inverse, namely, a function that
// will derive the type constructor inputs from the resulting type.
//
// When a compiler sees that a type constructor returns an inversible type, it will associate
// the inverse function with that type constructor which would allow that type constructor
// to be used with `infer T`.
@Inversible(comptime T: type, comptime inverse: fn(comptime T: type, comptime param: comptime_int) type

For example, imagine we start with a simple type constructor like this:

fn Pair(comptime A: type, comptime B: type) type {
    return struct { a: A, b: B };
};

Now let's redefine it to make it inversible:

fn Pair(comptime A: type, comptime B: type) type {
    return @Inversible(struct { a: A, b: B }, pair_inverse);
};

fn pair_inverse(comptime T: type, comptime param: comptime_int) type {
    return switch (param) {
        0 => @typeInfo(T).Struct.fields[0].field_type,
        1 => @typeInfo(T).Struct.fields[1].field_type,
        else => unreachable,
    };
}

So if you had a function like this:

fn foo(p: Pair(infer A, infer B)) void {
    // ...
}

If the function was called with an instance of Pair(u8, i32), then it would evaluate the inverse function twice and get:

inverse(Pair(u8, i32), 0) -> u8
inverse(Pair(u8, i32), 1) -> i32

The compiler would also verify internally that the inverse function is correct by taking the result of the inverse function and re-evaluating the type constructor to ensure it evaluates to the same type that was given. This would mean that any error in the inverse function that would have an affect on the program would automatically become a compile error. This also prevents the inverse function from working with different types. For example, say we had passed a different type:

const T = struct { x: i32, y: i64, z: u8};
foo(T { .x = 0, .y = 0, .z = 0});

// inverse(T, 0) -> i32
// inverse(T, 1) -> i64

The inverse function would appear to work in this case, but once it evaluated Pair(i32, i64) it would see that this the given type T is not equivalent to a Pair. Also note that this mechanism would still work for functions that only exist to set default parameters, like ArrayList. If foo accepted ArrayListAligned(infer T, infer align), then passing an ArrayList would still work as expected. Note however that foo would not be able to define a parameter with ArrayList(infer T) unless ArrayList itself also provided it's own inverse function.

This solution is a tradeoff between having the ability to support infer T in all possible cases at the cost of extra code to help the compiler for each type constructor. If we decided to support infer, then I'd like to understand what the compiler could do without a feature like this which I believe is a matter of understanding how easy it is to reverse evaluate zig code. Reversing code isn't something I've heard much about but I can easily come up with impossible cases and cases that seem difficult, For the cases that are possible, I believe it would be more difficult than implementing a normal evaluation engine, but maybe it's not as hard as it seems? I bet this is an area that's been explored so if anyone knows let me know.

@ghost
Copy link
Author

ghost commented Apr 11, 2023

@mlugg,
Thanks for pointing this out. Though I don't think it changes anything in the grand scheme of things. Generic function types would simply have to be checked lazily. This is not ideal, but par for the course with Zig's generics. It is also still an improvement over the status quo, where this type check would be simply ignored.

@mlugg
Copy link
Member

mlugg commented Apr 11, 2023

@mlugg,
Thanks for pointing this out. Though I don't think it changes anything in the grand scheme of things. Generic function types would simply have to be checked lazily.

Consider this:

fn foo(_: infer T, _: std.ArrayList(T)) void {}
const a: fn (infer T, []const T) void = foo;
const b: fn (infer T, infer T) void = a;
const c: fn (infer T, ?T) void = b;
const d: fn (infer T, infer U) void = c;

None of those types can be checked to be valid until an instantiation, so what should happen when d is instantiated and called? Should all of those old signatures get checked, or only d's? Checking them may be a lot of work in pathological cases, but not checking them easily allows invalid assignments to slip through.

Additionally, look at this case:

fn foo(_: infer T, _: std.ArrayList(T)) void {}
const a: fn (infer T, std.ArrayList(u32)) void = foo;
test {
    a(@as(u32, 5), undefined);
}

The instantiation created by this call passes the type check, despite the type annotation being in general incredibly wrong. In my opinion, having these precise function types which are not well-enforced may be more confusing than a well-docunented version of the status quo (with the exception that anytype is a terrible name in that context, #14187)

@nektro
Copy link
Contributor

nektro commented Apr 11, 2023

my understanding of this proposal is that t: infer T is intended to have exactly the same mechanics as the current anytype with the T variable initialized to @TypeOf(t).

so what should happen when d is instantiated and called? Should all of those old signatures get checked, or only d's?

yes they would all be checked because of the way const d = c; etc are assigned.

in that checking process the user would find that const b: fn (infer T, infer T) void = a; would be a compile error both because infer T was used twice and because it violates the signature of a

and a violates the signature of foo

@nektro
Copy link
Contributor

nektro commented Apr 11, 2023

running it yields an even faster error

const std = @import("std");

fn foo(x: anytype, y: std.ArrayList(@TypeOf(x))) void {
    _ = y;
}
const a: fn (x: anytype, y: []const @TypeOf(x)) void = foo;
const b: fn (x: anytype, y: anytype) void = a;
const c: fn (x: anytype, y: ?@TypeOf(x)) void = b;
const d: fn (x: anytype, y: anytype) void = c;

test {
    _ = d;
}
test.zig:6:45: error: use of undeclared identifier 'x'
const a: fn (x: anytype, y: []const @TypeOf(x)) void = foo;
                                            ^
test.zig:8:38: error: use of undeclared identifier 'x'
const c: fn (x: anytype, y: ?@TypeOf(x)) void = b;
                                     ^

@mlugg
Copy link
Member

mlugg commented Apr 11, 2023

@nektro

I was writing the comment under the assumption that the entire proposal was implemented, so backreferences in function types would be valid and multiple infer T would work using PTR.

The point of my first example is that every single one of those assignments, including the initial assignment to a, is invalid, but Sema can only determine that for specific instantiations. If, say, all but the last instantiation were valid, that means every instantiation of a generic function has to do an unbounded amount of checking of "previous" signatures, even though this ultimately doesn't even check the type properly as in my second example.

@ghost
Copy link
Author

ghost commented Apr 11, 2023

@mlugg,
But what would be the alternative? The way Zig's generics work, the best we can ever do is check concrete instantiations at comptime. It certainly sucks that generic functions cannot be type checked in advance, but that's how it is. As far as I can tell, such type checks can either be done for concrete instantiations or not at all, and I would much prefer the former, all other things being equal.

@InKryption
Copy link
Contributor

InKryption commented Apr 11, 2023

I've been sitting on this idea for a while, and since this appears to have been somewhat revived, I'd like to add my take on what this could instead look like.
I'd propose we allow not only types, but also functions of type fn (type) type where the type of a parameter is expected, wherein the input type is inferred from the call-site (same as with anytype), and the returned type is what the concrete type of the parameter will end up being. A few examples of what this could look like:

const std = @import("std");

fn ConstSliceParam(comptime T: type) type {
    const Elem = std.meta.Elem(T);
    return []const Elem;
}
fn toSlice(slice: ConstSliceParam) @TypeOf(slice) {
    return slice;
}
comptime {
    @compileLog(@TypeOf(toSlice("foo"))); // `[]const u8`
    @compileLog(@TypeOf(toSlice(&[_]u32{ 1, 2, 3 }))); // `[]const u32`
    @compileLog(@TypeOf(toSlice(@as(*[0]i16, &.{})))); // `[]const i16`
}

fn TypeArrayParam(comptime T: type) type {
    const len = switch (@typeInfo(T)) {
        .Struct => |structure| structure.fields.len,
        .Array => |array| array.len,
        else => @compileError("Expected array of types, got " ++ @typeName(T)),
    };
    return [len]type;
}
fn Tuple(comptime types: TypeArrayParam) type {
    // basically the same code as in `std.meta.Tuple`, except
    // without a need for the `CreateUniqueTuple` trick.
}

fn ArrayListPtrParam(comptime T: type) type {
    std.debug.assert(@typeInfo(T).Pointer.size == .One);
    const List = std.meta.Child(T);
    return std.ArrayList(std.meta.Child(List.Slice));
}
fn pop3(list: ArrayListPtrParam) [3]std.meta.Child(@TypeOf(list).Slice) {
    return .{
        list.pop(),
        list.pop(),
        list.pop(),
    };
}

This by itself doesn't resolve the original issue, being that the types of variables still have to be referred to as @TypeOf(variable) - but this could be resolved by changing the syntax from x: ParamFunc to x: ParamFunc |T| (or whatever else) in order to get the returned type of the function. So the last example could look like:

fn ArrayListPtrParam(comptime T: type) type {
    std.debug.assert(@typeInfo(T).Pointer.size == .One);
    const List = std.meta.Child(T);
    return std.ArrayList(std.meta.Child(List.Slice));
}
fn pop3(list: ArrayListPtrParam |List|) [3]std.meta.Child(List.Slice) {
    return .{
        list.pop(),
        list.pop(),
        list.pop(),
    };
}

And thus, unless I'm mistaken, this would remove the issue of making the type system "smart" enough to match F(X) to G(X), by placing the onus on the programmer to implement said logic.

As a plus, since the parameter type functions are just functions, more extensive type checking could also be moved into them, and out of function bodies.

@mlugg
Copy link
Member

mlugg commented Apr 11, 2023

@mlugg,
But what would be the alternative?

Yeah, I'm not saying there is a great one - in fact, I'm almost certain there's not. But from my perspective, I think I actually prefer the status quo behaviour, because although not overly helpful, it's explicitly so. Lazy checking at instantiation sites, aside from having potentially unbounded performance impacts which IMO is already pretty bad, can easily allow incorrect types to slip through, so lulls you into a false sense of security where the types make it look like your code is being correctly checked, but in reality it's not. I think I'd prefer a system which doesn't check anything, but is clear about that, than a system where you have to look into function internals to determine to what extent something is actually being checked.

@ghost
Copy link
Author

ghost commented Apr 11, 2023

@mlugg
This "illusion of type checking" is already present in the status quo, though, and was in fact one of the motivators for this proposal. For example, the following code will still compile without error:

fn eql(x: anytype, y: @TypeOf(x)) bool {
    return x == y;
}

fn nonsense() bool {
    const foo: fn(anytype, []const u8) bool = eql;
    const x: i32 = 5;
    const y: i32 = 6;
    return foo(x, y);
}

Here, the generic parameters are not checked at all, even at instantiation time, even though syntactically this looks like a proper type signature. This is why I still think that this proposal will improve things 90% of the time, and is unlikely to make things actively worse in the remaining 10%.

@mlugg
Copy link
Member

mlugg commented Apr 11, 2023

@zzyxyzz

Ah, so that I actually do think should be changed from status quo - my bad for not properly recalling the current situation. I think it makes sense (and is simple implementation-wise) to check signatures up to "is this argument generic" - so that would fail, because the second arg of eql is generic so it should have the type fn(anytype, anytype) bool. (anytype is a misleading name here, it should be generic or something, #14187). I do think checking should be done on generic function types, but that we should perhaps only check to the extent we can do eagerly.

(For the record, I still strongly support the rest of this proposal - infer is a much nicer syntax in general.)

@ghost
Copy link
Author

ghost commented Apr 11, 2023

Ok, that makes more sense to me now. So your counter-proposal would be to keep the infer syntax, but perform partial type erasure in generic function types, and only match concrete arguments (?). The type of eql would no longer be fn(infer T, T) bool, but fn(infer, infer) bool or something similar. That sounds sensible in general, but I'm still not convinced it is actually better. On the one hand, there is less risk of wrong expectations about type checking. On the other, the type checks themselves become less fine-grained and some expressiveness is lost (in APIs for example).

@ghost
Copy link
Author

ghost commented Apr 11, 2023

@InKryption
What I gather from your proposal is that a type converter function F: fn (type) type can be specified as the "type" of a generic parameter of some function f. When a concrete x: T is passed in, it is implicitly coerced to F(T) before being actually passed to an instance of f specialized on F(T). Is that correct?

And thus, unless I'm mistaken, this would remove the issue of making the type system "smart" enough to match F(X) to G(X), by placing the onus on the programmer to implement said logic.

This I still don't get. Could you elaborate?

@InKryption
Copy link
Contributor

@zzyxyzz That is all a much smarter way of saying what I meant, yes.

As for

And thus, unless I'm mistaken, this would remove the issue of making the type system "smart" enough to match F(X) to G(X), by placing the onus on the programmer to implement said logic.

All I mean is that this avoids the issues mentioned recently in this thread concerning the complicated semantics of the generalized compiler-implemented specialization logic, since it's left to userland to define it for concrete use cases.

@nektro
Copy link
Contributor

nektro commented Apr 11, 2023

I think the path to getting this potentially accepted is by reducing scope and making follow up proposals. getting only the syntax change in for anytype -> infer T would greatly increase the ergonomics of comptime generic code and allow #5893 to be superseded

@ekipan
Copy link

ekipan commented Apr 11, 2023

Speaking of scope, was it mentioned in this thread what was the scope of bindings introduced by infer?

fn foo(x: infer T, y: T) T {
  ...
  // can use T here, or must I say @TypeOf(x)?
  ...
}

@nektro
Copy link
Contributor

nektro commented Apr 12, 2023

you would be able to use T in the body yes

@Pyrolistical
Copy link
Contributor

Instead of introducing a new keyword, can the parser gods bless us with "comptime T" instead of "infer T", or is this ambigious?

fn find(needle: infer T, haystack: ArrayList(T)) usize {...}

//  vs

fn find(needle: comptime T, haystack: ArrayList(T)) usize {...}

@rohlem
Copy link
Contributor

rohlem commented Apr 18, 2024

@Pyrolistical I feel like that would be overloading the meaning of the comptime keyword for no gain in readability.

Everywhere else comptime as unary operator forces the argument expression to be comptime-evaluated.
For uniformity it should do the same thing here, only that name: type type annotations already have to be
comptime-evaluated from a language perspective, so the syntax disallows explicitly placing the operator there.
I don't see an intuitive parallel from that primary usage to the semantics proposed here.

@ghost ghost closed this as not planned Won't fix, can't repro, duplicate, stale Aug 25, 2024
@mlugg mlugg reopened this Aug 25, 2024
@jayrod246
Copy link
Contributor

jayrod246 commented Aug 26, 2024

How ugly would this be? 😄

fn find(needle: anytype |T|, haystack: ArrayList(T)) usize {...}

edit: Found this comment which seems to have had a similar idea.

@ItsMeSamey
Copy link

Imho something like this would also be cool too

fn isGenerator(T: type) bool { return @hasDecl(T, "gen"); }
fn generate(generator: conform isGenerator) usize {...}

@paperdev-code
Copy link

paperdev-code commented Oct 17, 2024

@ItsMeSamey

Imho something like this would also be cool too

fn isGenerator(T: type) bool { return @hasDecl(T, "gen"); }
fn generate(generator: conform isGenerator) usize {...}

To expand on this and the previous comment, maybe the captured type can be used within a comptime expression which must evaluate. I think this is better than a boolean condition, because it plays well with @compileError(...)

fn Loggable(comptime T: type) void {
  // perhaps with a helper function for conciseness;
  std.meta.expectField(T, .logFn, std.fmt.comptimePrint("{s} missing .logFn: LogFn!", @TypeName(T));
  // ... more constraints
}

fn log(value: anytype |T| Loggable(T)) !void {
  // type capture has asserted this field exists in a way visible from the function signature
  try value.logFn(...);
}

Maybe |Type| should imply anytype as syntactic sugar;

// same but specify the field which holds our function for ex.
fn log(comptime log_field: @TypeOf(.Tag), value: |T| Loggable(T, log_field)) !void {...}

Because it's an expression, you could uhh

fn foo(value: |T| {
  SomeCondition(T);
  if (T == i32) return;
  switch (T) {
    u8, u16, i8, i16 => {},
    else => @compileError("maybe too cursed..?"),
  }
} void {}

@ItsMeSamey
Copy link

ItsMeSamey commented Oct 24, 2024

I actually was thinking something more like

const Foo = struct {
  logger: anytype |T| isLogger(T),
}

currently you have to do something like

fn GetFoo(comptime T: type) type {
  isLogger(T);
  return struct { logger: T, };
}

and the worst part is, like function coloring this problem is contagious.
Like if you had

const Bar = struct {
 foo: Foo,
 baz: Baz,
};

now you'll need a GetBar too.

@leecannon
Copy link
Contributor

@ItsMeSamey I don't think anytype fields are on the table.

Thinking about how that could actually be implemented, I can't think of many things less zig-like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests