Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Use separate operators for function calls and calls to function pointers #6966

Closed
SpexGuy opened this issue Nov 3, 2020 · 19 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@SpexGuy
Copy link
Contributor

SpexGuy commented Nov 3, 2020

Calls to function pointers have dramatically different characteristics than calls to comptime-known functions. To name a few:

  • stack size analysis cannot be done through a function pointer
  • calls through function pointers inhibit the optimizer
  • calls through function pointers inhibit the branch predictor
  • one function pointer may refer to different functions at different times
  • function pointers don't play well with async

In #1717, it is planned to make an explicit difference in the type system between function labels (comptime-only types which are always statically known) and function pointers (which may be runtime known). Because of the above differences, I propose that we also use separate operators to call functions and function pointers. Specifically, use () for functions and .() for pointers.

const func = fn() void {};
var ptr = &func;
var opt_ptr: ?*fn() void = ptr;

const T = struct {
    field: *fn() void,
    pub const member = fn(self: *T) void {}
};

test "examples" {
    func();
    ptr.();
    opt_ptr.?.();
    if (opt_ptr) |f| f.();

    var t: T = undefined;
    T.member(t);
    t.member();
    t.field.();
}

This also helps to disambiguate calls to member functions in types from calls to function pointers in fields.

@SpexGuy SpexGuy added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Nov 3, 2020
@andrewrk andrewrk added this to the 0.8.0 milestone Nov 3, 2020
@andrewrk andrewrk added the accepted This proposal is planned. label Nov 3, 2020
@ghost
Copy link

ghost commented Nov 3, 2020

What would the status quo be if #1717 is implemented? Was it planned that funcptr() would support one level of magic indirection*, like in C? What about funcptr.*() - it sounds like that would just work automatically (like you can do (*funcptr)() if you want to in C). You might still have two ways of doing the same thing if funcptr.() and funcptr.*() both work.

I support the idea of requiring explicit indirection, but it would create some dissonance with . operator in Zig, which does the magic indirection (we don't have ->). Actually, it means Zig is going to invert C in this regard: C has magic indirection for function pointer calls but not struct field access, Zig will have magic indirection for struct field access but not function pointer calls.

*not a technical term

@ghost
Copy link

ghost commented Nov 3, 2020

What exactly is the benefit of this? The main difference between function pointers and labelled functions is non-functional (pardon the pun), i.e., it is in how much optimization and static analysis can be performed at compile time, not in how the function behaves at runtime. I'm not sure that visually signalling this difference is important enough to require special syntax.

Also, If we are going to do this, a simple dereference (foo.*()) as suggested by @dbandstra would seem more consistent.

@SpexGuy
Copy link
Contributor Author

SpexGuy commented Nov 4, 2020

funcptr.* doesn't work, because the type of that expression is a function label, which is a comptime-only type. So this expression must be evaluated at compile time, which doesn't work for a runtime pointer.

@ghost
Copy link

ghost commented Nov 4, 2020

funcptr.* doesn't work, because the type of that expression is a function label, which is a comptime-only type. So this expression must be evaluated at compile time, which doesn't work for a runtime pointer.

Under the hood, a function label is nothing but a named, comptime-known function pointer. So saying that the type of a dereferenced function pointer is a label doesn't sound right to me.

However, on second thought, dereferencing a function pointer is more akin to inlining a function than to accessing a callable object, which would make the funptr.* syntax indeed less suitable.

@Rocknest
Copy link
Contributor

Rocknest commented Nov 4, 2020

Are you sure that function pointers so different and so bad that they deserve special treatment? Its almost identical to fields-via-pointer.

stack size analysis cannot be done through a function pointer

I think its planned for function pointers to have stack size metadata.

calls through function pointers inhibit the branch predictor

Fun fact: short-circuit operators sometimes inhibit the branch predictor

function pointers don't play well with async

Calls to async function pointers are done via builtin, right?

@rohlem
Copy link
Contributor

rohlem commented Nov 4, 2020

I don't see a way this explicitness would benefit me personally.

Calls to function pointers have dramatically different characteristics than calls to comptime-known functions.

I think general-enough code benefits from the ability to uniformly call a function (comptime-known or not). Having to differentiate them means an increase in complexity. You would need to update every call site if you choose to switch from one to the other - unless you provide a comptime-known wrapper function... which is just an extra step to get back to status-quo.

@ghost
Copy link

ghost commented Nov 4, 2020

This also helps to disambiguate calls to member functions in types from calls to function pointers in fields.

That's a good point, I overlooked that initially. Although the same might also be achieved without distinct call syntax:

t.member();
(t.field)();

In all other cases, the different call notations seem to be purely ornamental, in the sense that they don't convey any information to the compiler that it does not have already. Writing f.() instead of f() would be simply a required acknowledgement from the programmer that they are using a dynamic function pointer and not a static label.

Are there any other cases apart from the member function vs pointer-in-field situation, where the difference between the two call syntaxes would be semantic?

@ghost
Copy link

ghost commented Nov 4, 2020

I support a different syntax for function pointer calls, however my concern is that we need a way of communicating stack growth, as stated in the recursion issue (too lazy to find a link rn). Seems to me that we would benefit from making this mandatory, but I don't really see how we would do that without resorting to builtins, which would be cumbersome. However, maybe the relative nicheness and inherent danger of this operation would warrant that?

@andrewrk andrewrk removed the accepted This proposal is planned. label Nov 5, 2020
@SpexGuy
Copy link
Contributor Author

SpexGuy commented Nov 7, 2020

myStruct.myFunctionPtr.();, why . ?

One major reason to do this is that it disambiguates between "read a member of the struct and invoke that pointer" and "look in the namespace of the struct's type for a function with this name, and pass the struct as an implicit first parameter".

"ducktyping"

I take your point here, that it's extra syntax that may be unnecessary. @dbandstra 's argument is more convincing: "it would create some dissonance with . operator in Zig, which does the magic indirection (we don't have ->)." Please note that the "accepted" label has been removed and this proposal is being reconsidered.

But I feel like I need to point out that ducktyping usually refers to the same syntax working on objects of multiple types. Not accepting this proposal would be duck typing: allowing () on both functions and function pointers.

As language grows, more people start to depend on it

Zig is openly not done yet. Once it hits 1.0, backwards compatibility will be considered if any further changes need to be made. Until then it is strongly advised not to depend on language stability at all. Please don't use the current version of Zig in production, or in anything that isn't a hobby project.

@ghost
Copy link

ghost commented Nov 8, 2020

One major reason to do this is that it disambiguates between "read a member of the struct and invoke that pointer" and "look in the namespace of the struct's type for a function with this name, and pass the struct as an implicit first parameter".

This ambiguity won't be eliminated like this -- there's still ambiguity between member access, and referencing a function and not calling it (for instance @call(.{.modifier = .never_inline}, object.member, .{})), and eliminating that would require more twists of syntax than would be practical. In light of that, I don't think the ambiguity is a big enough deal to worry about -- global variables require one more memory access than locals, yet we're perfectly comfortable putting them on equal syntactic footing.

@0syf3r
Copy link

0syf3r commented Nov 10, 2020

Why should the programmer have to worry about how the language works internally? Unless there are hidden footguns, I'm in favor of keeping the status quo since this proposal just creates more work for the programmer with no apparent benefits. Remember: we should optimize the language for the programmer, not the compiler.

@ikskuh
Copy link
Contributor

ikskuh commented Nov 10, 2020

Why should the programmer have to worry about how the language works internally? Unless there are hidden footguns, I'm in favor of keeping the status quo since this proposal just creates more work for the programmer with no apparent benefits. Remember: we should optimize the language for the programmer, not the compiler.

The benefit for the programmer is knowledge. It's a huge difference if i call a function (which is always the same) or i call a function pointer (which may change between invocations).

It's not at all a problem for the compiler to differentiate between function pointer and function invocation, but it makes a difference for the programmer who reviews the code and sees it for the first time.

Communicate intent precisely.

@ghost
Copy link

ghost commented Nov 10, 2020

I think it shouldn't be this easy -- function pointers are not as "nice" as functions, their invocations should reflect that. Adding a different syntax does nothing if that syntax leaves just as much to the imagination.

I don't think there should be any special syntax for function pointer invocation at all -- I think there should be a builtin, @ptrCall(stack_growth, func, .{args...}), and that builtin should be the only way to invoke function pointers. (It does not take a modifier, as none of the @call modifiers make sense for function pointers.)

@ghost
Copy link

ghost commented Nov 10, 2020

The benefit for the programmer is knowledge. It's a huge difference if i call a function (which is always the same) or i call a function pointer (which may change between invocations).

Doesn't this apply to all mutable variables? I still fail to see what makes functions in particular so different. We don't require @getCurrentValue(x) to access non-constants, even though they too may have changed since the last access... and could even lead to runtime errors if our mental model of what they hold is incorrect.

However, maybe the relative nicheness and inherent danger of this operation would warrant that?

If you are doing something really, ahem, interesting, like hot-plugging new functionality into a global jump table while the interpreter is still running... then it's your own business to know about the dangers of function pointers. But under normal circumstances, I can't actually think of any substantial footguns involving FPs that could arise purely from inattention and could therefore be remedied by a more eye-catching syntax. I'd be interested in seeing some examples.

@ityonemo
Copy link
Sponsor Contributor

ityonemo commented Nov 10, 2020

neutral on the choice, but wanted to provide some language design context from other communities:

There are other languages that do "the equivalent of this", in particular, elixir and ruby: https://hashrocket.com/blog/posts/elixir-functions-ruby-lambdas. In those languages, the choice is more syntactical (both languages support calling a function without parentheses, so the syntax is strictly necessary).

It does occasionally cause issues in forums when folks coming from python are not sure why their lambdas aren't working. I imagine something similar could happen for people coming from C, but this can easily be fixed with a helpful compiler error message that reminds people how function pointers are called in zig. This will be really easy since in zig there is no variable shadowing (IIRC) and so identifier resolution should be dead easy.

@kyle-github
Copy link

Other than the CPU's branch predictor getting trashed by this (at least potentially), are there any impacts to how a compiler would treat a function called through a mutable pointer? I think most optimizations like leaf functions still apply.

Most compilers and CPUs do pretty well with function pointers (thanks to decades of C++). Obviously if you have a function pointer that you change all the time, then your branch predictor is going to give up and you'll take a large performance hit. But if you use this as something like vtable entries (relatively static), then the predictor should do fairly well and then your cost will be low. Perhaps even as low as a direct branch in terms of cycles.

I think given how . works, it makes more sense to allow functions and pointers to functions to be treated the same when calling the function. It is at least more consistent and since there isn't much of a performance hit in the 99% case...

@ghost
Copy link

ghost commented Nov 14, 2020

Given that we know statically whether a given pointer is to a function, I think we are actually perfectly safe with no new syntax -- func.*(args). Yes, we are technically dereferencing to a function label, but no actual loading need be done, since we know statically that this is the case -- and this is really no different from referencing a function label in a normal function call, func(args), as long as we only call it and don't do anything else.

@kyle-github
Copy link

I think I have lost the plot a bit here. The original idea from @SpexGuy was to solve the following:

Calls to function pointers have dramatically different characteristics than calls to comptime-known functions. To name a few:

  1. stack size analysis cannot be done through a function pointer
  2. calls through function pointers inhibit the optimizer
  3. calls through function pointers inhibit the branch predictor
  4. one function pointer may refer to different functions at different times
  5. function pointers don't play well with async

(I numbered them instead of leaving the original bullet points so that I can reference them directly.)

Let's take these one at a time.

  1. Stack size analysis. I would say that this issue remains a problem simply because, at the very least, it is hard for the compiler to figure this out and possibly impossible in some cases. E.g. when a dynamic library is loaded at run time and a function pointer is used to access a function within that dynamic library, how would you know what stack size is needed?

That all said, how does a different syntax help here? All this does is make it a little easier for the programmer to know that the compiler is not going to be able to determine the stack size. There is @EleanorNB's proposal about a much more explicit call syntax. That might work if you can get that information about a function in a dynamically loaded DLL at runtime.

Of all the different bikeshedding here, @EleanorNB's is the only way I can see to provide that stack information, and it would need to be either a fixed large number or a runtime-known number. You could flip this around and annotate the function pointer with this information and any function that was assigned to the function pointer would need to match that annotation.

  1. Inhibiting the optimizer. I am not sure that this is actually true in most common cases. Mostly functions are optimized within the function itself. Only LTO really does much more than that. Coming back to the worst-case scenario of loading a function at runtime, then LTO is not applicable anyway.

IMHO, this is a weak reason. LTO seems like the only optimization pass that will not work as well and that is going to be true in the runtime load case no matter what.

  1. Inhibit the branch predictor. This was true up until several years ago. At this point, as long as you do not change the function that the pointer points to very often, the predictor will do pretty well (they will even handle a small number of possible indirect functions fairly well). C++ has driven this and the statistics I see show that this is working well (note that recent information leak attacks use the timing behavior of the branch predictor in some circumstances to get information). I do not think that this is a significant problem. If you are writing this kind of code, you are doing it explicitly, so you know what is going on.

  2. Pointers can reference different functions at different times. Whether knowing that this is happening or not is useful is the question here. I am not sure why this is separate from the other issues. Is there a performance implication here that is not covered elsewhere? Zig does not tell you that something different is happening when you access a global or thread-local or a field in struct directly vs. a pointer.

  3. interaction with async. I am not familiar enough with the async implementation to comment too much here, but if @SpexGuy says it does not play well with async, I'll believe it! But what is the programmer supposed to do about it?

I think that issues 1 and 5 identify the most urgent issues with function pointers. However, I am not sure how a different calling syntax helps with those issues.

Of all the proposals, I think only @EleanorNB's really would provide the information needed to solve these cases, but it seems like a lot of ceremony for the very common case of using function pointers in a v-table. In that case you have a very limited number of possibilities. You can flip this around and add annotations to the pointer declaration instead of the call site.

At least to me, and perhaps I am missing something important, it all boils down to this:

  • There are cases, e.g. async and calculating stack size, where functions accessed through pointers are going to cause the compiler problems because it does not have enough information.
  • There may be some performance implications of calling functions through pointers.
  • On the other hand, Zig already hides performance and implementation details about whether a field is accessed via a pointer or not and whether a variable is a global or thread-local or not.
  • Most of the options here do not actually give the compiler sufficient information to solve the stack size/async problem.
  • Loading functions at runtime from a dynamic DLL is going to thwart LTO and require either runtime discovery of stack size requirements or some sort of maximum size annotation. This is a common case.

Is the need to know whether a function is accessed through a pointer sufficient to make this a place where Zig is more explicit than ordinary fields and variables? Doing so makes the language more explicit but also makes it more complex.

@SpexGuy
Copy link
Contributor Author

SpexGuy commented Nov 23, 2021

I agree, this proposed feature does not solve the problems that it set out to. Additionally, as mentioned, having a distinction between x() and x.*() is inconsistent with the language decision to make a.*.b implicit.

@SpexGuy SpexGuy closed this as completed Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

9 participants