Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language inconsistency: aliasing externs #21027

Open
mlugg opened this issue Aug 11, 2024 · 7 comments
Open

Language inconsistency: aliasing externs #21027

mlugg opened this issue Aug 11, 2024 · 7 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@mlugg
Copy link
Member

mlugg commented Aug 11, 2024

Consider this code:

const foo = struct {
    extern fn foo() void;
}.foo;

const bar = struct {
    extern var bar: u32;
}.bar;

What happens when you reference these symbols? It turns out the second one (with the extern var) emits a compile error, but the first does not.

Implementation-wise, this is because comptime pointer loading permits loading .extern_func values, but not general .variable values (after #20964, it permits loading .@"extern" values only if they have a function type). We can't trivially allow it for all extern values, because if a value is inspected later, .@"extern" is not an expected tag (whereas for functions this is considered a "valid" tag).

This seems pretty clearly inconsistent, so should probably be changed one way or the other -- either both are allowed, or both are disallowed. If they're disallowed, you can instead take a pointer to the whole expression. For functions the usage is the same; for other externs you'll need to use .* at times.

Personally, I would propose that both of these forms are disallowed. This is essentially just for simplicity reasons -- the thing we're trying to do here doesn't actually make sense (you can't inspect the value of an extern at comptime!), so it's not worth bending the pointer access rules to make it work. It's the same reason you can't write var x = 123; const y = x; at container scope and have y act as an alias for x.

If we go that route, some parts of std will need changing. For instance, std.c aliases externs quite extensively:

zig/lib/std/c.zig

Lines 9676 to 9685 in 531cd17

const private = struct {
extern "c" fn close(fd: fd_t) c_int;
extern "c" fn clock_getres(clk_id: clockid_t, tp: *timespec) c_int;
extern "c" fn clock_gettime(clk_id: clockid_t, tp: *timespec) c_int;
extern "c" fn copy_file_range(fd_in: fd_t, off_in: ?*i64, fd_out: fd_t, off_out: ?*i64, len: usize, flags: c_uint) isize;
extern "c" fn flock(fd: fd_t, operation: c_int) c_int;
extern "c" fn fork() c_int;
extern "c" fn fstat(fd: fd_t, buf: *Stat) c_int;
extern "c" fn fstatat(dirfd: fd_t, path: [*:0]const u8, buf: *Stat, flag: u32) c_int;
extern "c" fn getdirentries(fd: fd_t, buf_ptr: [*]u8, nbytes: usize, basep: *i64) isize;

zig/lib/std/c.zig

Lines 8791 to 8794 in 531cd17

pub const close = switch (native_os) {
.macos, .ios, .tvos, .watchos, .visionos => darwin.@"close$NOCANCEL",
else => private.close,
};

That second snippet would need to change private.close to &private.close, and likewise for other private.xyz.

@mlugg mlugg added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Aug 11, 2024
@mlugg mlugg added this to the 0.16.0 milestone Aug 11, 2024
@mlugg
Copy link
Member Author

mlugg commented Aug 11, 2024

Naturally, after writing up this proposal, I immediately realised why this works like it does today. If I write foo() at runtime, that is a 3-phase process:

  • Get the pointer value &foo
  • Dereference that pointer
  • Call the result

If we treat extern fns like other externs, then the second step... well, it can't be done at comptime because that's the rule we're introducing, but it can't be done at runtime because the pointee is comptime-only (we can't load a function value at runtime). In other words, foo() fails because foo can't stand on its own as a runtime expression.

I still think this design space is worth discussing, but status quo makes a lot more sense to me after realising this.

@rohlem
Copy link
Contributor

rohlem commented Aug 11, 2024

If I write foo() at runtime, that is a 3-phase process:

* Get the pointer value `&foo`

* Dereference that pointer

* Call the result

[...] it can't be done at runtime because the pointee is comptime-only (we can't load a function value at runtime).

I couldn't find the original proposal/discussion about the notion of "function values", maybe they've always been in the compiler implementation.
(EDIT: I now found #8383 , which may have influenced the self-hosted compiler => status-quo, but was mostly rejected.)

IIUC, function values are a way for the compiler to distinguish scenarios where it is responsible for creating the function, so it has additional information/control over it (maybe more transformation opportunities like inlining, etc., maybe also generic instantiation uses them), vs a function pointer, which may be external or even runtime-known (which would be minimal information and control).
From my understanding, at runtime calling a function pointer is a CPU-native concept, whereas "function values" aren't really.
(Well, you can sometimes read a function's assembly as data, but that's not at all what we're doing here.)

Therefore the biggest discrepancy seems to be this second step of dereferencing - it is introduced by the language, and doesn't actually (need to) exist.
In that light it shouldn't matter that we can't do it at runtime nor at comptime - because it's not necessary for it to happen at all.
Instead, calling a function pointer could be an operation directly understood by the compiler, without implying a dereference.

We could keep flagging whether it's possible to "dereference a function pointer", whatever that implies semantically, and keep doing it when we know we're pointing to a Zig-generated function.
But maybe for external functions / when it's impossible, we can provide an alternative that skips this step.

@InKryption
Copy link
Contributor

Isn't the extern part of this actually irrelevant? We can't do this with normal vars either:
image

@mlugg
Copy link
Member Author

mlugg commented Aug 11, 2024

In that light it shouldn't matter that we can't do it at runtime nor at comptime - because it's not necessary for it to happen at all.
Instead, calling a function pointer could be an operation directly understood by the compiler, without implying a dereference.

Implementation question: how do we lower function calls, then? Specifically, given expr(), how do we lower expr to ZIR? If we lower it as an rvalue, then there will be a pointer load instruction. If we lower it as an lvalue, then there are all kinds of nasty consequences, such as all functions called at comptime being lowered into the binary, and the compiler not noticing certain unmutated vars.

Since realising why this behavior exists, I've kind of come around to status quo. I'm leaving this proposal open in case anyone comes up with something better, but my current thoughts are that the example I give in the issue is just a weird but necessary consequence of the sanest possible design.

Isn't the extern part of this actually irrelevant? We can't do this with normal vars either:

The inconsistency is that you can do it with extern fn but not extern var; and the extern is the important thing there, because extern means that the value can never be truly "comptime-known". If you want a more clear-cut example, extern const has the same rule as extern var.

@rohlem
Copy link
Contributor

rohlem commented Aug 11, 2024

Specifically, given expr(), how do we lower expr to ZIR? If we lower it as an rvalue, then there will be a pointer load instruction.

I don't know why there should be a pointer load instruction assigning a pointer: const ptr = expr;
Similarly I don't see why there should be a pointer load instruction calling a function pointer: expr();
EDIT: Now I think you mean a load instruction on a pointer pointer? So *const *const fn (...) -> *const fn (...)?
If that's what you mean, then imo we should forward the lazy (reference to the link-time) value. The compiler already must have some concept of extern fn identity, so that should be used for the function pointer value. But that's just what first comes to my mind.

I've not worked on the compiler. I'm sure your point is accurate to the rules currently implemented.
But when discussing changes in behavior, I don't think the behavior of status-quo should limit a proposed solution. What we don't like we should change once we get a better idea.

@mlugg
Copy link
Member Author

mlugg commented Aug 11, 2024

I don't understand what you're saying. The expression foo, given the declaration extern var foo: u32, is not a pointer; it's a value of type u32. Writing foo means "take a pointer to foo, and dereference it". That dereference might happen at comptime (for e.g. a global const), or it might happen at runtime (for e.g. a global var). We're in the latter case for an extern. The expression &foo is a pointer; if you had to call extern functions as (&foo)(), then everything would be fine. But that's ridiculous to require.

@rohlem
Copy link
Contributor

rohlem commented Aug 11, 2024

The expression &foo is a pointer; if you had to call extern functions as (&foo)(), then everything would be fine. But that's ridiculous to require.

I still don't know what the concept of a "function value" is actually useful for.
Could we make the names of functions refer to the function's pointer instead?
Then users write foo(), the & of status-quo is implied, and everything is fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

3 participants