Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: do not perform RLS on explicitly typed struct and array inits #17194

Closed
mlugg opened this issue Sep 18, 2023 · 0 comments
Closed

Proposal: do not perform RLS on explicitly typed struct and array inits #17194

mlugg opened this issue Sep 18, 2023 · 0 comments
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@mlugg
Copy link
Member

mlugg commented Sep 18, 2023

This is a proposal to simplify the language implementation by limiting the scope of RLS in relation to struct and array initialization expressions.

Background

Result Location Semantics guarantees copy elision for struct and array initialization expressions. This is a useful feature for optimizations and for pinned types (in particular, async frames). In general, the idea is that when you write my_var = .{ .x = 123, .y = 456 }, this is "desugared" to my_var.x = 123; my_var.y = 456;, so no temporary is ever constructed. This feature is on the whole relatively simple in terms of implementation: however, it is complicated significantly by explicit types.

Consider the array initialization expression var foo: A = B{ 123, 456 };. Without RLS, this is simple: we initialize a value of type B, then attempt to coerce it to an A, emitting an appropriate compile error if the coercion is invalid. However, under RLS, this becomes quite non-trivial. One's first instinct might be to treat this as equivalent to foo[0] = 123; foo[1] = 456; (plus the necessary validation that A has length 2) - however, this would be incorrect, because now we never validate that the type B is compatible with this initialization (it's lost entirely!). Okay, so, let's try another approach: we'll first check that A and B are the same type. That doesn't work either - what if the line were var foo: ?T = T{ 123, 456 }? ?T and T are distinct types, but coercion means this initialization is allowed. Maybe we could strip all error unions and optionals from the first type? Well, this is almost correct, but still fails, because tuples are coercible to arrays (and to equivalent tuples):

const T = struct { u32, u32 };
var foo: [2]u32 = T{ 1, 2 }; // this is valid

This is also complicated by comptime fields in T. As it turns out, the "correct" way to implement this is, I think, as follows:

  • Check that B has the expected element count
  • For every element in the initialization, first coerce the element to the field type in B, then store to the field pointer of foo (implicitly coercing to the field type in A)
  • Identify any comptime fields in B which were not specified, and also explicitly store those values to the corresponding fields in A

That's quite a lot of work, and a correct lowering tends to result in quite bulky ZIR.

So far I've been discussing array initializers, but the same problem exists for struct initializers, because they can be used to initialize tuples (MyTuple{ .@"0" = 123, .@"1" = 456 }). All this is to say that when we have both a result pointer and an explicit type, applying RLS while maintaining correct validation and compile errors becomes quite difficult, and requires a fair bit of ZIR and analysis logic.

All of this is complicated even further by inferred allocations. When storing an explicitly typed struct to an inferred allocation, we use the coerce_result_ptr instruction to coerce the pointer to a specific type - a resolve_inferred_alloc instruction is later used to instruct Sema to resolve the allocation's type through RLS and rewrite emitted AIR instructions as required. Unfortunately, RLS means that this could turn a comptime tuple field into a runtime one, which we would have to retroactively emit AIR stores for. This can cause miscompilations today:

pub fn main() void {
    const A = struct { comptime u32 = 1, u32 };
    const B = struct { u32, u32 };
    var t = true;
    const val = if (t) A{ 1, 2 } else B{ 3, 4 };
    @import("std").debug.print("{}\n", .{val});
    // prints '{ 2, 65536 }' - garbage!
}

In short, var x: A = .{ 123, 456 }; is currently much easier for us to correctly lower than var x = A{ 123, 456 }; is.

Proposal

Only apply RLS for untyped initializations (.{ ... }). Initializations with explicit types (T{ ... }) construct a temporary, then store it to the result location.

Clearly, the primary concern with this proposal is that it eliminates RLS in cases where users might want it. However, I do not think this practically comes up in well-written Zig code. Explicitly typed initializers are fairly rare to see - in my experience, by far the most common use for them is in anytype function parameters (particularly for the "extra data" DOD pattern), where RLS is not relevant (as there is no result pointer).

In general, it seems to be a preferred style to write const x: T = .{ ... } rather than const x = T{ ... } - in fact, there is an open proposal to eliminate the latter syntax entirely, in favour of type coercion (@as when needed). If that were accepted, we would effectively get this proposal anyway, since @as does not forward result pointers. I propose that even if that proposal is rejected, it makes sense to add this restriction, as anonymous initializations should remain preferred wherever valid (they're nicer to read and easier on the compiler implementation in several ways).

To back up my claims about how frequently typed initializations eligible for RLS are used, I searched through the compiler codebase for strings of [a-zA-Z]\{, which are mostly array initializers. Currently, it turns out there are actually a lot of them. However, I believe that almost all of these are leftover from the stage1 days, where type inference did not work as effectively, so you needed to specify types quite often. In my opinion, almost all code making use of these initializers would be improved without them. Here are examples of the most common usages:

Module.zig

// today:
pub fn typedValue(decl: Decl) error{AnalysisFail}!TypedValue {
    if (!decl.has_tv) return error.AnalysisFail;
    return TypedValue{ .ty = decl.ty, .val = decl.val };
}
// improved:
pub fn typedValue(decl: Decl) error{AnalysisFail}!TypedValue {
    if (!decl.has_tv) return error.AnalysisFail;
    return .{ .ty = decl.ty, .val = decl.val };
}

// today:
var verify = Liveness.Verify{
    .gpa = gpa,
    .air = air,
    .liveness = liveness,
    .intern_pool = ip,
};
// improved:
var verify: Liveness.Verify = .{
    .gpa = gpa,
    .air = air,
    .liveness = liveness,
    .intern_pool = ip,
};

The first of these is extremely common, and I know for a fact that stage1 (at least for most of its existence) could not handle this case, which explains why it's so common. The second is a fairly trivial change, but I think the second is generally agreed to be a bit nicer: in the same way that const x: u32 = @intCast(y) is nicer to read than the old const x = @intCast(u32, y), the readability of this type of expression is improved by the consistency of explicitly specifying the variable type.

I can't find many examples which don't fit into one of these two patterns, so I'm fairly confident in saying that well-written modern Zig would quite rarely use explicitly-typed initializations, and would virtually never use them where RLS is applicable.

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Sep 18, 2023
@andrewrk andrewrk added this to the 0.12.0 milestone Sep 18, 2023
@andrewrk andrewrk added the accepted This proposal is planned. label Sep 18, 2023
mlugg added a commit to mlugg/zig that referenced this issue Sep 19, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we break even on line
count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Resolves: ziglang#16512
Resolves: ziglang#17194
mlugg added a commit to mlugg/zig that referenced this issue Sep 19, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we break even on line
count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Resolves: ziglang#16512
Resolves: ziglang#17194
mlugg added a commit to mlugg/zig that referenced this issue Sep 20, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we break even on line
count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Resolves: ziglang#16512
Resolves: ziglang#17194
mlugg added a commit to mlugg/zig that referenced this issue Sep 20, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we break even on line
count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Resolves: ziglang#16512
Resolves: ziglang#17194
mlugg added a commit to mlugg/zig that referenced this issue Sep 20, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we break even on line
count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Resolves: ziglang#16512
Resolves: ziglang#17194
mlugg added a commit to mlugg/zig that referenced this issue Sep 23, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we almost break even on
line count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Lastly, it's worth noting that this commit introduces a slightly strange
source of generic poison types: in the expression `@as(*anyopaque, &x)`,
the sub-expression `x` has a generic poison result type, despite no
generic code being involved. This turns out to be a logical choice,
because we don't know the result type for `x`, and the generic poison
type represents precisely this case, providing the semantics we need.

Resolves: ziglang#16512
Resolves: ziglang#17194
mlugg added a commit to mlugg/zig that referenced this issue Sep 23, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we almost break even on
line count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Lastly, it's worth noting that this commit introduces a slightly strange
source of generic poison types: in the expression `@as(*anyopaque, &x)`,
the sub-expression `x` has a generic poison result type, despite no
generic code being involved. This turns out to be a logical choice,
because we don't know the result type for `x`, and the generic poison
type represents precisely this case, providing the semantics we need.

Resolves: ziglang#16512
Resolves: ziglang#17194
TUSF pushed a commit to TUSF/zig that referenced this issue May 9, 2024
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we almost break even on
line count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Lastly, it's worth noting that this commit introduces a slightly strange
source of generic poison types: in the expression `@as(*anyopaque, &x)`,
the sub-expression `x` has a generic poison result type, despite no
generic code being involved. This turns out to be a logical choice,
because we don't know the result type for `x`, and the generic poison
type represents precisely this case, providing the semantics we need.

Resolves: ziglang#16512
Resolves: ziglang#17194
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

2 participants