-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: do not perform RLS on explicitly typed struct and array inits #17194
Labels
Milestone
Comments
mlugg
added a commit
to mlugg/zig
that referenced
this issue
Sep 19, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen. This represents a expression which we want to treat as an lvalue, and the pointer will be coerced to a given type. This change gives known result types to many expressions, in particular struct and array initializations. This allows certain casts to work which previously required explicitly specifying types via `@as`. It also eliminates our dependence on anonymous struct types for expressions of the form `&.{ ... }` - this paves the way for ziglang#16865, and also results in less Sema magic happening for such initializations, also leading to potentially better runtime code. As part of these changes, this commit also implements ziglang#17194 by disallowing RLS on explicitly-typed struct and array initializations. Apologies for linking these changes - it seemed rather pointless to try and separate them, since they both make big changes to struct and array initializations in AstGen. The rationale for this change can be found in the proposal - in essence, performing RLS whilst maintaining the semantics of the intermediary type is a very difficult problem to solve. This allowed the problematic `coerce_result_ptr` ZIR instruction to be completely eliminated, which in turn also simplified the logic for inferred allocations in Sema - thanks to this, we break even on line count! In doing this, the ZIR instructions surrounding these initializations have been restructured - some have been added and removed, and others renamed for clarity (and their semantics changed slightly). In order to optimize ZIR tag count, the `struct_init_anon_ref` and `array_init_anon_ref` instructions have been removed in favour of using `ref` on a standard anonymous value initialization, since these instructions are now virtually never used. Resolves: ziglang#16512 Resolves: ziglang#17194
mlugg
added a commit
to mlugg/zig
that referenced
this issue
Sep 19, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen. This represents a expression which we want to treat as an lvalue, and the pointer will be coerced to a given type. This change gives known result types to many expressions, in particular struct and array initializations. This allows certain casts to work which previously required explicitly specifying types via `@as`. It also eliminates our dependence on anonymous struct types for expressions of the form `&.{ ... }` - this paves the way for ziglang#16865, and also results in less Sema magic happening for such initializations, also leading to potentially better runtime code. As part of these changes, this commit also implements ziglang#17194 by disallowing RLS on explicitly-typed struct and array initializations. Apologies for linking these changes - it seemed rather pointless to try and separate them, since they both make big changes to struct and array initializations in AstGen. The rationale for this change can be found in the proposal - in essence, performing RLS whilst maintaining the semantics of the intermediary type is a very difficult problem to solve. This allowed the problematic `coerce_result_ptr` ZIR instruction to be completely eliminated, which in turn also simplified the logic for inferred allocations in Sema - thanks to this, we break even on line count! In doing this, the ZIR instructions surrounding these initializations have been restructured - some have been added and removed, and others renamed for clarity (and their semantics changed slightly). In order to optimize ZIR tag count, the `struct_init_anon_ref` and `array_init_anon_ref` instructions have been removed in favour of using `ref` on a standard anonymous value initialization, since these instructions are now virtually never used. Resolves: ziglang#16512 Resolves: ziglang#17194
mlugg
added a commit
to mlugg/zig
that referenced
this issue
Sep 20, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen. This represents a expression which we want to treat as an lvalue, and the pointer will be coerced to a given type. This change gives known result types to many expressions, in particular struct and array initializations. This allows certain casts to work which previously required explicitly specifying types via `@as`. It also eliminates our dependence on anonymous struct types for expressions of the form `&.{ ... }` - this paves the way for ziglang#16865, and also results in less Sema magic happening for such initializations, also leading to potentially better runtime code. As part of these changes, this commit also implements ziglang#17194 by disallowing RLS on explicitly-typed struct and array initializations. Apologies for linking these changes - it seemed rather pointless to try and separate them, since they both make big changes to struct and array initializations in AstGen. The rationale for this change can be found in the proposal - in essence, performing RLS whilst maintaining the semantics of the intermediary type is a very difficult problem to solve. This allowed the problematic `coerce_result_ptr` ZIR instruction to be completely eliminated, which in turn also simplified the logic for inferred allocations in Sema - thanks to this, we break even on line count! In doing this, the ZIR instructions surrounding these initializations have been restructured - some have been added and removed, and others renamed for clarity (and their semantics changed slightly). In order to optimize ZIR tag count, the `struct_init_anon_ref` and `array_init_anon_ref` instructions have been removed in favour of using `ref` on a standard anonymous value initialization, since these instructions are now virtually never used. Resolves: ziglang#16512 Resolves: ziglang#17194
mlugg
added a commit
to mlugg/zig
that referenced
this issue
Sep 20, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen. This represents a expression which we want to treat as an lvalue, and the pointer will be coerced to a given type. This change gives known result types to many expressions, in particular struct and array initializations. This allows certain casts to work which previously required explicitly specifying types via `@as`. It also eliminates our dependence on anonymous struct types for expressions of the form `&.{ ... }` - this paves the way for ziglang#16865, and also results in less Sema magic happening for such initializations, also leading to potentially better runtime code. As part of these changes, this commit also implements ziglang#17194 by disallowing RLS on explicitly-typed struct and array initializations. Apologies for linking these changes - it seemed rather pointless to try and separate them, since they both make big changes to struct and array initializations in AstGen. The rationale for this change can be found in the proposal - in essence, performing RLS whilst maintaining the semantics of the intermediary type is a very difficult problem to solve. This allowed the problematic `coerce_result_ptr` ZIR instruction to be completely eliminated, which in turn also simplified the logic for inferred allocations in Sema - thanks to this, we break even on line count! In doing this, the ZIR instructions surrounding these initializations have been restructured - some have been added and removed, and others renamed for clarity (and their semantics changed slightly). In order to optimize ZIR tag count, the `struct_init_anon_ref` and `array_init_anon_ref` instructions have been removed in favour of using `ref` on a standard anonymous value initialization, since these instructions are now virtually never used. Resolves: ziglang#16512 Resolves: ziglang#17194
mlugg
added a commit
to mlugg/zig
that referenced
this issue
Sep 20, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen. This represents a expression which we want to treat as an lvalue, and the pointer will be coerced to a given type. This change gives known result types to many expressions, in particular struct and array initializations. This allows certain casts to work which previously required explicitly specifying types via `@as`. It also eliminates our dependence on anonymous struct types for expressions of the form `&.{ ... }` - this paves the way for ziglang#16865, and also results in less Sema magic happening for such initializations, also leading to potentially better runtime code. As part of these changes, this commit also implements ziglang#17194 by disallowing RLS on explicitly-typed struct and array initializations. Apologies for linking these changes - it seemed rather pointless to try and separate them, since they both make big changes to struct and array initializations in AstGen. The rationale for this change can be found in the proposal - in essence, performing RLS whilst maintaining the semantics of the intermediary type is a very difficult problem to solve. This allowed the problematic `coerce_result_ptr` ZIR instruction to be completely eliminated, which in turn also simplified the logic for inferred allocations in Sema - thanks to this, we break even on line count! In doing this, the ZIR instructions surrounding these initializations have been restructured - some have been added and removed, and others renamed for clarity (and their semantics changed slightly). In order to optimize ZIR tag count, the `struct_init_anon_ref` and `array_init_anon_ref` instructions have been removed in favour of using `ref` on a standard anonymous value initialization, since these instructions are now virtually never used. Resolves: ziglang#16512 Resolves: ziglang#17194
mlugg
added a commit
to mlugg/zig
that referenced
this issue
Sep 23, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen. This represents a expression which we want to treat as an lvalue, and the pointer will be coerced to a given type. This change gives known result types to many expressions, in particular struct and array initializations. This allows certain casts to work which previously required explicitly specifying types via `@as`. It also eliminates our dependence on anonymous struct types for expressions of the form `&.{ ... }` - this paves the way for ziglang#16865, and also results in less Sema magic happening for such initializations, also leading to potentially better runtime code. As part of these changes, this commit also implements ziglang#17194 by disallowing RLS on explicitly-typed struct and array initializations. Apologies for linking these changes - it seemed rather pointless to try and separate them, since they both make big changes to struct and array initializations in AstGen. The rationale for this change can be found in the proposal - in essence, performing RLS whilst maintaining the semantics of the intermediary type is a very difficult problem to solve. This allowed the problematic `coerce_result_ptr` ZIR instruction to be completely eliminated, which in turn also simplified the logic for inferred allocations in Sema - thanks to this, we almost break even on line count! In doing this, the ZIR instructions surrounding these initializations have been restructured - some have been added and removed, and others renamed for clarity (and their semantics changed slightly). In order to optimize ZIR tag count, the `struct_init_anon_ref` and `array_init_anon_ref` instructions have been removed in favour of using `ref` on a standard anonymous value initialization, since these instructions are now virtually never used. Lastly, it's worth noting that this commit introduces a slightly strange source of generic poison types: in the expression `@as(*anyopaque, &x)`, the sub-expression `x` has a generic poison result type, despite no generic code being involved. This turns out to be a logical choice, because we don't know the result type for `x`, and the generic poison type represents precisely this case, providing the semantics we need. Resolves: ziglang#16512 Resolves: ziglang#17194
mlugg
added a commit
to mlugg/zig
that referenced
this issue
Sep 23, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen. This represents a expression which we want to treat as an lvalue, and the pointer will be coerced to a given type. This change gives known result types to many expressions, in particular struct and array initializations. This allows certain casts to work which previously required explicitly specifying types via `@as`. It also eliminates our dependence on anonymous struct types for expressions of the form `&.{ ... }` - this paves the way for ziglang#16865, and also results in less Sema magic happening for such initializations, also leading to potentially better runtime code. As part of these changes, this commit also implements ziglang#17194 by disallowing RLS on explicitly-typed struct and array initializations. Apologies for linking these changes - it seemed rather pointless to try and separate them, since they both make big changes to struct and array initializations in AstGen. The rationale for this change can be found in the proposal - in essence, performing RLS whilst maintaining the semantics of the intermediary type is a very difficult problem to solve. This allowed the problematic `coerce_result_ptr` ZIR instruction to be completely eliminated, which in turn also simplified the logic for inferred allocations in Sema - thanks to this, we almost break even on line count! In doing this, the ZIR instructions surrounding these initializations have been restructured - some have been added and removed, and others renamed for clarity (and their semantics changed slightly). In order to optimize ZIR tag count, the `struct_init_anon_ref` and `array_init_anon_ref` instructions have been removed in favour of using `ref` on a standard anonymous value initialization, since these instructions are now virtually never used. Lastly, it's worth noting that this commit introduces a slightly strange source of generic poison types: in the expression `@as(*anyopaque, &x)`, the sub-expression `x` has a generic poison result type, despite no generic code being involved. This turns out to be a logical choice, because we don't know the result type for `x`, and the generic poison type represents precisely this case, providing the semantics we need. Resolves: ziglang#16512 Resolves: ziglang#17194
TUSF
pushed a commit
to TUSF/zig
that referenced
this issue
May 9, 2024
This commit introduces the new `ref_coerced_ty` result type into AstGen. This represents a expression which we want to treat as an lvalue, and the pointer will be coerced to a given type. This change gives known result types to many expressions, in particular struct and array initializations. This allows certain casts to work which previously required explicitly specifying types via `@as`. It also eliminates our dependence on anonymous struct types for expressions of the form `&.{ ... }` - this paves the way for ziglang#16865, and also results in less Sema magic happening for such initializations, also leading to potentially better runtime code. As part of these changes, this commit also implements ziglang#17194 by disallowing RLS on explicitly-typed struct and array initializations. Apologies for linking these changes - it seemed rather pointless to try and separate them, since they both make big changes to struct and array initializations in AstGen. The rationale for this change can be found in the proposal - in essence, performing RLS whilst maintaining the semantics of the intermediary type is a very difficult problem to solve. This allowed the problematic `coerce_result_ptr` ZIR instruction to be completely eliminated, which in turn also simplified the logic for inferred allocations in Sema - thanks to this, we almost break even on line count! In doing this, the ZIR instructions surrounding these initializations have been restructured - some have been added and removed, and others renamed for clarity (and their semantics changed slightly). In order to optimize ZIR tag count, the `struct_init_anon_ref` and `array_init_anon_ref` instructions have been removed in favour of using `ref` on a standard anonymous value initialization, since these instructions are now virtually never used. Lastly, it's worth noting that this commit introduces a slightly strange source of generic poison types: in the expression `@as(*anyopaque, &x)`, the sub-expression `x` has a generic poison result type, despite no generic code being involved. This turns out to be a logical choice, because we don't know the result type for `x`, and the generic poison type represents precisely this case, providing the semantics we need. Resolves: ziglang#16512 Resolves: ziglang#17194
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is a proposal to simplify the language implementation by limiting the scope of RLS in relation to struct and array initialization expressions.
Background
Result Location Semantics guarantees copy elision for struct and array initialization expressions. This is a useful feature for optimizations and for pinned types (in particular, async frames). In general, the idea is that when you write
my_var = .{ .x = 123, .y = 456 }
, this is "desugared" tomy_var.x = 123; my_var.y = 456;
, so no temporary is ever constructed. This feature is on the whole relatively simple in terms of implementation: however, it is complicated significantly by explicit types.Consider the array initialization expression
var foo: A = B{ 123, 456 };
. Without RLS, this is simple: we initialize a value of typeB
, then attempt to coerce it to anA
, emitting an appropriate compile error if the coercion is invalid. However, under RLS, this becomes quite non-trivial. One's first instinct might be to treat this as equivalent tofoo[0] = 123; foo[1] = 456;
(plus the necessary validation thatA
has length 2) - however, this would be incorrect, because now we never validate that the typeB
is compatible with this initialization (it's lost entirely!). Okay, so, let's try another approach: we'll first check thatA
andB
are the same type. That doesn't work either - what if the line werevar foo: ?T = T{ 123, 456 }
??T
andT
are distinct types, but coercion means this initialization is allowed. Maybe we could strip all error unions and optionals from the first type? Well, this is almost correct, but still fails, because tuples are coercible to arrays (and to equivalent tuples):This is also complicated by
comptime
fields inT
. As it turns out, the "correct" way to implement this is, I think, as follows:B
has the expected element countB
, then store to the field pointer offoo
(implicitly coercing to the field type inA
)comptime
fields inB
which were not specified, and also explicitly store those values to the corresponding fields inA
That's quite a lot of work, and a correct lowering tends to result in quite bulky ZIR.
So far I've been discussing array initializers, but the same problem exists for struct initializers, because they can be used to initialize tuples (
MyTuple{ .@"0" = 123, .@"1" = 456 }
). All this is to say that when we have both a result pointer and an explicit type, applying RLS while maintaining correct validation and compile errors becomes quite difficult, and requires a fair bit of ZIR and analysis logic.All of this is complicated even further by inferred allocations. When storing an explicitly typed struct to an inferred allocation, we use the
coerce_result_ptr
instruction to coerce the pointer to a specific type - aresolve_inferred_alloc
instruction is later used to instruct Sema to resolve the allocation's type through RLS and rewrite emitted AIR instructions as required. Unfortunately, RLS means that this could turn acomptime
tuple field into a runtime one, which we would have to retroactively emit AIR stores for. This can cause miscompilations today:In short,
var x: A = .{ 123, 456 };
is currently much easier for us to correctly lower thanvar x = A{ 123, 456 };
is.Proposal
Only apply RLS for untyped initializations (
.{ ... }
). Initializations with explicit types (T{ ... }
) construct a temporary, then store it to the result location.Clearly, the primary concern with this proposal is that it eliminates RLS in cases where users might want it. However, I do not think this practically comes up in well-written Zig code. Explicitly typed initializers are fairly rare to see - in my experience, by far the most common use for them is in
anytype
function parameters (particularly for the "extra data" DOD pattern), where RLS is not relevant (as there is no result pointer).In general, it seems to be a preferred style to write
const x: T = .{ ... }
rather thanconst x = T{ ... }
- in fact, there is an open proposal to eliminate the latter syntax entirely, in favour of type coercion (@as
when needed). If that were accepted, we would effectively get this proposal anyway, since@as
does not forward result pointers. I propose that even if that proposal is rejected, it makes sense to add this restriction, as anonymous initializations should remain preferred wherever valid (they're nicer to read and easier on the compiler implementation in several ways).To back up my claims about how frequently typed initializations eligible for RLS are used, I searched through the compiler codebase for strings of
[a-zA-Z]\{
, which are mostly array initializers. Currently, it turns out there are actually a lot of them. However, I believe that almost all of these are leftover from the stage1 days, where type inference did not work as effectively, so you needed to specify types quite often. In my opinion, almost all code making use of these initializers would be improved without them. Here are examples of the most common usages:Module.zig
The first of these is extremely common, and I know for a fact that stage1 (at least for most of its existence) could not handle this case, which explains why it's so common. The second is a fairly trivial change, but I think the second is generally agreed to be a bit nicer: in the same way that
const x: u32 = @intCast(y)
is nicer to read than the oldconst x = @intCast(u32, y)
, the readability of this type of expression is improved by the consistency of explicitly specifying the variable type.I can't find many examples which don't fit into one of these two patterns, so I'm fairly confident in saying that well-written modern Zig would quite rarely use explicitly-typed initializations, and would virtually never use them where RLS is applicable.
The text was updated successfully, but these errors were encountered: