Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand RLS for ref result type #16512

Closed
mlugg opened this issue Jul 23, 2023 · 1 comment
Closed

Expand RLS for ref result type #16512

mlugg opened this issue Jul 23, 2023 · 1 comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@mlugg
Copy link
Member

mlugg commented Jul 23, 2023

Motivation

Currently, using & uses the ref result location. This causes type information to be lost, which, as of #16163 landing, has meaningful effects on the language. First, consider this code snippet:

const S = struct { x: u32 };
const s: S = .{ .x = @intCast(val) };

This already works fine. The result type of S is forwarded to the struct init expression, which forwards field types onto each field init expression. Now, consider this (very similar) line:

const s: *const S = &.{ .x = @intCast(val) };

This currently fails to compile. The use of & means the struct init expression uses a ref result type, so has no type information to forward onto the field init expression. This means you either need to write &S{ ... }, or use @as to specify the field type.

Copy elision is also important here. Consider the following:

const T = struct { inner: struct { x: u32 } };
const t: *const T = &.{ .inner = .{ .x = 123 } };

Here, since there is not a known result type for the init expression, a copy of the inner struct cannot be avoided. The inner initialization is evaluated, then the result copied into the outer anon struct, which is then coerced to the correct type (potentially causing yet another copy).

Proposal

I'm going to explain this in terms of implementation details, because I think it's relatively easy to understand and is important to consider.

Introduce new result locations for typed refs. More specifically, expand AstGen.ResultInfo.Loc to add the following locations:

/// A pointer to the expression must be generated rather than the expression value itself.
/// The pointer child type (that is, the type of the expression) must be the provided type.
ref_ty: Zir.Inst.Ref,

When &a has a result location of .{ .ty = T }, a has a result location of .{ .ref_ty = S } where S is found through a new ptr_child ZIR instruction. This is made a little more complicated by slices, because if T is a []Foo, we cannot make S a concrete type, as it may be any [n]Foo. This can be solved by introducing an internal "unknown length array" type which ptr_child can return for slices. Care should be taken to avoid this fake type from leaking into other parts of the compiler.

Error Messages

Aside from helping with cast builtins and copy elision as already discussed, this proposal has an additional benefit of improving error messages. RLS can improve errors because field inits become individual pointer writes, so type errors are caught at the point of a field initialization rather than at the point of type coercion. #16508 details an error message which would be improved by this change.

Bonus Proposal: Removing Anonymous Struct Types

EDIT: moved to #16865

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Jul 24, 2023
@andrewrk andrewrk added this to the 0.12.0 milestone Jul 24, 2023
@mlugg
Copy link
Member Author

mlugg commented Aug 5, 2023

I realised I overcomplicated the proposed implementation of this. Just have ref_ty contain the pointer type, and do the following:

  • for array inits, use elem_type_index or similar and do a normal typed array init from there
  • for struct inits, get the pointer child type (erroring on slices) and do a normal typed struct init
  • for rvalue, do nothing and use as_node after taking a reference like today
  • (any other constructs should be similarly simple)

@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Aug 17, 2023
@andrewrk andrewrk added the accepted This proposal is planned. label Aug 17, 2023
@andrewrk andrewrk modified the milestones: 0.12.0, 0.13.0 Aug 17, 2023
mlugg added a commit to mlugg/zig that referenced this issue Sep 19, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we break even on line
count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Resolves: ziglang#16512
Resolves: ziglang#17194
mlugg added a commit to mlugg/zig that referenced this issue Sep 19, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we break even on line
count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Resolves: ziglang#16512
Resolves: ziglang#17194
mlugg added a commit to mlugg/zig that referenced this issue Sep 20, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we break even on line
count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Resolves: ziglang#16512
Resolves: ziglang#17194
mlugg added a commit to mlugg/zig that referenced this issue Sep 20, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we break even on line
count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Resolves: ziglang#16512
Resolves: ziglang#17194
mlugg added a commit to mlugg/zig that referenced this issue Sep 20, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we break even on line
count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Resolves: ziglang#16512
Resolves: ziglang#17194
mlugg added a commit to mlugg/zig that referenced this issue Sep 23, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we almost break even on
line count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Lastly, it's worth noting that this commit introduces a slightly strange
source of generic poison types: in the expression `@as(*anyopaque, &x)`,
the sub-expression `x` has a generic poison result type, despite no
generic code being involved. This turns out to be a logical choice,
because we don't know the result type for `x`, and the generic poison
type represents precisely this case, providing the semantics we need.

Resolves: ziglang#16512
Resolves: ziglang#17194
mlugg added a commit to mlugg/zig that referenced this issue Sep 23, 2023
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we almost break even on
line count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Lastly, it's worth noting that this commit introduces a slightly strange
source of generic poison types: in the expression `@as(*anyopaque, &x)`,
the sub-expression `x` has a generic poison result type, despite no
generic code being involved. This turns out to be a logical choice,
because we don't know the result type for `x`, and the generic poison
type represents precisely this case, providing the semantics we need.

Resolves: ziglang#16512
Resolves: ziglang#17194
@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Sep 25, 2023
TUSF pushed a commit to TUSF/zig that referenced this issue May 9, 2024
This commit introduces the new `ref_coerced_ty` result type into AstGen.
This represents a expression which we want to treat as an lvalue, and
the pointer will be coerced to a given type.

This change gives known result types to many expressions, in particular
struct and array initializations. This allows certain casts to work
which previously required explicitly specifying types via `@as`. It also
eliminates our dependence on anonymous struct types for expressions of
the form `&.{ ... }` - this paves the way for ziglang#16865, and also results
in less Sema magic happening for such initializations, also leading to
potentially better runtime code.

As part of these changes, this commit also implements ziglang#17194 by
disallowing RLS on explicitly-typed struct and array initializations.
Apologies for linking these changes - it seemed rather pointless to try
and separate them, since they both make big changes to struct and array
initializations in AstGen. The rationale for this change can be found in
the proposal - in essence, performing RLS whilst maintaining the
semantics of the intermediary type is a very difficult problem to solve.

This allowed the problematic `coerce_result_ptr` ZIR instruction to be
completely eliminated, which in turn also simplified the logic for
inferred allocations in Sema - thanks to this, we almost break even on
line count!

In doing this, the ZIR instructions surrounding these initializations
have been restructured - some have been added and removed, and others
renamed for clarity (and their semantics changed slightly). In order to
optimize ZIR tag count, the `struct_init_anon_ref` and
`array_init_anon_ref` instructions have been removed in favour of using
`ref` on a standard anonymous value initialization, since these
instructions are now virtually never used.

Lastly, it's worth noting that this commit introduces a slightly strange
source of generic poison types: in the expression `@as(*anyopaque, &x)`,
the sub-expression `x` has a generic poison result type, despite no
generic code being involved. This turns out to be a logical choice,
because we don't know the result type for `x`, and the generic poison
type represents precisely this case, providing the semantics we need.

Resolves: ziglang#16512
Resolves: ziglang#17194
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

2 participants