Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Make field access logically safe and in line with Zen #21544

Closed
HeadHuntR404 opened this issue Sep 29, 2024 · 4 comments
Closed

Proposal: Make field access logically safe and in line with Zen #21544

HeadHuntR404 opened this issue Sep 29, 2024 · 4 comments

Comments

@HeadHuntR404
Copy link

HeadHuntR404 commented Sep 29, 2024

Bakground

Field access (write) is logically unsafe and goes against Zen.
It is inconsistent with how other unsafe operations are handled (null & error).
Consider a struct whose fields are logically dependent on each other.

const Example = struct{
    a: usize,
    // b depends on a
    b: usize,
    // c depends on b
    c: usize,
};

There is no way for the user of said struct to tell that the following field access of a is logically unsafe and causes a bug.

    var example: Example // init
    example.a = 5;

A silly, but concrete example would be to override the len property of std.ArrayList.

    var array_list: std.ArrayList(usize) // init
    array_list.items.len = 5;

The bug would be free to "travel" and the actual failure point could be far away from the unsafe field access.
For instance the following loop works perfectly fine provided that he overridden .len is within .capacity.

    for (array_list.items) |item| {
        // . . .
    }

Field access is also inconsistent with how Zig handles other unsafe accesses such as with Optionals.
"Nullable types" are marked and there are mechanisms in place to aid in safe usage.

    const safeAccess: ?usize = null;
    try stdout.print("{}\n", .{safeAccess.?});

I would argue that the need for safe field access is the same as that for safe nullable access.
Checking if a field is safe to write to and checking whether a value is set before using it can be argued to be the same kind of issue. Not having it reintroduces the same issues that have historically plagued null, but for field access (write).
The goal of the proposal is to provide a idiomatically way to express field intent that can catch accidental writes / misuse.

Proposed solution

Updated proposal
Add a special property for fields marked with _ allowing reads to be done without the prefix.
Introduce a common Zig convention using the _ prefix on fields to mark that the field has dependents and is not safe to arbitrarily change. This allows clear intent to be expressed with minimal clutter and accidental writes will be caught by the compiler.

const Example = struct {
    //regular field
    a: usize,
    // internal field
    _b: usize,
};

var example: Example = .{.a = 0, ._b = 0}; // compiles fine
var example2: Example = .{.a = 0, .b = 0}; // compile error - writing to internal field without _ prefix

example.a = 42; // compiles fine
example._b = 404; // compiles fine
example.b = 9001; // // compile error - writing to internal field without _ prefix

var read_b: usize = example.b // compiles fine
var read_b2: usize = example._b // compiles fine

The std.ArrayList example, assuming len and items are declared in std using _ prefix.

var array_list: std.ArrayList(usize) // init
var length: usize = array_list.items.len // compiles fine
var length2: usize = array_list._items._len // compiles fine
array_list._items._len = 5; // compiles fine
array_list.items.len = 5; // compile error - writing to marked field without _ prefix

Zen

The proposed change would help communicate intent more precisely resulting in more readable code.
Furthermore it will move a class of bugs to compile time errors resulting in less error prone code.
Lastly one does not need to know / remember a struct in its entirety to use and access it safely.

The following Zen points would be improved with the proposed change:

  • Communicate intent precisely.
  • Favor reading code over writing code.
  • Runtime crashes are better than bugs.
  • Compile errors are better than runtime crashes.
  • Reduce the amount one must remember.

Old proposal - kept as context for comments

Introduce an optional field keyword or symbol for marking a field as internal.
This marking provides no guarantees and the fields visibility and write ability remains unchanged.
What changes are compiler checks requiring that direct access is done trough an extra symbol similar to .? for nullable types.
Example using # as the new symbol.

const Example = struct {
    //regular field
    a: usize,
    // internal field
    #b: usize,
};

var example: Example = .{.a = 0, .#b = 0}; // compiles fine
var example2: Example = .{.a = 0, .b = 0}; // compile error - accessing an internal field directly

example.a = 42; // compiles fine
example.#b = 404; // compiles fine
example.b = 9001; // compile error - accessing an internal field directly
@rohlem
Copy link
Contributor

rohlem commented Sep 29, 2024

This seems quite closely related to #9909, which was rejected.

In the example for std.ArrayList, you pose that array_list.items.len = 5; is unsafe.
However, that is a (language-provided) field of the slice member of the array list.
I see these options for directly addressing this line:

  1. Make ([]T).len an internal property.
    Maybe it could even be argued that both len and ptr depend on each other, so both should be internal?
    So then only fields which can be set completely independently of all others should remain "regular" fields?
  2. Make ArrayList.items an internal field.
    Currently this property is part of the public interface, so without change that would be strange.
    Though rereading your proposal, it's not clear to me: Would reads be allowed without the # symbol?
  3. Somehow mark (ArrayList).items.len (and probably (ArrayList).items.ptr) as internal without marking (ArrayList).items as internal.
    I don't see relevant syntax specified in the proposal.
    I guess one option would be to replace the field's type to be a userland slice replacement type - and then probably provide the slice via a method .items()?

@AndrewKraevskii
Copy link
Contributor

Example using # as the symbol for internal fields is already possible. Just add it to variable name.

const Example = struct {
    // regular field
    a: usize,
    // internal field
    _b: usize,
};

var example: Example = .{.a = 0, ._b = 0}; // compiles fine
var example2: Example = .{.a = 0, .b = 0}; // compile error - no such field in Example

example.a = 42; // compiles fine
example._b = 404; // compiles fine
example.b = 9001; // compile error - accessing non existant field directly

@HeadHuntR404
Copy link
Author

HeadHuntR404 commented Sep 29, 2024

Those are some excellent comments.

While this proposal tackles some of the same issues as #9909, it is fundamentally different as it doesn't tries to hide or prevent access to memory that you as the caller own.

My main issues with prefixing "internal fields" with _ today, is that it is an arbitrary convention not set by the language or the community as a whole. Same can be said about other mitigation strategies such as relying on comment, naming ( internal_field_a: usize,) and coding styles forbidding field writes (get/set methods).
It would not help when working with other peoples codes.

Arbitrary convention and coding styles that mitigate the issue, has historically been present as a way to mitigate null access.
It did not stop null from becoming problematic for the programming community.
As such I find it unlikely that it would prevent field access (write) from becoming problematic.
Checking if a variable is set is also a trivial operation compared to checking if a field is logically safe to write to in Zig.
As such I would prefer an idiomatically way to express this intent in Zig.

Considering your feedback I think a better solution would be to make the prefix _ have a special property on fields.
Fields prefixed with _ can be read from normally, but writes requires the prefix.
The reasoning being that if writes are safe, the corresponding reads should be too.

const Example = struct {
    // regular field
    a: usize,
    // internal field
    _b: usize,
};
var example: Example = .{.a = 0, ._b = 0}; // compiles fine
var example2: Example = .{.a = 0, .b = 0}; // compile error - writing to internal field without _ prefix

example.a = 42; // compiles fine
example._b = 404; // compiles fine
example.b = 9001; // // compile error - writing to internal field without _ prefix

var read_b: usize = example.b // compiles fine
var read_b2: usize = example._b // compiles fine

The std.ArrayList example, assuming len and items are declared in std using _ prefix.

var array_list: std.ArrayList(usize) // init
var length: usize = array_list.items.len // compiles fine
var length2: usize = array_list._items._len // compiles fine
array_list._items._len = 5; // compiles fine
array_list.items.len = 5; // compile error - writing to internal field without _ prefix

I will update my proposal to reflect these changes

@andrewrk
Copy link
Member

Please do not file a proposal to change the language

@andrewrk andrewrk closed this as not planned Won't fix, can't repro, duplicate, stale Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants