Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace more datatype() instances with @dataclass #8423

Merged
merged 11 commits into from
Oct 10, 2019

Conversation

Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano commented Oct 8, 2019

Builds off of #8362, this time tackling trickier instances.

This introduces a new idiom of using @dataclass(unsafe_hash=True). Certain custom __new__ implementations could not be implemented using either the normal dataclass auto-generated __init__() nor through __post_init__(). When frozen=True, it is impossible to implement your own __init__() because you cannot set attribute values. So, we cannot use frozen=True but still must have a hash to work with V2. unsafe_hash=True does this for us. We use @frozen_after_init from #8431 to ensure that we don't accidentally mutate the object after.

@cosmicexplorer
Copy link
Contributor

So, we cannot use frozen=True but still must have a hash to work with V2. unsafe_hash=True does this for us; however, we must be sure to not mutate the value after creation.

Could we perhaps create a wrapper (a mixin might work?) which overrides setattr for these cases to ensure the value isn't mutated? The hash of these objects are used directly as engine nodes and it feels like a good idea to enforce what we can.

@Eric-Arellano
Copy link
Contributor Author

Could we perhaps create a wrapper (a mixin might work?) which overrides setattr for these cases to ensure the value isn't mutated? The hash of these objects are used directly as engine nodes and it feels like a good idea to enforce what we can.

I tried to create the wrapper by defining at the end of the custom __init__()’s this line: self.__setattr__ = raise_immutable_error, where raise_immutable_error is a function that raises an exception. But this line doesn’t actually do anything.

The issue is that we need the object to be mutable during init, just not after. If we set the normal def __setattr__() like you set any typical dunder method, then we can't have the __init__() that we need.

Any ideas for how to dynamically make instance variables be immutable?

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Oct 9, 2019

Could we have a flag that gets initialized in the init method? Am I making any incorrect assumptions here?

@dataclass(unsafe_hash=True)
class X:
  x: Any = None

  def __init__(self, ...):
    self.x = ...

  def __setattr__(self, key, val):
    if getattr(self, 'x', None) is not None:
      raise Exception('frozen!!!')
    else:
      super().setattr(key, val)

Copy link
Contributor

@pierrechevalier83 pierrechevalier83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice change! The code is easier to read after it. See a couple of comments.

src/python/pants/engine/isolated_process.py Show resolved Hide resolved
@@ -530,7 +531,9 @@ def hydrate_sources(
"""
# TODO(#5864): merge the target's selection of --glob-expansion-failure (which doesn't exist yet)
# with the global default!
path_globs = sources_field.path_globs.copy(glob_match_error_behavior=glob_match_error_behavior)
path_globs = dataclasses.replace(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for me, can you explain the changes to this file and how they relate to the wider PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! So we need dataclasses to be ~immutable to work with V2. When you have an immutable object / record, the way you update it is by returning a new version of it with the modified fields. dataclasses.replace() will copy over every single attribute, minus whatever you change via kwargs.

We originally achieved this via our custom .copy() method through the DatatypeMixin. Now, we use the std lib's version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 😄

Eric-Arellano added a commit that referenced this pull request Oct 9, 2019
… with V2 (#8431)

### Problem
In #8423 (comment), we realized the need to be able to not always use `frozen=True` when using dataclasses for the V2 engine. Specifically, when we need a custom `__init__()`, we cannot have `frozen=True` because `__init__()` must be able to mutate the object's attributes.

However, we still want to keep the benefits of `frozen=True` as much as possible. After we create the object, we do not want to allow any additional modifications. This is important because the V2 engine uses the `__hash__()` of an object to determine its node ID - if the object mutates after creation, the ID will become invalid.

### Solution
As suggested in #8423 (comment), introduce a new decorator `@frozen_after_init` that only allows setting attributes in the `__init__()` and then marks the object as frozen after.

This is done by setting a flag in the `__init__()` to mark that the object should there-on-out be frozen. Then, we modify `__setattr__()` to check for that flag to prohibit any mutations other than those in the `__init__()`.

### Result
`@dataclass(unsafe_hash=True)` can now be marked with `@frozen_after_init` so that we can have the custom `__init__()`s but not lose the benefits of `frozen=True`.

This also works on any arbitrary class.

We do not yet enforce that dataclasses _must_ use this attribute (or `frozen=True`), which is left for #8181.
Copy link
Contributor

@cosmicexplorer cosmicexplorer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for devising the @frozen_after_init decorator! These changes look fantastic!!!

I honestly wouldn't be uncomfortable if we continued to use this decorator forever.

@Eric-Arellano Eric-Arellano merged commit cfbfa92 into pantsbuild:master Oct 10, 2019
@Eric-Arellano Eric-Arellano deleted the fewer-datatypes branch October 10, 2019 02:05
Eric-Arellano added a commit that referenced this pull request Oct 12, 2019
Builds off of #8423.

For `native.py`, we must use `typing.NamedTuple` because those call sites can only use basic Python primitives. This is the same as a normal `namedtuple`, only that it uses a class-based definition with types built-in.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants