Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge v2 engine @rule coroutines with python 3 asyncio #7077

Closed
cosmicexplorer opened this issue Jan 13, 2019 · 0 comments · Fixed by #8639
Closed

merge v2 engine @rule coroutines with python 3 asyncio #7077

cosmicexplorer opened this issue Jan 13, 2019 · 0 comments · Fixed by #8639

Comments

@cosmicexplorer
Copy link
Contributor

Python 3 introduces asyncio, an interface to exactly the kind of concurrent processing we currently perform with the v2 engine. As of #5580, we can express v2 @rules as coroutines with yield, which we've later extended to additionally validate that @rule coroutine syntax conforms to the engine's (pretty generic) async model in #7019.

asyncio covers the implementation of some async primitives itself as well, but as we've developed the native engine as a standalone Rust binary, we would almost definitely prefer to use the engine and the typed, dependency-injected v2 rule syntax (in addition to changes in the engine's Rust code) to cover how pants async code is executed (because the engine makes this incredibly ergonomic).

Despite that, taking the time to figure out how to merge our implementation with the standard (in python 3) syntax for defining async code sounds like a fantastic step for both making the engine more generic (potentially supporting more languages than python with ease, although it's pretty good at that already), as well as making our python rules more generic (and potentially allowing other execution backends -- bazel's starlark is being experimented with in #6998).

cosmicexplorer added a commit that referenced this issue Nov 20, 2019
### Problem

Fixes #7077.

See #8635 (comment). As of #8330, we annotate all `@rule`s with a single return type, even for rules which call `yield Get(...)` within the method body. MyPy doesn't like this, and requests that rules be given a `Generator` or `Iterable` return type. This blocks type-checking (even partially) for many targets which define rules.

### Solution

- Expand `_RuleVisitor` in `rules.py` to extract `Get` calls from `async def` rules.
- Add an `__await__()` method to `Get` and create `MultiGet` to allow awaiting on multiple `Get`s in parallel (named to match the corresponding rust type).
  - Lists cannot be `await`ed in the way that it is possible to `yield [Get(...) ...]` -- the expression must be *awaitable* -- see https://www.python.org/dev/peps/pep-0492/#await-expression. In this case, `MultiGet` wrapping an iterable of `Get`s and exposing an `__await__()` method is a complete replacement for the previous `yield [Get(...) ...]` syntax.
- Edit `native.py` to allow for `@rule` coroutine methods to `return` at the end instead of `yield`.
- Convert the `@rule`s in `build_files.py` into `async` methods, and type-check the `:build_files` target!

### Result

`@rule`s can now be defined with `async`/`await` syntax:
```python
@rule
async def hydrate_struct(address_mapper: AddressMapper, address: Address) -> HydratedStruct:
  # ...
  address_family: AddressFamily = await Get(AddressFamily, Dir(address.spec_path))
  # ...
  hydrated_inline_dependencies: Tuple[HydratedStruct, ...] = await MultiGet(Get(HydratedStruct, Address, a)
                                                for a in inline_dependencies)
  # ...
  return HydratedStruct(consume_dependencies(struct, args={"address": address}))
```

As a result, plugins and backends that define `@rule`s can now be checked with MyPy!

#### Alternative Solution: Returning `Generator[Any, Any, T]`

In [the first few commits of this PR](https://github.com/pantsbuild/pants/pull/8639/files/43f73f1ac2d1e86301936a895ef08ffe8787d0f7..f7e8534c72965c5e6daa143ccefcdc7428192291), MyPy type annotations were added to `build_files.py` without adding any support for `async` methods. This worked by first adding support for `return` at the end of an `@rule` body in `native.py`, as in this PR, and annotating `@rule`s with the return type `-> Generator[Any, Any, T]`, `T` being the rule's actual return type.

This worked, but required a lot of extra effort to extract the return type `T` from the `Generator[Any, Any, T]` annotation, so this was discarded because `async def` rules require no additional work to extract the output type. For the record, this would have looked like:
```python
@rule
def hydrate_struct(address_mapper: AddressMapper, address: Address) -> Generator[Any, Any, HydratedStruct]:
  # ...
  address_family: AddressFamily
  address_family = yield Get(AddressFamily, Dir(address.spec_path))
  # ...
  hydrated_inline_dependencies: Tuple[HydratedStruct, ...]
  hydrated_inline_dependencies = yield [Get(HydratedStruct, Address, a)
                                                for a in inline_dependencies]
  # ...
  return HydratedStruct(consume_dependencies(struct, args={"address": address}))
```

Note that `x: X = yield Get(X, Y(...))` is not a valid python expression -- this is another benefit of the `async`/`await` approach.

#### Follow-Up Extension: Type-Checked `await Get[X](...)`

Another alternative extending the `await Get()` syntax was proposed in order to automatically type-check the result of the `await` call. This would have looked like:

```python
@rule
async def hydrate_struct(address_mapper: AddressMapper, address: Address) -> HydratedStruct:
  # ...
  address_family = await Get[AddressFamily](Dir(address.spec_path))
  # ...
  hydrated_inline_dependencies = await MultiGet(Get[HydratedStruct](Address, a)
                                                for a in inline_dependencies)
  # ...
  return HydratedStruct(consume_dependencies(struct, args={"address": address}))
```

**Note that the `await Get[X](...)` calls are type-checked to return the type `X`!** This means that mypy can check that later uses of the `address_family` and `hydrated_inline_dependencies` objects above are correct, which it couldn't do without a separate redundant `address_family: AddressFamily` annotation before.

The syntax extension for `Get[X](...)` was reverted in e92fecb in order to reduce complexity of the initial implementation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant