Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@rules as coroutines #5580

Merged
merged 7 commits into from
Apr 6, 2018
Merged

Conversation

stuhood
Copy link
Member

@stuhood stuhood commented Mar 9, 2018

Problem

The "complex" Selector types used to select @rule inputs for subjects other than the current subject of an @rule have proven to be entirely too complicated. Additionally, their lack of flexibility meant that we needed to introduce many intermediate types to provide links between @rules in cases where imperative logic was needed for filtering or conditionals.

The only remaining motivating factor for these complex selectors was that they allow for closed-world compilation of the @rule graph, which avoids the need for error checking in rule bodies and provides for easier composition of sets of @rules.

Solution

After experimenting with using coroutines (ie: functions containing yield statements that receive a result on their left hand side), it became clear both that:

  1. Coroutines were universally easier to understand than the Selector DSL
  2. The dependency requests made by coroutines could still easily be statically checked via a small amount of runtime introspection.

And so, this change removes all usage of SelectProjection and most usage of SelectDependencies in favor of @rule coroutines. It does not yet remove the implementation of those Selectors (although doing so will shed a massive amount of complexity from the codebase).

Result

@rules are still statically checked, but are now much easier to write. Performance is unaffected.

Select(Address)])
def resolve_unhydrated_struct(address_mapper, address_family, address):
@rule(UnhydratedStruct, [Select(AddressMapper), Select(Address)])
def resolve_unhydrated_struct(address_mapper, address):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just looking at this exact rule today to try to understand how to use SelectProjection and this change is RIDICULOUSLY cool imo

"""Given an Address and its AddressFamily, resolve an UnhydratedStruct.

Recursively collects any embedded addressables within the Struct, but will not walk into a
dependencies field, since those are requested explicitly by tasks using SelectDependencies.
"""

address_family = yield Get(AddressFamily, Dir(address.spec_path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no Dir -> AddressFamily path exists in the rule graph, does it error at this line? Would the previous use of SelectProjection have errored out earlier, during the rule graph construction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly. And that's the primary downside of this change as it stands: we would lose the validation that all rules are satisfiable.

Have been thinking more about this though, and I think that with some very, very basic parsing (essentially, "capture the RHS of yield statements"), we could get back the static checking... which might be justified.

Copy link
Member Author

@stuhood stuhood Mar 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and also, as it stands the exception would be created in the rust code rather than literally on this line. It looks like it is easy to send an exception to the generator though with generator.throw().

EDIT: Hmm, but actually: the downside of this would be that it would introduce the possibility of people accidentally/intentionally implementing error checking. And frankly, I'd rather get the static checks fixed and discourage all error handling in @rules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "capture the RHS of yield statements" is a highly reasonable goal, but I don't know if it needs to be done by parsing -- that being said I remember python's ast module is supposed to be not awful? I can't see what the exact code would look like right now, but since we know we can get the desired info we want before invoking the python rule definition (with the previous example using SelectProjection), I think there's hopefully a middle ground where we can specify rule graphs without too much ceremony (probably exactly the same in spirit as parsing for yield statements). I would love to parse some python for yield statements if it gets us there, but I think there might be a neater way -- will think on it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now resolved: all Get statements in the body of an @rule are statically checked at RuleGraph construction time.

@stuhood
Copy link
Member Author

stuhood commented Mar 10, 2018

Parsing yield statements from @rules appears to be straightforward: have added one more commit on top that demonstrates it in a single file.

So I think that with one relatively minor restriction (which technically already existed) we can preserve the static checks; namely: the request/Get/yield would need to include both the product and subject type explicitly. An example syntax might be:

# verbose form: Get(product_type, subject_type, subject)
classpath = yield Get(Classpath, ScalaSources, sources)
# potential shorthand: Get(product_type, subject_type(subject))
classpath = yield Get(Classpath, ScalaSources(snapshot))

EDIT: And after thinking about it for a few minutes more: we'd probably want to parse the calls to the Get/"whatever" constructor rather than the yield statements themselves... that would more naturally allow for (for example) building up a mutable list of requests and then yielding them. And what we need are the arguments to the constructor anyway.
EDIT2: Did that. Looks good.

@stuhood stuhood force-pushed the stuhood/enginerator branch 2 times, most recently from ec40733 to 051c0e2 Compare March 10, 2018 08:58
@illicitonion
Copy link
Contributor

I like it, much cleaner than SelectProjection imo :)

@stuhood stuhood force-pushed the stuhood/enginerator branch 2 times, most recently from 3c86fa8 to 73f73b5 Compare March 28, 2018 06:32
@stuhood
Copy link
Member Author

stuhood commented Mar 28, 2018

I've implemented the static checks for Get calls, so this is almost ready to graduate from experiment to recommended API.

The next order of business: bikeshedding. What should Get's final name be?

@illicitonion
Copy link
Contributor

Very cool :) Though I'm not sure I entirely follow the fine details of everything going on.

I'm happy to keep the name Get - we can always change it in the future if we need to (I'm assuming this will be an "unstable" API for a bit :))

My next request would be a few Python unit tests which should this actually in use

@@ -356,7 +372,7 @@ impl Select {
product: self.product().clone(),
variants: self.variants.clone(),
task: task,
entry: entry.clone(),
entry: Arc::new(entry.clone()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a little strange for entry to be both Arc'd and clone'd here. My instinct (which may be wrong) is that either it's a cheap to clone type, and we shouldn't be Arcing it, or it's an expensive to clone type, and we should always be Arcing it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I have a separate review that attempts to Arc up the rule-graph, since it is full of tiny Vecs. On the other hand, it does not appear to impact performance in a significant way... so perhaps this would not either. It's very surprising to me that all of the allocation doesn't have a noticeable impact.

@stuhood stuhood mentioned this pull request Mar 29, 2018
@stuhood stuhood force-pushed the stuhood/enginerator branch 2 times, most recently from c4b40a3 to b51dae5 Compare March 29, 2018 20:36
@stuhood stuhood requested a review from benjyw March 29, 2018 21:44
@stuhood stuhood force-pushed the stuhood/enginerator branch 3 times, most recently from 9e22ec5 to 2b70546 Compare March 31, 2018 05:22
@stuhood stuhood changed the title [experiment] @rule generators @rules as coroutines Mar 31, 2018
@stuhood
Copy link
Member Author

stuhood commented Mar 31, 2018

I haven't written any new unit tests, but I updated the documentation in the topmost commit and added validation to the Get constructor, so I believe that this is now reviewable.

@stuhood
Copy link
Member Author

stuhood commented Apr 1, 2018

@jsirois
Copy link
Contributor

jsirois commented Apr 2, 2018

I like the direction, but I'm not in love with the new syntax / concept / mixed-point dependency injection. Perhaps best to explain with a summary example. From the current change we now have:

class Get(datatype('Get', ['product_type', 'subject'])):
  pass

@rule(UnhydratedStruct, [Select(AddressMapper), Select(Address)])
def resolve_unhydrated_struct(address_mapper, address):
  address_family = yield Get(AddressFamily, Dir(address.spec_path)
  ...

So we have dependency injection through function arguments and, now, co-routine results.

This could alternatively be modeled with just dependency injection through function arguments if we allow for both terminal values and rules (functions) to be injected:

class Rule(datatype('Rule', ['product_type', 'subject_type'])):
  def __call__(subject):
    pass  # IE: Get
  
  def map(subjects):
    pass  # IE: GetMulti

@rule(UnhydratedStruct, [Select(AddressMapper), Select(Address), Rule(AddressFamily, Dir)])
def resolve_unhydrated_struct(address_mapper, address, address_family_rule):
  address_family = address_family_rule(address.spec_path)
  ...

IOW, I'm a Rule author who knows how to operate on certain selected values but I need to enlist the help of other rules imperatively. The proposed syntax change says this more directly I think and allows eliminating AST gymnastics.

Taken further (but importantly ignoring variant selects), this just means @rule becomes nothing but type annotations for a python function; ie: things eventuially could just boil down to:

@rule(UnhydratedStruct, [AddressMapper, Address, Rule(AddressFamily, Dir)])
def resolve_unhydrated_struct(address_mapper, address, address_family_rule):

@stuhood
Copy link
Member Author

stuhood commented Apr 2, 2018

Thanks @jsirois , that's definitely an interesting suggestion!

One downside I see is that it removes the trampolining that happens when we yield as a coroutine. @rules actually synchronously calling one another would put us back in a python threadstack ~all of the time. Whether this results in a net advantage in terms of time spent outside of the GIL is an open question, because if python functions were to synchronously call into other rules, I believe that the GIL would be released for the length of the call.

Taken further (but importantly ignoring variant selects), this just means @rule becomes nothing but type annotations for a python function; ie: things eventuially could just boil down to:

I believe that it's possible that the @rule declaration can actually be simplified with either of the proposals, but I haven't looked into how variants will affect it either. I'd need to look at more callsites, but it's possible that SelectVariant would become a flavor of Get, which would mean that @rule arguments could be simplified similarly.

EDIT: Although, of course, they would still be coroutines, and thus harder to invoke.

@stuhood
Copy link
Member Author

stuhood commented Apr 2, 2018

Another advantage that @illicitonion might point out is that avoiding coroutines would mean that we could more easily use Skylark at some point (which doesn't support them, afaik).

EDIT: Then again, skylark probably doesn't support decorators either.

@jsirois
Copy link
Contributor

jsirois commented Apr 2, 2018

I may be missing something here, but couldn't the implementation of the call into the engine be masked, deferring the optimizations for behind-the-scenes work?:

>>> def coroutine(value):
...   result = yield rust(value)
...   yield result
... 
>>> def rust(value):
...   return value * 2
... 
>>> def using_coroutine(value):
...   c = coroutine(value)
...   for r in c:
...     if r:
...       return r
... 
>>> using_coroutine(3)
6

Where using_coroutine is what is passed into the example above as the Rule implementation stub.

@stuhood
Copy link
Member Author

stuhood commented Apr 2, 2018

I may be missing something here, but couldn't the implementation of the call into the engine be masked, deferring the optimizations for behind-the-scenes work?:

result = yield rust(value) is still a "python calls rust" situation... so the coroutine would not actually be doing anything useful with the yield. In this PR the @rule yields to the rust code which sends back the value. Again, really not certain that it buys us less time under the GIL. But it's recursion safe at least?

Copy link
Contributor

@jsirois jsirois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished some discussion about coroutines vs rule stubs injected as values like all other values in the function arguments in slack #engine. I'd still prefer the latter on behalf of the average rule writer, but my instincts may be wrong and I don't strongly object.

@stuhood
Copy link
Member Author

stuhood commented Apr 3, 2018

After dancing around the realization a bit, it occurred to us this morning that if we have synchronous calls back and forth between python and rust, we can't have async io. While fully switching to synchronous io via rayon would be a possibility, it would be a pretty massive refactor... and there is a lot of good stuff coming down the pipe for async in rust.

@stuhood
Copy link
Member Author

stuhood commented Apr 5, 2018

The top commit now adds a test utility function that makes it easier to test @rules that make Get requests.

Copy link
Contributor

@baroquebobcat baroquebobcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the direction this is going.

The yield syntax in rules is a lot more understandable than the projecting selects.

I'd like to think a bit more about how we can validate that rules construct the Gets in expected ways going forward, but I don't think that blocks getting this in.

I do have a few minor requests about the doc updates, but I don't think they're blocking. I think doing a couple copy edit passes on the docs might be worthwhile as a separate task.

@rule(StringType, [Select(IntType)])
def int_to_str(an_int):
return str(an_int)
The first argument to the `@rule` decorator is the Product (ie, return) type for the @rule. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: second @rule needs back ticks

it will next look for any other registered Rules that can compute an IntType Product for the
the subject is already of `type(subject) == IntType`, then the @rule will be satisfiable without
any other depenencies. On the other hand, if the type _doesn't_ match, the engine doesn't give up:
it will next look for any other registered @rules that can compute an IntType Product for the
Subject (and so on, recursively.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a couple more @rule references here without backticks. Also s/depenencies/dependencies/

@rule(FormattedInt, [Select(IntType)])
def int_to_str(an_int):
return FormattedInt('{}'.format(an_int))
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd make sense to add a paragraph here explaining the FormattedInt, and maybe providing an example of what a subject would be that would cause it to be evaluated. I think discussing subjects here would help the next section do its job of explaining how to change subjects.

In cases where this is necessary, `@rule`s may be written as coroutines (ie, using the python
`yield` statement) that yield "`Get` requests" that request products for other subjects. Just like
`@rule` parameter Selectors, `Get` requests instatiated in the body of an `@rule` are statically
checked to be satisfiable in the set of installed `@rules`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/instatiated/instantiated/

pass
This @rule declares that: "for any Subject for which we can compute `Files`, we can also compute
`ConcattedFiles`". Each yielded `Get` request results in FileContent for a different File Subject
from the Files list.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it could use a little more explanation in it. Maybe something like

This `@rule` declares that: "for any Subject for which we can compute `Files`, we can also compute
`ConcattedFiles`". In order to do that, it needs the `FileContent` for each of the files in the selected 
`Files`. It gets those contents by yielding `Get` requests that ask for `FileContent`s for with each 
file as a Subject. Once those have been collected, it concatenates the contents and yields the result.

if isinstance(node, ast.FunctionDef) and node.name == func.__name__:
rule_visitor = _RuleVisitor()
rule_visitor.visit(node)
gets.extend(Get(resolve_type(p), resolve_type(s)) for p, s in rule_visitor.gets)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty slick. After we nail down the API further, or as part of doing so, I'd like to see a bunch of tests that cover different ways that Gets are defined and the error cases defined here.

return (product_type.id, subject_type.id)
else:
raise Exception('Invalid {}; expected either two or three args, but '
'got: ({})'.format(Get.__name__, render_args()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps these should raise ValueError?

@stuhood stuhood merged commit 99c2a50 into pantsbuild:master Apr 6, 2018
@stuhood stuhood deleted the stuhood/enginerator branch April 6, 2018 02:01
@stuhood stuhood mentioned this pull request Apr 6, 2018
stuhood pushed a commit that referenced this pull request Apr 6, 2018
`SelectProjection` is entirely superseded by the new `@rule` coroutine syntax from #5580.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants