Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use the asttokens 3rdparty lib to make @rule parsing errors very smooth #7023

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Jan 4, 2019

Problem

As @stuhood and @baroquebobcat pointed out in #7019, it seems a little silly to only be able to provide the location of the start of the offending @rule for yield statements which would lead to ambiguity (see #7019), instead of the actual yield statement to blame. In that PR, we punted and used ast.dump(), but in the meantime I was able to find a 3rdparty library (asttokens) which performs the appropriate bookkeeping to get the correct source text as well as line and column offsets of individual nodes within an ast parse tree.

If you are interested in why this isn't trivial, I recommend trying an ast.dump(node, include_attributes=True) and trying to hook up the numbers yourself (and if it is obvious, let me know, but I don't think this particular 3rdparty lib is too difficult of a dependency to bring in).

Solution

  • Add asttokens 3rdparty requirement.
  • Use asttokens.ASTTokens and asttokens.LineNumbers to get the text and position of the offending yield statement in the original file.
  • Re-add indentation that we had to remove to parse the @rule body in the first place so that our error message is exactly what the source text is.

Result

An invalid rule like:

@rule(A, [])
def f():
  yield Get(SomeProduct, SomeInput, some_input)
  yield A()

Will now raise an error like:

In function f: yield in @rule without assignment must come at the end of a series of statements.

A yield in an @rule without an assignment is equivalent to a return, and we
currently require that no statements follow such a yield at the same level of nesting.
Use `_ = yield Get(...)` if you wish to yield control to the engine and discard the result.

The invalid statement was:
test_rules.py:45:2
  yield Get(SomeProduct, SomeInput, some_input)

The rule defined by function `f` begins at:
test_rules.py:43:0
@rule(A, [])
def f():

I can now jump to a parse error (an invalid yield statement, as above) and hit the return key in emacs's compilation-mode to jump again to the precise source text which caused the error. This makes the prospect of adding more ast magic in the future (or not, and continuing to use what was added in #7019) in a highly structured and specific way much more plausible and understandable to users not incredibly well-versed in the semantics of the v2 engine.

Copy link
Contributor

@baroquebobcat baroquebobcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me! Could you add an example of what the error message looks like to the summary / commit message?

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Jan 4, 2019 via email

@cosmicexplorer
Copy link
Contributor Author

Added an example of the output, and added the column to the location of the @rule definition in the error message since I didn't realize we had that information.

@baroquebobcat
Copy link
Contributor

I think would be better if the top of the error message directly said what the issue was, with the explanation second. Maybe something like Exception message: In function f: Yield in @rule without assignment not at end of a series of statements.

@cosmicexplorer
Copy link
Contributor Author

@baroquebobcat I thought that was a great idea and I implemented that in b565291, with some small changes -- one, to remove the negation, and two to add the pronoun "the", and three to decapitalize "yield". I generally find messaging without negatives to be clearer personally, I thought it sounded better with "the", and I thought it made sense to use the lowercase since we specifically refer to the keyword "yield". I recognize the insertion of "the" is a tad inconsistent since we still have "yield in @rule" at the start, but I thought adding "an" there would make the message too busy. Since there's no rush to merge this and it describes a somewhat-complex error condition I would like to know if you have further input -- feel free to use the suggested edit feature, or not, as necessary.

Copy link
Contributor

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great :) Thanks!

Yield(value=Call(func=Name(id='A', ctx=Load()), args=[], keywords=[]{}))
""".format('' if PY3 else ', starargs=None, kwargs=None'))
with self.assertRaisesRegexp(_RuleVisitor.YieldVisitError, expected_rx_str):
with self.assertRaisesRegexp(_RuleVisitor.YieldVisitError, re.escape('yield A()')):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this test case because I'm actually ok with the code:

@rule(A, [])
def f():
  yield A()
  return

how would you feel about making this not error? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be exceedingly easy! I think @stuhood didn't have an issue with making this not error, but just didn't want to require the final return (and I was being lazy in the last PR because I wanted to get this one in, and forgot to add this capability back). Will do.

Copy link
Member

@stuhood stuhood Jan 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@illicitonion : I'm not a fan of (optional) explicit return, and definitely not a fan of required explicit return.

Given the restriction that is applied here (that the yield is always a trailing statement in a block), I don't think accidental early return is likely. And there is nothing illegal about a coroutine that is "partially consumed". The semantics of "if I yield a Get, I should expect to resume, otherwise, I can expect to exit" doesn't take much time to pick up... and the check that Danny has added helps to ensure that you don't get those swapped.

Copy link
Contributor Author

@cosmicexplorer cosmicexplorer Jan 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I also agree with the above, with the viewpoint that since we are not writing normal python code, I actually don't want it to look like a normal python coroutine, and I feel ok about that because of the checking here.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: the python is 100% valid and normal.

If I write a coroutine that generates an infinite stream of fibonacci numbers, and then only consume part of it, that's no less a valid coroutine or coroutine consumer than something that consumes a coroutine until it is given the result value it wants.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was that -- the yield A() part is what returns control to the engine, not the return below it, so allowing a return below it that isn't actually involved in any control flow feels very confusing to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh -- maybe that's what you meant. My python experience is showing....

Copy link
Contributor Author

@cosmicexplorer cosmicexplorer Jan 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@illicitonion I'm interpreting this conversation as "not allowing yield then return" for now, but it is extremely easy to add that capability afterwards (it's always easier to restrict then relax than the other way around, or something), as demonstrated by the previous iteration of the previous PR #7019 in which that was enforced (before it was changed back).

tests/python/pants_test/engine/test_rules.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants