Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

various f-string fixes #52

Conversation

isidentical
Copy link
Collaborator

@isidentical isidentical commented Mar 25, 2023

Please review individual commits. Each commit tries to reduce the number of test failures, currently:

  • 307f7d2 66 => 52
  • d9b2c26 52 => 44
  • e1ead33 44 => 42
  • 5d1e626 42 => 39 (1 case is now allowed, FYI: f"{3!s }" where the conversion character has an insignificant whitespace afterwards, see the commit description for the reasoning but I can revert it too and try to add an explicit error in the constructor).
  • c80b292 39 => 31
  • 1f47ba8 31 => 30

@isidentical isidentical force-pushed the last-fstring-status branch from d9b2c26 to e1ead33 Compare March 25, 2023 01:32
@isidentical
Copy link
Collaborator Author

Out for the day, we have ~30 remaining test failures which I was able to at least group and I think none of them too hard to solve:

  • test_conversions => f'{3!ª} is interpreted as f'{3!a} which I am not sure is a desired behavior or not (although a funny edge case since ª is normalized as a if it were a regular identifier, in any way we should be able to get the unnormalized version quickly).

  • test_conversions (x5) => f"{a!3}" or f"{a!!}" is not handled correctly (we can only reject invalid NAMEs, not other type of tokens). If there were a way to use && forced with a NAME (so we'd force the next token after ! to be NAME), I think it would solve it for us but it doesn't seem possible now. Perhaps with a new invalid rule might come handy.

  • test_missing_expression => we need to reject empty expressions inside format specifiers f'{10:{ }}'

  • test_parens_in_expressions => f'{\n}' now raises "f-string: empty expression not allowed" instead of "unterminated string literal"

  • test_compile_time_concat, test_mismatched_braces (x2), test_syntax_error_for_starred_expressions => s/expecting/expected/, s/cannot/can't/

  • Rest: the newer errors make sense (e.g. f'{=' would now raise there is an expression required before = instead of the old way of saying an } is required).

@isidentical isidentical marked this pull request as ready for review March 25, 2023 02:45
@isidentical isidentical requested a review from pablogsal as a code owner March 25, 2023 02:45
@isidentical
Copy link
Collaborator Author

cc: @lysnikolaou @pablogsal @mgmacias95

@@ -1495,7 +1495,8 @@ expr_ty _PyPegen_formatted_value(Parser *p, expr_ty expression, Token *debug, ex
if (PyUnicode_GET_LENGTH(conversion->v.Name.id) > 1 ||
!(first == 's' || first == 'r' || first == 'a')) {
RAISE_SYNTAX_ERROR_KNOWN_LOCATION(conversion,
"f-string: invalid conversion character: expected 's', 'r', or 'a'");
"f-string: invalid conversion character %R: expected 's', 'r', or 'a'",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are removing f-string: from the other tests should we remove it from here as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's also the question, should we maybe find a way to keep f-string in all the error messages that come from inside an f-string?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning was, errors that can happen in a normal code should not have f-string: in front of them (e.g. 1.X() vs f"{1.X()}") but errors that are specific to f-strings (like this one or the error we give when saying unmatched brace even though there was a previous brace f"{)}" vs "{)}") should include f-string: prefix to denote that this rule only applies within the f-string context. But am open to other ideas as well (fully removing or automatically adding f-string: at the top level error delegating part).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer to remove all the f-string: based on two things:

  • We don't mention other constructs elsewhere (as in "error in import statement") or something like that.
  • The parser doesn't really know that some syntax error is really inside an f-string. Sure, it can know what is happening inside one but it doesn't need to know how deep into it is.
  • I don't really think it helps a lot the user once is clear where the error is when is printed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's go with no f-string: then. While I agree with @isidentical's view and it makes sense, I think in this case consistency is a bit more important, so let's remove them from everywhere if you're all alright with it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's make a quick poll. React to this comment with 👍 for removing the prefix and 👎 to keep it only on f-string related errors.

Copy link

@sunmy2019 sunmy2019 Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support removing all f-string: prefix, but insert f-string somewhere in the context when it's ambiguous.


Now everything inside the { } is just as normal as everything out of the f-string.

I think the f-string exists previously is because the former parser will send the f"{123+}" into another parser and output

    (123+)
         ^
SyntaxError: f-string: invalid syntax

With the new parser, the user can now locate the error easily without telling f-string, thus we can remove all prefix.


But for other errors, such as missing the conversion letter,

SyntaxError: f-string: invalid conversion character: expected 's', 'r', or 'a'

Removing all the f-string suffix would cause some ambiguity. You cannot search for the correct answer without telling Google it's the f-string.

https://www.google.com/search?q=python+conversion+characters

I think it would be better if we tell the user

SyntaxError: invalid f-string conversion character: expected 's', 'r', or 'a'

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a very good point.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, add some context saying it's a f-string in these cases can be very helpful. I also agree on adding the context in the error message and not as a f-string: prefix.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is what we are going to do: let's move this forward and let's make another PR to "normalize" the error messages.

@isidentical
Copy link
Collaborator Author

Uh, seems like we are having a weird problem with the parser on the latest revision of fstring-grammar-rebased-after-sprint branch.

test test_fstring crashed -- Traceback (most recent call last):
  File "/home/cicero/projects/cpython/Lib/test/libregrtest/runtest.py", line 360, in _runtest_inner
    refleak = _runtest_inner2(ns, test_name)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cicero/projects/cpython/Lib/test/libregrtest/runtest.py", line 300, in _runtest_inner2
    the_module = importlib.import_module(abstest)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cicero/projects/cpython/Lib/importlib/__init__.py", line 124, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1334, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1307, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1278, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 841, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 1016, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1154, in get_code
  File "<frozen importlib._bootstrap_external>", line 1084, in source_to_code
  File "<frozen importlib._bootstrap>", line 400, in _call_with_frames_removed
ValueError: AST node column range (-1, 1) for line range (1313, 1314) is not valid
import ast
import _tokenize

src = """f'''{
3
}'''"""

for token in _tokenize.TokenizerIter(src):
    print(token)

ast.parse(src)

It might be related to extra peeks since if I add a bunch of characters after the first brace f"""{3333[...]}""" the error disappears (so it might be because tok_backup doesn't function very well?). Still investigating but any help is welcome.

@isidentical
Copy link
Collaborator Author

OK, I think I have a fix at #56.

@pablogsal
Copy link
Owner

OK, I think I have a fix at #56.

Landed the fix, good catch!

isidentical and others added 8 commits March 26, 2023 22:28
If a syntax error is not specific to the situation of an f-string
(like empty expression etc.) then it probably doesn't worth the
extra complexity for us to introduce codepaths in tokenizer in
order to add an "f-string" prefix in front of the errors.
Note: we can safely disregard whitespace after the conversion
character since it doesn't relate to neither the conversion
opener (`!`) nor the conversion character itself, it is part
of the conversion group (which I think is OK to keep as is).
Co-authored-by: Marta Gómez Macías <mgmacias@google.com>
@isidentical isidentical force-pushed the last-fstring-status branch from c5e83df to 83b2adf Compare March 26, 2023 19:31
@isidentical isidentical requested review from pablogsal, lysnikolaou and mgmacias95 and removed request for lysnikolaou March 26, 2023 19:34
if (tok->tok_mode_stack_index > 0 && opening == '{') {
assert(current_tok->bracket_stack >= 0);
int previous_bracket = current_tok->bracket_stack - 1;
if (previous_bracket == current_tok->bracket_mark[current_tok->bracket_mark_index]) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe we could make a macro to simplify current_tok->bracket_mark[current_tok->bracket_mark_index].

@pablogsal
Copy link
Owner

OK, I think I have a fix at #56.

This has been landed

@pablogsal pablogsal merged commit a0edfb1 into pablogsal:fstring-grammar-rebased-after-sprint Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants