various f-string fixes #52

isidentical · 2023-03-25T00:51:21Z

Please review individual commits. Each commit tries to reduce the number of test failures, currently:

307f7d2 66 => 52
d9b2c26 52 => 44
e1ead33 44 => 42
5d1e626 42 => 39 (1 case is now allowed, FYI: f"{3!s }" where the conversion character has an insignificant whitespace afterwards, see the commit description for the reasoning but I can revert it too and try to add an explicit error in the constructor).
c80b292 39 => 31
1f47ba8 31 => 30

isidentical · 2023-03-25T02:44:57Z

Out for the day, we have ~30 remaining test failures which I was able to at least group and I think none of them too hard to solve:

test_conversions => f'{3!ª} is interpreted as f'{3!a} which I am not sure is a desired behavior or not (although a funny edge case since ª is normalized as a if it were a regular identifier, in any way we should be able to get the unnormalized version quickly).
test_conversions (x5) => f"{a!3}" or f"{a!!}" is not handled correctly (we can only reject invalid NAMEs, not other type of tokens). If there were a way to use && forced with a NAME (so we'd force the next token after ! to be NAME), I think it would solve it for us but it doesn't seem possible now. Perhaps with a new invalid rule might come handy.
test_missing_expression => we need to reject empty expressions inside format specifiers f'{10:{ }}'
test_parens_in_expressions => f'{\n}' now raises "f-string: empty expression not allowed" instead of "unterminated string literal"
test_compile_time_concat, test_mismatched_braces (x2), test_syntax_error_for_starred_expressions => s/expecting/expected/, s/cannot/can't/
Rest: the newer errors make sense (e.g. f'{=' would now raise there is an expression required before = instead of the old way of saying an } is required).

isidentical · 2023-03-25T20:40:42Z

cc: @lysnikolaou @pablogsal @mgmacias95

Lib/test/test_fstring.py

pablogsal · 2023-03-26T16:08:45Z

Parser/action_helpers.c

@@ -1495,7 +1495,8 @@ expr_ty _PyPegen_formatted_value(Parser *p, expr_ty expression, Token *debug, ex
        if (PyUnicode_GET_LENGTH(conversion->v.Name.id) > 1 ||
            !(first == 's' || first == 'r' || first == 'a')) {
            RAISE_SYNTAX_ERROR_KNOWN_LOCATION(conversion,
-                                              "f-string: invalid conversion character: expected 's', 'r', or 'a'");
+                                              "f-string: invalid conversion character %R: expected 's', 'r', or 'a'",


As we are removing f-string: from the other tests should we remove it from here as well?

I guess there's also the question, should we maybe find a way to keep f-string in all the error messages that come from inside an f-string?

My reasoning was, errors that can happen in a normal code should not have f-string: in front of them (e.g. 1.X() vs f"{1.X()}") but errors that are specific to f-strings (like this one or the error we give when saying unmatched brace even though there was a previous brace f"{)}" vs "{)}") should include f-string: prefix to denote that this rule only applies within the f-string context. But am open to other ideas as well (fully removing or automatically adding f-string: at the top level error delegating part).

I think I prefer to remove all the f-string: based on two things:

We don't mention other constructs elsewhere (as in "error in import statement") or something like that.

The parser doesn't really know that some syntax error is really inside an f-string. Sure, it can know what is happening inside one but it doesn't need to know how deep into it is.

I don't really think it helps a lot the user once is clear where the error is when is printed.

Ok, let's go with no f-string: then. While I agree with @isidentical's view and it makes sense, I think in this case consistency is a bit more important, so let's remove them from everywhere if you're all alright with it.

Ok, let's make a quick poll. React to this comment with 👍 for removing the prefix and 👎 to keep it only on f-string related errors.

I support removing all f-string: prefix, but insert f-string somewhere in the context when it's ambiguous.

Now everything inside the { } is just as normal as everything out of the f-string.

I think the f-string exists previously is because the former parser will send the f"{123+}" into another parser and output

(123+) ^ SyntaxError: f-string: invalid syntax

With the new parser, the user can now locate the error easily without telling f-string, thus we can remove all prefix.

But for other errors, such as missing the conversion letter,

SyntaxError: f-string: invalid conversion character: expected 's', 'r', or 'a'

Removing all the f-string suffix would cause some ambiguity. You cannot search for the correct answer without telling Google it's the f-string.

https://www.google.com/search?q=python+conversion+characters

I think it would be better if we tell the user

SyntaxError: invalid f-string conversion character: expected 's', 'r', or 'a'

This is actually a very good point.

agreed, add some context saying it's a f-string in these cases can be very helpful. I also agree on adding the context in the error message and not as a f-string: prefix.

Ok, this is what we are going to do: let's move this forward and let's make another PR to "normalize" the error messages.

Parser/tokenizer.c

isidentical · 2023-03-26T17:35:20Z

Uh, seems like we are having a weird problem with the parser on the latest revision of fstring-grammar-rebased-after-sprint branch.

test test_fstring crashed -- Traceback (most recent call last):
  File "/home/cicero/projects/cpython/Lib/test/libregrtest/runtest.py", line 360, in _runtest_inner
    refleak = _runtest_inner2(ns, test_name)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cicero/projects/cpython/Lib/test/libregrtest/runtest.py", line 300, in _runtest_inner2
    the_module = importlib.import_module(abstest)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cicero/projects/cpython/Lib/importlib/__init__.py", line 124, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1334, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1307, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1278, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 841, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 1016, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1154, in get_code
  File "<frozen importlib._bootstrap_external>", line 1084, in source_to_code
  File "<frozen importlib._bootstrap>", line 400, in _call_with_frames_removed
ValueError: AST node column range (-1, 1) for line range (1313, 1314) is not valid

import ast
import _tokenize

src = """f'''{
3
}'''"""

for token in _tokenize.TokenizerIter(src):
    print(token)

ast.parse(src)

It might be related to extra peeks since if I add a bunch of characters after the first brace f"""{3333[...]}""" the error disappears (so it might be because tok_backup doesn't function very well?). Still investigating but any help is welcome.

isidentical · 2023-03-26T17:47:06Z

OK, I think I have a fix at #56.

pablogsal · 2023-03-26T18:51:45Z

OK, I think I have a fix at #56.

Landed the fix, good catch!

If a syntax error is not specific to the situation of an f-string (like empty expression etc.) then it probably doesn't worth the extra complexity for us to introduce codepaths in tokenizer in order to add an "f-string" prefix in front of the errors.

Note: we can safely disregard whitespace after the conversion character since it doesn't relate to neither the conversion opener (`!`) nor the conversion character itself, it is part of the conversion group (which I think is OK to keep as is).

Co-authored-by: Marta Gómez Macías <mgmacias@google.com>

pablogsal · 2023-03-27T11:10:37Z

Parser/tokenizer.c

+            if (tok->tok_mode_stack_index > 0 && opening == '{') {
+                assert(current_tok->bracket_stack >= 0);
+                int previous_bracket = current_tok->bracket_stack - 1;
+                if (previous_bracket == current_tok->bracket_mark[current_tok->bracket_mark_index]) {


Nit: maybe we could make a macro to simplify current_tok->bracket_mark[current_tok->bracket_mark_index].

pablogsal · 2023-03-27T11:11:54Z

OK, I think I have a fix at #56.

This has been landed

isidentical force-pushed the last-fstring-status branch from d9b2c26 to e1ead33 Compare March 25, 2023 01:32

isidentical marked this pull request as ready for review March 25, 2023 02:45

isidentical requested a review from pablogsal as a code owner March 25, 2023 02:45

isidentical mentioned this pull request Mar 25, 2023

PEP 701 – Syntactic formalization of f-strings python/cpython#102856

Closed

4 tasks

mgmacias95 reviewed Mar 26, 2023

View reviewed changes

Lib/test/test_fstring.py Outdated Show resolved Hide resolved

pablogsal reviewed Mar 26, 2023

View reviewed changes

Parser/tokenizer.c Show resolved Hide resolved

isidentical and others added 8 commits March 26, 2023 22:28

Show the problematic conversion character in error

1d93766

Pass over the expression-opening braces on mismatches

9bead54

Top-level '}' should be rejected directly

66d2b76

Change test to catch DepreationWarnings for invalid escape seq

b9c519b

Update Lib/test/test_fstring.py

0623fc3

Co-authored-by: Marta Gómez Macías <mgmacias@google.com>

Add an assertion about bracket_stack >= 0

83b2adf

isidentical force-pushed the last-fstring-status branch from c5e83df to 83b2adf Compare March 26, 2023 19:31

isidentical requested review from pablogsal, lysnikolaou and mgmacias95 and removed request for lysnikolaou March 26, 2023 19:34

Solve evil C standard stuff

00f8bda

pablogsal reviewed Mar 27, 2023

View reviewed changes

pablogsal merged commit a0edfb1 into pablogsal:fstring-grammar-rebased-after-sprint Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

various f-string fixes #52

various f-string fixes #52

isidentical commented Mar 25, 2023 •

edited

Loading

isidentical commented Mar 25, 2023

isidentical commented Mar 25, 2023

pablogsal Mar 26, 2023

lysnikolaou Mar 26, 2023

isidentical Mar 26, 2023

pablogsal Mar 27, 2023

lysnikolaou Mar 27, 2023

pablogsal Mar 27, 2023

sunmy2019 Mar 27, 2023 •

edited

Loading

pablogsal Mar 27, 2023

mgmacias95 Mar 27, 2023

pablogsal Mar 28, 2023

isidentical commented Mar 26, 2023

isidentical commented Mar 26, 2023

pablogsal commented Mar 26, 2023

pablogsal Mar 27, 2023

pablogsal commented Mar 27, 2023

various f-string fixes #52

various f-string fixes #52

Conversation

isidentical commented Mar 25, 2023 • edited Loading

isidentical commented Mar 25, 2023

isidentical commented Mar 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sunmy2019 Mar 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

isidentical commented Mar 26, 2023

isidentical commented Mar 26, 2023

pablogsal commented Mar 26, 2023

Choose a reason for hiding this comment

pablogsal commented Mar 27, 2023

isidentical commented Mar 25, 2023 •

edited

Loading

sunmy2019 Mar 27, 2023 •

edited

Loading