Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logical_line for call with f-string can result in invalid ast under Python 3.12 #1948

Closed
stephenfin opened this issue Jul 31, 2024 · 6 comments · Fixed by #1949
Closed

logical_line for call with f-string can result in invalid ast under Python 3.12 #1948

stephenfin opened this issue Jul 31, 2024 · 6 comments · Fixed by #1949
Milestone

Comments

@stephenfin
Copy link
Contributor

how did you install flake8?

$ pip install flake8

unmodified output of flake8 --bug-report

{}

describe the problem

what I expected to happen

Some of the plugins in hacking use a combo of ast.parse and logical_line to generate an AST for an individual line that is then fed into an ast.NodeVisitor subclass. These work just fine on Python 3.11 and earlier, but I've noticed they fail under specific circumstances on Python 3.12. I've included a minimal reproducer below. There's a chance I am "holding it wrong" but I'd like to confirm this first.

sample code

test.py

import unittest


class TestFoo(unittest.TestCase):

    def test_foo(self):
        response_key = 'foo'
        response_val = 'bar'
        body_output = ''
        self.assertEqual(
            f'{{"{response_key}": "{response_val}"}}', body_output
        )

checks.py:

class NoneArgChecker(ast.NodeVisitor):
    def __init__(self, func_name, num_args=2):
        self.func_name = func_name
        self.num_args = num_args
        self.none_found = False

    def visit_Call(self, node):
        # snipped
        ...


def foo(logical_line, noqa):
    if noqa:
        return

    for func_name in (
        'assertEqual', 'assertIs', 'assertNotEqual', 'assertIsNot'
    ):
        try:
            start = logical_line.index('.%s(' % func_name) + 1
        except ValueError:
            continue
        checker = NoneArgChecker(func_name)
        # print(logical_line)
        checker.visit(ast.parse(logical_line))
        continue
        if checker.none_found:
            yield start, "H203: Use assertIs(Not)None to check for None"

tox.ini:

[flake8:local-plugins]
extension =
    X123 = checks:foo
paths = .

commands ran

$ flake8 test.py
wow.py:1:23: E999 SyntaxError: invalid syntax. Perhaps you forgot a comma?

If I comment out the print statement, I see different results for Python 3.12 compared Pythons 3.10 and 3.11. Under 3.12:

self.assertEqual(f'x{x{response_key}xxxx{response_val}xx}', body_output)

Under 3.10 and 3.11:

self.assertEqual(f'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', body_output)

additional information

The E999 error is associated with line 1 of the file, despite the error actually coming from a plugin. This led me on a wild goose chase as I tried to figure out why flake8 thought my file was invalid Python but Python itself did not. I suspect this might also be user error and I should be handling the potential exception from ast.parse in my plugin but again, I just wanted to confirm that this was expected practice.

@stephenfin stephenfin changed the title logical_line with f-string can result in invalid ast logical_line for call with f-string can result in invalid ast under Python 3.12 Jul 31, 2024
@stephenfin
Copy link
Contributor Author

stephenfin commented Jul 31, 2024

I think this may be a bug introduced in 43266a2. Simplifying the above reproducer:

test.py

key = 'foo'
val = 'bar'
print(f'{{"{key}": "{val}"}}'

checks.py

def foo(logical_line):
    print(logical_line)

If we run this we get:

flake8 test.py
key = 'xxx'
val = 'xxx'
print(f'x{x{key}xxxx{val}xx}')

I think we want to get:

flake8 test.py
key = 'xxx'
val = 'xxx'
print(f'xxxxxxxxxxxxxxxxxxxx')

Right?

openstack-mirroring pushed a commit to openstack/hacking that referenced this issue Jul 31, 2024
We parse 'logical_line's in a couple of extensions. There is currently a
potential bug in flake8 [1] that means these lines are not valid Python.
While we wait on a fix, simply skip these lines.

[1] PyCQA/flake8#1948

Change-Id: Ia0f2d729ee48f85afaa58ddb6e983e12d4f298a2
Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
openstack-mirroring pushed a commit to openstack/openstack that referenced this issue Jul 31, 2024
* Update hacking from branch 'master'
  to 634cb78484a9c58f067c2df8cba7f49458c82043
  - Ignore SyntaxError exceptions
    
    We parse 'logical_line's in a couple of extensions. There is currently a
    potential bug in flake8 [1] that means these lines are not valid Python.
    While we wait on a fix, simply skip these lines.
    
    [1] PyCQA/flake8#1948
    
    Change-Id: Ia0f2d729ee48f85afaa58ddb6e983e12d4f298a2
    Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
@asottile
Copy link
Member

nope! the method is meant to redact string contents and not code

@stephenfin
Copy link
Contributor Author

Then we should have:

print(f'xxx{key}xxxx{val}xxx')

?

@asottile
Copy link
Member

ah yes I see the quirk. ugh. the FSTRING_MIDDLE token is misleading for curly braces -- should be an easy fix (pycodestyle will likely need the same fix)

want to try a patch?

stephenfin added a commit to stephenfin/flake8 that referenced this issue Aug 1, 2024
To use a curly brace in an f-string, you must escape it. For example:

  >>> k = 1
  >>> f'{{{k}'
  '{1'

Saving this as a script and running the 'tokenize' module higlights
something odd around our counting:

  ❯ python -m tokenize wow.py
  0,0-0,0:            ENCODING       'utf-8'
  1,0-1,1:            NAME           'k'
  1,2-1,3:            OP             '='
  1,4-1,5:            NUMBER         '1'
  1,5-1,6:            NEWLINE        '\n'
  2,0-2,2:            FSTRING_START  "f'"
  2,2-2,3:            FSTRING_MIDDLE '{'     # <-- here...
  2,4-2,5:            OP             '{'     # <-- and here
  2,5-2,6:            NAME           'k'
  2,6-2,7:            OP             '}'
  2,7-2,8:            FSTRING_END    "'"
  2,8-2,9:            NEWLINE        '\n'
  3,0-3,0:            ENDMARKER      ''

The FSTRING_MIDDLE character we have is the escaped/post-parse single
curly brace rather than the raw double curly brace, however, while our
end index of this token accounts for the parsed form, the start index of
the next token does not (put another way, it jumps from 3 -> 4). This
triggers some existing, unrelated code that we need to bypass. Do just
that.

Signed-off-by: Stephen Finucane <stephen@that.guru>
stephenfin added a commit to stephenfin/flake8 that referenced this issue Aug 1, 2024
To use a curly brace in an f-string, you must escape it. For example:

  >>> k = 1
  >>> f'{{{k}'
  '{1'

Saving this as a script and running the 'tokenize' module higlights
something odd around our counting:

  ❯ python -m tokenize wow.py
  0,0-0,0:            ENCODING       'utf-8'
  1,0-1,1:            NAME           'k'
  1,2-1,3:            OP             '='
  1,4-1,5:            NUMBER         '1'
  1,5-1,6:            NEWLINE        '\n'
  2,0-2,2:            FSTRING_START  "f'"
  2,2-2,3:            FSTRING_MIDDLE '{'     # <-- here...
  2,4-2,5:            OP             '{'     # <-- and here
  2,5-2,6:            NAME           'k'
  2,6-2,7:            OP             '}'
  2,7-2,8:            FSTRING_END    "'"
  2,8-2,9:            NEWLINE        '\n'
  3,0-3,0:            ENDMARKER      ''

The FSTRING_MIDDLE character we have is the escaped/post-parse single
curly brace rather than the raw double curly brace, however, while our
end index of this token accounts for the parsed form, the start index of
the next token does not (put another way, it jumps from 3 -> 4). This
triggers some existing, unrelated code that we need to bypass. Do just
that.

Signed-off-by: Stephen Finucane <stephen@that.guru>
Closes: PyCQA#1948
stephenfin added a commit to stephenfin/flake8 that referenced this issue Aug 1, 2024
To use a curly brace in an f-string, you must escape it. For example:

  >>> k = 1
  >>> f'{{{k}'
  '{1'

Saving this as a script and running the 'tokenize' module higlights
something odd around our counting:

  ❯ python -m tokenize wow.py
  0,0-0,0:            ENCODING       'utf-8'
  1,0-1,1:            NAME           'k'
  1,2-1,3:            OP             '='
  1,4-1,5:            NUMBER         '1'
  1,5-1,6:            NEWLINE        '\n'
  2,0-2,2:            FSTRING_START  "f'"
  2,2-2,3:            FSTRING_MIDDLE '{'     # <-- here...
  2,4-2,5:            OP             '{'     # <-- and here
  2,5-2,6:            NAME           'k'
  2,6-2,7:            OP             '}'
  2,7-2,8:            FSTRING_END    "'"
  2,8-2,9:            NEWLINE        '\n'
  3,0-3,0:            ENDMARKER      ''

The FSTRING_MIDDLE character we have is the escaped/post-parse single
curly brace rather than the raw double curly brace, however, while our
end index of this token accounts for the parsed form, the start index of
the next token does not (put another way, it jumps from 3 -> 4). This
triggers some existing, unrelated code that we need to bypass. Do just
that.

Signed-off-by: Stephen Finucane <stephen@that.guru>
Closes: PyCQA#1948
@stephenfin
Copy link
Contributor Author

want to try a patch?

Done. Apologies in advance for the fuzzy wording: I have no idea what the below code is intended to account for:

if previous_row:
(start_row, start_column) = start
if previous_row != start_row:
row_index = previous_row - 1
column_index = previous_column - 1
previous_text = self.lines[row_index][column_index]
if previous_text == "," or (
previous_text not in "{[(" and text not in "}])"
):
text = f" {text}"
elif previous_column != start_column:
text = line[previous_column:start_column] + text

(and that's after git blameing my way back as far as 23c9091...)

stephenfin added a commit to stephenfin/flake8 that referenced this issue Aug 1, 2024
To use a curly brace in an f-string, you must escape it. For example:

  >>> k = 1
  >>> f'{{{k}'
  '{1'

Saving this as a script and running the 'tokenize' module highlights
something odd around the counting of tokens:

  ❯ python -m tokenize wow.py
  0,0-0,0:            ENCODING       'utf-8'
  1,0-1,1:            NAME           'k'
  1,2-1,3:            OP             '='
  1,4-1,5:            NUMBER         '1'
  1,5-1,6:            NEWLINE        '\n'
  2,0-2,2:            FSTRING_START  "f'"
  2,2-2,3:            FSTRING_MIDDLE '{'     # <-- here...
  2,4-2,5:            OP             '{'     # <-- and here
  2,5-2,6:            NAME           'k'
  2,6-2,7:            OP             '}'
  2,7-2,8:            FSTRING_END    "'"
  2,8-2,9:            NEWLINE        '\n'
  3,0-3,0:            ENDMARKER      ''

The FSTRING_MIDDLE character we have is the escaped/post-parse single
curly brace rather than the raw double curly brace, however, while our
end index of this token accounts for the parsed form, the start index of
the next token does not (put another way, it jumps from 3 -> 4). This
triggers some existing, unrelated code that we need to bypass. Do just
that.

Signed-off-by: Stephen Finucane <stephen@that.guru>
Closes: PyCQA#1948
@sigmavirus24
Copy link
Member

want to try a patch?

Done. Apologies in advance for the fuzzy wording: I have no idea what the below code is intended to account for:

if previous_row:
(start_row, start_column) = start
if previous_row != start_row:
row_index = previous_row - 1
column_index = previous_column - 1
previous_text = self.lines[row_index][column_index]
if previous_text == "," or (
previous_text not in "{[(" and text not in "}])"
):
text = f" {text}"
elif previous_column != start_column:
text = line[previous_column:start_column] + text

(and that's after git blameing my way back as far as 23c9091...)

Not looking at where you blamed back to but a bunch of this was implemented to keep compat with pycodestyle. If you want reasoning it's likely in that project

stephenfin added a commit to stephenfin/flake8 that referenced this issue Aug 2, 2024
To use a curly brace in an f-string, you must escape it. For example:

  >>> k = 1
  >>> f'{{{k}'
  '{1'

Saving this as a script and running the 'tokenize' module highlights
something odd around the counting of tokens:

  ❯ python -m tokenize wow.py
  0,0-0,0:            ENCODING       'utf-8'
  1,0-1,1:            NAME           'k'
  1,2-1,3:            OP             '='
  1,4-1,5:            NUMBER         '1'
  1,5-1,6:            NEWLINE        '\n'
  2,0-2,2:            FSTRING_START  "f'"
  2,2-2,3:            FSTRING_MIDDLE '{'     # <-- here...
  2,4-2,5:            OP             '{'     # <-- and here
  2,5-2,6:            NAME           'k'
  2,6-2,7:            OP             '}'
  2,7-2,8:            FSTRING_END    "'"
  2,8-2,9:            NEWLINE        '\n'
  3,0-3,0:            ENDMARKER      ''

The FSTRING_MIDDLE character we have is the escaped/post-parse single
curly brace rather than the raw double curly brace, however, while our
end index of this token accounts for the parsed form, the start index of
the next token does not (put another way, it jumps from 3 -> 4). This
triggers some existing, unrelated code that we need to bypass. Do just
that.

Signed-off-by: Stephen Finucane <stephen@that.guru>
Closes: PyCQA#1948
asottile pushed a commit to stephenfin/flake8 that referenced this issue Aug 4, 2024
To use a curly brace in an f-string, you must escape it. For example:

  >>> k = 1
  >>> f'{{{k}'
  '{1'

Saving this as a script and running the 'tokenize' module highlights
something odd around the counting of tokens:

  ❯ python -m tokenize wow.py
  0,0-0,0:            ENCODING       'utf-8'
  1,0-1,1:            NAME           'k'
  1,2-1,3:            OP             '='
  1,4-1,5:            NUMBER         '1'
  1,5-1,6:            NEWLINE        '\n'
  2,0-2,2:            FSTRING_START  "f'"
  2,2-2,3:            FSTRING_MIDDLE '{'     # <-- here...
  2,4-2,5:            OP             '{'     # <-- and here
  2,5-2,6:            NAME           'k'
  2,6-2,7:            OP             '}'
  2,7-2,8:            FSTRING_END    "'"
  2,8-2,9:            NEWLINE        '\n'
  3,0-3,0:            ENDMARKER      ''

The FSTRING_MIDDLE character we have is the escaped/post-parse single
curly brace rather than the raw double curly brace, however, while our
end index of this token accounts for the parsed form, the start index of
the next token does not (put another way, it jumps from 3 -> 4). This
triggers some existing, unrelated code that we need to bypass. Do just
that.

Signed-off-by: Stephen Finucane <stephen@that.guru>
Closes: PyCQA#1948
@asottile asottile added this to the 7.1.1 milestone Aug 4, 2024
stephenfin added a commit to stephenfin/keystoneauth that referenced this issue Sep 10, 2024
We disable the E203 (whitespace before ':') and E501 (line too long)
linter rules since these conflict with ruff-format. We also rework a
statement in 'keystoneauth1/tests/unit/test_session.py' since it's
triggering a bug in flake8 [1] that is currently (at time of authoring)
unresolved.

[1] PyCQA/flake8#1948

Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Change-Id: Ief5c1c57d1e72db9fc881063d4c7e1030e76da43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants