Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #4761: Emit used-before-assignment where single assignment only made in except blocks #5402

Merged
merged 16 commits into from
Dec 11, 2021

Conversation

jacobtylerwalls
Copy link
Member

@jacobtylerwalls jacobtylerwalls commented Nov 26, 2021

Type of Changes

Type
βœ“ πŸ› Bug fix

Description

Before
If an assignment only took place in an except block, later use of the name outside the except did not raise used-before-assignment.

Now
As long as only one assignment for the name is in the except block, the message is raised.

Closes #4761

Relates to #5384 (we may want to cherry-pick the first commit and rewrite slightly)

@jacobtylerwalls jacobtylerwalls changed the title Fix #4761: Emit used-before-assignment where single assignment only made in except blocks Fix #4761: Emit used-before-assignment where single assignment only made in except blocks Nov 26, 2021
@coveralls
Copy link

coveralls commented Nov 26, 2021

Pull Request Test Coverage Report for Build 1547071143

  • 15 of 15 (100.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.005%) to 93.701%

Totals Coverage Status
Change from base Build 1541896281: 0.005%
Covered Lines: 14177
Relevant Lines: 15130

πŸ’› - Coveralls

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, rest looks good.

However, I think this should be blocked by #5241. We really need that merged first before changing stuff in visit_name. Otherwise we'll need to keep updating the refactor and we might miss newly added lines.

@Pierre-Sassoulas
Copy link
Member

Hmm yes let me review #5241, I think I messed up with not doing it for 2.12 because this will pretty much change the whole implementation for this once we merge.

@Pierre-Sassoulas
Copy link
Member

@jacobtylerwalls as someone who used the previous code, your input would be valuable for #5241. If you have the time to review it too I'd appreciate that.

@jacobtylerwalls
Copy link
Member Author

Small comment, rest looks good.

No problem. I'm having trouble finding the comment, though. Can you post it?

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said in https://github.com/PyCQA/pylint/pull/5384/files#r757821102 let's do this one before #5384.

Now that the refactor is finally done we can move ahead with this one πŸ˜„

@@ -1221,7 +1251,14 @@ def visit_name(self, node: nodes.Name) -> None:
"undefined-variable", node=node, args=node.name
)

current_consumer.mark_as_consumed(node.name, found_nodes)
# Some nodes were filtered out because they execute only on success
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly like how this works, as any new contributor needs to read about this to understand this. It is not as self-explanatory as to_consume and consumed. However, I don't know of a better way to do this. Perhaps we could improve on the name? I'm not sure.. I'll think about this some more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about consumed_conditionally?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either works for me. @Pierre-Sassoulas do you have an opinion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'maybe_consumed' ? I don't have en opinion either :)

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ‘

I think these are my last comments. I'll do a final review after your revision.

I wonder if any other contributors can come up with a more self-explanatory name for consumed_uncertain..

@Pierre-Sassoulas Pierre-Sassoulas added Bug πŸͺ² False Negative πŸ¦‹ No message is emitted but something is wrong with the code Control flow Requires control flow understanding labels Dec 3, 2021
@Pierre-Sassoulas Pierre-Sassoulas added this to the 2.13.0 milestone Dec 3, 2021
jacobtylerwalls and others added 2 commits December 3, 2021 09:50
Co-authored-by: DaniΓ«l van Noord <13665637+DanielNoord@users.noreply.github.com>
@Pierre-Sassoulas
Copy link
Member

This need to be reviewed and merged before #5384

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jacobtylerwalls

Just one final comment! The docstrings and comments really do look good!

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jacobtylerwalls !

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice MR, well documented, well tested, and it fix the try/except logic to something most user would expect imo.

@cdce8p
Copy link
Member

cdce8p commented Dec 11, 2021

Came across a regression case during testing -> #5500

@jacobtylerwalls jacobtylerwalls deleted the variable-in-except branch December 11, 2021 13:40
@martimlobao
Copy link

martimlobao commented Jan 14, 2022

@jacobtylerwalls @DanielNoord Not sure if I should open a ticket, but was testing the latest version of main and came across a new false positive which I've tracked down to this PR. Here's a MWE:

for _ in range(3):
    try:
        1 / 0
    except ZeroDivisionError as e:
        error = e  # keep error for final raise
    else:
        break
    print("retrying...")
else:
    raise error

This function works fine, but pylint now complains about a false positive Using variable 'error' before assignment (used-before-assignment) in the last line.

@jacobtylerwalls
Copy link
Member Author

Hi @martimlobao, thanks for reporting and for providing a concrete example. I've constructed a couple more examples that I think we'd have to reason about to decide on a way forward.

1.

for _ in range(3):
    try:
        [][0]
    except ZeroDivisionError as e:
        error = e  # keep error for final raise
    except:
        pass
    else:
        break
    print("retrying...")
else:
    raise error

When executed, gives:

Traceback (most recent call last):
  File "a.py", line 12, in <module>
    raise error
NameError: name 'error' is not defined

2.

for i in range(3):
    if 'some runtime condition':
        continue
    try:
        1 / 0
    except ZeroDivisionError as e:
        error = e  # keep error for final raise
    else:
        break
    print("retrying...")
else:
    raise error

When executed, gives:

  File "b.py", line 12, in <module>
    raise error
NameError: name 'error' is not defined

Of course, 1 and 2 are very similar: all that's required is some other means of finishing the inner loop than hitting the except handler you contemplated. These are runtime control flow things that I don't think Pylint is in a position to speculate about. I think it's a virtue that as things stand, you either have to define error before your loop and then check is not None before raising it, or just disable the check if you feel very confident about the design of the code. That's a judgment call: verbosity vs. how likely it is someone someone later turns the code into more like 1 or 2 and causes problems.

How does that strike you?

@martimlobao
Copy link

Of course, 1 and 2 are very similar: all that's required is some other means of finishing the inner loop than hitting the except handler you contemplated. These are runtime control flow things that I don't think Pylint is in a position to speculate about. I think it's a virtue that as things stand, you either have to define error before your loop and then check is not None before raising it, or just disable the check if you feel very confident about the design of the code. That's a judgment call: verbosity vs. how likely it is someone someone later turns the code into more like 1 or 2 and causes problems.

How does that strike you?

Thanks for the detailed response @jacobtylerwalls! The issue you're trying to fix here isn't an easy one to solve, even conceptually, so I can definitely appreciate the difficulty in balancing false positives and false negatives.

I will say though that from a user's perspective, a false positive error is probably more annoying than a false negative, especially in this case where Pylint classifies this as an actual error and not merely a convention or recommendation. Making a change because of an actual bug is fine, but making it to appease a linter that thinks there's a bug that doesn't exist may feel slightly frustrating.

Your examples illustrate well how hard it is to consider every different edge case. I'm not sure how feasible this is as a solution, but would it be possible to check every "exit branch" for a potentially undefined variable? If at least one exit path does not define the variable, then Pylint should raise the error since there's the possibility for a valid used-before-assignment error.

Let me try to illustrate what I mean with a few examples:

def undefined_variable():
    """used-before-assignment (E0601)"""
    try:
        pass  # < exit branch where some_message is *not* defined
    except ValueError:
        some_message = 'some message'  # < exit branch where some_message is defined

    if not some_message:
        return 1

    return some_message
for _ in range(3):
    try:
        do_something()  # < exit branch where error is *not* defined, should flag
    except ZeroDivisionError as e:
        error = e  # < exit branch where error is defined
    print("retrying...")
else:
    raise error
for _ in range(3):
    try:
        do_something()  # not an exit branch because of `else` below (will only exit here with an exception)
    except ZeroDivisionError as e:
        error = e  # < exit branch where error is defined
    else:
        break  # < exit condition where error is *not* defined, but will skip `raise error` below
    print("retrying...")
else:
    raise error
for _ in range(3):
    try:
        do_something()  # not an exit branch because of `else` below (will only exit here with an exception)
    except ZeroDivisionError as e:
        error = e  # < exit branch where error is defined
    except:
        pass  # < exit branch where error is *not* defined, should flag
    else:
        break  # < exit condition where error is *not* defined, but will skip `raise error` below
    print("retrying...")
else:
    raise error
for i in range(3):
    if 'some runtime condition':
        continue  # < exit branch where error is *not* defined, should flag
    try:  # < other exit branch, needs to check below if error is defined
        1 / 0
    except ZeroDivisionError as e:
        error = e  # keep error for final raise
    else:
        break
    print("retrying...")
else:
    raise error

I have to admit I have no understanding of how feasible it would be to implement this, but I believe this approach should be able to distinguish between (all? most?) scenarios where an unassigned variable exists and those where it does not.

@jacobtylerwalls
Copy link
Member Author

jacobtylerwalls commented Jan 14, 2022

Thanks for the well-constructed examples. I'm sure in a moment we'll move this discussion to a new issue.

Of your last examples, this is the one false positive:

for _ in range(3):
    try:
        do_something()  # not an exit branch because of `else` below (will only exit here with an exception)
    except ZeroDivisionError as e:
        error = e  # < exit branch where error is defined
    else:
        break  # < exit condition where error is *not* defined, but will skip `raise error` below
    print("retrying...")
else:
    raise error

But I can easily turn it into a true positive with:

for _ in range(3):
    try:
        do_something()  # not an exit branch because of `else` below (will only exit here with an exception)
    except ZeroDivisionError as e:
        error = e  # < exit branch where error is defined
    else:
        if 'runtime condition':
            continue  # < exit condition where error is *not* defined, should flag
        break  # < exit condition where error is *not* defined, but will skip `raise error` below
    print("retrying...")
else:
    raise error

So maybe this is the constraint to check for:

  • if the use of error being tested is guarded by else in for/else (i.e. the last line of these examples)
  • if the definition of error in an except block is a descendant of the same for
  • if that inner else breaks
  • if there is no continue anywhere at all underneath the for loop and before the break (other than inside another for loop, but including nested conditions and try/excepts)

@martimlobao, could you open an issue to track this? I think this specific scenario might be feasible to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug πŸͺ² Control flow Requires control flow understanding False Negative πŸ¦‹ No message is emitted but something is wrong with the code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

False negative used-before-assignment when variable is only defined in except clause
6 participants