Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RET504 false-positive when variable built up with assignments and returned #2950

Closed
gdub opened this issue Feb 16, 2023 · 23 comments · Fixed by #3393
Closed

RET504 false-positive when variable built up with assignments and returned #2950

gdub opened this issue Feb 16, 2023 · 23 comments · Fixed by #3393
Labels
bug Something isn't working

Comments

@gdub
Copy link

gdub commented Feb 16, 2023

Example:

def get_options(option1, option2):
    options = []
    if option1:
        options = options + ["option1"]
    if option2:
        options = options + ["option2"]
    return options

Output:

$ ruff check --isolated --select RET ret_test.py 
ret_test2.py:16:12: RET504 Unnecessary variable assignment before `return` statement
Found 1 error.

In this case, there is no unnecessary variable assignment that could be removed and cannot do an earlier return as need to get through all the conditionals.

That said, the error does not trigger if one were to instead use an in-place operator such as +=:

def get_options(option1, option2):
    options = []
    if option1:
        options += ["option1"]
    if option2:
        options += ["option2"]
    return options

Version:

$ ruff --version
ruff 0.0.247
@charliermarsh charliermarsh added the bug Something isn't working label Feb 16, 2023
@charliermarsh
Copy link
Member

Possibly not respecting overrides, does look like a bug.

@henryiii
Copy link
Contributor

Just noticed this too!

def f(y):
    x = 1
    if y:
        x = 2

    return x
$ ruff tmp.py --isolated --select RET
tmp.py:6:12: RET504 Unnecessary variable assignment before `return` statement
Found 1 error.

It seems the RET checks don't handle branches very well (the "else after" ones also are broken when you have a chain).

@charliermarsh
Copy link
Member

It may wants you to do this, even though it's awkward:

def f(y):
    if y:
        return 2

    return 1

the "else after" ones also are broken when you have a chain

Is this on main? I removed enforcement for elif chains in the most recent release.

@charliermarsh
Copy link
Member

(I find these rules really tedious lol. Maybe they should just be removed.)

@henryiii
Copy link
Contributor

They will come back again when adding pylint rules. :) I like these in pylint a lot, which I is why I'm happy to see them showing up in a faster pre-commit friendly form, but the pylint version is solid and doesn't trigger falsely, while RET seems to not properly handle branching (I expect it's a problem with the flake8 plugin).

It may wants you to do this, even though it's awkward

It's not practical in the actual code that caused me to look into this. The fix should be to just only enforce x = value; return value and not if the assignment is in an if statement (with statement is 90% of the time fine, just not if statements)

@henryiii
Copy link
Contributor

I haven't rerun with the most recent release yet for the RET 507 and similar rules, quite possibly fixed!

@charliermarsh
Copy link
Member

Ah yeah, I mostly meant that I have issues with the implementation. It's a direct port of the flake8 plugin, and it just has a bunch of limitations (in addition to being hard to reason about).

@henryiii
Copy link
Contributor

henryiii commented Feb 16, 2023

Ahh, I see what you mean.

def f(y):
    x = 1
    if y:
        return 2

    return x

Would pass. I could live with that in some/most cases.

@charliermarsh
Copy link
Member

I think it will still complain about x=1 :(

@henryiii
Copy link
Contributor

In a realistic example (where other things are happening to the x before the final if), it seems pretty happy. I didn't end up with any false positives that I couldn't handle knowing that it was fine just changing the assignment in the if to a return. There were several when converting Plumbum and scikit-hep/particle@10caa90 was the one that prompted me to search for this issue.

@martinlehoux
Copy link
Contributor

martinlehoux commented Mar 4, 2023

Just to add one more (almost) real life example: I have this pattern that seems reasonable in Django to build a queryset based on different inputs

def get_queryset(option_1, option_2):
    queryset: Any = None

    queryset = queryset.filter(a=1)

    if option_1:
        queryset = queryset.annotate(b=Value(2))

    if option_2:
        queryset = queryset.filter(c=3)

    return queryset

main.py:15:12: RET504 Unnecessary variable assignment before return statement

@henryiii
Copy link
Contributor

henryiii commented Mar 4, 2023

This would be how you’d make it happy:

def get_queryset(option_1, option_2):
    queryset: Any = None

    queryset = queryset.filter(a=1)

    if option_1:
        queryset = queryset.filter(b=2)

    if option_2:
        return queryset.filter(c=3)

    return queryset

Not quite as symmetric, but not really “worse” from a readers standpoint either. I’d probably disable the check if this was a lot of very symmetric conditions (like you have described), though in practice most of them are not that symmetric and are fine written with the extra return. Overriding a variable is a bit ugly, so avoiding an unnecessary one isn’t that bad.

@gdub
Copy link
Author

gdub commented Mar 6, 2023

Personally, I dislike the resulting work-around because:

  • The lack of symmetry feels unnatural.
  • If you need to add an additional condition block, you would have churn turning the return statement back to an assignment.
  • It does not save any lines, as is the case for simple assignment then return.

One idea would be to only flag the single-assignment and return scenarios, e.g.:

def get_queryset():
    queryset = Model.filter(a=1)
    return queryset

...and not flag scenarios where there are multiple assignments to the same variable before returning, e.g.:

def get_queryset(option_1, option_2):
    queryset = Model.filter(a=1)
    if option_1:
        queryset = queryset.filter(b=2)
    if option_2:
        queryset = queryset.filter(c=3)
    return queryset

Though, I guess that would also mean that something like this would pass not get flagged:

def get_queryset():
    queryset = Model.filter(a=1)
    queryset = queryset.filter(c=3)
    return queryset

Which raises the question of what is it that this rule is really meant to be targeting? Cases where would save a line of code due to an unnecessary assignment? Should check be simplified to only look for the simple assign/return case that happens at the same nesting level, and ignore if there is more complex branching/looping involved?

@Czaki
Copy link
Contributor

Czaki commented Mar 6, 2023

Though, I guess that would also mean that something like this would pass not get flagged:

If the rule should only be ignored if the assignment is under if clause.

@bellini666
Copy link

I agree with @gdub . In cases similar to the one mentioned by him, I usually do # noqa: RET504, but it would be very good if that kind of usage got exempted from the rule.

Don't know if it is simple for ruff to detect it though =P

@charliermarsh
Copy link
Member

The multiple assignments heuristic seems like it could work... pretty well? In my personal opinion I actually prefer this to inlining return queryset.filter(c=3):

def get_queryset():
    queryset = Model.filter(a=1)
    queryset = queryset.filter(c=3)
    return queryset

It also has the nice property that if you add another operation to the fluent chain, the diff much clearer. But, it's all a bit subjective of course.

I think the intent of the rule is to capture "trivial" errors like:

x = 1
return x

Or:

x = 1
print("Got 1")
return x

Part of me is like... is this even a rule worth having? I wish I had data on how many people are using it and how often it's # noqa'd or ignored, etc.

@NeilGirdhar
Copy link

NeilGirdhar commented Mar 6, 2023

I wish I had data on how many people are using it and how often it's # noqa'd or ignored, etc.

Can you run Ruff on a corpus of projects and gather rule hit counts and a random sampling of examples? MyPy does something like that with its MyPy Primer, which they use to evaluate pull requests. It might be nice to see how various changes make rules more or less sensitive, and examples of lines that would be newly flagged, and also examples of lines would no longer be flagged.

@charliermarsh
Copy link
Member

I'd like to do something like this: index projects using Ruff, and get some data on their configurations (and on how often they noqa various rules).

@gdub
Copy link
Author

gdub commented Mar 6, 2023

Interesting looking through some results from:
https://github.com/search?q=noqa%3A+RET504&type=code
https://github.com/search?q=RET504&type=Code

Shows up in some ignores with comments, e.g.:

# RET504: Unnecessary variable before return (ugly)
    # suppress what appears to be false-positives with these checks
    'RET504',
    'RET505',
    'RET506',
    'RET507',
    'RET508',

See many examples with similar chaining-style assignment with conditionals that we've touched on already.

Here's an interesting case where variable is assigned to, referenced, and then returned:
https://github.com/KyleKing/pytest_cache_assert/blob/47d0b538528a9c9cdbc182d2b276462a2aa0fd4d/tests/test_flask.py#L20

def _create_app() -> Flask:
    app = Flask(__name__)

    @app.route('/hello')
    def hello() -> str:
        """Hello endpoint."""
        return 'Hello, World!'

    return app  # noqa: RET504

Seems like something that shouldn't be flagged.

Perhaps instead of an assignment count, heuristic could be reference count. In other words, excluding return statement, flag if the only reference was assignment.

@charliermarsh
Copy link
Member

This PR productionizes that suggestion (avoid RET504 for variables with multiple assignments): #3393.

Feedback welcome, I won't be merging tonight.

@charliermarsh
Copy link
Member

(Fixed the decorator thing in #3395, separate bug!)

@charliermarsh
Copy link
Member

Ok, I looked through the GitHub Code Search, and I do see a lot of examples of this kind of "fluent interface"-like assignment style, e.g.:

command = self.extract_command(text)
self.validate_prefix(command=command)
await self.validate_mention(bot=bot, command=command)
command = self.validate_command(command)
command = self.do_magic(command=command)
return command  # noqa: RET504

This strikes me as very reasonable code, so I'm gonna go ahead and merge that PR.

@henryiii
Copy link
Contributor

henryiii commented Mar 9, 2023

Personally, like moving the return up one, reducing the number of variable overwrites there (and that sort of code isn't very good for static typing unless every command returns the same thing!)

I don't see any problem with:

command = self.extract_command(text)
self.validate_prefix(command=command)
await self.validate_mention(bot=bot, command=command)
command = self.validate_command(command)
return self.do_magic(command=command)

In fact, there's only one overwrite now, so I'd actually do:

command = self.extract_command(text)
self.validate_prefix(command=command)
await self.validate_mention(bot=bot, command=command)
validated_command = self.validate_command(command)
return self.do_magic(command=validated_command)

IMO, this now reads much better. The point of style checks is to force better style, not to try not to bother anyone or conform to what everyone is doing if it's wrong. I think the biggest problem with this check was that it wasn't clear how to "fix" it from the message (and also some false positives, like the function false positive above). And it's totally fine to disable a style check you don't agree with! Whenever I talk about style checking, I emphasize this (especially WRT PyLint).

Also, one of the points of this sort of check is to help two people writing the same logic get the same code. Returning without assigning is perfectly valid, so it's reasonable to force this rather than force an assignment always before return.

For example, I think this would have read much better without an extra assignment: https://github.com/scikit-build/scikit-build-core/pull/197/files#diff-5a9598893dbb4007601522cfb26b27c92d790c18bf35d7bfe86205ae1955fa0bR47-R48

I'm not sure why Ruff didn't find that, as I'd guess #3393 wasn't in the version of Ruff that ran on that, but I wish it did.

charliermarsh added a commit that referenced this issue Jun 10, 2023
## Summary

The `RET504` rule, which looks for unnecessary assignments before return
statements, is a frequent source of issues (#4173, #4236, #4242, #1606,
#2950). Over time, we've tried to refine the logic to handle more cases.
For example, we now avoid analyzing any functions that contain any
function calls or attribute assignments, since those operations can
contain side effects (and so we mark them as a "read" on all variables
in the function -- we could do a better job with code graph analysis to
handle this limitation, but that'd be a more involved change.) We also
avoid flagging any variables that are the target of multiple
assignments. Ultimately, though, I'm not happy with the implementation
-- we just can't do sufficiently reliable analysis of arbitrary code
flow given the limited logic herein, and the existing logic is very hard
to reason about and maintain.

This PR refocuses the rule to only catch cases of the form:

```py
def f():
    x = 1
    return x
```

That is, we now only flag returns that are immediately preceded by an
assignment to the returned variable. While this is more limiting, in
some ways, it lets us flag more cases vis-a-vis the previous
implementation, since we no longer "fully eject" when functions contain
function calls and other effect-ful operations.

Closes #4173.

Closes #4236.

Closes #4242.
charliermarsh added a commit that referenced this issue Jun 10, 2023
## Summary

This PR extends the new `RET504` implementation to handle cases like:

```py
def foo():
    with open("foo.txt", "r") as f:
        x = f.read()
    return x
```

This was originally suggested in
#2950 (comment).
konstin pushed a commit that referenced this issue Jun 13, 2023
## Summary

The `RET504` rule, which looks for unnecessary assignments before return
statements, is a frequent source of issues (#4173, #4236, #4242, #1606,
#2950). Over time, we've tried to refine the logic to handle more cases.
For example, we now avoid analyzing any functions that contain any
function calls or attribute assignments, since those operations can
contain side effects (and so we mark them as a "read" on all variables
in the function -- we could do a better job with code graph analysis to
handle this limitation, but that'd be a more involved change.) We also
avoid flagging any variables that are the target of multiple
assignments. Ultimately, though, I'm not happy with the implementation
-- we just can't do sufficiently reliable analysis of arbitrary code
flow given the limited logic herein, and the existing logic is very hard
to reason about and maintain.

This PR refocuses the rule to only catch cases of the form:

```py
def f():
    x = 1
    return x
```

That is, we now only flag returns that are immediately preceded by an
assignment to the returned variable. While this is more limiting, in
some ways, it lets us flag more cases vis-a-vis the previous
implementation, since we no longer "fully eject" when functions contain
function calls and other effect-ful operations.

Closes #4173.

Closes #4236.

Closes #4242.
konstin pushed a commit that referenced this issue Jun 13, 2023
## Summary

This PR extends the new `RET504` implementation to handle cases like:

```py
def foo():
    with open("foo.txt", "r") as f:
        x = f.read()
    return x
```

This was originally suggested in
#2950 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants