Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] function parameter types #14802

Merged
merged 5 commits into from
Dec 6, 2024
Merged

[red-knot] function parameter types #14802

merged 5 commits into from
Dec 6, 2024

Conversation

carljm
Copy link
Contributor

@carljm carljm commented Dec 6, 2024

Summary

Inferred and declared types for function parameters, in the function body scope.

Fixes #13693.

Test Plan

Added mdtests.

Co-authored By: Alex Waygood AlexWaygood@Gmail.com

@carljm carljm added the red-knot Multi-file analysis & type inference label Dec 6, 2024
Copy link
Contributor

github-actions bot commented Dec 6, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

def f(a, b: int, c=1, d: int = 2, /, e=3, f: Literal[4] = 4, *args: object, g=5, h: Literal[6] = 6, **kwargs: str) -> bytes:
reveal_type(a) # revealed: Unknown
reveal_type(b) # revealed: int
reveal_type(c) # revealed: Unknown | Literal[1]
Copy link
Member

@MichaReiser MichaReiser Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR and I know you're aware of it.... I find it very confusing as a user that this infers to Literal[1] | Unknown, and I'd have a hard time explaining it to any colleague of why and even what it means.

Copy link
Contributor

@sharkdp sharkdp Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not the answer you are looking for, but let me try to explain the current/intended behavior, to see if my own understanding is correct.

We want to follow a philosophy where a perfectly valid un-annotated program should ideally not lead us to issue any diagnostics.

Consider this perfectly reasonable and working (untyped) program:

def f(x, a=0):
    return 2 * x * x + a


f(2.0, 1.2)

pyright issues a diagnostic here (Argument of type "float" cannot be assigned to parameter "a" of type "int") because it infers a (too) narrow parameter type of int.

We infer Unknown | Literal[0] instead, which does not raise any diagnostic. We also wouldn't raise a diagnostic if you were to call f(2.0, "foo"), but you can always achieve this by annotating your function parameter appropriately (e.g. as float). You can opt into more safety.

On the other hand of the spectrum, consider this non-crashing — but obviously error-prone — program:

def f(name, age=None):
    # lots of code
    print(f"Next year, {name} will turn {age+1}")

f("Max", 20)

mypy does not raise any diagnostic here, because it infers a (too) wide parameter type of Any for age.

We infer Unknown | NoneType instead. The meaning of this type is: an unknown set of Python objects, but at least as large as the set represented by NoneType (i.e. {None}). This allows us to issue a Operator + not supported diagnostic (not yet but eventually; only works for operators like in so far). In other words: if we see age=None, we say: we don't know what kind of objects can be passed in for the age parameter, but you are clearly showing us that None is one possibility, so we better make sure that all operations on age are compatible with NoneType.

Copy link
Contributor Author

@carljm carljm Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't say it better than @sharkdp did, in terms of explaining what this type means and why we need it.

I understand the real issue here is that users who haven't gotten such a good explanation will likely be confused by these types, because other type checkers don't use them much. I'm not yet sure how to reconcile this. It may be that we can find a more intuitive display notation, or it may be that this is just something we have to address via documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I just see myself struggling with it and am worried that people will accept that Red Knot shows these weird inferred types and maybe even understand why but without fully understanding what it means

Comment on lines 1050 to 1052
let opt_default_ty = default
.as_ref()
.map(|default| definition_expression_ty(self.db, definition, default));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an understand question. Why are we calling definition_expression_ty over infer_standalone_expression?

Do we need to merge the diagnostics from the definition_expression_ty ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we calling definition_expression_ty over infer_standalone_expression?

Because parameter default values are not marked as standalone expressions in the semantic index builder. We minimize the expressions that are marked as standalone expressions, to reduce ingredient counts. In this case we don't need them to be standalone expressions, we can use definition_expression_ty to find the right scope, infer that scope, and pull the expression type out.

Do we need to merge the diagnostics from the definition_expression_ty ?

No, in fact we shouldn't. Function parameter definitions are a little odd, in that the definition is part of the function body scope (it defines a symbol in the function body), but the expressions whose type we need here (the annotation and the default) are not part of the function body scope, they are part of the outer scope (or possibly two different outer scopes: the defaults are always in the scope the function is defined in, the annotations might be in the intervening type-parameters scope, for a generic function.) Part of the functionality of definition_expression_ty is that it can handle inferring expressions that are in some other scope. But we shouldn't try to merge diagnostics that belong to a different scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized here that definition_expression_ty is kind of over-powered for this use case, since in these cases we know for sure that the expression is not in the same scope as the definition. So I've instead introduced an expression_ty that just looks up the right scope for the expression, infers that scope, and pulls out the expression type.

Copy link
Member

@dhruvmanila dhruvmanila Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is there a use-case for having expression_ty as a standalone function instead of adding it as a method on TypeInference? For the latter, we could add an assert to make sure that the expression is not from the same scope as the one that's currently being inferred which is available via the self.scope method:

fn scope(&self) -> ScopeId<'db> {
self.types.scope
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, I'll put up a PR for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@MichaReiser
Copy link
Member

MichaReiser commented Dec 6, 2024

@AlexWaygood are you nitpicking your own code? 😆 (just kidding, these are great improvements and not nits!)

@carljm
Copy link
Contributor Author

carljm commented Dec 6, 2024

these are great improvements and not nits!

I think it's fair to say that adding backticks to a TODO comment is indeed a nit 😆 If that's not a nit, I don't know what would be. But I don't mind some nits :)

carljm and others added 5 commits December 6, 2024 10:23
Co-authored By: Alex Waygood <AlexWaygood@Gmail.com>
Co-authored-by: Micha Reiser <micha@reiser.io>
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
@carljm
Copy link
Contributor Author

carljm commented Dec 6, 2024

Going to go ahead and merge this rather than wait until Monday. I did make some significant changes in response to review; happy to push a follow-up PR if there are comments on the new changes.

@carljm carljm merged commit 3017b3b into main Dec 6, 2024
21 checks passed
@carljm carljm deleted the cjm/funcparams branch December 6, 2024 20:55
carljm added a commit that referenced this pull request Dec 9, 2024
…e_expression_ty (#14879)

## Summary

Per suggestion in
#14802 (comment)

This is a bit less error-prone and allows us to handle both expressions
in the current scope or a different scope. Also, there's currently no
need for this method outside of `TypeInferenceBuilder`, so no reason to
expose it in `types.rs`.

## Test Plan

Pure refactor, no functional change; existing tests pass.

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[red-knot] infer types from parameter annotations
5 participants