Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Track root cause of why a type is inferred as Unknown #12986

Closed
wants to merge 3 commits into from

Conversation

AlexWaygood
Copy link
Member

@AlexWaygood AlexWaygood commented Aug 19, 2024

Summary

This PR adds detailed information to our type inference so we can track exactly why a symbol has been inferred as Unknown. This allows us to restore the unresolved-import red-knot lint to the level of accuracy it had prior to a9847af, because we can now distinguish between Unknown types that were caused by unresolved imports and other kinds of Unknown types. The approach is similar to the one mypy uses.

Test Plan

There are two ways in which this PR is tested:

  1. The assertion in the benchmark is changed: six spurious "Unresolved import" diagnostics go away, meaning the number of diagnostics drops from 34 to 28. The only remaining "unresolved import" diagnostic is emitted on this line, which is because we don't understand * imports yet, and Iterable is defined in the typeshed stub for collections.abc here.
  2. A test is unskipped in red_knot_workspace/src/lint.rs asserting that a project structure (where the foo module does not exist) like this only triggers an "unresolved import" diagnostic in a.py, and not in b.py:
    • /src/a.py: import foo as foo
    • /src/b.py: from a import foo

@AlexWaygood AlexWaygood added the red-knot Multi-file analysis & type inference label Aug 19, 2024
}
}
}

#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I derived Ord and PartialOrd here because the Type enum derives them, but I'm not sure it makes sense to do so. I don't really understand why the Type enum derives them in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, me neither. I would remove Ord from Type

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the PartialOrd and Ord implementations in ff6b148

Copy link
Contributor

github-actions bot commented Aug 19, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this new type work when unifying or intersecting types? Do we keep all different variants?

@AlexWaygood
Copy link
Member Author

How does this new type work when unifying or intersecting types? Do we keep all different variants?

Good question. Currently yes, you could end up with a union such as int | Unknown(UnknownTypeKind::UnresolvedImport) | Unknown(UnknownTypeKind::TypeError).

I suppose we should flatten them into a single Unknown member. But then the question becomes: what kind of Unknown should it be? Maybe UnknownTypeKind::SecondOrder, since it results from the union of two Unknown elements?

@MichaReiser
Copy link
Member

I would love to wait with this PR to get @carljm opinion. I feel uncertain about the type's definition because there are no "obvious" definitions for unification and intersection (unless we make it a bit set, at least for unification).

It's further unclear if we still need to distinguish between the two types, now that the check for unresolved imports moves into the type checker.

I had a quick look at pyright and from what i understand is that it uses Unknown for unresolved imports but it only has a single Unknown type.

https://github.com/microsoft/pyright/blob/52a47010b9db2ad6801b4b35d987cb9f4c923c18/packages/pyright-internal/src/analyzer/typeEvaluator.ts#L18629-L18676

@AlexWaygood
Copy link
Member Author

I would love to wait with this PR to get @carljm opinion. I feel uncertain about the type's definition because there are no "obvious" definitions for unification and intersection

Agreed, I'm definitely not going to merge until we've heard Carl's thoughts here! There's no rush on this.

@AlexWaygood
Copy link
Member Author

I feel uncertain about the type's definition because there are no "obvious" definitions for unification and intersection (unless we make it a bit set, at least for unification).

FWIW, I think simplifying unions is inevitably going to become more complex than what we currently have, whether we go with this PR or not, because of the fact that we will want to simplify Literal[True] | Literal[False] to Instance(builtins.bool), and similarly for other sealed types such as enums.

It's further unclear if we still need to distinguish between the two types, now that the check for unresolved imports moves into the type checker.

Hmm, yeah, not sure. I'll try to pull out some of the improvements here into a separate PR that doesn't introduce the new UnknownTypeKind enum, so that we can evaluate in isolation exactly whether it improves anything for us right now.

Comment on lines -459 to -461
#[ignore = "\
A spurious second 'Unresolved import' diagnostic message is emitted on `b.py`, \
despite the symbol existing in the symbol table for `a.py`"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following ecd9e6a, the key benefit of this PR is now that this test can be unskipped. This improvement is also reflected in the fact that six spurious "unresolved import" diagnostics go away in the benchmarks.

Copy link

codspeed-hq bot commented Aug 21, 2024

CodSpeed Performance Report

Merging #12986 will degrade performances by 5.41%

Comparing alex/unknown-kind (698de48) with main (ecd9e6a)

Summary

❌ 1 regressions
✅ 31 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main alex/unknown-kind Change
linter/default-rules[numpy/globals.py] 170.5 µs 180.3 µs -5.41%

@carljm
Copy link
Contributor

carljm commented Aug 22, 2024

I'm open to the idea that we may need to do this eventually, but there are complexity and efficiency disadvantages to smuggling additional context through the type system this way, so I'd rather not do it until it's really clear exactly why we need it. And I don't think that's currently clear, given that the issue with the unresolved-imports check in main branch that this PR fixes is also easily fixable with a much smaller change: #13007

I don't think we should ever need this for the sake of deciding whether or not to emit a diagnostic. Unknown may result from a type error, but I don't think its existence should ever create a type error. The fact that main branch currently treats an import of any Unknown typed value as an error is IMO not correct, and is fixed by the change linked above.

I would be interested in knowing more about precisely how mypy uses its extra information about the origin of Any.

@AlexWaygood
Copy link
Member Author

I don't think we should ever need this for the sake of deciding whether or not to emit a diagnostic. Unknown may result from a type error, but I don't think its existence should ever create a type error. The fact that main branch currently treats an import of any Unknown typed value as an error is IMO not correct, and is fixed by the change linked above.

Thanks, I think this is correct; treating Unknown as a type error goes against the spirit of gradual typing. We need to catch the error before we transform the Unbound into Unknown. My mistake.

@AlexWaygood
Copy link
Member Author

I would be interested in knowing more about precisely how mypy uses its extra information about the origin of Any.

Much of it seems to be used for precise diagnostics when strict mode is enabled. Mypy --strict enables several checks that prevent implicit Any types from accidentally percolating through your code. So mypy wants to be able to distinguish between Anys that result from missing generic arguments, unannotated variables, and objects explicitly annotated as Any (for example).

E.g. here the type of Any is checked by the has_any_from_unimported_type() function in order to implement mypy's optional error code [no-any-unimported].

@AlexWaygood
Copy link
Member Author

I still think there's a good chance that we'll need something like this eventually, but I agree that with the fixes in #13055, there are few immediate benefits to this right now. Thanks both!

@AlexWaygood AlexWaygood deleted the alex/unknown-kind branch August 22, 2024 13:07
@carljm
Copy link
Contributor

carljm commented Aug 22, 2024

Much of it seems to be used for precise diagnostics when strict mode is enabled.

Yeah, that makes sense. I agree, we'll probably need something like this for the same reason, when we want to have an Any-forbidding strict mode and emit useful diagnostics for it.

carljm pushed a commit that referenced this pull request Sep 30, 2024
This variant shows inference that is not yet implemented..

## Summary

PR #13500 reopened the idea of adding a new type variant to keep track
of not-implemented features in Red Knot.

It was based off of #12986 with a more generic approach of keeping track
of different kind of unknowns. Discussion in #13500 agreed that keeping
track of different `Unknown` is complicated for now, and this feature is
better achieved through a new variant of `Type`.

### Requirements

Requirements for this implementation can be summed up with some extracts
of comment from @carljm on the previous PR

> So at the moment we are leaning towards simplifying this PR to just
use a new top-level variant, which behaves like Any and Unknown but
represents inference that is not yet implemented in red-knot.

> I think the general rule should be that Todo should propagate only
when the presence of the input Todo caused the output to be unknown.
>
> To take a specific example, the inferred result of addition must be
Unknown if either operand is Unknown. That is, Unknown + X will always
be Unknown regardless of what X is. (Same for X + Unknown.) In this
case, I believe that Unknown + Todo (or Todo + Unknown) should result in
Unknown, not result in Todo. If we fix the upstream source of the Todo,
the result would still be Unknown, so it's not useful to propagate the
Todo in this case: it wrongly suggests that the output is unknown
because of a todo item.

## Test Plan

This PR does not introduce new tests, but it did required to edit some
tests with the display of `[Type::Todo]` (currently `@Todo`), which
suggests that those test are placeholders requirements for features we
don't support yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants