Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/unknown kinds #13500

Closed
wants to merge 8 commits into from
Closed

Feat/unknown kinds #13500

wants to merge 8 commits into from

Conversation

Slyces
Copy link
Contributor

@Slyces Slyces commented Sep 24, 2024

Summary

Previous Work

With #12986, @AlexWaygood tried to introduce a lower level of granularity into the Unknown type, similarly to what mypy does for any.

The main benefits (he targeted) were related to unresolved-import diagnostics.

Current Approach

I rebased Alex's PR and continued his work following some comments on a previous PR.

I would like to take his work as a basis to achieve a slightly different objective. This PR does not target enabling any specific user-facing feature. It's attempting to find a way for us to record more accurate information when we have it (e.g. detail the current Unknown into multiple kinds) without breaking other functionalities.

As a benefit, I hope this will allow us to

  • Debug more comfortably complicated cases with all the information needed
    • Even when this additional information is not required to make the program work
  • Gradually maintaining and introducing this additional information is easier early in the project than later
    • If some of this ends up being valuable for user-facing features
  • Distinguish between expected behaviour & not implemented branches (RedKnotLimitation)
  • Pave the way for the same pattern applied to Any (if mypy needs it, maybe we will too)

This PR is a draft and doesn't have all the answers, but I'll try to address the limiting factors that led Alex to close his previous work

Previous Objections

Better alternative for unresolved-imports

One of the critical objections was that the previous PR's main goal could be achieved with an easier solution. See this comment.

I think this wouldn't apply here as I have a different core objective with this PR.

Behaviour for union

For unions, multiple UnknownKindType are aggregated together through the method

impl UnknownTypeKind {
    /// Method to aggregate multiple `UnknownTypeKind`s when they interact in any context
    pub(crate) fn union(self, other: Self) -> Self;
}

The specific of which implementation makes the most sense depends on how we use unions. There are many options. The most trivial would be to only return one kind (e.g. the first found). The most complicated would be to record a bitflag of all kinds encountered.

Behaviour for intersections

For intersections, I tried out an implementation with a special kind, AllKinds, which is equal to all other kinds.

There can only be one kind of Unknown in a positive or negative intersection, making all unknown kinds valid (or invalid) for an intersection that contains any other unknown kind.

Another idea I had would be to make all UnknownKind indistinguishable by Eq & Hash so only code using match would make difference between the UnknownTypeKind, if they need to do so. I didn't implement this in the draft because I'm not sure it's a good idea.

Test Plan

  • The core element is that all existing behaviour should not be impacted by this specialization of Unknown
  • Added some tests for the behaviour of Union and Intersection that requires additional code

P.S: this is a draft, any feedback is welcome. I can try any implementation you'd like to see for the different problems stated above. If this PR's cost/value isn't worth it, no problem with closing!

@MichaReiser MichaReiser added the red-knot Multi-file analysis & type inference label Sep 24, 2024
Copy link
Contributor

github-actions bot commented Sep 24, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@MichaReiser
Copy link
Member

MichaReiser commented Sep 24, 2024

I've very limited understanding of Python's type inference, so take this with a grain of salt. I'm also not entirely convinced if we actually want this. I think I would like to see specific use cases where it demonstrates that the information is indeed useful (and doesn't get lost in translation).

I think I would prefer a bit flag over an enum because it seems more obvious to me how the information propagates through different types of inference operations. IMO, the bit flag should be additive. If we have any type inference where an input type is Unknown and we return Unknown because of that, than the result should carry on the flags of the input Unknown.

x = [1, 2, 3];
y = x[0]

Let's pretend that Red Knot doesn't support inferring the list type, but it does support the subscript.

x's type infers as Unknown(NotImplemented) and I would expect y to also have the type Unknown(NotImplemented) or Unknown(NotImplemented | TypeError) because it is derived from an Unknown(NotImplemented) value. Carrying along the reasons would for example allow us to flag all Unknown(NotImplemented) types in tests because that's probably unintentional. However, that wouldn't be possible if we erase the information to Unknown(SecondOrder)

@Slyces
Copy link
Contributor Author

Slyces commented Sep 24, 2024

This is achievable through ensuring that whenever we'd encounter a new occurence of Unknown (here, subscript on a type that does not support it) we use the kind1.union(kind2) method.

That method is responsible for aggregating multiple unknown causes together. It's critical that we use this method in the right places.

In the current draft, I am simplifying the implementation of this union method (to return self). As you stated, it should eventually hold the full information of which kinds interacted together. I think the way I pictured it would be to have a specific enum kind that would hold the different bitflags anytime multiple unknown interact. You'd never build this kind yourself, only through the interaction of one or more primary kinds.

@AlexWaygood AlexWaygood requested a review from carljm September 25, 2024 03:31
Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

I have mixed feelings about it. I see the potential value for debugging, but I'm not sure how well this will play out in practice. If you see "red-knot-limitation" you still have to figure out where that came from. But I guess it is useful to at least know you're not dealing with intended behavior.

I think I would slightly favor adding this. I agree with Micha that it should be a bitflag.

The main alternative I would want to consider is that we instead just add a new Type variant, Type::Todo, which behaves like Unknown. I think this could give us similar debugging benefits with less runtime cost, although still some cost in verbosity of Type matches. There are also hypothetical future uses (for better diagnostics) that this wouldn't help with, but I'm hesitant to add complexity and runtime cost for hypothetical features we haven't designed or committed to yet.

One thing that might help clarify would be some specific examples of where mypy makes use of its Any kinds, so we can really decide whether that's something we will definitely want to emulate.

Comment on lines +917 to +919

/// The symbol was unannotated and the true type can't be reasonably inferred
UnannotatedSymbol,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this case exists, and I think we should limit the cases to those we actually use, not add more speculatively.

Comment on lines +924 to +927
/// A "second-order" Unknown type, that results from an interaction with
/// a "first-order" Unknown type. For example, if the type of `x` is `Unknown`,
/// the type of `x + 1` will also be `Unknown`
SecondOrder,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not convinced we need this kind; I think I'd rather propagate the original unknown kind in the x + 1 example.

Comment on lines +929 to +931
/// Special kind for intersections that is equal to all other kinds. This allows to consider a
/// single `Type::Unknown(_)` in intersections. Should not be used in other contexts.
AllKinds,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not convinced we should have this; I think I'd prefer to just not simplify differing unknowns in intersections. I don't expect this case to arise often, because most intersections will come from narrowing constraints, and in most cases we'll just not return a narrowing constraint at all rather than return one with an unknown type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, does that mean that having potentially different unknown kinds inside an intersection is not a concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think that's a problem.


impl UnknownTypeKind {
/// Method to aggregate multiple `UnknownTypeKind`s when they interact in any context
pub(crate) fn union(self, _other: Self) -> Self {
Copy link
Contributor

@carljm carljm Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the kind of an unknown should just always be a bitflag, and then the behavior of union is clear. (It doesn't make sense to me to have an enum with a special variant which is a bitflag of other variants; that seems like complexity that doesn't buy anything useful.)

Comment on lines +67 to +68
// REVIEWERS: should we display 'RedKnotLimitation' differently?
// => intent: any test including 'red-knot-limitation' should be updated one day
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so -- if we don't do this, the debugging value is limited.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexWaygood and I looked more carefully at the mypy use cases, and we are not convinced that we will ever want this data for any purpose other than as a debugging aid while red-knot has a lot of "not implemented" inference. We aren't 100% convinced that we won't eventually want it, either, but it remains unclear enough that it still doesn't feel like it makes sense to add it speculatively. So at the moment we are leaning towards simplifying this PR to just use a new top-level NotImplemented variant, which behaves like Any and Unknown but represents inference that is not yet implemented in red-knot. @MichaReiser, does this sound reasonable to you?

Mypy uses its Any provenance info to be able to alert you when an implicit Any creeps into your code from some particular source (e.g. a failed import, an unannotated function, an unparameterized generic). What's unclear to me about the utility here is that these are all things that can be errored directly where they happen, if you want to be strict about them; erroring directly where they happen doesn't require tracking the provenance of Any.

I guess the use case is maybe that the implicit Any happens in third-party code that you aren't checking or maintaining, but you still don't want the Any to sneak into your code? But in this case I'm not clear why it would matter whether the Any comes from an unresolved import, vs a type error, vs an explicit Any -- if you aren't maintaining or checking the code where it came from, one Any is just like another, why does it matter specifically how it originated?

@hauntsaninja, you recently provided some great feedback from a mypy perspective on some other red-knot issues -- any thoughts here?

pub enum UnknownTypeKind {
/// Temporary variant that indicates that we *should*
/// be able to infer a type here in due course, but currently can't
RedKnotLimitation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I like the name NotImplemented here

Comment on lines +929 to +931
/// Special kind for intersections that is equal to all other kinds. This allows to consider a
/// single `Type::Unknown(_)` in intersections. Should not be used in other contexts.
AllKinds,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think that's a problem.

@AlexWaygood
Copy link
Member

So at the moment we are leaning towards simplifying this PR to just use a new top-level NotImplemented variant

I agree with nearly everything @carljm said here, with one small exception: tne issue with calling the variant Type::NotImplemented is that there is actually a NotImplemented type in Python, and this variant would not represent that type. Type::ToDo might be a better name here.

@MichaReiser
Copy link
Member

So at the moment we are leaning towards simplifying this PR to just use a new top-level NotImplemented variant, which behaves like Any and Unknown but represents inference that is not yet implemented in red-knot. @MichaReiser, does this sound reasonable to you?

Overall yes, but it depends on how this Todo variant propagates when using unification/intersection, or matching where the left side is a Todo. I only consider it useful when any Todo "poisons" all results in the sense that Unkown and Todo results in Todo (something and a todo is a todo).

@carljm
Copy link
Contributor

carljm commented Sep 26, 2024

it depends on how this Todo variant propagates when using unification/intersection, or matching where the left side is a Todo. I only consider it useful when any Todo "poisons" all results in the sense that Unkown and Todo results in Todo (something and a todo is a todo).

I agree that we should consider this carefully to maximize the debugging value. I think that following this proposed rule might cause Todo to be over-propagated, reducing its debugging value.

I think the general rule should be that Todo should propagate only when the presence of the input Todo caused the output to be unknown.

To take a specific example, the inferred result of addition must be Unknown if either operand is Unknown. That is, Unknown + X will always be Unknown regardless of what X is. (Same for X + Unknown.) In this case, I believe that Unknown + Todo (or Todo + Unknown) should result in Unknown, not result in Todo. If we fix the upstream source of the Todo, the result would still be Unknown, so it's not useful to propagate the Todo in this case: it wrongly suggests that the output is unknown because of a todo item.

@MichaReiser
Copy link
Member

Makes sense. Summarising this in a comment on the todo variant would be useful

@Slyces
Copy link
Contributor Author

Slyces commented Sep 26, 2024

Hey guys, thank you so much for spending so much time thinking about the problem.

The direction is clear, I'll edit this PR when I can to implement what's been decided so far, and I'll request for another review then.

Thank you again 😁

@hauntsaninja
Copy link
Contributor

hauntsaninja commented Sep 26, 2024

I think it comes up if an Any is explicitly annotated or comes about from a lack of annotation (assuming an absence of inference). It also matters what configuration options people use in practice. For various often historical reasons, --ignore-missing-imports is extremely commonly used with mypy, so it can often be useful to have additional diagnostics. People hate when they're surprised by an Any they didn't expect. If you plan on having configuration that lets you have really tight control over Any's, you may also find this useful (e.g. type coverage reports or some of the really strict options). If there's design risk at this point, I think it's fine to defer. While it might be a little cumbersome, it's probably not a thing that's too hard to add in later

@Slyces
Copy link
Contributor Author

Slyces commented Sep 28, 2024

Hey guys, closing this in favour of a new PR to allow for discussions focused on the implementation

@Slyces Slyces closed this Sep 28, 2024
@Slyces Slyces deleted the feat/unknown-kinds branch September 28, 2024 20:18
carljm pushed a commit that referenced this pull request Sep 30, 2024
This variant shows inference that is not yet implemented..

## Summary

PR #13500 reopened the idea of adding a new type variant to keep track
of not-implemented features in Red Knot.

It was based off of #12986 with a more generic approach of keeping track
of different kind of unknowns. Discussion in #13500 agreed that keeping
track of different `Unknown` is complicated for now, and this feature is
better achieved through a new variant of `Type`.

### Requirements

Requirements for this implementation can be summed up with some extracts
of comment from @carljm on the previous PR

> So at the moment we are leaning towards simplifying this PR to just
use a new top-level variant, which behaves like Any and Unknown but
represents inference that is not yet implemented in red-knot.

> I think the general rule should be that Todo should propagate only
when the presence of the input Todo caused the output to be unknown.
>
> To take a specific example, the inferred result of addition must be
Unknown if either operand is Unknown. That is, Unknown + X will always
be Unknown regardless of what X is. (Same for X + Unknown.) In this
case, I believe that Unknown + Todo (or Todo + Unknown) should result in
Unknown, not result in Todo. If we fix the upstream source of the Todo,
the result would still be Unknown, so it's not useful to propagate the
Todo in this case: it wrongly suggests that the output is unknown
because of a todo item.

## Test Plan

This PR does not introduce new tests, but it did required to edit some
tests with the display of `[Type::Todo]` (currently `@Todo`), which
suggests that those test are placeholders requirements for features we
don't support yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants