Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type guard doesn't intersect types like instance check #1351

Closed
NeilGirdhar opened this issue Feb 18, 2023 · 19 comments
Closed

Type guard doesn't intersect types like instance check #1351

NeilGirdhar opened this issue Feb 18, 2023 · 19 comments
Labels
topic: other Other topics not covered

Comments

@NeilGirdhar
Copy link

NeilGirdhar commented Feb 18, 2023

The type guard doesn't seem to work like an instance check:

from typing import TypeGuard, Any
class X: pass
class Y: pass
x: X = X()
assert isinstance(x, Y)
def f(z: Any) -> TypeGuard[Y]:
    return isinstance(z, Y)
reveal_type(x)  # Revealed type is "a.<subclass of "X" and "Y">", which is great!
x2: X
assert f(x2)
reveal_type(x2)  # Revealed type is "a.Y", which is unfortunate.

This is causing problems with the definition of dataclasses.is_dataclass python/typeshed#9723.

The above code was tested with MyPy and Pyright.

@JelleZijlstra
Copy link
Member

This isn't a bug, it's precisely how TypeGuard is specified (https://peps.python.org/pep-0647/#typeguard-type: "the first positional argument to the type guard function should be assumed by a static type checker to take on the type specified in the TypeGuard return type"). A separate mechanism will need a new PEP.

@NeilGirdhar
Copy link
Author

Thanks for the clear explanation. Do you think it's a good idea for me to propose this in typing-sig?

@JelleZijlstra
Copy link
Member

Should probably read https://mail.python.org/archives/list/typing-sig@python.org/thread/JTO3WKRKQFW5YFP3ZSMNTRIEA5NPJEM6/ first. I think Eric has a well-developed proposal but hasn't pushed it forward because of insufficient interest.

@NeilGirdhar
Copy link
Author

NeilGirdhar commented Feb 18, 2023

Okay, I've read that thread. It's not something I'm used to thinking about, so it takes me a bit longer. The proposal in my issue is for the positive case to do narrowing. It seems like Erik's proposals are about what to do in the negative case.
(@msfterictraut what do you think?)

Also, regarding MyPy specifically, would this be easy for you to do since you're calling some kind of narrowing function for isinstance, but replacing the type for type guards? Ideally, it would be as easy as narrowing for type guards.

@erictraut
Copy link
Collaborator

@NeilGirdhar, it's not clear to me what you're proposing, so I can't offer any thoughts. Can you clarify what you're proposing?

Also as Jelle noted, TypeGuard is working as designed in your code above. There has been some discussion of adding an alternate form (StrictTypeGuard) that has different semantics and behaviors from the existing TypeGuard. This new form would be less flexible than TypeGuard but would allow for narrowing in the negative case and union narrowing in general. Even with StrictTypeGuard, you won't get the same behaviors as built-in type guards like isinstance. Those built-in type guards contain lots of custom logic based on detailed knowledge of the calls or operators used in the conditional expression. For example, the isinstance type guard can produce an intersection type as shown in your code sample above. A type checker can't make the same assumptions about a user-defined type guard.

If you want the semantics of an isinstance type guard, then you should use isinstance directly rather than attempting to wrap it in a user-defined type guard function.

@NeilGirdhar
Copy link
Author

NeilGirdhar commented Feb 18, 2023

Thanks Erik for your detailed reply!

Can you clarify what you're proposing?

I would like asserting on the type guard to narrow the argument type to the intersection of its original type X and the type guard type Y: X & Y. MyPy seems to do this for isinstance, but not for type guards.

This would fix the problem whereby is_dataclass(X) narrows to type[DataclassInstance] and loses X. I would like it to narrow to X & type[DataclassInstance].

For example, the isinstance type guard can produce an intersection type as shown in your code sample above. A type checker can't make the same assumptions about a user-defined type guard.

Why not?

If you want the semantics of an isinstance type guard, then you should use isinstance directly rather than attempting to wrap it in a user-defined type guard function.

How do we do that with is_dataclass?

There has been some discussion of adding an alternate form (StrictTypeGuard) that has different semantics and behaviors from the existing TypeGuard. This new form would be less flexible than TypeGuard but would allow for narrowing in the negative case and union narrowing in general.

I read your strict type guard idea. I'm not sure if it's related. I'm not interested in the negative case. What's 'union narrowing"?

@erictraut
Copy link
Collaborator

erictraut commented Feb 18, 2023

I read your strict type guard idea. I'm not sure if it's related.

StrictTypeGuard is another form of user-defined type guard with slightly different semantics. It will not help with the issue you're posing here, so you can ignore it.

Why not?

Because isinstance has very specific semantics and behaviors. A static type checker cannot assume that a user-defined type guard function has those same behaviors. I suppose it would be theoretically possible to define a special IsinstanceTypeGuard that replicates the isinstance type guard semantics exactly, but I don't see why that would be useful when we already have isinstance. If you want the isinstance type guard semantics, just use isinstance in your conditional statement.

How do we do that with is_dataclass?

I don't think I can answer that question because I don't know enough about is_dataclass or what you're trying to do. Is is_dataclass implemented internally with a simple isinstance check? If so, why was a new API created rather than telling developers to simply use isinstance if they want to check for a dataclass? I'm guessing that its behaviors are somewhat different (perhaps more complex) than an isinstance check.

@NeilGirdhar
Copy link
Author

NeilGirdhar commented Feb 18, 2023

I don't think I can answer that question because I don't know enough about is_dataclass or what you're trying to do. Is is_dataclass implemented internally with a simple isinstance check?

Sorry, I should have linked it: is_dataclass is a standard library function that checks if an object is a dataclass instance, or if a type is a dataclass.

That's why we can't simply use isinstance. I guess we could put is_dataclass into an ABC. __subclasshook__, and then use an ordinary isinstance on that to recover the desired behaviour (I tested that here).

However, it's not clear to me why that's necessary. When a type checker sees assert f(x2) (from the code in the issue), why can't it narrow x2's type the same way it did when it sees assert isinstance(x2, Y)?

Is is_dataclass implemented internally with a simple isinstance check?

It's a library function. It think it checks for various attributes. It can't check isinstance since dataclasses have no base class. (A design error IMO, but too late to complain.)

@erictraut
Copy link
Collaborator

When a type checker sees assert f(x2) (from the code in the issue), why can't it narrow x2's type the same way...

That's not how TypeGuard works. I won't explain in detail why, but if you look at some of the examples in PEP 647, you'll quickly see why it would be completely inappropriate for TypeGuard to imply "intersection".

I'll also note that there is currently no notion of an intersection in the Python type system. If and when such a concept is added, you could use a TypeGuard that explicitly uses an intersection. That said, adding intersection support to the Python type system is not straightforward. The discussions I've seen so far seem to be leaning toward very constrained uses, and it's unlikely that these would be sufficient for this use case.

@NeilGirdhar
Copy link
Author

NeilGirdhar commented Feb 19, 2023

That's not how TypeGuard works. I won't explain in detail why, but if you look at some of the examples in PEP 647, you'll quickly see why it would be completely inappropriate for TypeGuard to imply "intersection".

Sorry, but I don't see it. Consider the is_set_of function. Suppose you have x: set[X], and then you assert on is_set_of(x, Y). Currently, that makes x: set[Y]. Your argument is that it would be inappropriate to infer x: set[X] & set[Y]? But this seems to contradict PEP 647, which says "Type checkers should assume that type narrowing should be applied to the expression that is passed as the first positional argument to a user-defined type guard." And set[Y] is not narrower than set[X] whereas the intersection is narrower.

Also, you could wrap up any single-class type guard f in an ABC.__subclasshook__, and then switch from asserting on the type guard to asserting it's an instance of your ABC. Then, you get the intersection behavior (in Mypy and Pyright at least). So, I don't see why it's "completely inappropriate".

I'll also note that there is currently no notion of an intersection in the Python type system.

Yes, I understand that. However, both Pyright and Mypy seem to have a rudimentary intersection that they seem to use with instance assertions.

@erictraut
Copy link
Collaborator

If you read PEP 647, you will not see the word "intersection" anywhere within the spec. When I wrote the PEP, it didn't even occur to me to include intersections for several reasons. First, there is no such concept in the Python type system today. Second, intersections don't make sense for many of the use cases I envisioned for TypeGuard. Third, the PEP was loosely based on the user-defined type guard facility in TypeScript, and it doesn't use intersections (even though TypeScript does have such a notion in its type system).

I'm not sure if you're arguing that a reading of PEP 647 implies intersections or that you wish it had. In either case, I think the point is moot because it doesn't. The spec is final, and mypy, pyright and other type checkers implement TypeGuard as it is spec'ed.

You're welcome to propose a new variant of TypeGuard that does use intersections, but that would need to be done as a new PEP, and it would need to go through the normal discussion and review process. I'll also note that any proposal that depends on intersections would necessarily need to define and spec the concept of intersection types in the Python type system — or wait for another PEP to introduce such a concept.

To make my point about intersections being inappropriate for TypeGuard, let's take a look at the first example in PEP 647:

def is_str_list(val: List[object]) -> TypeGuard[List[str]]: ...

You're arguing that the narrowed type should be List[object] & List[str] (i.e. an intersection between the input type and the type guard type) rather than List[str]. I'm not sure what List[object] & List[str] even means. Presumably it would mean that the resulting type is compatible with both List[object] and List[str] and can be used anywhere either of these types can be used. It would also imply that you can append None to it, since None is an object.

The issubclass logic in pyright and mypy make use of an "anonymous subclass" that derives from two base classes. I guess you could call this a rudimentary intersection, but it's not really the same as an intersection, and there are limits to this approach. For example, you cannot express set[X] & set[Y] using this technique because it's not possible to subclass from the same generic class multiple times, and even if you could, only one of the subclass' methods would be visible due to MRO rules.

@kmillikin
Copy link

kmillikin commented Feb 20, 2023

The PEP is not as clear as it could be. The phrase "takes on the type" is used informally and never defined in the PEP. It might mean that it has exactly the type, or it might mean that it has that type in addition to its other types (like isinstance). But I agree that if you step back from the PEP you'll realize that it unfortunately can't be an intersection.

I say unfortunate, because we can't "combine" user defined typeguards like this:

if is_serializable(x):
  if is_dog(x):
    # We've forgotten it's serializable.
  elif is_cat(x):
    # Forgotten serializable.

The simple technical reason it can't be an intersection is because the PEP wants to allow type guards that don't make sense statically, like the motivating is_str_list example.

A simpler example:

def is_int(x: None) -> TypeGuard[int]: ...

Here the intersection None & int is empty so we've got to forget it's None in order to treat it like an int.

About list[object] & list[str], that's an empty intersection. It's a paradoxical list. You can only get strings out of it but you can put anything in. That's why list is invariant. A list[object] is never a list[str] and you can't even check at runtime.

For @NeilGirdhar, you could try to sidestep user-defined type guards. Instead of user-defined type guards, you could try user-defined downcast functions. These return their argument unchanged or None and should work without any new type system features.

You can cast a type to a consistent subtype ("strict narrowing") by returning the argument unchanged like:

def as_str_list(val: list[Any]) -> list[str] | None:
  return val if all(isinstance(x, str) for x in val) else None

And you will get narrowing for free by using it like:

ss = as_str_list(xs)
if ss:
  # ss is list[str]

You can also write the unsafe version by using typing.cast:

def unsafe_as_str_list(val: list[object]) -> list[str] | None:
  if all(isinstance(x, str) for x in val):
    return cast(list[str], val)

Which has the feature that the library writer has to acknowledge that the cast doesn't make sense statically. With type guards, you can accidentally write something that you think is safe but it's not, like the example from the PEP.

You still don't get intersections like you do with is_instance, so you have to remember which name to use. You have to write:

x: X
d = as_dataclass(x)
if d:
  # Use d for the dataclass and x when you need type X.

[Edit: fixed type annotation.]

@kmillikin
Copy link

kmillikin commented Feb 20, 2023

The issubclass logic in pyright and mypy make use of an "anonymous subclass" that derives from two base classes. I guess you could call this a rudimentary intersection, but it's not really the same as an intersection, and there are limits to this approach.

That's interesting. I wondered how you did it. Here you have to be careful not to depend on the order of the bases though. Pyright accepts this:

class A:
  def f(self, x: int) -> int: return x

class B:
  def f(self, x: int, y: int) -> int: return y

def test(x: B) -> None:
  if isinstance(x, A):
    reveal_type(x)
    x.f(0, 1)

and reveals Type of "x" is "<subclass of B and A>". Swap the order of the types:

def test(x: A) -> None:
  if isinstance(x, B):
    reveal_type(x)
    x.f(0, 1)

and reveals Type of "x" is "<subclass of A and B>" with an error Expected 1 positional argument (reportGeneralTypeIssues).

These cases should be symmetrical.

Mypy is doing something more subtle. It accepts both without errors. But it doesn't reveal any type, which does indicate that it knows that code is unreachable --- there can be no subclasses of both A and B. [Edit: I mean statically. The runtime will allow it as long as the MRO is consistent.] It does statically reject any attempt to define such a class, with the bases in either order, because of the incompatible override.

It appears more like it's treating this as something symmetrical like Union[subclass of A and B, subclass of B and A].

@kmillikin
Copy link

You're arguing that the narrowed type should be List[object] & List[str] (i.e. an intersection between the input type and the type guard type) rather than List[str].

Just to clarify this: if the argument to is_str_list had static type T then the narrowed type "should be" T & list[str]. The parameter type of the type guard isn't relevant. That is: if we knew before the call that it was statically a T, then we still know that after the call and also that it's a list[str].

list isn't the best example here because it's invariant.

@NeilGirdhar
Copy link
Author

NeilGirdhar commented Feb 21, 2023

@erictraut Thanks for your detailed reply. I've learned a lot from your github comments over the years and I always appreciate them.

To make my point about intersections being inappropriate for TypeGuard, let's take a look at the first example in PEP 647:

def is_str_list(val: List[object]) -> TypeGuard[List[str]]: ...

You're arguing that the narrowed type should be List[object] & List[str] (i.e. an intersection between the input type and the type guard type) rather than List[str].

PEP 647 says "Type checkers should assume that type narrowing should be applied to the expression that is passed as the first positional argument to a user-defined type guard." The way I read it was that it is saying that it doesn't matter what the parameter annotation (list[object]) is. What matters is the type of "the expression that is passed in", i.e., the argument type (T). I realize that that may not have been what you intended, but it seems to be a reasonable reading of what's written? This fits with what @kmillikin explains here.

I'm not sure what List[object] & List[str] even means. Presumably it would mean that the resulting type is compatible with both List[object] and List[str] and can be used anywhere either of these types can be used.

Exactly 😄

If we go to the top of this issue. It was submitted to get is_dataclass working in an elegant way. Imagine you have something like:

x: T
assert is_dataclass(x)

Ideally, this would result in x having type T & DataclassInstance. Under the current system, either x has type T or x has type DataclassInstance. Information is necessarily lost. I realize that intersection doesn't exist yet. In the case that a type guard checks a basic type like DataclassInstance, then type checkers like MyPy could apply the logic they use for instance checks and return '"anonymous subclass" that derives from two base classes'.

However, that seems like too much work for a partial solution. In an ideal world, I believe they would return the intersection of the argument type and the check type. What do you think?

Here the intersection None & int is empty so we've got to forget it's None in order to treat it like an int.

If you need type forgetting behaviour, can you not cast to Any?

Instead of user-defined type guards, you could try user-defined downcast functions…

This was extremely clever! Thanks for teaching me something new.

@NeilGirdhar
Copy link
Author

NeilGirdhar commented Sep 19, 2023

The new PEP 724 proposes:

  • narrowing types when a user-defined type guard function returns False, and
  • applying additional (more precise) type narrowing under certain circumstances when the type guard function returns True.

(The second part is this issue.)

@KotlinIsland
Copy link

Basedmypy has typeguards that intersect, read more here:
https://kotlinisland.github.io/basedmypy/based_features.html#reinvented-type-guards

@erictraut
Copy link
Collaborator

I think this can be closed now that TypeIs is part of the typing spec.

@bluenote10
Copy link

For future reference, adding a link to the section on TypeIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: other Other topics not covered
Projects
None yet
Development

No branches or pull requests

7 participants