Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descriptor changing its class return type when object sets something to it #2698

Closed
bellini666 opened this issue Dec 12, 2021 · 13 comments
Closed
Labels
as designed Not a bug, working as intended

Comments

@bellini666
Copy link

Considering this self-contained example:

from typing import TypeVar, overload

_S = TypeVar("_S", bound="Descriptor")
_M = TypeVar("_M", bound="Model")


class Descriptor:
    @overload
    def __get__(self: _S, obj: None, objtype: type[_M]) -> _S:
        ...

    @overload
    def __get__(self, obj: _M, objtype: type[_M]) -> str:
        ...

    def __set__(self, obj, value: str):
        ...


class Model:
    foo = Descriptor()
    foo2 = Descriptor()

    def some_method_that_sets_foo2(self):
        self.foo2 = "abc"


reveal_type(Model.foo)
reveal_type(Model.foo2)

Running pyright gives the following

> pyright foobar.py 
No configuration file found.
No pyproject.toml file found.
stubPath /tmp/typings is not a valid directory.
Assuming Python platform Linux
Searching for source files
Found 1 source file
/tmp/foobar.py
  /tmp/foobar.py:28:13 - info: Type of "Model.foo" is "Descriptor"
  /tmp/foobar.py:29:13 - info: Type of "Model.foo2" is "Descriptor | str"
1 error, 0 warnings, 4 infos 
Completed in 0.558sec

The fact that there exists a method that sets foo2 makes the typing for the descriptor being retrieved from the class to consider that now he can return that too, which is wrong since I'm setting it through object and not class. Also, setting it by class (i.e. Model.foo = <something else> would change it to something else and not call __set__).

@erictraut
Copy link
Collaborator

The problem here is that you haven't declared the type of attribute foo2. Its type is therefore inferred based on all assignments to it. If you declare its type, this code will work as you expect.

class Model:
    foo2: Descriptor = Descriptor()
    ...

I'll think more about whether the inference logic could handle this better. One challenge is that execution order matters here, and in general a static type checker can't predict execution order. In your particular case, it would be able to assume a partial execution order because foo2 = Descriptor() is executed within the class scope and self.foo2 = "abc" is executed within a method scope. But consider the following:

class Model:
    def method1(self):
        self.foo = Descriptor()

    def method2(self):
        self.foo = "hi"


m1 = Model()
m1.method1()
m1.method2()
print(m1.foo)

m2 = Model()
m2.method2()
m2.method1()
print(m2.foo)

In this case, the value of foo differs depending on the order of execution.

Another challenge is in cases where a descriptor is conditionally assigned in a class.

class Model:
    if condition:
        foo2 = Descriptor()
    else:
        foo2 = "abc"

    def some_method_that_sets_foo2(self):
        self.foo2 = "abc"

Perhaps the right answer is to say that you must declare the type of the attribute in this case. That eliminates these ambiguities and makes the intent clear.

@bellini666
Copy link
Author

Hey @erictraut ,

In the first example I don't understand how it is similar to the issue. I mean, if you are setting self.foo = <something> you are setting it at object level, which means that you will set obj.foo to be an object of Descriptor (since it did not exist at class level). Now, if you have defined if like this:

class Model:
    foo = Descriptor()

    def method1(self):
        self.foo = Descriptor()

Then you would be actually calling Descriptor.__set__, and passing a new instance of Descriptor to its value and not actually changing the object's foo to be Descriptor (unless the implementation there would accept that... Also, ifthat example was built on top of my original example, this code above ^ should have produce an error since Descriptor.__set__ says the value should be str and not a Descriptor object). If Descriptor.__set__ had a assert isinstance(value, str) it would even raise an AssertionError when calling method1.

In the second example I can see how it is ambiguous, but I also don't understand how that is related to the issue.

And just to mention, that is not too different from using a property: If foo2 was a property, by setting it to something else using model_object.foo2 = 'something' would still not change Model.foo2 (the class access) to be property | str

Summarizing, if the class defines an attribute as a descriptor, it totally changes what self.<attribute> does for get/set, just like a @property would do. So, getting/setting to that attribute at object level should follow the descriptor's __get__/__set__ rules, which also includes the fact that getting them can be done from both the object and the class and can result in different types for both (i.e. in my example, Model.foo returns the descriptor instance itself and Model().foo returns str). And setting the value at object will trigger __set__ and not actually set the value at the object's __dict__ always.

@erictraut
Copy link
Collaborator

erictraut commented Dec 12, 2021

Python is pretty loose when it comes to differentiating between class and instance variables. A class variable can be accessed in the same manner as an instance variable through a self reference, and you can overwrite the class-level attribute with an instance-level attribute of the same name, after which any access of the attribute will return the instance attribute rather than the class attribute. This makes life difficult for type checkers. Mypy makes very little effort to even track the difference between class and instance variables. Pyright does a bit more in this respect, but it cannot know when a self reference is going to refer to the class or instance variable because dynamic execution order can't generally be determined statically.

And just to mention, that is not too different from using a property

There's a critical difference between your first example and a property. A @property with an associated def statement is a type declaration. In your sample above, you haven't declared the type of foo2, so pyright is falling back on type inference to determine its type. If you declare the type of foo2, there is no problem here.

@erictraut
Copy link
Collaborator

Thinking about this more, I think this is a case where you will need to provide a type declaration rather than relying on type inference. The inference rules are already very complex and expensive. I don't think it makes sense to add more logic to handle this special case and all of the edge cases I mentioned above.

@erictraut erictraut added the as designed Not a bug, working as intended label Dec 12, 2021
@bellini666
Copy link
Author

Hey @erictraut ,

I understand what you mean regarding class and instance variables, due to the fact that obj.attr is basically doing obj.__dict__["attr"], falling back to type(obj).__dict__["attr"] (considering its __mro__), and the challenges that it might bring to static type checkers.

But that doesn't apply to descriptors. Descriptors work in a way that, even if obj.__dict__["attr"] exists, if a descriptor is defined at type(obj).__dict__["attr"]), its __get__ method will be used instead of retrieving the attr from the obj. This is described in the descriptor protocol and works like that since python 2.2.

For example:

class Descriptor:
    def __init__(self):
        self._values = {}

    def __get__(self, obj, objtype):
        if obj is None:
            return self
        return "descriptor get: " + self._values.get(obj, "<not set>")

    def __set__(self, obj, value):
        self._values[obj] = value


class Foo:
    desc = Descriptor()


f = Foo()
print(f.desc)
print(Foo.desc)

f.desc = "abc"
print(f.desc)
print(Foo.desc)

f.__dict__["desc"] = "def"
print(f.desc)
print(Foo.desc)

Executing that I get:

descriptor get: <not set>
<__main__.Descriptor object at 0x7f5d0e067910>
descriptor get: abc
<__main__.Descriptor object at 0x7f5d0e067910>
descriptor get: abc
<__main__.Descriptor object at 0x7f5d0e067910>

When I set obj.desc = "something" I don't set it to the object, since desc is a descriptor its __set__ will get called with the value that I passed. So, when I ask for obj.desc again I don't receive "abc" but instead I receive what the __get__ returns. Even when I set it directly to obj.__dict__ it still returns what the descriptor returns.

There's a critical difference between your first example and a property. A @Property with an associated def statement is a type declaration. In your sample above, you haven't declared the type of foo2, so pyright is falling back on type inference to determine its type. If you declare the type of foo2, there is no problem here.

I don't understand the internal type checking implicances here but, even though I didn't declare the type, since the assignment is a descriptor it will not change unless set directly at the class. The comparison to property was that the kind of descriptor that I defined in the example is the exactly the same as a property. Not to mention that a property is actually a descriptor internally, which make their use cases basically the same.

Thinking about this more, I think this is a case where you will need to provide a type declaration rather than relying on type inference. The inference rules are already very complex and expensive. I don't think it makes sense to add more logic to handle this special case and all of the edge cases I mentioned above.

I would like to kindly ask you to reconsider this. Descriptors are pretty core python stuff (e.g. without them we could not have @property), that exists since python 2.2, and they are the base of a lot of existing python libs out there: E.g. In Django all fields in a model are descriptors. Having to type everything would mean that instead of doing char_field = models.CharField() one would have to do char_field: models.CharField = models.CharField(), for every single django field defined. (note that django is an example, there are a lot of other similar libs out there)

Also, you mentioned mypy, but it supports the descriptor protocol: python/mypy#2266

IMO, there are basically 2 issues here:

  1. The main issue described here that when setting an instance attribute to something it made the class report that attribute as possibly being that as well. Even not considering everything that we discussed here, never doing obj.foo = "something" would change type(obj).foo. I could understand that type(obj).foo = "something" could change obj.foo if it didn't define its own, but not the other way around.

  2. The support for the descriptor protocol, as I described above, considering that get/set to a descriptor attribute (at least for a data descriptor) is always going to trigger its __get__/__set__ methods and there's no override through the object.

@erictraut
Copy link
Collaborator

Pyright already has support for descriptors. If you add a type annotation for foo2 in your example, it will work as you are hoping it will. Did you try that?

class Model:
    foo2: Descriptor = Descriptor()

@bellini666
Copy link
Author

@erictraut it does solve some problems but not all. Let me give you an example based on the original code:

from typing import TypeVar, overload

_S = TypeVar("_S", bound="Descriptor")
_M = TypeVar("_M", bound="Model")


class Descriptor:
    @overload
    def __get__(self: _S, obj: None, objtype: type[_M]) -> _S:
        ...

    @overload
    def __get__(self, obj: _M, objtype: type[_M]) -> str:
        ...

    def __set__(self, obj, value: str):
        ...


class Model:
    foo = Descriptor()
    foo2: Descriptor = Descriptor()

    def some_method_that_sets_foo2(self):
        reveal_type(self.foo)
        reveal_type(type(self).foo)
        self.foo = "abc"
        reveal_type(self.foo)
        reveal_type(type(self).foo)

        reveal_type(self.foo2)
        reveal_type(type(self).foo2)
        self.foo2 = "abc"
        reveal_type(self.foo2)
        reveal_type(type(self).foo2)

The output of this is:

No configuration file found.
No pyproject.toml file found.
stubPath /tmp/typings is not a valid directory.
Assuming Python platform Linux
Searching for source files
Found 1 source file
/tmp/desc.py
  /tmp/desc.py:25:21 - info: Type of "self.foo" is "str"
  /tmp/desc.py:26:21 - info: Type of "type(self).foo" is "Descriptor | str"
  /tmp/desc.py:28:21 - info: Type of "self.foo" is "Literal['abc']"
  /tmp/desc.py:29:21 - info: Type of "type(self).foo" is "Descriptor | str"
  /tmp/desc.py:31:21 - info: Type of "self.foo2" is "str"
  /tmp/desc.py:32:21 - info: Type of "type(self).foo2" is "Descriptor"
  /tmp/desc.py:34:21 - info: Type of "self.foo2" is "Literal['abc']"
  /tmp/desc.py:35:21 - info: Type of "type(self).foo2" is "Descriptor"
1 error, 0 warnings, 8 infos 
Completed in 0.571sec

I can see that, as you said, type(self).foo2 continues to be for sure a Descriptor even after doing self.foo2 = "abc". But my point still stands that even without typing, self.foo = "abc" would never make type(self).foo return Literal["abc"] in any situation as an assignment to an object doesn't change the __dict__ of its class.

Having said that, there's an error in that even for foo2 that is now typed, regarding the descriptor protocol. After the assignment it says that self.foo2 is now Literal["abc"] which is wrong. The descriptor returns str, but it doesn't have to be the exact same thing that was set to it. If the implementation was like this example self.foo2 would now return "descriptor get: abc" instead of Literal["abc"].

@erictraut
Copy link
Collaborator

The reveal_type call reveals the type of the expression at that point in the code flow. It doesn't reveal the declared type of the expression. Pyright performs type narrowing on assignment, which can further narrow the type. For example:

def func1(val: int):
    x: int = val
    reveal_type(x) # int
    x = 3
    reveal_type(x) # Literal[3]

or

class A:
    a: int

def func2(val: A):
    reveal_type(val.a) # int
    val.a = 3
    reveal_type(val.a) # Literal[3]

The same thing is happening in your example above.

So, this is all behaving as expected.

I recently (yesterday) added logic to avoid assignment-based type narrowing for asymmetric descriptors (where the getter return type doesn't match the setter type). This will be in the next release. See this issue for more details. But this doesn't apply to your descriptor because it's not asymmetric.

@bellini666
Copy link
Author

bellini666 commented Dec 13, 2021

Hey @erictraut ,

But this doesn't apply to your descriptor because it's not asymmetric.

I understand why the type narrowing on assignment makes sense from the point of a variable, but I think it is too much to consider the same for a descriptor as it is only considering the ones that acts like a "property".

I mean, even though the types match doesn't mean that what I set is what I get. Let me demonstrate a real working example:

from typing import Any, TypeVar, overload

_S = TypeVar("_S", bound="PowerOfTwo")


class PowerOfTwo:
    def __init__(self):
        self._values = {}

    @overload
    def __get__(self: _S, obj: None, objtype: Any) -> _S:
        ...

    @overload
    def __get__(self, obj: Any, objtype: Any) -> int:
        ...

    def __get__(self, obj, objtype):
        if obj is None:
            return self
        return self._values.get(obj, 0) ** 2

    def __set__(self, obj, value: int):
        self._values[obj] = value


class Foo:
    var: PowerOfTwo = PowerOfTwo()


f = Foo()
reveal_type(f.var)  # Type of "f.var" is "int"
f.var = 4
reveal_type(f.var)  # Type of "f.var" is "Literal[4]"
print(f.var)  # prints 16

This is basically what I'm saying ^. Setting anything to a descriptor doesn't imply that it will return that, even if it is symetric. In that example, it says that f.var is Literal[4], but that narrowing is totally wrong.

edit: changed the output of the code as comments in the example itself for clarity

@erictraut
Copy link
Collaborator

Yes, this is a case where practicality trumps type correctness. There has been significant discussion of this already. See this issue for a discussion and resolution. We're unlikely to revisit this.

@bellini666
Copy link
Author

@erictraut I don't see how that discussion in the issue you pointed is "resolved" like that, for me it seems like an open discussion still.

Also, a comment from gvanrossum just after your summary points to the exact same issue I'm pointing here, that narrowing the type of descriptors is wrong. His example is asymmetric, mine just happens to be symmetric, but the idea is the same.

this is a case where practicality trumps type correctness

I can see why and respect that for symmetrical @property, even though I still don't fully agree with that, but descriptors are used for much more than fancy "getters"/"setters" to be put on the same basket like that.

And speaking of the original issue itself, you still didn't give me any reply on it yet. Taking a non-descriptor as an example:

class Foo:
    a = 1

f = Foo()
reveal_type(f.a)  # Type of "f.a" is "int"
reveal_type(Foo.a)  # Type of "Foo.a" is "int"

f.a = "abc"
reveal_type(f.a)  # Type of "f.a" is "Literal['abc']"
reveal_type(Foo.a)  # Type of "Foo.a" is "int"

This is what I was expecting ^. Even though I changed f.a to a string, Foo.a is still an int because an assignment to the object doesn't change the class attribute. If you look at my original example you will see that the class attribute is narrowing together with the assignment to the instance, which is wrong.

You said that it was because the type was being inferred, but this example is too being inferred but isn't suffering from the same issue (or else the last reveal_type(Foo.a) would say Type of "Foo.a" is "int" | "str")

@erictraut
Copy link
Collaborator

erictraut commented Dec 14, 2021

We've meandered all over the place in this issue. Let me try to summarize.

  1. In your initial example, you have a symbol that has no type declaration that is used as both a class variable and an instance variable. Because there is no type declaration, pyright falls back on its inference logic which uses the union of assigned types. You would prefer that the inference logic special case the situation where a descriptor is assigned in the class body. I've concluded that this is a case where an explicit type declaration is required. You've verified that the presence of a type annotation addresses the problem.

  2. Prior to Pyright 1.1.196, assignment-based type narrowing is always applied for descriptor assignments. Based on a long discussion with gvanrossum and others, we've concluded that assignment-based type narrowing should not be applied for descriptors or properties when they are asymmetric. You would prefer that assignment-based type narrowing is never applied for descriptors. I understand your viewpoint, but we're unlikely to do this. It will result in too many false positive errors. The decision to suppress assignment-based type narrowing for asymmetric descriptors already makes me nervous, but it's a reasonable compromise.

I'm going to consider this issue closed at this point. If you see other issues, please feel open another bug report or feature request.

@krokosik
Copy link

So, to summarize, descriptors, an essential and widely used part of the Python language, are not supported by Pyright and never will be? That is a bit disappointing, I understand it's a lot of work, but maybe it should at least be on your roadmap?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
as designed Not a bug, working as intended
Projects
None yet
Development

No branches or pull requests

3 participants