Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typing support for capsules #109562

Closed
pitrou opened this issue Sep 19, 2023 · 16 comments
Closed

Typing support for capsules #109562

pitrou opened this issue Sep 19, 2023 · 16 comments
Labels
topic-typing type-feature A feature request or enhancement

Comments

@pitrou
Copy link
Member

pitrou commented Sep 19, 2023

Feature or enhancement

Proposal:

While not common, capsule objects can appear in APIs. For example the Python DLPack protocol specifies a __dlpack__ method returning a capsule object.

It would therefore be nice to add typing support for capsules. Ideally it should be parameterable by the capsule name, because it often serves as a discriminator for the actual C type under the capsule.

from typing import Capsule, Protocol

class DLPackExportable(Protocol):
    def __dlpack__(self) -> Capsule["dltensor"]:

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

@pitrou
Copy link
Member Author

pitrou commented Sep 19, 2023

Another example where capsules may be part of a protocol: apache/arrow#35531

@pitrou
Copy link
Member Author

pitrou commented Sep 19, 2023

@JelleZijlstra
Copy link
Member

Right now I don't think there's any way to refer to the type of capsules from Python. A good first step would be to add types.CapsuleType, which would automatically make it possible to refer to this type in type annotations.

You also ask for a syntax like Capsule["dltensor"] to refer to specific capsules. That makes sense, because usually you want one specific capsule, not any capsule. However, the syntax you propose would need special-casing in type checkers, and I feel capsules aren't common enough to warrant such a special case. An alternative could be to use NewType: you would write DLTensorCapsule = NewType("DLTensorCapsule", types.CapsuleType) and use that type in return and argument annotations.

@TeamSpen210
Copy link

I'm not sure there's actually much benefit to exposing types.CapsuleType really. Capsules have absolutely no Python-accessible API, aside from their repr(). So they're effectively opaque objects, a NewType(..., object) would be just as good. The only benefit would be using isinstance(), but that seems more like checking an implementation detail. It seems quite reasonable that some module could swap to their own type at any point, and not affect the Python code using it.

@pitrou
Copy link
Member Author

pitrou commented Sep 20, 2023

I'm quite new to typing, but the context is the definition of a protocol, not a single library, so a common way to spell out the type would be welcome.

Contrast:

>>> a = typing.NewType("DLTensorCapsule", object)
>>> b = typing.NewType("DLTensorCapsule", object)
>>> a == b
False
>>> c = typing.List[int]
>>> d = typing.List[int]
>>> c == d
True

As for the fact that capsules don't have an accessible Python API, well, it's something that might be changed if desired.

@serhiy-storchaka
Copy link
Member

You can define DLTensorCapsule = NewType("DLTensorCapsule", object) in your library. Since capsule objects have no useful methods or attributes, it works the same.

@pitrou
Copy link
Member Author

pitrou commented Sep 20, 2023

@serhiy-storchaka , please, read the discussion above instead of reiterating previous comments.

@pitrou
Copy link
Member Author

pitrou commented Sep 20, 2023

So, I submitted #109599 and #109600 for types.CapsuleType (as a middle-ground solution).

There, I get some pushback because it's supposedly not useful (??).

But on this issue it's also explained that adding a parametric typing.Capsule would be too much special-casing.

So what would be the solution to the problem described in the issue description above?

The solution cannot be to define our own type in a third-party library, since there is no third-party library to speak of. It's a protocol specification that is independently implemented by different libraries.

@TeamSpen210
Copy link

Why not just use -> object:? From a Python perspective it's an entirely opaque object, the name of the method should provide enough uniqueness. Even if Python code had types.CapsuleType to do isinstance() checks, that's not really useful since it can't check that the contained pointer is correct.

@AlexWaygood
Copy link
Member

AlexWaygood commented Sep 20, 2023

If there's a need for a centralised type somewhere that can be shared between third-party libraries, I think I'd be okay with adding CapsuleType = NewType("CapsuleType", object) to the typing module. (It certainly wouldn't add much of a maintenence burden for CPython, and wouldn't require any special-casing from type checkers.) From what I gather, it sounds like it would be just as descriptive about the attributes and methods of the type as exposing the actual type in the types module would be — and it would avoid having to make the types module depend on _socket, which feels a little off to me.

I still don't feel 100% clear about the use case, though, so I'd want to see good documentation about how the feature would be used.

@pitrou
Copy link
Member Author

pitrou commented Sep 20, 2023

Well, for example, any library implementing the DLPack protocol (such as Numpy) could annotate its Array type like this:

from typing import Capsule, Protocol

class DLPackExportable(Protocol):
    def __dlpack__(self) -> Capsule["dltensor"]:
        pass

class MyArray(object):
    def __dlpack__(self) -> Capsule["dltensor"]:
        ...

    @classmethod
    def from_dlpack(self, obj: DLPackExportable) -> MyArray:
        ...

(here I'm reusing the originally suggested notation, but you get the idea)

I'll note the typing docs now promote proper typing annotations for objects supporting the buffer protocol. While it's true that the buffer protocol also provides a Python API through the memoryview object, its main purpose is still to convey the C struct Py_buffer, just like the DLPack dltensor capsule's main purpose is to convey the C struct DLManagedTensor.

@pitrou
Copy link
Member Author

pitrou commented Sep 20, 2023

@mattip @tqchen @rgommers Your opinions on this would be welcome. Would it be helpful? Which typing granularity is desirable?

@JelleZijlstra
Copy link
Member

The solution cannot be to define our own type in a third-party library, since there is no third-party library to speak of. It's a protocol specification that is independently implemented by different libraries.

I missed this part when I wrote my first reply; that does make the NewType solution harder. However, your suggestion also seems to rely on the string "dltensor" being well-known and used by all libraries.

For your use case, wouldn't putting def __dlpack__(self) -> object: or def __dlpack__(self) -> CapsuleType: be good enough in practice? Sure, it won't catch objects that have a __dlpack__ method that returns some other object, but such objects are unlikely to be common.

@pitrou
Copy link
Member Author

pitrou commented Sep 20, 2023

I missed this part when I wrote my first reply; that does make the NewType solution harder. However, your suggestion also seems to rely on the string "dltensor" being well-known and used by all libraries.

Since the pointer is type-erased in the capsule object (a void* member is stored),
the capsule name is used for type discrimination and enforced by APIs such as PyCapsule_IsValid and PyCapsule_GetPointer.

In the DLPack case, the capsule name is part of the Python spec, so it is well-known by definition:

"""The producer must set the PyCapsule name to "dltensor" so that it can be inspected by name [...]"""

Another in-progress spec, this time to convey the Arrow C Data Interface structs in Python, proposes a similar mechanism:
https://github.com/apache/arrow/blob/4d13e8864e775cd750a4227b62c6bf53a9d60a1d/docs/source/format/CDataInterface/PyCapsuleInterface.rst#pycapsule-standard

(cc @wjones127 :-)

For your use case, wouldn't putting def __dlpack__(self) -> object: or def __dlpack__(self) -> CapsuleType: be good enough in practice? Sure, it won't catch objects that have a __dlpack__ method that returns some other object, but such objects are unlikely to be common.

I suppose that would work. CapsuleType is IMHO better as it can also help type-check the __dlpack__ implementation, assuming it's written in Python (which it could for wrapper types).

@JelleZijlstra
Copy link
Member

We added CapsuleType (#109600); can we close this?

@pitrou
Copy link
Member Author

pitrou commented Sep 26, 2024

Ah, sorry, I had forgotten to close the issue.

@pitrou pitrou closed this as completed Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-typing type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants