-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protocol for array objects #229
Comments
In general this seem like a perfect case for the use of a Protocol, so +1! Based on my experience in the typing of numpy I do have a couple of comments:
Currently there exists the type-check-only
While
As a reminder: for
On the subjects of shapes I'd like to point everyone's attention to PEP 646: Variadic Generics. Namelly, it'd allow us to define a single shape type that takes a variable number of parameters (i.e. axis lengths), just like tuple can be
For |
Awesome, I didn't know or find that. This unblocks this and we could add the protocol to the specification. Current iteration: Protocol
import enum
from typing import (
TYPE_CHECKING,
Any,
Optional,
Protocol,
Tuple,
TypeVar,
Union,
runtime_checkable,
)
if TYPE_CHECKING:
from builtins import ellipsis
A = TypeVar("A")
@runtime_checkable
class ArrayProtocol(Protocol[A]):
@property
def dtype(self) -> Any:
...
@property
def device(self) -> Any:
...
@property
def ndim(self) -> int:
...
@property
def shape(self) -> Tuple[int, ...]:
...
@property
def size(self) -> int:
...
@property
def T(self) -> A:
...
def __abs__(self) -> A:
...
def __add__(self, other: Union[int, float, A], /) -> A:
...
def __and__(self, other: Union[bool, int, A], /) -> A:
...
def __array_namespace__(self, /, *, api_version: Optional[str] = None) -> Any:
...
def __bool__(self) -> bool:
...
def __dlpack__(self, /, *, stream: Optional[Union[int, Any]] = None) -> Any:
...
def __dlpack_device__(self) -> Tuple[enum.IntEnum, int]:
...
# This overrides the input type, since object.__eq__ handles any input
# This overrides the return type, since object.__eq__ returns a bool
def __eq__( # type: ignore[override]
self,
other: Union[bool, int, float, A],
/,
) -> A: # type: ignore[override]
...
def __float__(self) -> float:
...
def __floordiv__(self, other: Union[int, float, A], /) -> A:
...
def __ge__(self, other: Union[int, float, A], /) -> A:
...
def __getitem__(
self,
key: Union[
int,
slice,
"ellipsis",
Tuple[Union[int, slice, "ellipsis"], ...],
A,
],
/,
) -> A:
...
def __gt__(self, other: Union[int, float, A], /) -> A:
...
def __int__(self) -> int:
...
def __invert__(self) -> A:
...
def __le__(self, other: Union[int, float, A], /) -> A:
...
def __len__(self) -> int:
...
def __lshift__(self, other: Union[int, A], /) -> A:
...
def __lt__(self, other: Union[int, float, A], /) -> A:
...
def __matmul__(self, other: A) -> A:
...
def __mod__(self, other: Union[int, float, A], /) -> A:
...
def __mul__(self, other: Union[int, float, A], /) -> A:
...
# This overrides the input type, since object.__ne__ handles any input
# This overrides the return type, since object.__ne__ returns a bool
def __ne__( # type: ignore[override]
self, other: Union[bool, int, float, A], /
) -> A: # type: ignore[override]
...
def __neg__(self) -> A:
...
def __or__(self, other: Union[bool, int, A], /) -> A:
...
def __pos__(self) -> A:
...
def __pow__(self, other: Union[int, float, A], /) -> A:
...
def __rshift__(self, other: Union[int, A], /) -> A:
...
def __setitem__(
self,
key: Union[
int,
slice,
"ellipsis",
Tuple[Union[int, slice, "ellipsis"], ...],
A,
],
value: Union[bool, int, float, A],
/,
) -> None:
...
def __sub__(self, other: Union[int, float, A], /) -> A:
...
def __truediv__(self, other: Union[int, float, A], /) -> A:
...
def __xor__(self, other: Union[bool, int, A], /) -> A:
...
This is only true if class B:
def __add__(self, other):
return "B"
class C(B):
def __radd__(self, other):
return self.__add__(other)
class A:
def __add__(self, other):
if isinstance(other, B):
return NotImplemented
return "A"
a = A()
b = B()
c = C()
print(a + c)
print(a + b)
|
Fair enough, I implicitly assumed that both class A:
def __add__(self, other: A) -> A: ...
def __radd__(self, other: A) -> A: ...
class B:
def __add__(self, other: A | B) -> B: ...
def __radd__(self, other: A | B) -> B: ...
a: A
b: B
reveal_type(a + a) # Revealed type is '__main__.A'
reveal_type(a + b) # Revealed type is '__main__.B'
reveal_type(b + a) # Revealed type is '__main__.B'
reveal_type(b + b) # Revealed type is '__main__.B' |
The NotImplemented question came up in the NumPy PR as well (numpy/numpy#18585 (comment)). My opinion is that the spec should not disallow returning NotImplemented but not require it. Libraries like NumPy have their own dispatching mechanism that isn't based on NotImplemented. I also think it should be out of scope for the spec to define how These are just my opinions though, so maybe others will disagree. |
@pmeier can you summarize the issue with if every array library vendors this protocol, instead of putting it in a common package and inheriting from it? That still has a limitation, right?
"not allow" --> "allow" ?
+1 to wait for typeshed here (or go help them along if it starts taking too long).
Is there a problem with using |
Yes that is what I meant. |
Not sure what you mean here. If this is about my earlier concern (that I only voiced offline) that
it seems to work out fine with the vendored protocol, which by definition cannot be related to an actual array implementation. If you have a look at the playgorund repo and specifically this file,
Yes, see data-apis/array-api-tests#15 (comment). But this is not solved by variadic generics either IIUC. |
I've tested the protocol against |
Right, to clarify: the reason for excluding subscription was for the sake of creating a compact example.
Assuming we won't run into any bugs with type checkers, the sole "issue" here would be code duplication. |
Good. I'm not bothered by code duplication, I'm pretty sure that all array libraries will prefer that over adding a runtime dependency. However, we still need a way for downstream consumers to access this protocol - downstream packages may be less happy to vendor, and end users even less. That could be solved by:
We anyway will need a canonical version somewhere to ensure vendored copies can be updated and remain in sync. So it may be preferable to go with (1). |
We had a discussion about this today. A standalone package may indeed be nice, and then we also have a place where other utilities or tests can be added (this has come up several times before, and we always tried to avoid a package because that's work to maintain - but here it seems like we can't avoid that anyway). In terms of vendoring, it may be possible to avoid that by (in an array library) doing: try:
from array_pkgname import ArrayProtocol
except ImportError:
# this is an optional dependency. static type checking against the Protocol
# won't work for downstream users if the package is not installed
class ArrayProtocol():
pass
class ndarray(ArrayProtocol):
# array object in this library (could also be named Array, Tensor, etc.)
... A number of libraries don't support static typing at all yet (TensorFlow, CuPy), while some others (NumPy, PyTorch, MXNet) do. Then the only libraries that may want to vendor the protocol are the ones that want to use it in their own test suite. |
Related discussion (somehow totally missed this interesting issue agh) |
I've tried to tackle static typing and got a vendorable protocol that can be checked statically as well as at runtime for all but one case I'm going to detail below.
Protocol
To test everything yourself you can use this playground repo.
Current blocker
It is currently impossible to use
Ellipsis
in type annotations, since its alias...
has a different meaning there. Thus, it is currently impossible to correctly annotate the__getitem__
and__setitem__
methods. There is a fix for this in python/cpython/#22336, but it will only be shipped with Python 3.10. If we leave it out of the annotation, accessing the array with something likeArray()[..., 0]
will be flagged bymypy
although it should be supported according to the specification.Suggestes improvements
While working on the protocol I found a few issues that could be addressed:
Array.dtype
,Array.device
,Array.__array_namespace__()
, andArray.__dlpack__()
should return custom objects, but it is not specified how these objects "look like". In the current state of the protocol I've typed them asAny
, but the specification should be more precise.Array.shape
should returnTuple[int, ...]
, but explicitly check for Python scalars in constant tests array-api-tests#15 (comment) implies that custom objects might also be possible. Maybe we can useSequence[int]
?The type annotation of the
stream
parameter fromArray.__dlpack__()
readsOptional[Union[int, Any]]
which is equivalent toAny
but more concise.The binary dunder methods take a specific input types for the
other
parameter. For example__add__
takesUnion[int, float, Array]
. IMO they should takeAny
and returnNotImplemented
in case they cannot work with the type. For example:This makes it harder for static type checkers to catch bugs, because statically something like
Array() + None
would be allowed, but it gives theother
object a chance to work with theArray
object by implementing the reflected dunder (here__radd__
). If both objects do not know how to deal with the addition, Python will automatically raise aTypeError
.Since the
object
class defines a__eq__
and__neq__
method according to the proposed scheme above, I needed to put# type: ignore[override]
directives in the protocol for the input types.The text was updated successfully, but these errors were encountered: