WIP: type annotations #2877

crusaderky · 2019-04-08T00:55:31Z

shoyer

Thanks for diving in here!

I haven't had the chance to go through all of these, but can you verify that mypy xarray passes with these changes? There's one pre-existing failure currently (which will be fixed by #2878), but if you're editing type annotations it's good to ensure that they pass mypy's checks.

The main thing to note that most xarray APIs are actually OK (at least in theory) with arbitrary hashable values in names/dimensions, not only strings. This is probably poorly tested at present, but type annotations should help catch this sort of issue! The main exception would be serialization (e.g., to netCDF), which definitely requires string names.

shoyer · 2019-04-08T06:03:33Z

xarray/core/common.py

+        pass
+
+    @overload  # noqa:F811
+    def get_axis_num(self, dim: Iterable[str]) -> Tuple[int]:


Dimensions are actually allowed to be non-strings, so it might be better to keep this overload more generic, given that that reflects how the code actually works:

Suggested change

def get_axis_num(self, dim: Iterable[str]) -> Tuple[int]:

def get_axis_num(self, dim: Iterable[Hashable]) -> Tuple[int, ...]:

(Also Tuple[int] should be Tuple[int, ...])

If they were intended to be allowed to be non-strings, sure they aren't tested for it!

>>> xarray.DataArray([1,2], dims=[123]) TypeError: dimension 123 is not a string

Also I've never seen it mentioned in the documentation...

fixed Tuple[int, ...]

Hmm. It seems that you can only construct these sorts of names/dimensions with the Dataset API:

In [8]: xarray.Dataset({123: ((456,), [1, 2, 3])}) Out[8]: <xarray.Dataset> Dimensions: (456: 3) Dimensions without coordinates: 456 Data variables: 123 (456) int64 1 2 3

shoyer · 2019-04-08T06:04:18Z

xarray/core/common.py

-    def get_axis_num(self, dim):
+    @overload
+    def get_axis_num(self, dim: str) -> int:
+        pass


I think it's more idiomatic to use ... rather than pass when writing overloads?

shoyer · 2019-04-08T06:05:25Z

xarray/core/common.py

@@ -133,15 +146,15 @@ def get_axis_num(self, dim):
        else:
            return tuple(self._get_axis_num(d) for d in dim)

-    def _get_axis_num(self, dim):
+    def _get_axis_num(self, dim: str) -> int:


Suggested change

def _get_axis_num(self, dim: str) -> int:

def _get_axis_num(self, dim: Hashable) -> int:

shoyer · 2019-04-08T06:06:02Z

xarray/core/common.py

        try:
            return self.dims.index(dim)
        except ValueError:
            raise ValueError("%r not found in array dimensions %r" %
                             (dim, self.dims))

    @property
-    def sizes(self):
+    def sizes(self) -> Frozen[str, int]:


I don't think Frozen is a valid type annotation. Maybe better to stick with Mapping[str, int]?

it is valid as I changed the Frozen class to be a subclass of typing.Mapping

changed to Mapping[str, int] for the sake of user friendliness, as Frozen isn't really part of the public API

shoyer · 2019-04-08T06:09:04Z

xarray/core/common.py

@@ -214,36 +227,38 @@ def _ipython_key_completions_(self):
        return list(set(item_lists))


-def get_squeeze_dims(xarray_obj, dim, axis=None):
+def get_squeeze_dims(xarray_obj, dim: Union[str, Tuple[str, ...], None] = None,


Probably should be something more relaxed, perhaps dim: Union[Hashable, Iterable[Hashable]]

trouble with Iterable is that it doesn't necessarily allow you to iterate upon it twice (e.g. a file handler). I've tweaked the code to allow for it.

shoyer · 2019-04-08T06:10:16Z

xarray/core/common.py

@@ -271,7 +286,7 @@ def squeeze(self, dim=None, drop=False, axis=None):
        dims = get_squeeze_dims(self, dim, axis)
        return self.isel(drop=drop, **{d: 0 for d in dims})

-    def get_index(self, key):
+    def get_index(self, key: str) -> pd.Index:


key can actually be any hashable item here.

crusaderky · 2019-04-08T08:42:48Z

@shoyer how do you feel about raising the minimum Python version to 3.5.2? There's quite a bit of important stuff in it, some of which undocumented. Namely @overload doesn't work with 3.5.0 (although I could conditionally replace it with a no-op thorough a backport module).

crusaderky · 2019-04-08T08:45:42Z

From the 3.5.2 release notes:

A new version of typing.py provides several new classes and features: @overload outside stubs, Reversible, DefaultDict, Text, ContextManager, Type[], NewType(), TYPE_CHECKING, and numerous bug fixes (note that some of the new features are not yet implemented in mypy or other static analyzers). Also classes for PEP 492 (Awaitable, AsyncIterable, AsyncIterator) have been added (in fact they made it into 3.5.1 but were never mentioned).

crusaderky · 2019-04-08T09:16:51Z

@shoyer I am using mypy. But unlike flake8 I don't think I'll be able to integrate it in CI.
This error doesn't make any sense to me:

xarray/core/common.py:124: error: Overloaded function signatures 1 and 2 overlap with incompatible return types

These I think are unfixable in Python 3.5, as I can't declare these under the class definition:

xarray/core/utils.py:445: error: "NdimSizeLenMixin" has no attribute "shape"
xarray/core/utils.py:450: error: "NdimSizeLenMixin" has no attribute "shape"
xarray/core/utils.py:454: error: "NdimSizeLenMixin" has no attribute "shape"
xarray/core/utils.py:468: error: "NDArrayMixin" has no attribute "array"
xarray/core/utils.py:472: error: "NDArrayMixin" has no attribute "array"
xarray/core/utils.py:478: error: "NDArrayMixin" has no attribute "array"
xarray/core/common.py:151: error: "AbstractArray" has no attribute "dims"
xarray/core/common.py:154: error: "AbstractArray" has no attribute "dims"
xarray/core/common.py:166: error: "AbstractArray" has no attribute "dims"
xarray/core/common.py:166: error: "AbstractArray" has no attribute "shape"
xarray/core/common.py:896: error: Cannot determine type of '_file_obj'
xarray/core/common.py:897: error: Cannot determine type of '_file_obj'

(additionally, typing doesn't offer anything for "file-like" objects)

crusaderky · 2019-04-08T09:29:29Z

Mmh it looks like if I run mypy in Python 3.7 I get a much terser output. Still some horrible broken things to care about, like the fact that in 3.5 typing.Mapping did not inherit from collections.abc.Mapping, but they can be worked around somehow.

crusaderky · 2019-04-08T10:01:26Z

@shoyer I have a problem.
In Python 3.5.0-3.5.2, to create a custom typed collection you have to write

class Frozen(collections.abc.Mapping, typing.Mapping[K, V]):

Whereas in Python >= 3.5.3, you must just write

class Frozen(typing.Mapping[K, V]):

Things won't work if you use the < 3.5.3 syntax.

As an unrelated note, I'm observing that if I set python=3.5.3 in ci/requirements-py35.yml, a ton of unit tests fall apart. The test suite ci/requirements-py35-min.yml ensures compatibility with Python 3.5.0... but only as long as you don't use any of the non-mandatory dependencies.
Can we 1) increase the minimum requirements to 3.5.4 and 2) set python=3.5.4 in ci/requirements-py35.yml?
Is there a roadmap for dropping 3.5 entirely? (conda-forge already did...)

crusaderky · 2019-04-08T10:51:35Z

Opened https://bugs.python.org/issue36555

See: https://mypy.readthedocs.io/en/latest/more_types.html#type-checking-the-variants

max-sixty · 2019-04-08T15:26:42Z

FYI we added some code here to deal with python 3.5.0-3.5.2: #2831

shoyer · 2019-04-10T00:46:45Z

If it's not too much trouble, I'd like to keep the xarray (with type checking) working on Python 3.5.0 for now. The type checks themselves only need to work on Python 3.7.

This should be relatively straightforward if you put type annotations in strings as needed and guard typing imports in if TYPE_CHECKING blocks.

crusaderky · 2019-04-10T15:29:04Z

@shoyer I changed dimensions from str to Hashable as requested. In my opinion however it's not great, because

it's currently thoroughly broken due to lack of unit tests
it weakens type checking a lot, since Tuple is a subclass of Hashable however the vast majority of xarray functions deal with tuples with a specific code branch

Please review again - in theory, common.py and util.py are now 100% done. They pass flake8 and mypy (on python 3.7). If you like them please go on and merge into Master; since the work that needs to be done is going to touch literally everything I think it's best to break it down into more manageable chunks.

shoyer · 2019-04-10T15:56:24Z

xarray/core/common.py

+    def coarsen(self, dim: Optional[Mapping[Hashable, int]] = None,
+                boundary: str = 'exact',
+                side: Union[str, Mapping[Hashable, str]] = 'left',
+                coord_func: str = 'mean',


It looks like coord_func is more flexible than this. It looks like this should probably: Union[str, Mapping[Any, str]].

didn't you say that all dims should be Hashable?

I think Mapping[Any, str] and Mapping[Hashable, str] are equivalent -- every mapping key must be hashable.

No, there's absolutely nothing in Mapping saying that the keys must be hashable.

This is perfectly legit, and pretty useful too:

class AnyKeyDict(MutableMapping[K, V]): def __init__(self, *args: Tuple[K, V], **kwargs: V):: self.mapping = {} for k, v in args: self[k] = v for k, v in kwargs.items(): self[k] = v def __setitem__(self, k: K, v: V) -> None: try: self.mapping[k] = v except TypeError: self.mapping[pickle.dumps(k)] = v ...

This can be useful in a few borderline scenarios; for example when de-serialising msgpack data from a non-Python producer, as in msgpack map keys can be anything. Also I'd love to see something like this:

class DataArray: def __setitem__(self, k, v): if isinstance(k, Mapping): view = self.sel(**k) else: view = self.isel(**k) view[:] = v

OK, good point.

It turns out we actually do already support this for DataArray :)

>>> data = xarray.DataArray([0, 0, 0], dims='x') >>> data[{'x': 1}] = 1 >>> data <xarray.DataArray (x: 3)> array([0, 1, 0]) Dimensions without coordinates: x

shoyer · 2019-04-10T16:35:35Z

xarray/core/utils.py

+def ordered_dict_intersection(first_dict: Mapping[K, V],
+                              second_dict: Mapping[K, V],
+                              compat: Callable[[V, V], bool] = equivalent
+                              ) -> MutableMapping[K, V]:


This should probably be 'OrderedDict[K, V]'

That's painful to achieve, because typing.OrderedDict was added in python 3.7.2, and I'm scared of hacking about what is such a fundamental class for xarray.
Also, are there any methods that OrderedDict offers that MutableMapping doesn't?

If you put it in quotes, it's only checked when running mypy. That seems to work without any hacking, e.g., see

xarray/xarray/core/merge.py

Line 28 in 742ed39

) -> 'OrderedDict[Any, int]':

that would mean that anybody with less than Python 3.7.2 will get errors when running mypy...

Personally, I'm OK with that. But it may be worth clarifying some cohesive policy.

shoyer · 2019-04-10T16:36:22Z

xarray/core/utils.py

-    def __init__(self, values=None):
-        self._ordered_dict = OrderedDict()
+    def __init__(self, values: Optional[AbstractSet[T]] = None) -> None:
+        self._ordered_dict = OrderedDict()  # type: MutableMapping[T, None]


You can use type: OrderedDict[T, None]

same as above

shoyer · 2019-04-10T16:38:14Z

xarray/core/utils.py

        if values is not None:
-            self |= values
+            self.__ior__(values)


Does mypy not recognize |= as valid? That's usually preferred since that's the operator __ior__ corresponds to.

If not, that's OK but please leave a comment.

Neither mypy nor PyCharm are happy with it.
It has to do with the signature of __ior__:

def __ior__(self: MutableSet[T1], other: AbstractSet[T2]) -> MutableSet[Union[T1, T2]]: ...

When you do self |= other, both PyCharm and mypy understand that you're changing the type of self in place, in other word they treat it to be the same as self = self | other.
Calling __ior__ directly bypasses that.
This seems to be like a limitation of both tools like in our specific cases we explicitly specified that other is an AbstractSet[T1]. I'll revert to |= and just disable the type checking.

OK, I'm happy with a comment, too. It's also not a big deal to disable type checking on particular lines when we're sure the overall method/function signature is right.

I'm not entirely sure, but It might help to annotate self here

shoyer · 2019-04-10T16:38:21Z

xarray/core/utils.py

-    def update(self, values):
-        self |= values
+    def update(self, values: AbstractSet[T]) -> None:
+        self.__ior__(values)


same as above

shoyer · 2019-04-10T16:39:30Z

xarray/core/utils.py

+class ReprObject:
+    """Object that prints as the given value, for use with sentinel values.
+    """
+    def __init__(self, value: str) -> None:


note: I think mypy is OK with skipping -> None for __init__ methods.

I was going with PEP 484:

(Note that the return type of init ought to be annotated with -> None. The reason for this is subtle. If init assumed a return annotation of -> None, would that mean that an argument-less, un-annotated init method should still be type-checked? Rather than leaving this ambiguous or introducing an exception to the exception, we simply say that init ought to have a return annotation; the default behavior is thus the same as for other methods.)

but I've now read python/mypy#3358, which likely means a revision of the PEP in the opposite sense is incoming. I'll remove the -> None from __init__.

shoyer · 2019-04-10T16:55:07Z

@shoyer I changed dimensions from str to Hashable as requested. In my opinion however it's not great, because

it's currently thoroughly broken due to lack of unit tests

it weakens type checking a lot, since Tuple is a subclass of Hashable however the vast majority of xarray functions deal with tuples with a specific code branch

I agree it's not great. But some of our APIs are going to be hard to type check regardless, e.g., in cases where we have different behavior for strings vs sequences of strings.

crusaderky · 2019-04-10T17:11:27Z

@shoyer implemented all change requests except OrderedDict

shoyer · 2019-04-10T18:41:55Z

thanks @crusaderky !

* master: (29 commits) Handle the character array dim name (pydata#2896) Partial fix for pydata#2841 to improve formatting. (pydata#2906) docs: Move quick overview one level up (pydata#2890) Manually specify chunks in open_zarr (pydata#2530) Minor improvement of docstring for Dataset (pydata#2904) Fix minor typos in docstrings (pydata#2903) Added docs example for `xarray.Dataset.get()` (pydata#2894) Bugfix for docs build instructions (pydata#2897) Return correct count for scalar datetime64 arrays (pydata#2892) Indexing with an empty array (pydata#2883) BUG: Fix pydata#2864 by adding the missing vrt parameters (pydata#2865) Reduce length of cftime resample tests (pydata#2879) WIP: type annotations (pydata#2877) decreased pytest verbosity (pydata#2881) Fix mypy typing error in cftime_offsets.py (pydata#2878) update links to https (pydata#2872) revert to 0.12.2 dev 0.12.1 release Various fixes for explicit Dataset.indexes (pydata#2858) Fix minor typo in docstring (pydata#2860) ...

Guido Imperiale added 2 commits April 8, 2019 01:54

typing for core.common and core.utils

b9b0543

Merge remote-tracking branch 'upstream/master' into type_annotations

a646404

shoyer reviewed Apr 8, 2019

View reviewed changes

Cleanup of core.common core.utils

ae14ff5

Remove ambiguous overload

d640c8b

See: https://mypy.readthedocs.io/en/latest/more_types.html#type-checking-the-variants

Guido Imperiale added 6 commits April 10, 2019 11:34

dims are generic Hashable; support for Python 3.5.0~3.5.2

fd31571

fixes

1028c71

fixes

bdbb4fe

More polish for common and utils

b1a0e9e

More polish for common and utils

266d8f4

More polish for common and utils

c9d0c1a

shoyer reviewed Apr 10, 2019

View reviewed changes

code review

81701ab

shoyer merged commit 3fb22cb into pydata:master Apr 10, 2019

crusaderky mentioned this pull request Apr 10, 2019

xarray.concat docstring confuses PyCharm type detection #2869

Closed

crusaderky deleted the type_annotations branch April 24, 2019 14:54

	def get_axis_num(self, dim: Iterable[str]) -> Tuple[int]:
	def get_axis_num(self, dim: Iterable[Hashable]) -> Tuple[int, ...]:

	def _get_axis_num(self, dim: str) -> int:
	def _get_axis_num(self, dim: Hashable) -> int:

WIP: type annotations #2877

WIP: type annotations #2877

Conversation

crusaderky commented Apr 8, 2019

shoyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crusaderky commented Apr 8, 2019

crusaderky commented Apr 8, 2019

crusaderky commented Apr 8, 2019

crusaderky commented Apr 8, 2019

crusaderky commented Apr 8, 2019

crusaderky commented Apr 8, 2019

max-sixty commented Apr 8, 2019

shoyer commented Apr 10, 2019

crusaderky commented Apr 10, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented Apr 10, 2019

crusaderky commented Apr 10, 2019

shoyer commented Apr 10, 2019