Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: type annotations #2877

Merged
merged 11 commits into from
Apr 10, 2019
Merged

WIP: type annotations #2877

merged 11 commits into from
Apr 10, 2019

Conversation

crusaderky
Copy link
Contributor

Fixes #2869

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for diving in here!

I haven't had the chance to go through all of these, but can you verify that mypy xarray passes with these changes? There's one pre-existing failure currently (which will be fixed by #2878), but if you're editing type annotations it's good to ensure that they pass mypy's checks.

The main thing to note that most xarray APIs are actually OK (at least in theory) with arbitrary hashable values in names/dimensions, not only strings. This is probably poorly tested at present, but type annotations should help catch this sort of issue! The main exception would be serialization (e.g., to netCDF), which definitely requires string names.

pass

@overload # noqa:F811
def get_axis_num(self, dim: Iterable[str]) -> Tuple[int]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dimensions are actually allowed to be non-strings, so it might be better to keep this overload more generic, given that that reflects how the code actually works:

Suggested change
def get_axis_num(self, dim: Iterable[str]) -> Tuple[int]:
def get_axis_num(self, dim: Iterable[Hashable]) -> Tuple[int, ...]:

(Also Tuple[int] should be Tuple[int, ...])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they were intended to be allowed to be non-strings, sure they aren't tested for it!

>>> xarray.DataArray([1,2], dims=[123])
TypeError: dimension 123 is not a string

Also I've never seen it mentioned in the documentation...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed Tuple[int, ...]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. It seems that you can only construct these sorts of names/dimensions with the Dataset API:

In [8]: xarray.Dataset({123: ((456,), [1, 2, 3])})
Out[8]:
<xarray.Dataset>
Dimensions:  (456: 3)
Dimensions without coordinates: 456
Data variables:
    123      (456) int64 1 2 3

def get_axis_num(self, dim):
@overload
def get_axis_num(self, dim: str) -> int:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more idiomatic to use ... rather than pass when writing overloads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -133,15 +146,15 @@ def get_axis_num(self, dim):
else:
return tuple(self._get_axis_num(d) for d in dim)

def _get_axis_num(self, dim):
def _get_axis_num(self, dim: str) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _get_axis_num(self, dim: str) -> int:
def _get_axis_num(self, dim: Hashable) -> int:

try:
return self.dims.index(dim)
except ValueError:
raise ValueError("%r not found in array dimensions %r" %
(dim, self.dims))

@property
def sizes(self):
def sizes(self) -> Frozen[str, int]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think Frozen is a valid type annotation. Maybe better to stick with Mapping[str, int]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is valid as I changed the Frozen class to be a subclass of typing.Mapping

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to Mapping[str, int] for the sake of user friendliness, as Frozen isn't really part of the public API

@@ -214,36 +227,38 @@ def _ipython_key_completions_(self):
return list(set(item_lists))


def get_squeeze_dims(xarray_obj, dim, axis=None):
def get_squeeze_dims(xarray_obj, dim: Union[str, Tuple[str, ...], None] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be something more relaxed, perhaps dim: Union[Hashable, Iterable[Hashable]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trouble with Iterable is that it doesn't necessarily allow you to iterate upon it twice (e.g. a file handler). I've tweaked the code to allow for it.

@@ -271,7 +286,7 @@ def squeeze(self, dim=None, drop=False, axis=None):
dims = get_squeeze_dims(self, dim, axis)
return self.isel(drop=drop, **{d: 0 for d in dims})

def get_index(self, key):
def get_index(self, key: str) -> pd.Index:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key can actually be any hashable item here.

@crusaderky
Copy link
Contributor Author

@shoyer how do you feel about raising the minimum Python version to 3.5.2? There's quite a bit of important stuff in it, some of which undocumented. Namely @overload doesn't work with 3.5.0 (although I could conditionally replace it with a no-op thorough a backport module).

@crusaderky
Copy link
Contributor Author

From the 3.5.2 release notes:

A new version of typing.py provides several new classes and features: @overload outside stubs, Reversible, DefaultDict, Text, ContextManager, Type[], NewType(), TYPE_CHECKING, and numerous bug fixes (note that some of the new features are not yet implemented in mypy or other static analyzers). Also classes for PEP 492 (Awaitable, AsyncIterable, AsyncIterator) have been added (in fact they made it into 3.5.1 but were never mentioned).

@crusaderky
Copy link
Contributor Author

@shoyer I am using mypy. But unlike flake8 I don't think I'll be able to integrate it in CI.
This error doesn't make any sense to me:

xarray/core/common.py:124: error: Overloaded function signatures 1 and 2 overlap with incompatible return types

These I think are unfixable in Python 3.5, as I can't declare these under the class definition:

xarray/core/utils.py:445: error: "NdimSizeLenMixin" has no attribute "shape"
xarray/core/utils.py:450: error: "NdimSizeLenMixin" has no attribute "shape"
xarray/core/utils.py:454: error: "NdimSizeLenMixin" has no attribute "shape"
xarray/core/utils.py:468: error: "NDArrayMixin" has no attribute "array"
xarray/core/utils.py:472: error: "NDArrayMixin" has no attribute "array"
xarray/core/utils.py:478: error: "NDArrayMixin" has no attribute "array"
xarray/core/common.py:151: error: "AbstractArray" has no attribute "dims"
xarray/core/common.py:154: error: "AbstractArray" has no attribute "dims"
xarray/core/common.py:166: error: "AbstractArray" has no attribute "dims"
xarray/core/common.py:166: error: "AbstractArray" has no attribute "shape"
xarray/core/common.py:896: error: Cannot determine type of '_file_obj'
xarray/core/common.py:897: error: Cannot determine type of '_file_obj'

(additionally, typing doesn't offer anything for "file-like" objects)

@crusaderky
Copy link
Contributor Author

Mmh it looks like if I run mypy in Python 3.7 I get a much terser output. Still some horrible broken things to care about, like the fact that in 3.5 typing.Mapping did not inherit from collections.abc.Mapping, but they can be worked around somehow.

@crusaderky
Copy link
Contributor Author

@shoyer I have a problem.
In Python 3.5.0-3.5.2, to create a custom typed collection you have to write

class Frozen(collections.abc.Mapping, typing.Mapping[K, V]):

Whereas in Python >= 3.5.3, you must just write

class Frozen(typing.Mapping[K, V]):

Things won't work if you use the < 3.5.3 syntax.

As an unrelated note, I'm observing that if I set python=3.5.3 in ci/requirements-py35.yml, a ton of unit tests fall apart. The test suite ci/requirements-py35-min.yml ensures compatibility with Python 3.5.0... but only as long as you don't use any of the non-mandatory dependencies.
Can we 1) increase the minimum requirements to 3.5.4 and 2) set python=3.5.4 in ci/requirements-py35.yml?
Is there a roadmap for dropping 3.5 entirely? (conda-forge already did...)

@crusaderky
Copy link
Contributor Author

Opened https://bugs.python.org/issue36555

@max-sixty
Copy link
Collaborator

FYI we added some code here to deal with python 3.5.0-3.5.2: #2831

@shoyer
Copy link
Member

shoyer commented Apr 10, 2019

If it's not too much trouble, I'd like to keep the xarray (with type checking) working on Python 3.5.0 for now. The type checks themselves only need to work on Python 3.7.

This should be relatively straightforward if you put type annotations in strings as needed and guard typing imports in if TYPE_CHECKING blocks.

@crusaderky
Copy link
Contributor Author

@shoyer I changed dimensions from str to Hashable as requested. In my opinion however it's not great, because

  1. it's currently thoroughly broken due to lack of unit tests
  2. it weakens type checking a lot, since Tuple is a subclass of Hashable however the vast majority of xarray functions deal with tuples with a specific code branch

Please review again - in theory, common.py and util.py are now 100% done. They pass flake8 and mypy (on python 3.7). If you like them please go on and merge into Master; since the work that needs to be done is going to touch literally everything I think it's best to break it down into more manageable chunks.

def coarsen(self, dim: Optional[Mapping[Hashable, int]] = None,
boundary: str = 'exact',
side: Union[str, Mapping[Hashable, str]] = 'left',
coord_func: str = 'mean',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like coord_func is more flexible than this. It looks like this should probably: Union[str, Mapping[Any, str]].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't you say that all dims should be Hashable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Mapping[Any, str] and Mapping[Hashable, str] are equivalent -- every mapping key must be hashable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there's absolutely nothing in Mapping saying that the keys must be hashable.

This is perfectly legit, and pretty useful too:

class AnyKeyDict(MutableMapping[K, V]):
    def __init__(self, *args: Tuple[K, V], **kwargs: V)::
       self.mapping = {}
       for k, v in args:
           self[k] = v
       for k, v in kwargs.items():
           self[k] = v

    def __setitem__(self, k: K, v: V) -> None:
        try:
            self.mapping[k] = v
        except TypeError:
            self.mapping[pickle.dumps(k)] = v

    ...

This can be useful in a few borderline scenarios; for example when de-serialising msgpack data from a non-Python producer, as in msgpack map keys can be anything. Also I'd love to see something like this:

class DataArray:
    def __setitem__(self, k, v):
        if isinstance(k, Mapping):
            view = self.sel(**k)
        else:
            view = self.isel(**k)
        view[:] = v

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, good point.

It turns out we actually do already support this for DataArray :)

>>> data = xarray.DataArray([0, 0, 0], dims='x')
>>> data[{'x': 1}] = 1
>>> data
<xarray.DataArray (x: 3)>
array([0, 1, 0])
Dimensions without coordinates: x

def ordered_dict_intersection(first_dict: Mapping[K, V],
second_dict: Mapping[K, V],
compat: Callable[[V, V], bool] = equivalent
) -> MutableMapping[K, V]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be 'OrderedDict[K, V]'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's painful to achieve, because typing.OrderedDict was added in python 3.7.2, and I'm scared of hacking about what is such a fundamental class for xarray.
Also, are there any methods that OrderedDict offers that MutableMapping doesn't?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put it in quotes, it's only checked when running mypy. That seems to work without any hacking, e.g., see

) -> 'OrderedDict[Any, int]':

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would mean that anybody with less than Python 3.7.2 will get errors when running mypy...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm OK with that. But it may be worth clarifying some cohesive policy.

def __init__(self, values=None):
self._ordered_dict = OrderedDict()
def __init__(self, values: Optional[AbstractSet[T]] = None) -> None:
self._ordered_dict = OrderedDict() # type: MutableMapping[T, None]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use type: OrderedDict[T, None]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

if values is not None:
self |= values
self.__ior__(values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does mypy not recognize |= as valid? That's usually preferred since that's the operator __ior__ corresponds to.

If not, that's OK but please leave a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither mypy nor PyCharm are happy with it.
It has to do with the signature of __ior__:

def __ior__(self: MutableSet[T1], other: AbstractSet[T2]) -> MutableSet[Union[T1, T2]]:
    ...

When you do self |= other, both PyCharm and mypy understand that you're changing the type of self in place, in other word they treat it to be the same as self = self | other.
Calling __ior__ directly bypasses that.
This seems to be like a limitation of both tools like in our specific cases we explicitly specified that other is an AbstractSet[T1]. I'll revert to |= and just disable the type checking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm happy with a comment, too. It's also not a big deal to disable type checking on particular lines when we're sure the overall method/function signature is right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure, but It might help to annotate self here

def update(self, values):
self |= values
def update(self, values: AbstractSet[T]) -> None:
self.__ior__(values)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

class ReprObject:
"""Object that prints as the given value, for use with sentinel values.
"""
def __init__(self, value: str) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I think mypy is OK with skipping -> None for __init__ methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going with PEP 484:

(Note that the return type of init ought to be annotated with -> None. The reason for this is subtle. If init assumed a return annotation of -> None, would that mean that an argument-less, un-annotated init method should still be type-checked? Rather than leaving this ambiguous or introducing an exception to the exception, we simply say that init ought to have a return annotation; the default behavior is thus the same as for other methods.)

but I've now read python/mypy#3358, which likely means a revision of the PEP in the opposite sense is incoming. I'll remove the -> None from __init__.

@shoyer
Copy link
Member

shoyer commented Apr 10, 2019

@shoyer I changed dimensions from str to Hashable as requested. In my opinion however it's not great, because

  1. it's currently thoroughly broken due to lack of unit tests
  2. it weakens type checking a lot, since Tuple is a subclass of Hashable however the vast majority of xarray functions deal with tuples with a specific code branch

I agree it's not great. But some of our APIs are going to be hard to type check regardless, e.g., in cases where we have different behavior for strings vs sequences of strings.

@crusaderky
Copy link
Contributor Author

@shoyer implemented all change requests except OrderedDict

@shoyer shoyer merged commit 3fb22cb into pydata:master Apr 10, 2019
@shoyer
Copy link
Member

shoyer commented Apr 10, 2019

thanks @crusaderky !

dcherian added a commit to yohai/xarray that referenced this pull request Apr 19, 2019
* master: (29 commits)
  Handle the character array dim name  (pydata#2896)
  Partial fix for pydata#2841 to improve formatting. (pydata#2906)
  docs: Move quick overview one level up (pydata#2890)
  Manually specify chunks in open_zarr (pydata#2530)
  Minor improvement of docstring for Dataset (pydata#2904)
  Fix minor typos in docstrings (pydata#2903)
  Added docs example for `xarray.Dataset.get()` (pydata#2894)
  Bugfix for docs build instructions (pydata#2897)
  Return correct count for scalar datetime64 arrays (pydata#2892)
  Indexing with an empty array (pydata#2883)
  BUG: Fix pydata#2864 by adding the missing vrt parameters (pydata#2865)
  Reduce length of cftime resample tests (pydata#2879)
  WIP: type annotations (pydata#2877)
  decreased pytest verbosity (pydata#2881)
  Fix mypy typing error in cftime_offsets.py (pydata#2878)
  update links to https (pydata#2872)
  revert to 0.12.2 dev
  0.12.1 release
  Various fixes for explicit Dataset.indexes (pydata#2858)
  Fix minor typo in docstring (pydata#2860)
  ...
@crusaderky crusaderky deleted the type_annotations branch April 24, 2019 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants