-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: type annotations #2877
WIP: type annotations #2877
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for diving in here!
I haven't had the chance to go through all of these, but can you verify that mypy xarray
passes with these changes? There's one pre-existing failure currently (which will be fixed by #2878), but if you're editing type annotations it's good to ensure that they pass mypy's checks.
The main thing to note that most xarray APIs are actually OK (at least in theory) with arbitrary hashable values in names/dimensions, not only strings. This is probably poorly tested at present, but type annotations should help catch this sort of issue! The main exception would be serialization (e.g., to netCDF), which definitely requires string names.
xarray/core/common.py
Outdated
pass | ||
|
||
@overload # noqa:F811 | ||
def get_axis_num(self, dim: Iterable[str]) -> Tuple[int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dimensions are actually allowed to be non-strings, so it might be better to keep this overload more generic, given that that reflects how the code actually works:
def get_axis_num(self, dim: Iterable[str]) -> Tuple[int]: | |
def get_axis_num(self, dim: Iterable[Hashable]) -> Tuple[int, ...]: |
(Also Tuple[int]
should be Tuple[int, ...]
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they were intended to be allowed to be non-strings, sure they aren't tested for it!
>>> xarray.DataArray([1,2], dims=[123])
TypeError: dimension 123 is not a string
Also I've never seen it mentioned in the documentation...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed Tuple[int, ...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It seems that you can only construct these sorts of names/dimensions with the Dataset API:
In [8]: xarray.Dataset({123: ((456,), [1, 2, 3])})
Out[8]:
<xarray.Dataset>
Dimensions: (456: 3)
Dimensions without coordinates: 456
Data variables:
123 (456) int64 1 2 3
xarray/core/common.py
Outdated
def get_axis_num(self, dim): | ||
@overload | ||
def get_axis_num(self, dim: str) -> int: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more idiomatic to use ...
rather than pass
when writing overloads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
xarray/core/common.py
Outdated
@@ -133,15 +146,15 @@ def get_axis_num(self, dim): | |||
else: | |||
return tuple(self._get_axis_num(d) for d in dim) | |||
|
|||
def _get_axis_num(self, dim): | |||
def _get_axis_num(self, dim: str) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _get_axis_num(self, dim: str) -> int: | |
def _get_axis_num(self, dim: Hashable) -> int: |
xarray/core/common.py
Outdated
try: | ||
return self.dims.index(dim) | ||
except ValueError: | ||
raise ValueError("%r not found in array dimensions %r" % | ||
(dim, self.dims)) | ||
|
||
@property | ||
def sizes(self): | ||
def sizes(self) -> Frozen[str, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Frozen
is a valid type annotation. Maybe better to stick with Mapping[str, int]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is valid as I changed the Frozen class to be a subclass of typing.Mapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to Mapping[str, int]
for the sake of user friendliness, as Frozen isn't really part of the public API
xarray/core/common.py
Outdated
@@ -214,36 +227,38 @@ def _ipython_key_completions_(self): | |||
return list(set(item_lists)) | |||
|
|||
|
|||
def get_squeeze_dims(xarray_obj, dim, axis=None): | |||
def get_squeeze_dims(xarray_obj, dim: Union[str, Tuple[str, ...], None] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should be something more relaxed, perhaps dim: Union[Hashable, Iterable[Hashable]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trouble with Iterable is that it doesn't necessarily allow you to iterate upon it twice (e.g. a file handler). I've tweaked the code to allow for it.
xarray/core/common.py
Outdated
@@ -271,7 +286,7 @@ def squeeze(self, dim=None, drop=False, axis=None): | |||
dims = get_squeeze_dims(self, dim, axis) | |||
return self.isel(drop=drop, **{d: 0 for d in dims}) | |||
|
|||
def get_index(self, key): | |||
def get_index(self, key: str) -> pd.Index: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key
can actually be any hashable item here.
@shoyer how do you feel about raising the minimum Python version to 3.5.2? There's quite a bit of important stuff in it, some of which undocumented. Namely |
From the 3.5.2 release notes:
|
@shoyer I am using mypy. But unlike flake8 I don't think I'll be able to integrate it in CI.
These I think are unfixable in Python 3.5, as I can't declare these under the class definition:
(additionally, typing doesn't offer anything for "file-like" objects) |
Mmh it looks like if I run mypy in Python 3.7 I get a much terser output. Still some horrible broken things to care about, like the fact that in 3.5 |
@shoyer I have a problem.
Whereas in Python >= 3.5.3, you must just write
Things won't work if you use the < 3.5.3 syntax. As an unrelated note, I'm observing that if I set python=3.5.3 in |
FYI we added some code here to deal with python 3.5.0-3.5.2: #2831 |
If it's not too much trouble, I'd like to keep the xarray (with type checking) working on Python 3.5.0 for now. The type checks themselves only need to work on Python 3.7. This should be relatively straightforward if you put type annotations in strings as needed and guard typing imports in |
@shoyer I changed dimensions from str to Hashable as requested. In my opinion however it's not great, because
Please review again - in theory, common.py and util.py are now 100% done. They pass flake8 and mypy (on python 3.7). If you like them please go on and merge into Master; since the work that needs to be done is going to touch literally everything I think it's best to break it down into more manageable chunks. |
def coarsen(self, dim: Optional[Mapping[Hashable, int]] = None, | ||
boundary: str = 'exact', | ||
side: Union[str, Mapping[Hashable, str]] = 'left', | ||
coord_func: str = 'mean', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like coord_func
is more flexible than this. It looks like this should probably: Union[str, Mapping[Any, str]]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't you say that all dims should be Hashable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Mapping[Any, str]
and Mapping[Hashable, str]
are equivalent -- every mapping key must be hashable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there's absolutely nothing in Mapping saying that the keys must be hashable.
This is perfectly legit, and pretty useful too:
class AnyKeyDict(MutableMapping[K, V]):
def __init__(self, *args: Tuple[K, V], **kwargs: V)::
self.mapping = {}
for k, v in args:
self[k] = v
for k, v in kwargs.items():
self[k] = v
def __setitem__(self, k: K, v: V) -> None:
try:
self.mapping[k] = v
except TypeError:
self.mapping[pickle.dumps(k)] = v
...
This can be useful in a few borderline scenarios; for example when de-serialising msgpack data from a non-Python producer, as in msgpack map keys can be anything. Also I'd love to see something like this:
class DataArray:
def __setitem__(self, k, v):
if isinstance(k, Mapping):
view = self.sel(**k)
else:
view = self.isel(**k)
view[:] = v
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, good point.
It turns out we actually do already support this for DataArray
:)
>>> data = xarray.DataArray([0, 0, 0], dims='x')
>>> data[{'x': 1}] = 1
>>> data
<xarray.DataArray (x: 3)>
array([0, 1, 0])
Dimensions without coordinates: x
def ordered_dict_intersection(first_dict: Mapping[K, V], | ||
second_dict: Mapping[K, V], | ||
compat: Callable[[V, V], bool] = equivalent | ||
) -> MutableMapping[K, V]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be 'OrderedDict[K, V]'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's painful to achieve, because typing.OrderedDict was added in python 3.7.2, and I'm scared of hacking about what is such a fundamental class for xarray.
Also, are there any methods that OrderedDict offers that MutableMapping doesn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you put it in quotes, it's only checked when running mypy. That seems to work without any hacking, e.g., see
Line 28 in 742ed39
) -> 'OrderedDict[Any, int]': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would mean that anybody with less than Python 3.7.2 will get errors when running mypy...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'm OK with that. But it may be worth clarifying some cohesive policy.
def __init__(self, values=None): | ||
self._ordered_dict = OrderedDict() | ||
def __init__(self, values: Optional[AbstractSet[T]] = None) -> None: | ||
self._ordered_dict = OrderedDict() # type: MutableMapping[T, None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use type: OrderedDict[T, None]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
xarray/core/utils.py
Outdated
if values is not None: | ||
self |= values | ||
self.__ior__(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does mypy not recognize |=
as valid? That's usually preferred since that's the operator __ior__
corresponds to.
If not, that's OK but please leave a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither mypy nor PyCharm are happy with it.
It has to do with the signature of __ior__
:
def __ior__(self: MutableSet[T1], other: AbstractSet[T2]) -> MutableSet[Union[T1, T2]]:
...
When you do self |= other
, both PyCharm and mypy understand that you're changing the type of self in place, in other word they treat it to be the same as self = self | other
.
Calling __ior__
directly bypasses that.
This seems to be like a limitation of both tools like in our specific cases we explicitly specified that other is an AbstractSet[T1]
. I'll revert to |= and just disable the type checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'm happy with a comment, too. It's also not a big deal to disable type checking on particular lines when we're sure the overall method/function signature is right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure, but It might help to annotate self
here
xarray/core/utils.py
Outdated
def update(self, values): | ||
self |= values | ||
def update(self, values: AbstractSet[T]) -> None: | ||
self.__ior__(values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
xarray/core/utils.py
Outdated
class ReprObject: | ||
"""Object that prints as the given value, for use with sentinel values. | ||
""" | ||
def __init__(self, value: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I think mypy is OK with skipping -> None
for __init__
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going with PEP 484:
(Note that the return type of init ought to be annotated with -> None. The reason for this is subtle. If init assumed a return annotation of -> None, would that mean that an argument-less, un-annotated init method should still be type-checked? Rather than leaving this ambiguous or introducing an exception to the exception, we simply say that init ought to have a return annotation; the default behavior is thus the same as for other methods.)
but I've now read python/mypy#3358, which likely means a revision of the PEP in the opposite sense is incoming. I'll remove the -> None
from __init__
.
I agree it's not great. But some of our APIs are going to be hard to type check regardless, e.g., in cases where we have different behavior for strings vs sequences of strings. |
@shoyer implemented all change requests except OrderedDict |
thanks @crusaderky ! |
* master: (29 commits) Handle the character array dim name (pydata#2896) Partial fix for pydata#2841 to improve formatting. (pydata#2906) docs: Move quick overview one level up (pydata#2890) Manually specify chunks in open_zarr (pydata#2530) Minor improvement of docstring for Dataset (pydata#2904) Fix minor typos in docstrings (pydata#2903) Added docs example for `xarray.Dataset.get()` (pydata#2894) Bugfix for docs build instructions (pydata#2897) Return correct count for scalar datetime64 arrays (pydata#2892) Indexing with an empty array (pydata#2883) BUG: Fix pydata#2864 by adding the missing vrt parameters (pydata#2865) Reduce length of cftime resample tests (pydata#2879) WIP: type annotations (pydata#2877) decreased pytest verbosity (pydata#2881) Fix mypy typing error in cftime_offsets.py (pydata#2878) update links to https (pydata#2872) revert to 0.12.2 dev 0.12.1 release Various fixes for explicit Dataset.indexes (pydata#2858) Fix minor typo in docstring (pydata#2860) ...
Fixes #2869