Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use strict type hinting for namedarray #8241

Merged
merged 114 commits into from
Oct 3, 2023

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Sep 27, 2023

Towards the strict goal in #8239.

pyproject.toml Outdated Show resolved Hide resolved
@Illviljan Illviljan changed the title Disallow untyped defs in namedarray Use strict type hinting in namedarray Sep 27, 2023
max-sixty added a commit to max-sixty/xarray that referenced this pull request Sep 27, 2023
In reviewing pydata#8241, I realize that we actually want `check-untyped-defs`, which is a bit less strict, but lets us add some more modules on. I did have to add a couple of ignores, think it's a reasonable tradeoff to add big modules like `computation` on.

Errors with this enabled are actual type errors, not just `mypy` pedanticness, so would be good to get as much as possible into this list...
@Illviljan Illviljan changed the title Use strict type hinting in namedarray Use strict type hinting for namedarray Sep 27, 2023
@Illviljan
Copy link
Contributor Author

Current errors:

Found 36 errors in 3 files (checked 147 source files)

xarray/namedarray/utils.py: note: In function "to_0d_object_array":
xarray/namedarray/utils.py:64: error: Missing type parameters for generic type "ndarray"  [type-arg]
xarray/namedarray/core.py: note: In function "as_compatible_data":
xarray/namedarray/core.py:49: error: Call to untyped function "getmaskarray" in typed context  [no-untyped-call]
xarray/namedarray/core.py: note: In member "__init__" of class "NamedArray":
xarray/namedarray/core.py:83: error: Missing type parameters for generic type "dict"  [type-arg]
xarray/namedarray/core.py:110: error: Missing type parameters for generic type "dict"  [type-arg]
xarray/namedarray/core.py: note: In member "dtype" of class "NamedArray":
xarray/namedarray/core.py:143: error: Missing type parameters for generic type "dtype"  [type-arg]
xarray/namedarray/core.py:152: error: Returning Any from function declared to return "dtype[Any]"  [no-any-return]
xarray/namedarray/core.py: note: In member "shape" of class "NamedArray":
xarray/namedarray/core.py:170: error: Returning Any from function declared to return "tuple[int, ...]"  [no-any-return]
xarray/namedarray/core.py: note: In member "nbytes" of class "NamedArray":
xarray/namedarray/core.py:181: error: Returning Any from function declared to return "int"  [no-any-return]
xarray/namedarray/core.py: note: In function "attrs":
xarray/namedarray/core.py:211: error: Missing type parameters for generic type "Mapping"  [type-arg]
xarray/namedarray/core.py: note: In function "data":
xarray/namedarray/core.py:222: error: Function is missing a return type annotation  [no-untyped-def]
xarray/namedarray/core.py: note: In member "data" of class "NamedArray":
xarray/namedarray/core.py:222: error: Function is missing a return type annotation  [no-untyped-def]
xarray/namedarray/core.py: note: In member "__dask_tokenize__" of class "NamedArray":
xarray/namedarray/core.py:259: error: Function is missing a return type annotation  [no-untyped-def]
xarray/namedarray/core.py: note: In member "__dask_graph__" of class "NamedArray":
xarray/namedarray/core.py:266: error: Function is missing a return type annotation  [no-untyped-def]
xarray/namedarray/core.py: note: In member "__dask_keys__" of class "NamedArray":
xarray/namedarray/core.py:269: error: Function is missing a return type annotation  [no-untyped-def]
xarray/namedarray/core.py: note: In member "__dask_layers__" of class "NamedArray":
xarray/namedarray/core.py:272: error: Function is missing a return type annotation  [no-untyped-def]
xarray/namedarray/core.py: note: In member "__dask_optimize__" of class "NamedArray":
xarray/namedarray/core.py:276: error: Missing type parameters for generic type "typing.Callable"  [type-arg]
xarray/namedarray/core.py:277: error: Returning Any from function declared to return "Callable[..., Any]"  [no-any-return]
xarray/namedarray/core.py: note: In member "__dask_scheduler__" of class "NamedArray":
xarray/namedarray/core.py:280: error: Missing type parameters for generic type "typing.Callable"  [type-arg]
xarray/namedarray/core.py:281: error: Returning Any from function declared to return "Callable[..., Any]"  [no-any-return]
xarray/namedarray/core.py: note: In member "__dask_postcompute__" of class "NamedArray":
xarray/namedarray/core.py:285: error: Missing type parameters for generic type "typing.Callable"  [type-arg]
xarray/namedarray/core.py: note: In member "__dask_postpersist__" of class "NamedArray":
xarray/namedarray/core.py:291: error: Missing type parameters for generic type "typing.Callable"  [type-arg]
xarray/namedarray/core.py: note: In member "_dask_finalize" of class "NamedArray":
xarray/namedarray/core.py:295: error: Function is missing a type annotation for one or more arguments  [no-untyped-def]
xarray/namedarray/core.py: note: In member "_replace" of class "NamedArray":
xarray/namedarray/core.py:341: error: Function is missing a type annotation for one or more arguments  [no-untyped-def]
xarray/namedarray/core.py: note: In member "_as_sparse" of class "NamedArray":
xarray/namedarray/core.py:415: error: Function is missing a type annotation for one or more arguments  [no-untyped-def]
xarray/namedarray/core.py:427: error: Call to untyped function "maybe_promote" in typed context  [no-untyped-call]
xarray/tests/test_namedarray.py: note: In function "random_inputs":
xarray/tests/test_namedarray.py:10: error: Missing type parameters for generic type "ndarray"  [type-arg]
xarray/tests/test_namedarray.py: note: In function "test_as_compatible_data_with_masked_array":
xarray/tests/test_namedarray.py:31: error: Call to untyped function "array" in typed context  [no-untyped-call]
xarray/tests/test_namedarray.py: note: In function "test_as_compatible_data_with_explicitly_indexed":
xarray/tests/test_namedarray.py:42: error: Function is missing a type annotation for one or more arguments  [no-untyped-def]
xarray/tests/test_namedarray.py:45: error: Function is missing a type annotation  [no-untyped-def]
xarray/tests/test_namedarray.py:51: error: Call to untyped function "CustomArray" in typed context  [no-untyped-call]
xarray/tests/test_namedarray.py:55: error: Call to untyped function "CustomArrayIndexable" in typed context  [no-untyped-call]
xarray/tests/test_namedarray.py: note: In function "test_data":
xarray/tests/test_namedarray.py:82: error: Function is missing a type annotation for one or more arguments  [no-untyped-def]
xarray/tests/test_namedarray.py: note: In function "test_0d_string":
xarray/tests/test_namedarray.py:97: error: Function is missing a type annotation for one or more arguments  [no-untyped-def]
xarray/tests/test_namedarray.py: note: In function "test_0d_timedelta":
xarray/tests/test_namedarray.py:141: error: Missing type parameters for generic type "dtype"  [type-arg]
xarray/tests/test_namedarray.py:141: error: Function is missing a type annotation for one or more arguments  [no-untyped-def]
xarray/tests/test_namedarray.py: note: In function "test_dims_setter":
xarray/tests/test_namedarray.py:157: error: Function is missing a type annotation for one or more arguments  [no-untyped-def]

@Illviljan Illviljan marked this pull request as ready for review October 2, 2023 20:37
Copy link
Contributor Author

@Illviljan Illviljan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mypy and (most of) CI is passing now! I'm stopping here.

Comment on lines +480 to +488
fill_value: np.typing.ArrayLike | Default = _default,
) -> Self:
"""
use sparse-array as backend.
"""
import sparse

# TODO: what to do if dask-backended?
if fill_value is dtypes.NA:
if fill_value is _default:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype.NA was only used as a default value. So why not use the _default we already have?

Comment on lines +76 to +80
class CustomArray(CustomArrayBase):
def __array__(self) -> np.ndarray[Any, np.dtype[np.generic]]:
return np.array(self.array)

class CustomArrayIndexable(CustomArrayBase, xr.core.indexing.ExplicitlyIndexed):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExplicitlyIndexed has a non-standard __array__ method. This made inheriting from it difficult.

Comment on lines +124 to +126
def is_chunked_duck_array(
x: T_DuckArray,
) -> TypeGuard[_ChunkedArray[np.dtype[np.generic]]]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using _ChunkedArray here, though I think it should be the TypeVar T_ChunkedArray.

@Illviljan
Copy link
Contributor Author

@headtr1ck:

If the duck array type differs from Self the generic part of the return type should change as well.

Probably need a second TypeVar for this.

I can see you're point, and I gave it a try in the last commits but I couldn't figure it out.

Since I'm running out of steam with this PR, I think I'll merge this as is. Figuring out the exact grammar on T_DuckArray can be done in a future PR.

@Illviljan Illviljan merged commit d5f1785 into pydata:main Oct 3, 2023
24 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Oct 14, 2023
* upstream/main: (46 commits)
  xfail flaky test (pydata#8299)
  Most of mypy 1.6.0 passing (pydata#8296)
  Rename `reset_encoding` to `drop_encoding` (pydata#8287)
  Enable `.rolling_exp` to work on dask arrays (pydata#8284)
  Fix `GroupBy` import (pydata#8286)
  Ask bug reporters to confirm they're using a recent version of xarray (pydata#8283)
  Add pyright type checker (pydata#8279)
  Improved typing of align & broadcast (pydata#8234)
  Update ci-additional.yaml (pydata#8280)
  Fix time encoding regression (pydata#8272)
  Allow a function in `.sortby` method (pydata#8273)
  make more args kw only (except 'dim') (pydata#6403)
  Use duck array ops in more places (pydata#8267)
  Don't raise rename warning if it is a no operation (pydata#8266)
  Mandate kwargs on `to_zarr` (pydata#8257)
  copy  the `dtypes` module to the `namedarray` package. (pydata#8250)
  Add xarray-regrid to ecosystem.rst (pydata#8270)
  Use strict type hinting for namedarray (pydata#8241)
  Update type annotation for center argument of dataaray_plot methods (pydata#8261)
  [pre-commit.ci] pre-commit autoupdate (pydata#8262)
  ...
Comment on lines +98 to 103
def is_dask_collection(x: object) -> TypeGuard[DaskCollection]:
if module_available("dask"):
from dask.base import is_dask_collection
from dask.typing import DaskCollection

return is_dask_collection(x)
return isinstance(x, DaskCollection)
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a related note, i encountered a strange issue while trying to consolidate the two implementations of is_dask_collection in the utils.py and pycompat.py files:

def is_dask_collection(x: object) -> TypeGuard[DaskCollection]:
if module_available("dask"):
from dask.typing import DaskCollection
return isinstance(x, DaskCollection)
return False

and

def is_dask_collection(x):
if module_available("dask"):
from dask.base import is_dask_collection
return is_dask_collection(x)
return False

when using pint arrays, i observed different behaviors. Here is an example:

In [22]: from dask.typing import DaskCollection

In [23]: from dask.base import is_dask_collection

In [24]: import xarray as xr, pint_xarray

In [25]: ds =  xr.Dataset({"a": ("x", [0, 1, 2]), "b": ("y", [-3, 5, 1], {"units": "m"})}).pint.quantify(a="s")

In [27]: ds.b
Out[27]: 
<xarray.DataArray 'b' (y: 3)>
<Quantity([-3  5  1], 'meter')>
Dimensions without coordinates: y
In [34]: isinstance(ds.b.data, DaskCollection)
Out[34]: True

In [35]: is_dask_collection(ds.b.data)
Out[35]: False

@Illviljan, i'm curious to hear your thoughts on how to address this divergence in implementations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance(x, DaskCollection)

This doesn't seem right. It just asserts that dask protocols are present on x.

This is always true for xarray objects and pint objects. But these things are technically dask collections if and only if they wrap an underlying dask array. is_dask_collection recognizes this (https://docs.dask.org/en/latest/_modules/dask/base.html#is_dask_collection)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcherian, thank you for chiming in and the pointer to the doc page. i'm going to revert to using is_dask_collection() function

Copy link
Contributor Author

@Illviljan Illviljan Feb 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function just checked for .__dask_graph__ in october 2023. Using isinstance(x, DaskCollection) correctly narrows down the typing and made sense back then.

Compare the old version: dask/dask#10676

Dask's definition of a DaskCollection requires:

  • __dask_graph__
  • __dask_keys__
  • __dask_postcompute__
  • __dask_postpersist__
  • __dask_tokenize__
  • compute
  • persist
  • visualize

which is_dask_collection ignores.
This is also strange and another reason why I preferred the more modern DaskCollection-protocol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants