Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose "Coordinates" as part of Xarray's public API #7368

Merged
merged 76 commits into from
Jul 21, 2023
Merged
Show file tree
Hide file tree
Changes from 70 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
41f4fd8
add indexes argument to Dataset.__init__
benbovy Oct 25, 2022
4baa8af
make indexes arg public for DataArray.__init__
benbovy Oct 25, 2022
dbc058a
Indexes constructor updates
benbovy Oct 26, 2022
16a9983
use the generic Mapping[Any, Index] for indexes
benbovy Oct 26, 2022
3c076d5
add wrap_pandas_multiindex function
benbovy Oct 26, 2022
70e7a5d
do not create default indexes when not desired
benbovy Oct 26, 2022
00e1766
fix Dataset dimensions
benbovy Oct 26, 2022
3bf92cd
copy the coordinate variables of passed indexes
benbovy Oct 26, 2022
c9b6363
DataArray: check dimensions/shape of index coords
benbovy Oct 26, 2022
82dc5cc
docstrings tweaks
benbovy Oct 27, 2022
a58c9d0
more Indexes safety
benbovy Oct 27, 2022
9beeea7
ensure input indexes are Xarray indexes
benbovy Oct 27, 2022
c6e94b4
add .assign_indexes() method
benbovy Oct 27, 2022
ddd505e
Merge branch 'main' into indexes-arg-constructors
benbovy Dec 8, 2022
f97adb5
add `IndexedCoordinates` subclass
benbovy Dec 8, 2022
45709ef
rollback/update Dataset and DataArray constructors
benbovy Dec 8, 2022
4c559f1
update docstrings
benbovy Dec 8, 2022
1192948
fix Dataset creation internal error
benbovy Dec 8, 2022
a877a74
add IndexedCoordinates.merge_coords
benbovy Dec 9, 2022
9d6d2ae
drop IndexedCoordinates and reuse Coordinates
benbovy Dec 12, 2022
3ee26ef
update api docs
benbovy Dec 12, 2022
dd02eca
make Coordinates init args optional
benbovy Dec 12, 2022
0ee8f95
docstrings updates
benbovy Dec 12, 2022
fc6c948
convert to base variable when no index is given
benbovy Dec 12, 2022
0572b96
raise when an index is given with no variable
benbovy Dec 12, 2022
6f5114b
skip create default indexes...
benbovy Dec 12, 2022
e27830a
invariant checks: maybe skip IndexVariable checks
benbovy Dec 12, 2022
1649fb8
add Coordinates tests
benbovy Dec 12, 2022
298fccd
more Coordinates tests
benbovy Dec 12, 2022
e8c627c
add Dataset constructor tests with Coordinates
benbovy Dec 12, 2022
be86f87
fix mypy
benbovy Dec 12, 2022
75e2523
assign_coords: do not create default indexes...
benbovy Dec 12, 2022
82f0fb2
support alignment of Coordinates
benbovy Dec 12, 2022
883e67c
clean-up
benbovy Dec 12, 2022
28e9861
fix failing test (dataarray coords not extracted)
benbovy Dec 12, 2022
9a209a3
fix tests: prevent index conflicts
benbovy Dec 12, 2022
4f337e3
add Coordinates.equals and Coordinates.identical
benbovy Dec 13, 2022
43ddcf6
more tests, docstrings, docs
benbovy Dec 13, 2022
2437456
fix assert_* (Coordinates subclasses)
benbovy Dec 13, 2022
e60570f
review copy
benbovy Dec 13, 2022
d01cf01
another few tests
benbovy Dec 13, 2022
9fc49ff
fix mypy
benbovy Dec 13, 2022
7873c77
update what's new
benbovy Dec 13, 2022
e7998d1
Merge branch 'main' into indexes-arg-constructors-2
benbovy Dec 13, 2022
f7ec33e
do not copy indexes
benbovy Dec 13, 2022
b1a9688
add Coordinates fastpath constructor
benbovy Dec 14, 2022
38fdf1e
fix sphinx directive
benbovy Dec 14, 2022
d9e9e34
re-add coord indexes in merge (dataset constructor)
benbovy Dec 14, 2022
3999eff
create coords with default idx: try a cleaner impl
benbovy Dec 14, 2022
d5d8233
some useful comments for later
benbovy Dec 14, 2022
d2fcaa3
xr.merge: add support for Coordinates objects
benbovy Dec 14, 2022
193dad3
allow skip align for object(s) in merge_core
benbovy Dec 15, 2022
84c77a4
fix mypy
benbovy Dec 15, 2022
5e82d61
what's new tweaks
benbovy Dec 15, 2022
c6409fd
align Coordinates callbacks: don't reindex data vars
benbovy Dec 15, 2022
39294fc
fix Coordinates._overwrite_indexes callback
benbovy Dec 15, 2022
3fc1e8c
Merge branch 'main' into indexes-arg-constructors-2
benbovy Jan 13, 2023
8c65f85
remove merge_coords
benbovy Jan 13, 2023
cf6fcbb
futurewarning: pass multi-index via data vars
benbovy Jan 13, 2023
6a6444f
review comments
benbovy Jan 13, 2023
50cf057
Merge branch 'main' into indexes-arg-constructors-2
benbovy Jan 13, 2023
f5d1fe1
Merge branch 'main' into pr/7368
Illviljan Jul 14, 2023
1759ac9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 14, 2023
48f6950
Fix circulat imports
Illviljan Jul 14, 2023
a789f6b
Merge branch 'indexes-arg-constructors-2' of https://github.com/benbo…
Illviljan Jul 14, 2023
fa384f7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 14, 2023
7628cb2
typing: add Alignable protocol class
benbovy Jul 17, 2023
c8821f9
try fixing mypy error (Self redefinition)
benbovy Jul 17, 2023
c71aadb
remove Coordinate alias of Variable
benbovy Jul 17, 2023
139b13a
fix groupby test
benbovy Jul 17, 2023
7ed6279
doc: remove merge_coords in api reference
benbovy Jul 18, 2023
3d94357
doc: improve docstrings and glossary
benbovy Jul 18, 2023
4a6e915
use Self type annotation in Coordinate class
benbovy Jul 18, 2023
31f66b4
better comment
benbovy Jul 18, 2023
4cb70d0
fix Self undefined error with python < 3.11
benbovy Jul 18, 2023
4ef5f17
Merge branch 'main' into indexes-arg-constructors-2
dcherian Jul 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 41 additions & 10 deletions doc/api-hidden.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,42 @@
.. autosummary::
:toctree: generated/

Coordinates.from_pandas_multiindex
Coordinates.get
Coordinates.items
Coordinates.keys
Coordinates.values
Coordinates.dims
Coordinates.dtypes
Coordinates.variables
Coordinates.xindexes
Coordinates.indexes
Coordinates.to_dataset
Coordinates.to_index
Coordinates.update
Coordinates.merge
Coordinates.merge_coords
Coordinates.copy
Coordinates.equals
Coordinates.identical

core.coordinates.DatasetCoordinates.get
core.coordinates.DatasetCoordinates.items
core.coordinates.DatasetCoordinates.keys
core.coordinates.DatasetCoordinates.merge
core.coordinates.DatasetCoordinates.to_dataset
core.coordinates.DatasetCoordinates.to_index
core.coordinates.DatasetCoordinates.update
core.coordinates.DatasetCoordinates.values
core.coordinates.DatasetCoordinates.dims
core.coordinates.DatasetCoordinates.indexes
core.coordinates.DatasetCoordinates.dtypes
core.coordinates.DatasetCoordinates.variables
core.coordinates.DatasetCoordinates.xindexes
core.coordinates.DatasetCoordinates.indexes
core.coordinates.DatasetCoordinates.to_dataset
core.coordinates.DatasetCoordinates.to_index
core.coordinates.DatasetCoordinates.update
core.coordinates.DatasetCoordinates.merge
core.coordinates.DatasetCoordinates.merge_coords
core.coordinates.DataArrayCoordinates.copy
core.coordinates.DatasetCoordinates.equals
core.coordinates.DatasetCoordinates.identical

core.rolling.DatasetCoarsen.boundary
core.rolling.DatasetCoarsen.coord_func
Expand Down Expand Up @@ -47,14 +72,20 @@
core.coordinates.DataArrayCoordinates.get
core.coordinates.DataArrayCoordinates.items
core.coordinates.DataArrayCoordinates.keys
core.coordinates.DataArrayCoordinates.merge
core.coordinates.DataArrayCoordinates.to_dataset
core.coordinates.DataArrayCoordinates.to_index
core.coordinates.DataArrayCoordinates.update
core.coordinates.DataArrayCoordinates.values
core.coordinates.DataArrayCoordinates.dims
core.coordinates.DataArrayCoordinates.indexes
core.coordinates.DataArrayCoordinates.dtypes
core.coordinates.DataArrayCoordinates.variables
core.coordinates.DataArrayCoordinates.xindexes
core.coordinates.DataArrayCoordinates.indexes
core.coordinates.DataArrayCoordinates.to_dataset
core.coordinates.DataArrayCoordinates.to_index
core.coordinates.DataArrayCoordinates.update
core.coordinates.DataArrayCoordinates.merge
core.coordinates.DataArrayCoordinates.merge_coords
core.coordinates.DataArrayCoordinates.copy
core.coordinates.DataArrayCoordinates.equals
core.coordinates.DataArrayCoordinates.identical

core.rolling.DataArrayCoarsen.boundary
core.rolling.DataArrayCoarsen.coord_func
Expand Down
1 change: 1 addition & 0 deletions doc/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1085,6 +1085,7 @@ Advanced API
.. autosummary::
:toctree: generated/

Coordinates
Dataset.variables
DataArray.variable
Variable
Expand Down
14 changes: 14 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,20 @@ v2023.07.1 (unreleased)
New Features
~~~~~~~~~~~~

- :py:class:`Coordinates` can now be constructed independently of any Dataset or
DataArray (it is also returned by the :py:attr:`Dataset.coords` and
:py:attr:`DataArray.coords` properties). ``Coordinates`` objects are useful for
passing both coordinate variables and indexes to new Dataset / DataArray objects,
e.g., via their constructor or via :py:meth:`Dataset.assign_coords`. We may also
wrap coordinate variables in a ``Coordinates`` object in order to skip
the automatic creation of (pandas) indexes for dimension coordinates.
The :py:class:`Coordinates.from_pandas_multiindex` constructor may be used to
create coordinates directly from a :py:class:`pandas.MultiIndex` object (it is
preferred over passing it directly as coordinate data, which may be deprecated soon).
Like Dataset and DataArray objects, ``Coordinates`` objects may now be used in
:py:func:`align` and :py:func:`merge`.
(:issue:`6392`, :pull:`7368`).
By `Benoît Bovy <https://github.com/benbovy>`_.

Breaking changes
~~~~~~~~~~~~~~~~
Expand Down
4 changes: 3 additions & 1 deletion xarray/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
where,
)
from xarray.core.concat import concat
from xarray.core.coordinates import Coordinates
dcherian marked this conversation as resolved.
Show resolved Hide resolved
from xarray.core.dataarray import DataArray
from xarray.core.dataset import Dataset
from xarray.core.extensions import (
Expand All @@ -35,7 +36,7 @@
from xarray.core.merge import Context, MergeError, merge
from xarray.core.options import get_options, set_options
from xarray.core.parallel import map_blocks
from xarray.core.variable import Coordinate, IndexVariable, Variable, as_variable
from xarray.core.variable import IndexVariable, Variable, as_variable
from xarray.util.print_versions import show_versions

try:
Expand Down Expand Up @@ -98,6 +99,7 @@
"CFTimeIndex",
"Context",
"Coordinate",
"Coordinates",
"DataArray",
"Dataset",
"IndexVariable",
Expand Down
51 changes: 24 additions & 27 deletions xarray/core/alignment.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,12 @@
from collections import defaultdict
from collections.abc import Hashable, Iterable, Mapping
from contextlib import suppress
from typing import TYPE_CHECKING, Any, Callable, Generic, TypeVar, cast
from typing import TYPE_CHECKING, Any, Callable, Generic, cast

import numpy as np
import pandas as pd

from xarray.core import dtypes
from xarray.core.common import DataWithCoords
from xarray.core.indexes import (
Index,
Indexes,
Expand All @@ -20,15 +19,14 @@
indexes_all_equal,
safe_cast_to_index,
)
from xarray.core.types import T_Alignable
from xarray.core.utils import is_dict_like, is_full_slice
from xarray.core.variable import Variable, as_compatible_data, calculate_dimensions

if TYPE_CHECKING:
from xarray.core.dataarray import DataArray
from xarray.core.dataset import Dataset
from xarray.core.types import JoinOptions, T_DataArray, T_Dataset, T_DataWithCoords

DataAlignable = TypeVar("DataAlignable", bound=DataWithCoords)
from xarray.core.types import JoinOptions, T_DataArray, T_Dataset


def reindex_variables(
Expand Down Expand Up @@ -92,7 +90,7 @@ def reindex_variables(
NormalizedIndexVars = dict[MatchingIndexKey, dict[Hashable, Variable]]


class Aligner(Generic[DataAlignable]):
class Aligner(Generic[T_Alignable]):
"""Implements all the complex logic for the re-indexing and alignment of Xarray
objects.

Expand All @@ -105,8 +103,8 @@ class Aligner(Generic[DataAlignable]):

"""

objects: tuple[DataAlignable, ...]
results: tuple[DataAlignable, ...]
objects: tuple[T_Alignable, ...]
results: tuple[T_Alignable, ...]
objects_matching_indexes: tuple[dict[MatchingIndexKey, Index], ...]
join: str
exclude_dims: frozenset[Hashable]
Expand All @@ -127,7 +125,7 @@ class Aligner(Generic[DataAlignable]):

def __init__(
self,
objects: Iterable[DataAlignable],
objects: Iterable[T_Alignable],
join: str = "inner",
indexes: Mapping[Any, Any] | None = None,
exclude_dims: Iterable = frozenset(),
Expand Down Expand Up @@ -510,7 +508,7 @@ def _get_dim_pos_indexers(

def _get_indexes_and_vars(
self,
obj: DataAlignable,
obj: T_Alignable,
matching_indexes: dict[MatchingIndexKey, Index],
) -> tuple[dict[Hashable, Index], dict[Hashable, Variable]]:
new_indexes = {}
Expand All @@ -533,13 +531,13 @@ def _get_indexes_and_vars(

def _reindex_one(
self,
obj: DataAlignable,
obj: T_Alignable,
matching_indexes: dict[MatchingIndexKey, Index],
) -> DataAlignable:
) -> T_Alignable:
new_indexes, new_variables = self._get_indexes_and_vars(obj, matching_indexes)
dim_pos_indexers = self._get_dim_pos_indexers(matching_indexes)

new_obj = obj._reindex_callback(
return obj._reindex_callback(
self,
dim_pos_indexers,
new_variables,
Expand All @@ -548,8 +546,6 @@ def _reindex_one(
self.exclude_dims,
self.exclude_vars,
)
new_obj.encoding = obj.encoding
return new_obj

def reindex_all(self) -> None:
self.results = tuple(
Expand Down Expand Up @@ -581,13 +577,13 @@ def align(self) -> None:


def align(
*objects: DataAlignable,
*objects: T_Alignable,
join: JoinOptions = "inner",
copy: bool = True,
indexes=None,
exclude=frozenset(),
fill_value=dtypes.NA,
) -> tuple[DataAlignable, ...]:
) -> tuple[T_Alignable, ...]:
"""
Given any number of Dataset and/or DataArray objects, returns new
objects with aligned indexes and dimension sizes.
Expand Down Expand Up @@ -801,14 +797,15 @@ def deep_align(

This function is not public API.
"""
from xarray.core.coordinates import Coordinates
from xarray.core.dataarray import DataArray
from xarray.core.dataset import Dataset

if indexes is None:
indexes = {}

def is_alignable(obj):
return isinstance(obj, (DataArray, Dataset))
return isinstance(obj, (Coordinates, DataArray, Dataset))

positions = []
keys = []
Expand Down Expand Up @@ -866,15 +863,15 @@ def is_alignable(obj):


def reindex(
obj: DataAlignable,
obj: T_Alignable,
indexers: Mapping[Any, Any],
method: str | None = None,
tolerance: int | float | Iterable[int | float] | None = None,
copy: bool = True,
fill_value: Any = dtypes.NA,
sparse: bool = False,
exclude_vars: Iterable[Hashable] = frozenset(),
) -> DataAlignable:
) -> T_Alignable:
"""Re-index either a Dataset or a DataArray.

Not public API.
Expand Down Expand Up @@ -905,13 +902,13 @@ def reindex(


def reindex_like(
obj: DataAlignable,
obj: T_Alignable,
other: Dataset | DataArray,
method: str | None = None,
tolerance: int | float | Iterable[int | float] | None = None,
copy: bool = True,
fill_value: Any = dtypes.NA,
) -> DataAlignable:
) -> T_Alignable:
"""Re-index either a Dataset or a DataArray like another Dataset/DataArray.

Not public API.
Expand Down Expand Up @@ -953,8 +950,8 @@ def _get_broadcast_dims_map_common_coords(args, exclude):


def _broadcast_helper(
arg: T_DataWithCoords, exclude, dims_map, common_coords
) -> T_DataWithCoords:
arg: T_Alignable, exclude, dims_map, common_coords
) -> T_Alignable:
from xarray.core.dataarray import DataArray
from xarray.core.dataset import Dataset

Expand Down Expand Up @@ -984,16 +981,16 @@ def _broadcast_dataset(ds: T_Dataset) -> T_Dataset:

# remove casts once https://github.com/python/mypy/issues/12800 is resolved
if isinstance(arg, DataArray):
return cast("T_DataWithCoords", _broadcast_array(arg))
return cast(T_Alignable, _broadcast_array(arg))
elif isinstance(arg, Dataset):
return cast("T_DataWithCoords", _broadcast_dataset(arg))
return cast(T_Alignable, _broadcast_dataset(arg))
else:
raise ValueError("all input must be Dataset or DataArray objects")


# TODO: this typing is too restrictive since it cannot deal with mixed
# DataArray and Dataset types...? Is this a problem?
def broadcast(*args: T_DataWithCoords, exclude=None) -> tuple[T_DataWithCoords, ...]:
def broadcast(*args: T_Alignable, exclude=None) -> tuple[T_Alignable, ...]:
"""Explicitly broadcast any number of DataArray or Dataset objects against
one another.

Expand Down
12 changes: 10 additions & 2 deletions xarray/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
from xarray.core.indexing import BasicIndexer, ExplicitlyIndexed
from xarray.core.options import OPTIONS, _get_keep_attrs
from xarray.core.parallelcompat import get_chunked_array_type, guess_chunkmanager
from xarray.core.pdcompat import _convert_base_to_offset
from xarray.core.pycompat import is_chunked_array
from xarray.core.utils import (
Frozen,
Expand Down Expand Up @@ -609,9 +608,17 @@ def assign_coords(
Dataset.swap_dims
Dataset.set_coords
"""
from xarray.core.coordinates import Coordinates

coords_combined = either_dict_or_kwargs(coords, coords_kwargs, "assign_coords")
data = self.copy(deep=False)
results: dict[Hashable, Any] = self._calc_assign_results(coords_combined)

results: Coordinates | dict[Hashable, Any]
if isinstance(coords, Coordinates):
results = coords
else:
results = self._calc_assign_results(coords_combined)

data.coords.update(results)
return data

Expand Down Expand Up @@ -952,6 +959,7 @@ def _resample(

from xarray.core.dataarray import DataArray
from xarray.core.groupby import ResolvedTimeResampleGrouper, TimeResampleGrouper
from xarray.core.pdcompat import _convert_base_to_offset
from xarray.core.resample import RESAMPLE_DIM

if keep_attrs is not None:
Expand Down
Loading