Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved default behavior when concatenating DataArrays #2777

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,9 @@ Enhancements
`Spencer Clark <https://github.com/spencerkclark>`_.
- Add ``data=False`` option to ``to_dict()`` methods. (:issue:`2656`)
By `Ryan Abernathey <https://github.com/rabernat>`_
- Use new dimension name and unique array names to create a new coordinate
when concatenating arrays, if no coordinates are given.
(:issue:`2775`). By `Zac Hatfield-Dodds <https://github.com/Zac-HD>`_.
- :py:meth:`~xarray.DataArray.coarsen` and
:py:meth:`~xarray.Dataset.coarsen` are newly added.
See :ref:`comput.coarsen` for details.
Expand Down Expand Up @@ -82,6 +85,10 @@ Bug fixes

- Silenced warnings that appear when using pandas 0.24.
By `Stephan Hoyer <https://github.com/shoyer>`_
- Concatenating a sequence of :py:class:`~xarray.DataArray` with varying names
sets the name of the output array to ``None``, instead of the name of the
first input array.
(:issue:`2775`). By `Zac Hatfield-Dodds <https://github.com/Zac-HD>`_.
- Interpolating via resample now internally specifies ``bounds_error=False``
as an argument to ``scipy.interpolate.interp1d``, allowing for interpolation
from higher frequencies to lower frequencies. Datapoints outside the bounds
Expand Down
28 changes: 18 additions & 10 deletions xarray/core/combine.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

from . import utils
from .alignment import align
from .computation import result_name
from .merge import merge
from .variable import IndexVariable, Variable, as_variable
from .variable import concat as concat_vars
Expand Down Expand Up @@ -323,16 +324,23 @@ def _dataarray_concat(arrays, dim, data_vars, coords, compat,
raise ValueError('data_vars is not a valid argument when '
'concatenating DataArray objects')

datasets = []
for n, arr in enumerate(arrays):
if n == 0:
name = arr.name
elif name != arr.name:
if compat == 'identical':
raise ValueError('array names not identical')
else:
arr = arr.rename(name)
datasets.append(arr._to_temp_dataset())
name = result_name(arrays)
names = [arr.name for arr in arrays]
if compat == 'identical' and len(set(names)) != 1:
raise ValueError(
"compat='identical', but array names {!r} are not identical"
.format(names if len(names) <= 10 else sorted(set(names)))
)
datasets = [arr.rename(name)._to_temp_dataset() for arr in arrays]

if (
isinstance(dim, str)
and len(set(names) - {None}) == len(names)
and not any(dim in a.dims or dim in a.coords for a in arrays)
):
# We're concatenating arrays with unique non-None names along
# a new dimension, so we use the existing names as coordinates.
dim = pd.Index(names, name=dim)

ds = _dataset_concat(datasets, dim, data_vars, coords, compat,
positions)
Expand Down
22 changes: 21 additions & 1 deletion xarray/tests/test_combine.py
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,8 @@ def test_concat(self):

# from dataset array:
expected = DataArray(np.array([foo.values, bar.values]),
dims=['w', 'x', 'y'], coords={'x': [0, 1]})
dims=['w', 'x', 'y'],
coords={'x': [0, 1], 'w': ['foo', 'bar']})
actual = concat([foo, bar], 'w')
assert_equal(expected, actual)
# from iteration:
Expand Down Expand Up @@ -297,6 +298,25 @@ def test_concat_lazy(self):
assert combined.shape == (2, 3, 3)
assert combined.dims == ('z', 'x', 'y')

def test_concat_names_and_coords(self):
ds = Dataset({'foo': (['x', 'y'], np.random.random((2, 2))),
'bar': (['x', 'y'], np.random.random((2, 2)))})
# Concat arrays with different names, new name is None
# and unique array names are used as coordinates
new = concat([ds.foo, ds.bar], dim='new')
assert new.name is None
assert (new.coords['new'] == ['foo', 'bar']).values.all()
# Get a useful error message for unexpectedly different names
with pytest.raises(ValueError) as err:
concat([ds.foo, ds.bar], dim='new', compat='identical')
assert err.value.args[0] == "compat='identical', " + \
"but array names ['foo', 'bar'] are not identical"
# Concat arrays with same name, name is preserved
# and non-unique names are not used as coords
foobar = ds.foo.rename('bar')
assert concat([foobar, ds.bar], dim='new').name == 'bar'
assert 'new' not in concat([foobar, ds.bar], dim='new').coords


class TestAutoCombine(object):

Expand Down
9 changes: 5 additions & 4 deletions xarray/tutorial.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,16 +91,17 @@ def open_dataset(name, cache=True, cache_dir=_default_cache_dir,

def load_dataset(*args, **kwargs):
"""
`load_dataset` will be removed in version 0.12. The current behavior of
this function can be achived by using `tutorial.open_dataset(...).load()`.
`load_dataset` is deprecated and will be removed in a future version.
The current behavior of this function can be achived by using
`tutorial.open_dataset(...).load()`.

See Also
--------
open_dataset
"""
warnings.warn(
"load_dataset` will be removed in xarray version 0.12. The current "
"behavior of this function can be achived by using "
"load_dataset` will be removed in a future version of Xarray. "
"The current behavior of this function can be achived by using "
"`tutorial.open_dataset(...).load()`.",
DeprecationWarning, stacklevel=2)
return open_dataset(*args, **kwargs).load()