Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a combine_attrs parameter to Dataset.merge #4895

Merged
merged 18 commits into from
Mar 6, 2021
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,13 @@ New Features
contain missing values; 8x faster in our benchmark, and 2x faster than pandas.
(:pull:`4746`);
By `Maximilian Roos <https://github.com/max-sixty>`_.

- Performance improvement when constructing DataArrays. Significantly speeds up repr for Datasets with large number of variables.
By `Deepak Cherian <https://github.com/dcherian>`_
By `Deepak Cherian <https://github.com/dcherian>`_.
- add ``"drop_conflicts"`` to the strategies supported by the ``combine_attrs`` kwarg
(:issue:`4749`, :pull:`4827`).
By `Justus Magin <https://github.com/keewis>`_.
By `Deepak Cherian <https://github.com/dcherian>`_.
- allow passing ``combine_attrs`` to :py:meth:`Dataset.merge` (:pull:`4895`).
keewis marked this conversation as resolved.
Show resolved Hide resolved
By `Justus Magin <https://github.com/keewis>`_.
- :py:meth:`DataArray.swap_dims` & :py:meth:`Dataset.swap_dims` now accept dims
in the form of kwargs as well as a dict, like most similar methods.
By `Maximilian Roos <https://github.com/max-sixty>`_.
Expand Down
16 changes: 16 additions & 0 deletions xarray/core/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -3897,6 +3897,7 @@ def merge(
compat: str = "no_conflicts",
join: str = "outer",
fill_value: Any = dtypes.NA,
combine_attrs: str = "override",
) -> "Dataset":
"""Merge the arrays of two datasets into a single dataset.

Expand Down Expand Up @@ -3934,9 +3935,23 @@ def merge(
- 'left': use indexes from ``self``
- 'right': use indexes from ``other``
- 'exact': error instead of aligning non-equal indexes

fill_value : scalar or dict-like, optional
Value to use for newly missing values. If a dict-like, maps
variable names (including coordinates) to fill values.
combine_attrs : {"drop", "identical", "no_conflicts", "drop_conflicts", \
"override"}, default: "override"
String indicating how to combine attrs of the objects being merged:
dcherian marked this conversation as resolved.
Show resolved Hide resolved

- "drop": empty attrs on returned Dataset.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- "drop": empty attrs on returned Dataset.
- "drop": empty attrs on the returned Dataset.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was copied from merge and should be the same for all functions / methods that accept combine_attrs so I would change that in a new PR.

- "identical": all attrs must be the same on every object.
- "no_conflicts": attrs from all objects are combined, any that have
the same name must also have the same value.
- "drop_conflicts": attrs from all objects are combined, any that have
the same name but different values are dropped.
- "override": skip comparing and copy attrs from the first dataset to
the result.


Returns
-------
Expand All @@ -3956,6 +3971,7 @@ def merge(
compat=compat,
join=join,
fill_value=fill_value,
combine_attrs=combine_attrs,
)
return self._replace(**merge_result._asdict())

Expand Down
8 changes: 7 additions & 1 deletion xarray/core/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -893,6 +893,7 @@ def dataset_merge_method(
compat: str,
join: str,
fill_value: Any,
combine_attrs: str,
) -> _MergeResult:
"""Guts of the Dataset.merge method."""
# we are locked into supporting overwrite_vars for the Dataset.merge
Expand Down Expand Up @@ -922,7 +923,12 @@ def dataset_merge_method(
priority_arg = 2

return merge_core(
objs, compat, join, priority_arg=priority_arg, fill_value=fill_value
objs,
compat,
join,
priority_arg=priority_arg,
fill_value=fill_value,
combine_attrs=combine_attrs,
)


Expand Down
31 changes: 31 additions & 0 deletions xarray/tests/test_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -418,3 +418,34 @@ def test_merge_dataarray(self):
da = xr.DataArray(data=1, name="b")

assert_identical(ds.merge(da), xr.merge([ds, da]))

@pytest.mark.parametrize(
["combine_attrs", "attrs1", "attrs2", "expected_attrs", "expect_error"],
# don't need to test thoroughly
(
("drop", {"a": 0, "b": 1, "c": 2}, {"a": 1, "b": 2, "c": 3}, {}, False),
(
"drop_conflicts",
{"a": 0, "b": 1, "c": 2},
{"b": 2, "c": 2, "d": 3},
{"a": 0, "c": 2, "d": 3},
False,
),
("override", {"a": 0, "b": 1}, {"a": 1, "b": 2}, {"a": 0, "b": 1}, False),
("no_conflicts", {"a": 0, "b": 1}, {"a": 0, "b": 2}, None, True),
("identical", {"a": 0, "b": 1}, {"a": 0, "b": 2}, None, True),
),
)
def test_merge_combine_attrs(
self, combine_attrs, attrs1, attrs2, expected_attrs, expect_error
):
ds1 = xr.Dataset(attrs=attrs1)
ds2 = xr.Dataset(attrs=attrs2)

if expect_error:
with pytest.raises(xr.MergeError):
ds1.merge(ds2, combine_attrs=combine_attrs)
else:
actual = ds1.merge(ds2, combine_attrs=combine_attrs)
expected = xr.Dataset(attrs=expected_attrs)
assert_identical(actual, expected)