Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataArray.apply is missing #1074

Open
burnpanck opened this issue Nov 2, 2016 · 9 comments
Open

DataArray.apply is missing #1074

burnpanck opened this issue Nov 2, 2016 · 9 comments

Comments

@burnpanck
Copy link
Contributor

In essence, I'm looking for the functionality of xarray.core.utils.maybe_wrap_array. I guess you are waiting for #964 to unify such functionality?

@shoyer
Copy link
Member

shoyer commented Nov 4, 2016

We could try to get in a simple version of this before #964.

My only concern is that the functionality here is slightly different from Dataset.apply, which takes functions that map DataArray -> DataArray or DataArray -> numpy.ndarray.

For DataArray.apply, we really want something that maps numpy.ndarray -> numpy.ndarray, which is now inconsistent. I guess we could pass the original DataArray to the function, but we already have a .pipe method for that.

This is also a concern for #964, because any new xarray.apply function would have similar consistency issues with Dataset.apply.

Two possible solutions, neither of which is fully satisfying:

  • New keyword argument raw to Dataset.apply, defaulting to False. Works like the raw argument to DataFrame.apply. If raw=True, then Dataset.apply passes unlabeled arrays to the provided function, like DataArray.apply. This makes the difference a little less jarring.
  • Pick a new name for one of these uses of apply, e.g., apply_raw for this use case. xarray.apply_raw or DataArray.apply_raw is pretty verbose, though.

@burnpanck
Copy link
Contributor Author

Aha! For my use-case, DataArray.pipe is perfectly fine, I just didn't know about it. I have to admit that I know nothing about pandas. Before I learned about xarray, pandas was not interesting to me at all. My datasets are often high-dimensional which does not work well with pandas' orientation towards (one-dimensional) collections of observations. In that sense, I could rather relabel this issue (or create a new one) as a documentation problem. The API reference does not indicate the existence of DataArray.pipe at all (only Dataset.pipe, even though that one mentions it works on DataArrays too). Also, there could possibly be a see-also link to pipe from apply. Shall I have a go at a PR?

@burnpanck
Copy link
Contributor Author

As for the consistency concern, I wouldn't have expected that to be a big issue. I'd argue that most functions mapping np.ndarray -> np.ndarray will not mind receiving a DataArray instead. On the other hand, functions mapping DataArray -> np.ndarray would seldom prefer to receive the raw np.ndarray. So I see no use to the raw parameter (but then again, I do not know pandas and their use-case), such that my hypotetical DataArray.apply and the existing DataArray.pipe are essentially the same.

@smartass101
Copy link

I think #1130 is related. I also think that apply is somewhat synonymous to pipe and is a lot more understandable for people without a pandas background. It would also be more consistent to have them both named the same on both Dataset and DataArray.

@shoyer
Copy link
Member

shoyer commented Nov 17, 2016

@burnpanck I missed your comments from a few weeks ago. Yes, please do make PRs to update the docs.

I am OK with adding DataArray.apply for consistency and discoverability, even if it's basically an alias of DataArray.pipe. And if we add a raw argument I suppose it should probably default to raw=False for both .apply methods.

@smartass101
Copy link

Actually, I think that pipe should default to raw=False if there even is to be such a parameter. the reason is that one usually uses pipe to chain together functions, each of which usually expect a DataArray and "downcasting" to ndarray often breaks such a chain. If you insist on having one method behave as if raw=True, then I think it should be apply in order to be constsent with the python apply function which simply applies the function and nothing more.

@stale
Copy link

stale bot commented Jan 24, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity
If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

@openSourcerer9000
Copy link

openSourcerer9000 commented Jul 2, 2021

Not stale! Why not follow the pd convention of map vs apply for dataarray vs dataset?

@aberges-grd
Copy link

I want to apply a function over a given axis. This functionality is well defined and not covered by pipe (well, you can select the axis INSIDE the function passed, but it's bad practice because that means the function has an extra parameter to select the axis, not very KISS imo)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants