Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utility to restore original dimension order after apply_ufunc #1739

Open
shoyer opened this issue Nov 23, 2017 · 11 comments
Open

Utility to restore original dimension order after apply_ufunc #1739

shoyer opened this issue Nov 23, 2017 · 11 comments

Comments

@shoyer
Copy link
Member

shoyer commented Nov 23, 2017

This seems to be coming up quite a bit for wrapping functions that apply an operation along an axis, e.g., for interpolate in #1640 or rank in #1733.

We should either write a utility function to do this or consider adding an option to apply_ufunc.

@shoyer
Copy link
Member Author

shoyer commented Nov 24, 2017

I wonder if it would make sense to simply change the behavior of apply_ufunc so existing dimensions always get restored to their original location, and only new dimensions remain at the end. It seems like this is almost always desirable.

@jhamman
Copy link
Member

jhamman commented Nov 27, 2017

At this point, I think I'm +1 on either automatically restoring dimension order or providing some option to do so (with restoring as the default). I'm still getting used to apply_ufunc so I'm not sure I feel strongly enough to make this call yet though.

@shoyer
Copy link
Member Author

shoyer commented Nov 27, 2017

I started trying to implement this in apply_ufunc and then realized why I hadn't done so before: it's really not clear how to match up input/output arguments if there are more than one of each. We could do something special for there first input/output but that could be more confusing than helpful.

This is probably better reserved for the simpler apply_raw(#1618).

@jhamman
Copy link
Member

jhamman commented Dec 8, 2017

What if we just supplied an output_dims kwarg to apply_ufunc or apply_raw?

@shoyer
Copy link
Member Author

shoyer commented Dec 8, 2017

What if we just supplied an output_dims kwarg to apply_ufunc or apply_raw?

We could do that, but how to do we handle specifying output dimensions for a Dataset function? Maybe use a dict?

This is part of why a helper function starts to make sense to me.

@stale
Copy link

stale bot commented Nov 8, 2019

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Nov 8, 2019
@jhamman jhamman removed the stale label Nov 8, 2019
@max-sixty
Copy link
Collaborator

What do we think about attempting to enforce a dimension order throughout the dataset? That would solve these issues for free.

We already have some order in .dims & .sizes. Could we transpose all dimensions to that ordering after any operation?

(Maybe this doesn't need to be strictly enforced, but we'd at least set the expectation that dimensions could be reordered to that dimension order at any time)

Or are there uses to having different dimension order throughout a dataset?

In [45]: ds
Out[45]:
<xarray.Dataset>
Dimensions:  (lat: 25, lon: 53, time: 2920)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

In [46]: ds.dims
Out[46]: Frozen(SortedKeysDict({'lat': 25, 'time': 2920, 'lon': 53}))

In [47]: ds.sizes
Out[47]: Frozen(SortedKeysDict({'lat': 25, 'time': 2920, 'lon': 53}))

@shoyer
Copy link
Member Author

shoyer commented Nov 16, 2019

Consistent dimension ordering is usually a best practice, but I'm not sure we want to enforce it. If a user explicitly supplies two variables, with dimensions ('x', 'y') and ('y', 'x'), respectively, we would either have to raise an error or automatically transpose the second variable. Neither option sounds great to me.

As for the current order in ds.dims, right now we use sorted order. Mostly this is because I didn't want to need to use an OrderedDict for keeping track of dimensions on datasets, and until recently Python dict order was entirely arbitrary. Perhaps this could be relaxed now that Python's dict always preserves the order in which items are added.

@max-sixty
Copy link
Collaborator

we would either have to raise an error or automatically transpose the second variable. Neither option sounds great to me.

Agree, I was thinking we'd transpose the second one, but fair if you think too invasive

Perhaps this could be relaxed now that Python's dict always preserves the order in which items are added.

Ah OK. If we relaxed that, what would the order be? Just the order they were initially added, and so at least consistent through time (though not necessarily with the variables)?

@shoyer
Copy link
Member Author

shoyer commented Nov 16, 2019

we would either have to raise an error or automatically transpose the second variable. Neither option sounds great to me.

Agree, I was thinking we'd transpose the second one, but fair if you think too invasive

Maybe this would be fine. We already do automatic alignment, and this is really not that much different.

It would be a breaking change, though, so we would need to roll it out slowly.

Perhaps this could be relaxed now that Python's dict always preserves the order in which items are added.

Ah OK. If we relaxed that, what would the order be? Just the order they were initially added, and so at least consistent through time (though not necessarily with the variables)?

Yes, that's right.

One potential concern is that this would expose a detail of xarray's data model that would not be easy for users to control. I can imagine that we might have internal xarray methods that inadvertently change dimension order.

So maybe this would make sense only if we also do your other suggested change (enforcing a consistent dimension order throughout a Dataset).

@max-sixty
Copy link
Collaborator

OK great. We can meditate on it for a while, no great rush! Thanks for engaging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants