-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xarray contrib module #1850
Comments
Thanks for starting this issue @shoyer. One thing I would be interested to know is how sklearn and tensorflow balance code-quality and API consistency with low barrier to entry. For instance, most of the sklearn contrib packages provide classes which inherit from sklearn's |
I like the idea of regrouping contrib projects. I'd be +1 for the "separate repository" model, which looks indeed easier from a maintenance perspective. However, with this model it might probably be a good thing to also follow some package naming convention (see #1447 for discussion) so that we could easily identify contrib projects in, e.g.,
I'd see xarray contrib packages mainly provide |
Some additional thoughts: One thing that I like with contrib modules "protected" within the xarray namespace is that it would really help us choosing module names that are short, relevant and ideally the same that the However, it is likely that contrib modules may need domain-specific dependencies other than the ones used in xarray "core". With the |
I think domain specific dependencies are a pretty decisive argument in favor of the separate repository model. TensorFlow doesn't relax its code quality standards for contrib packages -- it's more about reducing guarantees of API stability or maintenance. That works OK for TensorFlow in part because the authors of most contrib packages are Google software engineers. |
I don't have any strong opinion about separate repos or contrib submodules, so long as there is some way to improve discoverability of methods. Having said that, many of the methods mentioned in #1288 are in the numpy namespace, and at least naively applicable to all domains. Would you consider numpy methods with semantics compatible with DataArrays and/or Datasets as appropriate to contribute to core xarray? |
I agree that the separate repository model is probably best. However, should it be in just one repository or in many? Using many repos would solve the domain-specific dependency problem, but the sklearn-contrib packages are not that discoverable IMO. I found two of them via google on separate occasions before realizing that they were part of the same github organization. |
One repository for all contrib projects would be hard to maintain if we allow very specific projects, like a little xarray extension to work with the 'xyz' GCM model (which seems to be a common case for extensions). That said, it doesn't prevent us from adding bigger, generic repositories like
Hence the suggestion to choose some convention for package naming, e.g., something similar to |
To make methods even more discoverable, we might also add the
>>> import xarray as xr
>>> import xscipy >>> da = xr.DataArray(...)
>>> da.xscipy.method() But maybe that's too much |
My 2-cents. I think we could consider setting up an Side note, we don't have to use it but I did grab the |
@maxim-lian There is a very short list of such packages hidden in the xarray documention. In general, there are a ton of these |
Personally I'd rather have "awesome xarray" listed somewhere prominently in
the xarray docs, along with mentions inline in the docs anywhere where they
are particularly relevant . The very short list that is currently there is
based upon a handful of projects that I knew about a few years ago, but
it's definitely woefully out of date now.
…On Fri, Feb 23, 2018 at 9:23 PM Noah D Brenowitz ***@***.***> wrote:
@maxim-lian <https://github.com/maxim-lian> There is a very short list of
such packages hidden in the xarray documention
<http://xarray.pydata.org/en/stable/internals.html?highlight=xgcm#extending-xarray>
.
In general, there are a ton of these awesome-... repos floating around
the internet which just list the useful/related tools/libraries which are
related to ... . For example, there are repos out there like
awesome-python <https://github.com/vinta/awesome-python> and awesome-bash.
Maybe someone could start an awesome-xarray package.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1850 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1oUunEGU95WyDsCgTYpuXKdybftIks5tX5y4gaJpZM4RoiXN>
.
|
FYI, we have started https://github.com/pangeo-data/awesome-open-climate-science. It is not xarray specific, but contains many xarray-related packages. Please contribute! |
Thanks @rabernat that awesome list looks pretty awesome. However, I would still advocate for a more centralized approach to this problem. For instance, the NCL has a huge library of contributed functions which they distribute along with the code. By now, I am sure that xarray users have basically reimplemented equivalents to all of these functions, but without a centralized home it is still too difficult to find or contribute new codes. For instance, I have a useful wrapper to I would be more than willing to volunteer for such an effort, but I think it needs to involve multiple people. Various individuals have tried to make such repos on their own, but none seem to have reached critical mass. For example, |
Just to add to the mix, we have our own package for spectra!
https://xrft.readthedocs.io/en/latest/
… On Apr 4, 2019, at 5:04 PM, Noah D Brenowitz ***@***.***> wrote:
Thanks @rabernat that awesome list looks pretty awesome.
However, I would still advocate for a more centralized approach to this problem. For instance, the NCL has a huge library of contributed functions which they distribute along with the code. By now, I am sure that xarray users have basically reimplemented equivalents to all of these functions, but without a centralized home it is still too difficult to find or contribute new codes.
For instance, I have a useful wrapper to scipy.ndimage that I use all the time, but it seems overkill to release/support a whole package for this one module. I would be much more likely to contribute a PR to a community run repository. I am also much more likely to use such a repo.
I would be more than willing to volunteer for such an effort, but I think it needs to involve multiple people. Various individuals have tried to make such repos on their own, but none seem to have reached critical mass. For example,
https://github.com/crusaderky/xarray_extras
https://github.com/fujiisoup/xr-scipy
I think there should be multiple maintainers, so that if one person drops out, there still appears to be activity on the repo.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
A few comments:
|
For what it's worth, TensorFlow has decided that bundling contrib modules into TensorFlow as |
@teoliphant thanks for sharing your thoughts! I would be very happy to collaborate on what a protocol for labeled arrays in Python could look like. Xarray is one useful implementations of labeled arrays, but it's definitely not the only one. |
I'd also like to thank @teoliphant for weighing in! Bearing in mind the history of scipy, I agree that the xarray community doesn't need 100% centralization, but there should be some conglomeration. IMO, the current situation of "one graduate student/postdoc per package" is not sustainable. |
The approach we have been taking is to develop "micro-packages". We currently have three:
These packages share some common design principles. In particular, they are all fully lazy and dask-friendly, meaning that we can apply them to very large datasets (which is the main focus in our group). By keeping the packages small, they are more maintainable. Xgcm and Xrft probably have O(3) active contributors, primarily myself and grad students in my group. Small, but significantly different from 1. We use these packages heavily in everyday scientific work, so I know they are useful. I would love to combine forces on a larger effort. However, we have limited time and effort. For now, however, this situation doesn't seem too bad. It's kind of compatible with what @teoliphant was suggesting in his comment 1 above. I'm not sure that some mega xarray-contrib package would have critical mass to be sustainable either. |
To be clear, I think there is some optimal middle ground between the "mega xarray-contrib" package and the current situation. I think the "micro-package" approach works when the collection of micro-packages is being maintained by an active/permanent entity (e.g. Ryan research group). On the other hand, postdocs and grad students are very likely to leave the field entirely within a few years, at which point they will probably stop maintaining their "micro-packages". |
@nbren12 - the key difference for our micro-packages is that the primary maintainer is me, not my grad students, and I'm not going anywhere for now. 😉 I still agree that there is probably a better way to organize all of this. Just trying to share our perspective as an xarray-centric small research group. |
The gentlest of bumps on this. Any updates or progress here?? 😄 A couple of us @NCAR ( Cc @kmpaul, @matt-long ) are interested in the outcome of this issue. |
@andersy005 what kind of update are you looking for? I assume you are about to implement some general functionality but what to know where to put it? |
This is correct. One of the things we've been exploring is a "general resample utility" that would both enable fluid translation between data at different temporal intervals (this is one of the use cases) and be aware of things like
We have a general, low-level prototype in https://github.com/coderepocenter/AxisUtilities. We think that it would be beneficial to have this functionality in xarray instead of it residing in yet another xarray related package. For the time being, my main question is: where (in xarray) would something like this reside? Note: I am happy to open a separate issue to discuss the merits of having this functionality in xarray. Cc @maboualidev |
Over in #1288 @nbren12 wrote:
Yes, I agree that we should explore this. There are a lot of interesting projects building on xarray now but not great ways to discover them.
Are there other open source projects with a good model we should copy here?
tensorflow.contrib
This gives us two different models to consider. The first "separate repository" model might be easier/flexible from a maintenance perspective. Any preferences/thoughts?
There's also some nice overlap with the Pangeo project.
The text was updated successfully, but these errors were encountered: