Exploratory proposal: convert dascore to a Pangeo/xarray project #27

d-chambers · 2022-05-23T16:44:34Z

d-chambers
May 23, 2022
Maintainer

It may be in our best interest to make dascore a Pangeo affiliated project. This discussion contains some of the advantages/disadvantages with the associated proposal.

Motivation

The Pangeo project is an ecosystem of python packages used for geoscience (especially climate science). Provided we use the correct data structures (discussed later) dascore can benefit from all the large-scale data processing tools of that ecosystem, including:

Dask integration
Straight-forward path for use in cloud environments
Compatibility with a variety of xarray libraries
Benefits of future improvements to xarray/dask
Lower development/maintenance load (because we simply rely on xarray for more)

The Pangeo community is very active and a recent effort to improve ETL workflows have been funded by the NSF under the name of Pangeo Forge

Disadvantages

API changes

We would need to remove the custom DAS classes (Patch and Spool) and operate on xarray objects DataArray and DataSet. This would break existing code, but the migration path should be straight-forward. Rather than something like this:

import dascore
pa = dascore.get_example_patch()

out = (
    pa.decimate(8)  # decimate to reduce data volume by 8 along time dimension
    .detrend(dim='distance')  # detrend along distance dimension
    .pass_filter(time=(None, 10))  # apply a low-pass 10 Hz butterworth filter
)

It would look like this:

import dascore as dc
dar = dascore.get_example_array()

out = (
    pa.dc.decimate(8)  
    dc.detrend(dim='distance') 
    dc.pass_filter(time=(None, 10)) 
)

Where we would use either 'dc' or 'das' as an accessor namespace.

We would also need to rewrite dascore file parsers as xarray backends, but, at first glance, this seems straight-forward.

Steeping learning curve

Personally, I find the xarray API functional enough, but it can be hard for beginners to grok, especially if they aren't familiar with pandas. We can mitigate this by providing simpler functions in the das namespace that will do 90% of what most users need, then they have to learn the xarray API to go beyond that point.

Eco-system lock-in

There was some discussion of using ray rather than dask, or jax rather than numpy in dascore. If we adopt this proposal these paths will effectively be closed unless the pangeo packages go this route.

Next steps

I am going to spend some time working on a branch that implements the outlined changes then report back here.

d-chambers · 2022-05-24T16:53:40Z

d-chambers
May 24, 2022
Maintainer Author

I spent a few hours on this and I found the xarray backend stuff a bit convoluted and confusing, and wasn't able to get the lazy loading to work with dask yet. I am wondering if, without additional abstraction, this will make it too hard for people to add additional DAS backends as I really don't want to write them all...

Also, even for lazy loading, it seems the entire coordinate arrays need to be loaded into memory and only the data arrays are lazy. For nearly square data this isn't a big deal, but for DAS data where the time axis is easily orders of magnitude larger than the distance axis it comes at a much greater cost.

Moreover, we will find it difficult to maintain metadata consistency if all the DataArray methods are mixed with the dascore methods.

0 replies

d-chambers · 2023-07-21T22:38:44Z

d-chambers
Jul 21, 2023
Maintainer Author

I think this ship is long sailed at this point.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploratory proposal: convert dascore to a Pangeo/xarray project #27

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Exploratory proposal: convert dascore to a Pangeo/xarray project #27

d-chambers May 23, 2022 Maintainer

Motivation

Disadvantages

Next steps

Replies: 2 comments

d-chambers May 24, 2022 Maintainer Author

d-chambers Jul 21, 2023 Maintainer Author

d-chambers
May 23, 2022
Maintainer

d-chambers
May 24, 2022
Maintainer Author

d-chambers
Jul 21, 2023
Maintainer Author