Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apply_ufunc with dask="parallelized" and "allowed" #315

Open
2 of 25 tasks
dougiesquire opened this issue May 9, 2021 · 2 comments
Open
2 of 25 tasks

apply_ufunc with dask="parallelized" and "allowed" #315

dougiesquire opened this issue May 9, 2021 · 2 comments

Comments

@dougiesquire
Copy link
Collaborator

A large number of xskillscore methods use xarray's apply_ufunc with dask="parallelized" for dask array support. A preferred option if the wrapped function natively supports dask arrays is to use dask="allowed". See here for details.

This issues list all current methods within xskillscore that use apply_ufunc and tries to summarise for each method how much work is involved in enabling dask="allowed".

xskillscore.contingency

  • gerrity_score : already dask="allowed"

xskillscore.deterministic

  • linslope : all numpy functions used can ingest and return dask arrays so can simply switch dask="parallelized" for dask="allowed"
  • pearson_r : all numpy functions used can ingest and return dask arrays so can simply switch dask="parallelized" for dask="allowed"
  • pearson_r_p_value : requires slight refactor to how masked nans are reset here since dask doesn't seem to be able to take the length of a empty array
  • effective_sample_size : all numpy functions used can ingest and return dask arrays so can simply switch dask="parallelized" for dask="allowed"
  • pearson_r_eff_p_value : requires slight refactor to how masked nans are reset here since dask doesn't seem to be able to take the length of a empty array
  • spearman_r : need to wrap bottleneck.nanrankdata with dask.map_blocks or equivalent, although I don't know if this is any better than using dask="parallelized"
  • spearman_r_p_value : as above
  • spearman_r_eff_p_value : as above
  • r2 : all numpy functions used can ingest and return dask arrays so can simply switch dask="parallelized" for dask="allowed"
  • me : all numpy functions used can ingest and return dask arrays so can simply switch dask="parallelized" for dask="allowed"
  • rmse : all numpy functions used can ingest and return dask arrays so can simply switch dask="parallelized" for dask="allowed"
  • mse : all numpy functions used can ingest and return dask arrays so can simply switch dask="parallelized" for dask="allowed"
  • mae : all numpy functions used can ingest and return dask arrays so can simply switch dask="parallelized" for dask="allowed"
  • median_absolute_error : all numpy functions used can ingest and return dask arrays so can simply switch dask="parallelized" for dask="allowed"
  • mape : all numpy functions used can ingest and return dask arrays so can simply switch dask="parallelized" for dask="allowed"
  • smape : all numpy functions used can ingest and return dask arrays so can simply switch dask="parallelized" for dask="allowed"

xskillscore.probabilistic

  • crps_gaussian : this is a wrapper on properscoring which expects numpy arrays (and actually triggers compute with dask="allowed"). Getting dask="allowed" working properly would require a full refactor of properscoring
  • crps_quadrature : this is a wrapper on properscoring which expects numpy arrays (and actually triggers compute with dask="allowed"). Getting dask="allowed" working properly would require a full refactor of properscoring
  • crps_ensemble : this is a wrapper on properscoring which expects numpy arrays (and actually triggers compute with dask="allowed"). Getting dask="allowed" working properly would require a full refactor of properscoring
  • brier_score : this is a wrapper on properscoring which expects numpy arrays (and actually triggers compute with dask="allowed"). Getting dask="allowed" working properly would require a full refactor of properscoring
  • threshold_brier_score : this is a wrapper on properscoring which expects numpy arrays (and actually triggers compute with dask="allowed"). Getting dask="allowed" working properly would require a full refactor of properscoring
  • rank_histogram : need to wrap bottleneck.nanrankdata with dask.map_blocks or equivalent, although I don't know if this is any better than using dask="parallelized"
  • reliability : already dask="allowed"

xskillscore.resampling

  • resample_iterations_idx : use dask moveaxis when dask array. This would be easily handled with duck array ops - see below.
@dougiesquire
Copy link
Collaborator Author

As a general note, I'd suggest that we implement in xskillscore something like xarray's duck_array_ops module. This would make some of the suggestions above very easy to implement/read and replace, for example, the _get_numpy_funcs function in xskillscore.deterministic. I'll try to open a PR for this when I next find some time.

@dougiesquire
Copy link
Collaborator Author

I should also point out that even for those functions that ostensibly can just be switched to dask="allowed", I think @ahuang11 encountered some issues when the forecasts, observations and weights are not all dask or all numpy arrays. Details here. We'll need to resolve these issues with the first PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants