Replace parallelized with allowed #221

ahuang11 · 2020-11-04T01:53:43Z

See benchmarks in comments:
#207

aaronspring · 2020-11-04T07:48:43Z

can you run asv to show it is faster? there we have tests for different dataset sizes. also your benchmarks are only done on one machine

aaronspring · 2020-11-04T07:49:23Z

xskillscore/core/deterministic.py

@@ -137,7 +137,7 @@ def pearson_r(a, b, dim=None, weights=None, skipna=False, keep_attrs=False):
        weights,
        input_core_dims=input_core_dims,
        kwargs={"axis": -1, "skipna": skipna},
-        dask="parallelized",
+        dask="allowed",


I am still in favor of having dask as a global keyword, see xarray.set_config

~/Coding/xskillscore/asv_bench$ asv continuous -f 1.1 upstream/master HEAD -b deterministic.Compute_small.time_xskillscore_metric_small

[ 70.83%] ··· ================================================ ======== m ------------------------------------------------ -------- <function rmse at 0x7fe9d0353730> failed <function pearson_r at 0x7fe9d032c400> failed <function mae at 0x7fe9d0353840> failed <function mse at 0x7fe9d03537b8> failed <function pearson_r_p_value at 0x7fe9d0353378> failed ================================================ ========

can do easy checks with asv dev -b deterministic.Compute_small.time_xskillscore_metric_small, must delete that encoding it seems

ahuang11 · 2020-11-04T14:24:49Z

I couldn't get as running.

…

On Wed, Nov 4, 2020, 1:49 AM Aaron Spring ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In xskillscore/core/deterministic.py <#221 (comment)> : > @@ -137,7 +137,7 @@ def pearson_r(a, b, dim=None, weights=None, skipna=False, keep_attrs=False): weights, input_core_dims=input_core_dims, kwargs={"axis": -1, "skipna": skipna}, - dask="parallelized", + dask="allowed", I am still in favor of having dask as a global keyword, see xarray.set_config — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#221 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADU7FFUM7KU3CWRL4MZL3LDSOEBRBANCNFSM4TJOVA5Q> .

ahuang11 · 2020-11-04T15:15:36Z

Can you take a look at the issue page where I copied my error? it says my cache failed or something

…

On Wed, Nov 4, 2020, 9:06 AM Aaron Spring ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In xskillscore/core/deterministic.py <#221 (comment)> : > @@ -137,7 +137,7 @@ def pearson_r(a, b, dim=None, weights=None, skipna=False, keep_attrs=False): weights, input_core_dims=input_core_dims, kwargs={"axis": -1, "skipna": skipna}, - dask="parallelized", + dask="allowed", ~/Coding/xskillscore/asv_bench$ asv continuous -f 1.1 upstream/master HEAD -b deterministic.Compute_small.time_xskillscore_metric_small — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#221 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADU7FFSDEUNEFCJ5HJZ3363SOFUYXANCNFSM4TJOVA5Q> .

ahuang11 · 2020-11-05T01:27:56Z

asv continuous -f 1.1 origin/allowed HEAD -b deterministic
· Creating environments
· Discovering benchmarks
·· Uninstalling from conda-py3.6-bottleneck-dask-numba-numpy-xarray.
·· Installing 389df7dd into conda-py3.6-bottleneck-dask-numba-numpy-xarray.
· Running 12 total benchmarks (2 commits * 1 environments * 6 benchmarks)
[  0.00%] · For project commit 389df7dd (round 1/2):
[  0.00%] ·· Benchmarking conda-py3.6-bottleneck-dask-numba-numpy-xarray
[  8.33%] ··· Setting up deterministic.py:95                                                                                                                                                                 ok
[  8.33%] ··· Running (deterministic.Compute_large.time_xskillscore_metric_large--).
[ 16.67%] ··· Setting up deterministic.py:119                                                                                                                                                                ok
[ 16.67%] ··· Running (deterministic.Compute_large_dask.time_xskillscore_metric_large_dask--)..
[ 25.00%] · For project commit 389df7dd (round 1/2):
[ 25.00%] ·· Benchmarking conda-py3.6-bottleneck-dask-numba-numpy-xarray
[ 33.33%] ··· Setting up deterministic.py:95                                                                                                                                                                 ok
[ 33.33%] ··· Running (deterministic.Compute_large.time_xskillscore_metric_large--).
[ 41.67%] ··· Setting up deterministic.py:119                                                                                                                                                                ok
[ 41.67%] ··· Running (deterministic.Compute_large_dask.time_xskillscore_metric_large_dask--)..
[ 50.00%] · For project commit 389df7dd (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.6-bottleneck-dask-numba-numpy-xarray
[ 54.17%] ··· Setting up deterministic.py:95                                                                                                                                                                 ok
[ 54.17%] ··· deterministic.Compute_large.peakmem_xskillscore_metric_large                                                                                                                                   ok
[ 54.17%] ··· ================================================ =======
                                     m
              ------------------------------------------------ -------
                     <function rmse at 0x7f1f37f93730>           726M
                   <function pearson_r at 0x7f1f37f6e400>       1.01G
                      <function mae at 0x7f1f37f93840>           790M
                      <function mse at 0x7f1f37f937b8>           694M
               <function pearson_r_p_value at 0x7f1f37f93378>   1.01G
              ================================================ =======

[ 58.33%] ··· deterministic.Compute_large.time_xskillscore_metric_large                                                                                                                                      ok
[ 58.33%] ··· ================================================ ============
                                     m
              ------------------------------------------------ ------------
                     <function rmse at 0x7f1f37f93730>           582±7ms
                   <function pearson_r at 0x7f1f37f6e400>       1.24±0.01s
                      <function mae at 0x7f1f37f93840>           652±10ms
                      <function mse at 0x7f1f37f937b8>           522±10ms
               <function pearson_r_p_value at 0x7f1f37f93378>   1.37±0.01s
              ================================================ ============

[ 62.50%] ··· Setting up deterministic.py:119                                                                                                                                                                ok
[ 62.50%] ··· deterministic.Compute_large_dask.peakmem_xskillscore_metric_large_dask                                                                                                                 1/5 failed
[ 62.50%] ··· ================================================ ========
                                     m
              ------------------------------------------------ --------
                     <function rmse at 0x7f1f37f93730>           660M
                   <function pearson_r at 0x7f1f37f6e400>        821M
                      <function mae at 0x7f1f37f93840>           696M
                      <function mse at 0x7f1f37f937b8>           670M
               <function pearson_r_p_value at 0x7f1f37f93378>   failed
              ================================================ ========

[ 66.67%] ··· deterministic.Compute_large_dask.time_xskillscore_metric_large_dask                                                                                                                    1/5 failed
[ 66.67%] ··· ================================================ ==========
                                     m
              ------------------------------------------------ ----------
                     <function rmse at 0x7f1f37f93730>          334±40ms
                   <function pearson_r at 0x7f1f37f6e400>       620±20ms
                      <function mae at 0x7f1f37f93840>          403±20ms
                      <function mse at 0x7f1f37f937b8>          309±60ms
               <function pearson_r_p_value at 0x7f1f37f93378>    failed
              ================================================ ==========

[ 70.83%] ··· deterministic.Compute_small.peakmem_xskillscore_metric_small                                                                                                                                   ok
[ 70.83%] ··· ================================================ =======
                                     m
              ------------------------------------------------ -------
                     <function rmse at 0x7f1f37f93730>          15.3M
                   <function pearson_r at 0x7f1f37f6e400>       15.7M
                      <function mae at 0x7f1f37f93840>          15.3M
                      <function mse at 0x7f1f37f937b8>          15.3M
               <function pearson_r_p_value at 0x7f1f37f93378>   16.3M
              ================================================ =======

[ 75.00%] ··· deterministic.Compute_small.time_xskillscore_metric_small                                                                                                                                      ok
[ 75.00%] ··· ================================================ =============
                                     m
              ------------------------------------------------ -------------
                     <function rmse at 0x7f1f37f93730>           2.41±0.1ms
                   <function pearson_r at 0x7f1f37f6e400>        2.70±0.2ms
                      <function mae at 0x7f1f37f93840>           2.55±0.3ms
                      <function mse at 0x7f1f37f937b8>           2.44±0.2ms
               <function pearson_r_p_value at 0x7f1f37f93378>   3.00±0.08ms
              ================================================ =============

[ 75.00%] · For project commit 389df7dd (round 2/2):
[ 75.00%] ·· Benchmarking conda-py3.6-bottleneck-dask-numba-numpy-xarray
[ 79.17%] ··· Setting up deterministic.py:95                                                                                                                                                                 ok
[ 79.17%] ··· deterministic.Compute_large.peakmem_xskillscore_metric_large                                                                                                                                   ok
[ 79.17%] ··· ================================================ =======
                                     m
              ------------------------------------------------ -------
                     <function rmse at 0x7f1f37f93730>           726M
                   <function pearson_r at 0x7f1f37f6e400>       1.01G
                      <function mae at 0x7f1f37f93840>           790M
                      <function mse at 0x7f1f37f937b8>           694M
               <function pearson_r_p_value at 0x7f1f37f93378>   1.01G
              ================================================ =======

[ 83.33%] ··· deterministic.Compute_large.time_xskillscore_metric_large                                                                                                                                      ok
[ 83.33%] ··· ================================================ ============
                                     m
              ------------------------------------------------ ------------
                     <function rmse at 0x7f1f37f93730>           738±50ms
                   <function pearson_r at 0x7f1f37f6e400>        1.27±0s
                      <function mae at 0x7f1f37f93840>           668±6ms
                      <function mse at 0x7f1f37f937b8>           509±3ms
               <function pearson_r_p_value at 0x7f1f37f93378>   1.37±0.01s
              ================================================ ============

[ 87.50%] ··· Setting up deterministic.py:119                                                                                                                                                                ok
[ 87.50%] ··· deterministic.Compute_large_dask.peakmem_xskillscore_metric_large_dask                                                                                                                 1/5 failed
[ 87.50%] ··· ================================================ ========
                                     m
              ------------------------------------------------ --------
                     <function rmse at 0x7f1f37f93730>           665M
                   <function pearson_r at 0x7f1f37f6e400>        781M
                      <function mae at 0x7f1f37f93840>           691M
                      <function mse at 0x7f1f37f937b8>           660M
               <function pearson_r_p_value at 0x7f1f37f93378>   failed
              ================================================ ========

[ 91.67%] ··· deterministic.Compute_large_dask.time_xskillscore_metric_large_dask                                                                                                                    1/5 failed
[ 91.67%] ··· ================================================ ===========
                                     m
              ------------------------------------------------ -----------
                     <function rmse at 0x7f1f37f93730>           378±40ms
                   <function pearson_r at 0x7f1f37f6e400>        599±20ms
                      <function mae at 0x7f1f37f93840>          324±100ms
                      <function mse at 0x7f1f37f937b8>           267±30ms
               <function pearson_r_p_value at 0x7f1f37f93378>     failed
              ================================================ ===========

[ 95.83%] ··· deterministic.Compute_small.peakmem_xskillscore_metric_small                                                                                                                                   ok
[ 95.83%] ··· ================================================ =======
                                     m
              ------------------------------------------------ -------
                     <function rmse at 0x7f1f37f93730>          15.4M
                   <function pearson_r at 0x7f1f37f6e400>       15.7M
                      <function mae at 0x7f1f37f93840>          15.4M
                      <function mse at 0x7f1f37f937b8>          15.4M
               <function pearson_r_p_value at 0x7f1f37f93378>   16.3M
              ================================================ =======

[100.00%] ··· deterministic.Compute_small.time_xskillscore_metric_small                                                                                                                                      ok
[100.00%] ··· ================================================ =============
                                     m
              ------------------------------------------------ -------------
                     <function rmse at 0x7f1f37f93730>           2.39±0.1ms
                   <function pearson_r at 0x7f1f37f6e400>       2.47±0.05ms
                      <function mae at 0x7f1f37f93840>           2.30±0.1ms
                      <function mse at 0x7f1f37f937b8>          2.30±0.06ms
               <function pearson_r_p_value at 0x7f1f37f93378>    2.85±0.2ms
              ================================================ =============


BENCHMARKS NOT SIGNIFICANTLY CHANGED.

ahuang11 · 2020-11-05T01:36:28Z

asv continuous -f 1.1 upstream/master HEAD -b deterministic
· Creating environments
· Discovering benchmarks
· Running 12 total benchmarks (2 commits * 1 environments * 6 benchmarks)
[  0.00%] · For project commit 01761162 (round 1/2):
[  0.00%] ·· Building for conda-py3.6-bottleneck-dask-numba-numpy-xarray..
[  0.00%] ·· Benchmarking conda-py3.6-bottleneck-dask-numba-numpy-xarray
[  8.33%] ··· Setting up deterministic.py:95                                                                                                                                                                 ok
[  8.33%] ··· Running (deterministic.Compute_large.time_xskillscore_metric_large--).
[ 16.67%] ··· Setting up deterministic.py:119                                                                                                                                                                ok
[ 16.67%] ··· Running (deterministic.Compute_large_dask.time_xskillscore_metric_large_dask--)..
[ 25.00%] · For project commit 389df7dd (round 1/2):
[ 25.00%] ·· Building for conda-py3.6-bottleneck-dask-numba-numpy-xarray..
[ 25.00%] ·· Benchmarking conda-py3.6-bottleneck-dask-numba-numpy-xarray
[ 33.33%] ··· Setting up deterministic.py:95                                                                                                                                                                 ok
[ 33.33%] ··· Running (deterministic.Compute_large.time_xskillscore_metric_large--).
[ 41.67%] ··· Setting up deterministic.py:119                                                                                                                                                                ok
[ 41.67%] ··· Running (deterministic.Compute_large_dask.time_xskillscore_metric_large_dask--)..
[ 50.00%] · For project commit 389df7dd (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.6-bottleneck-dask-numba-numpy-xarray
[ 54.17%] ··· Setting up deterministic.py:95                                                                                                                                                                 ok
[ 54.17%] ··· deterministic.Compute_large.peakmem_xskillscore_metric_large                                                                                                                                   ok
[ 54.17%] ··· ================================================ =======
                                     m
              ------------------------------------------------ -------
                     <function rmse at 0x7ff4652d3730>           726M
                   <function pearson_r at 0x7ff4652ad400>       1.01G
                      <function mae at 0x7ff4652d3840>           790M
                      <function mse at 0x7ff4652d37b8>           694M
               <function pearson_r_p_value at 0x7ff4652d3378>   1.01G
              ================================================ =======

[ 58.33%] ··· deterministic.Compute_large.time_xskillscore_metric_large                                                                                                                                      ok
[ 58.33%] ··· ================================================ ============
                                     m
              ------------------------------------------------ ------------
                     <function rmse at 0x7ff4652d3730>           588±4ms
                   <function pearson_r at 0x7ff4652ad400>       1.25±0.01s
                      <function mae at 0x7ff4652d3840>           668±6ms
                      <function mse at 0x7ff4652d37b8>           514±5ms
               <function pearson_r_p_value at 0x7ff4652d3378>   1.36±0.01s
              ================================================ ============

[ 62.50%] ··· Setting up deterministic.py:119                                                                                                                                                                ok
[ 62.50%] ··· deterministic.Compute_large_dask.peakmem_xskillscore_metric_large_dask                                                                                                                 1/5 failed
[ 62.50%] ··· ================================================ ========
                                     m
              ------------------------------------------------ --------
                     <function rmse at 0x7ff4652d3730>           661M
                   <function pearson_r at 0x7ff4652ad400>        712M
                      <function mae at 0x7ff4652d3840>           688M
                      <function mse at 0x7ff4652d37b8>           657M
               <function pearson_r_p_value at 0x7ff4652d3378>   failed
              ================================================ ========

[ 66.67%] ··· deterministic.Compute_large_dask.time_xskillscore_metric_large_dask                                                                                                                    1/5 failed
[ 66.67%] ··· ================================================ ===========
                                     m
              ------------------------------------------------ -----------
                     <function rmse at 0x7ff4652d3730>           312±30ms
                   <function pearson_r at 0x7ff4652ad400>        548±80ms
                      <function mae at 0x7ff4652d3840>          266±100ms
                      <function mse at 0x7ff4652d37b8>           276±9ms
               <function pearson_r_p_value at 0x7ff4652d3378>     failed
              ================================================ ===========

[ 70.83%] ··· deterministic.Compute_small.peakmem_xskillscore_metric_small                                                                                                                                   ok
[ 70.83%] ··· ================================================ =======
                                     m
              ------------------------------------------------ -------
                     <function rmse at 0x7ff4652d3730>          15.4M
                   <function pearson_r at 0x7ff4652ad400>       15.7M
                      <function mae at 0x7ff4652d3840>          15.3M
                      <function mse at 0x7ff4652d37b8>          15.3M
               <function pearson_r_p_value at 0x7ff4652d3378>   16.3M
              ================================================ =======

[ 75.00%] ··· deterministic.Compute_small.time_xskillscore_metric_small                                                                                                                                      ok
[ 75.00%] ··· ================================================ =============
                                     m
              ------------------------------------------------ -------------
                     <function rmse at 0x7ff4652d3730>          2.21±0.04ms
                   <function pearson_r at 0x7ff4652ad400>       2.53±0.09ms
                      <function mae at 0x7ff4652d3840>          2.23±0.03ms
                      <function mse at 0x7ff4652d37b8>          2.16±0.06ms
               <function pearson_r_p_value at 0x7ff4652d3378>    2.61±0.1ms
              ================================================ =============

[ 75.00%] · For project commit 01761162 (round 2/2):
[ 75.00%] ·· Building for conda-py3.6-bottleneck-dask-numba-numpy-xarray..
[ 75.00%] ·· Benchmarking conda-py3.6-bottleneck-dask-numba-numpy-xarray
[ 79.17%] ··· Setting up deterministic.py:95                                                                                                                                                                 ok
[ 79.17%] ··· deterministic.Compute_large.peakmem_xskillscore_metric_large                                                                                                                                   ok
[ 79.17%] ··· ================================================ =======
                                     m
              ------------------------------------------------ -------
                     <function rmse at 0x7ff4652d3730>           726M
                   <function pearson_r at 0x7ff4652ad400>       1.01G
                      <function mae at 0x7ff4652d3840>           790M
                      <function mse at 0x7ff4652d37b8>           694M
               <function pearson_r_p_value at 0x7ff4652d3378>   1.01G
              ================================================ =======

[ 83.33%] ··· deterministic.Compute_large.time_xskillscore_metric_large                                                                                                                                      ok
[ 83.33%] ··· ================================================ ============
                                     m
              ------------------------------------------------ ------------
                     <function rmse at 0x7ff4652d3730>           593±9ms
                   <function pearson_r at 0x7ff4652ad400>       1.26±0.02s
                      <function mae at 0x7ff4652d3840>           688±10ms
                      <function mse at 0x7ff4652d37b8>           525±10ms
               <function pearson_r_p_value at 0x7ff4652d3378>   1.36±0.02s
              ================================================ ============

[ 87.50%] ··· Setting up deterministic.py:119                                                                                                                                                                ok
[ 87.50%] ··· deterministic.Compute_large_dask.peakmem_xskillscore_metric_large_dask                                                                                                                         ok
[ 87.50%] ··· ================================================ =======
                                     m
              ------------------------------------------------ -------
                     <function rmse at 0x7ff4652d3730>           883M
                   <function pearson_r at 0x7ff4652ad400>       1.25G
                      <function mae at 0x7ff4652d3840>           922M
                      <function mse at 0x7ff4652d37b8>           853M
               <function pearson_r_p_value at 0x7ff4652d3378>    1.3G
              ================================================ =======

[ 91.67%] ··· deterministic.Compute_large_dask.time_xskillscore_metric_large_dask                                                                                                                            ok
[ 91.67%] ··· ================================================ ==========
                                     m
              ------------------------------------------------ ----------
                     <function rmse at 0x7ff4652d3730>          374±50ms
                   <function pearson_r at 0x7ff4652ad400>       627±70ms
                      <function mae at 0x7ff4652d3840>          453±50ms
                      <function mse at 0x7ff4652d37b8>          358±50ms
               <function pearson_r_p_value at 0x7ff4652d3378>   711±80ms
              ================================================ ==========

[ 95.83%] ··· deterministic.Compute_small.peakmem_xskillscore_metric_small                                                                                                                                   ok
[ 95.83%] ··· ================================================ =======
                                     m
              ------------------------------------------------ -------
                     <function rmse at 0x7ff4652d3730>          15.3M
                   <function pearson_r at 0x7ff4652ad400>       15.7M
                      <function mae at 0x7ff4652d3840>          15.3M
                      <function mse at 0x7ff4652d37b8>          15.3M
               <function pearson_r_p_value at 0x7ff4652d3378>   16.2M
              ================================================ =======

[100.00%] ··· deterministic.Compute_small.time_xskillscore_metric_small                                                                                                                                      ok
[100.00%] ··· ================================================ =============
                                     m
              ------------------------------------------------ -------------
                     <function rmse at 0x7ff4652d3730>          2.19±0.07ms
                   <function pearson_r at 0x7ff4652ad400>        2.36±0.1ms
                      <function mae at 0x7ff4652d3840>          2.22±0.06ms
                      <function mse at 0x7ff4652d37b8>           2.18±0.1ms
               <function pearson_r_p_value at 0x7ff4652d3378>   2.48±0.04ms
              ================================================ =============

       before           after         ratio
     [01761162]       [389df7dd]
!            1.3G           failed      n/a  deterministic.Compute_large_dask.peakmem_xskillscore_metric_large_dask(<function pearson_r_p_value at 0x7ff4652d3378>)
!        711±80ms           failed      n/a  deterministic.Compute_large_dask.time_xskillscore_metric_large_dask(<function pearson_r_p_value at 0x7ff4652d3378>)
-            853M             657M     0.77  deterministic.Compute_large_dask.peakmem_xskillscore_metric_large_dask(<function mse at 0x7ff4652d37b8>)
-            883M             661M     0.75  deterministic.Compute_large_dask.peakmem_xskillscore_metric_large_dask(<function rmse at 0x7ff4652d3730>)
-            922M             688M     0.75  deterministic.Compute_large_dask.peakmem_xskillscore_metric_large_dask(<function mae at 0x7ff4652d3840>)
-           1.25G             712M     0.57  deterministic.Compute_large_dask.peakmem_xskillscore_metric_large_dask(<function pearson_r at 0x7ff4652ad400>)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

ahuang11 · 2020-11-05T01:37:27Z

If I'm not mistaken, the first ASV benchmark was using my allowed branch, the second ASV benchmark was using the upstream master. Is this correct?

aaronspring · 2020-11-05T09:56:00Z

If I'm not mistaken, the first ASV benchmark was using my allowed branch, the second ASV benchmark was using the upstream master. Is this correct?

I think so, yes.

ahuang11 · 2020-11-06T23:43:18Z

It means its faster?

aaronspring · 2020-11-08T11:01:23Z

It only shows benchmarks that have a 10% (1.1) performance change. Here PERFORMANCE DECREASED maybe because of the failing but from reading the table I would say that memory consumption was decreased significantly but no timing improvement

aaronspring · 2020-11-08T11:04:35Z

https://stackoverflow.com/questions/51736172/whats-the-difference-between-dask-parallelized-and-dask-allowed-in-xarrays-app

Can we infer from this that all our functions should rather use allowed? Unsure.
Is this also true for the probabilistic functions?

ahuang11 · 2020-11-08T13:31:41Z

I think so.

…

On Sun, Nov 8, 2020, 5:04 AM Aaron Spring ***@***.***> wrote: https://stackoverflow.com/questions/51736172/whats-the-difference-between-dask-parallelized-and-dask-allowed-in-xarrays-app Can we infer from this that all our functions should rather use allowed? seems so. Is this also true for the probabilistic functions? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#221 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADU7FFTEEEPD2X4SCHD27BLSOZ3M5ANCNFSM4TJOVA5Q> .

aaronspring · 2020-11-08T21:11:57Z

Got environment errors. I will try tomorrow again

aaronspring · 2020-11-10T14:09:11Z

I only get asv dev running but not asv run or asv continuous....
EDIT: Sorry, I wont get this done. my supercomputer conda is somehow broke and reinstalling doesnt help

ahuang11 · 2020-11-13T15:36:41Z

I've been testing my branch on a HPC and a huge dataset with shape (2, 4, 336, 361, 3, 720, 1, 5, 26). No hard metrics, but it seems to be much faster because when parallelized it has to distribute the data across cores and use significantly more memory (especially since weighted skipna triggers eager computation)?

aaronspring

go ahead with this PR. sounds all reasonable. sorry for blocking this. hoped to get a nice asv benchmark for this but failed

ahuang11 · 2020-11-13T17:27:16Z

Actually hold on. Let me do more testing; I think passing weights + skipna alongside dask='allowed' is causing it to error out.

bradyrx · 2020-11-13T20:58:56Z

Sorry, just getting to this now. So my understanding is dask='allowed' works if dask arrays can be passed through. Is it just because we're using simple pure numpy funcs that this works, since they can be applied to dask arrays? I.e., we don't need to use dask.array.sum() in this case? I guess your speedup proves that is the case. Just trying to understand for application to climpred and in other places..

bradyrx · 2020-11-13T21:00:19Z

Actually hold on. Let me do more testing; I think passing weights + skipna alongside dask='allowed' is causing it to error out.

Yes would be good to be sure that that works! I am also in favor of an asv test if possible.

@aaronspring would be good to do the config settings in another PR. Would be nice to see how to implement it on a lightweight package like this for future use in e.g. climpred.

@lukelbd has quite the extensive config system as well at proplot (https://proplot.readthedocs.io/en/latest/configuration.html)

aaronspring · 2020-12-07T11:00:39Z

resolving tests and with the xarray discussion answer I think we are good with this PR.

ahuang11 · 2020-12-11T01:50:05Z

Had to revert some probabilistic metrics back to parallelized because properscoring probably has some non-numpy calls.

ahuang11 · 2020-12-11T02:19:21Z

No idea why all the errors are now all back. Maybe merge the no_compute first then deal with this.

aaronspring · 2021-01-08T20:36:19Z

What’s the status here? Didn’t we all agree that allowed was better?

ahuang11 · 2021-01-09T17:32:08Z

No idea what's wrong with the lint stuff

raybellwaves · 2021-01-12T03:31:19Z

Failing at

pytest xskillscore/tests/test_deterministic.py

raybellwaves · 2021-01-12T03:40:56Z

Moved the dask keyword input to a variable.

Setting it to "parallelized" and the tests run fine.

Similar to probabilistic it may not work with all functions.

raybellwaves · 2021-01-13T16:57:15Z

Probably have to play whack-a-mole to see which functions work with "allowed" and then benchmarks those that do against "parallelized"

raybellwaves · 2021-01-15T19:02:42Z

Just making notes here.

Deterministics Metrics that don't work with allowed

pearson_r_p_value
pearson_r_eff_p_value
spearman_r
spearman_r_p_value
spearman_r_eff_p_value
me
rmse
mse
mae
mape
smape

Deterministics Metrics that do work with allowed (and parallelized)

pearson_r
r2
effective_sample_size
median_absolute_error

Probabilistic Metrics that don't work with allowed

crps_gaussian
crps_quadrature
crps_ensemble
brier_score
threshold_brier_score
rank_histogram

Probabilistic Metrics that do work with allowed (and not parallelized)

reliability

Contingency Metrics that do work with allowed (and not parallelized)

gerrity_score

raybellwaves · 2021-01-15T19:03:17Z

Next steps benchmark the four deterministic metrics.

ahuang11 · 2021-01-15T19:37:19Z

I thought RMSE did work?

raybellwaves · 2021-01-15T20:44:06Z

I thought RMSE did work?

The tests fail

ahuang11 · 2021-01-15T20:47:37Z

Weird. I would expect it to work since it's only like a line of numpy code

…

On Fri, Jan 15, 2021, 2:44 PM Ray Bell ***@***.***> wrote: I thought RMSE did work? The tests fail — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#221 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADU7FFVHFMVI2BQUPZ5WH3LS2CSKNANCNFSM4TJOVA5Q> .

raybellwaves · 2021-01-15T20:49:11Z

Weird. I would expect it to work since it's only like a line of numpy code

Yeah. Looking at the metrics above. I can't make sense of what works and what doesn't.

dougiesquire · 2021-01-15T23:08:50Z

I thought RMSE did work?

The tests fail

@raybellwaves, how does it fail? Do you get an error message that you can share?

Using dask="allowed" simply tells apply_ufunc to pass the underlying (either dask or numpy array) array directly to the function. So, for it to work robustly, the function must be able to handle both dask and numpy arrays. Conveniently, numpy ufuncs can ingest and egress dask arrays (i.e. stay lazy), but other functions/operations obviously can't necessarily. Skimming through deterministic._rmse, it looks as though everything should be able to handle dask arrays, but if it's failing there may be an operation somewhere that can't.

I'm very late to the party here, but I do think this is a worthwhile exercise. There is added overhead when running dask="parallelized" to create a wrapper that allows the provided function to act on dask arrays.

raybellwaves · 2021-01-16T02:52:38Z

Thanks for chiming in @dougiesquire.

I believe you can find the errors in the logs of the runs (https://github.com/xarray-contrib/xskillscore/pull/221/checks?check_run_id=1696533136) but there's a lot there. Best to fork Andrew's repo and dabble.

Just changing rmse (https://github.com/xarray-contrib/xskillscore/blob/master/xskillscore/core/deterministic.py#L843) to "allowed". now run the test

$ pytest xskillscore/tests/test_deterministic.py (https://github.com/xarray-contrib/xskillscore/blob/master/xskillscore/tests/test_deterministic.py)

It fails on test_distance_metrics_xr_dask (https://github.com/xarray-contrib/xskillscore/blob/master/xskillscore/tests/test_deterministic.py#L217)

I'll summarize below

     797 passed
       5 failed
         - xskillscore/tests/test_deterministic.py:212 test_distance_metrics_xr_dask[True-True-True-time-metrics2]
         - xskillscore/tests/test_deterministic.py:212 test_distance_metrics_xr_dask[True-True-True-lat-metrics2]
         - xskillscore/tests/test_deterministic.py:212 test_distance_metrics_xr_dask[True-True-True-lon-metrics2]
         - xskillscore/tests/test_deterministic.py:212 test_distance_metrics_xr_dask[True-True-True-dim3-metrics2]
         - xskillscore/tests/test_deterministic.py:212 test_distance_metrics_xr_dask[True-True-True-dim4-metrics2]

So it fails during the condition of has_nan, skipna and weight_bool. i.e. has nans, skip then and also weight the calculation.

Moving up the Traceback

======================================================================================== short test summary info ========================================================================================
FAILED xskillscore/tests/test_deterministic.py::test_distance_metrics_xr_dask[True-True-True-time-metrics2] - IndexError: too many indices for array: array is 3-dimensional, but 8 were indexed
FAILED xskillscore/tests/test_deterministic.py::test_distance_metrics_xr_dask[True-True-True-lat-metrics2] - IndexError: too many indices for array: array is 3-dimensional, but 24 were indexed
FAILED xskillscore/tests/test_deterministic.py::test_distance_metrics_xr_dask[True-True-True-lon-metrics2] - IndexError: too many indices for array: array is 3-dimensional, but 24 were indexed
FAILED xskillscore/tests/test_deterministic.py::test_distance_metrics_xr_dask[True-True-True-dim3-metrics2] - IndexError: too many indices for array: array is 3-dimensional, but 24 were indexed
FAILED xskillscore/tests/test_deterministic.py::test_distance_metrics_xr_dask[True-True-True-dim4-metrics2] - IndexError: too many indices for array: array is 3-dimensional, but 24 were indexed

xskillscore/core/deterministic.py:836: in rmse
    return xr.apply_ufunc(
../../../local/bin/anaconda3/envs/xskillscore-dev/lib/python3.8/site-packages/xarray/core/computation.py:1128: in apply_ufunc
    return apply_dataarray_vfunc(
../../../local/bin/anaconda3/envs/xskillscore-dev/lib/python3.8/site-packages/xarray/core/computation.py:271: in apply_dataarray_vfunc
    result_var = func(*data_vars)
../../../local/bin/anaconda3/envs/xskillscore-dev/lib/python3.8/site-packages/xarray/core/computation.py:724: in apply_variable_ufunc
    result_data = func(*input_data)
xskillscore/core/np_deterministic.py:534: in _rmse
    a, b, weights = _match_nans(a, b, weights)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

a = dask.array<where, shape=(12, 4, 5), dtype=float64, chunksize=(12, 4, 5), chunktype=numpy.ndarray>
b = dask.array<where, shape=(12, 4, 5), dtype=float64, chunksize=(12, 4, 5), chunktype=numpy.ndarray>
weights = array([[[1.        , 1.        , 1.        , 1.        , 1.        ],
        [0.54030231, 0.54030231, 0.54030231, 0.5....41614684, 0.41614684, 0.41614684, 0.41614684],
        [0.9899925 , 0.9899925 , 0.9899925 , 0.9899925 , 0.9899925 ]]])

    def _match_nans(a, b, weights):
        """
        Considers missing values pairwise. If a value is missing
        in a, the corresponding value in b is turned to nan, and
        vice versa.
    
        Returns
        -------
        a, b, weights : ndarray
            a, b, and weights (if not None) with nans placed at
            pairwise locations.
    
        """
        if np.isnan(a).any() or np.isnan(b).any():
            # Avoids mutating original arrays and bypasses read-only issue.
            a, b = a.copy(), b.copy()
            # Find pairwise indices in a and b that have nans.
            idx = np.logical_or(np.isnan(a), np.isnan(b))
            a[idx], b[idx] = np.nan, np.nan
            # https://github.com/xarray-contrib/xskillscore/issues/168
            if isinstance(weights, np.ndarray):
                if weights.shape:  # not None
                    weights = weights.copy()
>                   weights[idx] = np.nan
E                   IndexError: too many indices for array: array is 3-dimensional, but 24 were indexed

xskillscore/core/np_deterministic.py:46: IndexError

This is failing here where populating the weights array with nans (https://github.com/xarray-contrib/xskillscore/blob/master/xskillscore/core/np_deterministic.py#L46)

What's interesting is there is warning here

  /home/ray/Documents/PYTHON_dev/xskillscore/xskillscore/core/np_deterministic.py:46: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
    weights[idx] = np.nan

Run the test that fails by hand (WIP)

import numpy as np
import pytest
import xarray as xr

np.random.seed(42)

times = xr.cftime_range(start="2000", periods=PERIODS, freq="D")
lats = np.arange(4)
lons = np.arange(5)
data = np.random.rand(len(times), len(lats), len(lons))

a = xr.DataArray(
    data,
    coords=[times, lats, lons],
    dims=["time", "lat", "lon"],
)

members = np.arange(3)
data = np.random.rand(len(members), len(times), len(lats), len(lons))

b = xr.DataArray(
    data,
    coords=[members, times, lats, lons],
    dims=["member", "time", "lat", "lon"],
).isel(member=0, drop=True)


a_nan = a.copy().where(a < 0.5)

b_nan = b.copy().where(b < 0.5)

raybellwaves · 2021-01-16T03:00:40Z

Created #245

dougiesquire · 2021-01-16T03:42:44Z

Thanks @raybellwaves. Taking a closer look.

Interestingly, rmse works fine with dask="allowed" when weights is also a dask array. But there is line in test_distance_metrics_xr_dask (https://github.com/xarray-contrib/xskillscore/blob/master/xskillscore/tests/test_deterministic.py#L235) that loads the weights prior to computing the rmse and this causes the failure.

That is,

xs.rmse(a_dask, b_dask, dim, weights_dask, skipna=True)

works fine, but

xs.rmse(a_dask, b_dask, dim, weights, skipna=True)

fails with the IndexError you saw above. Still investigating why this is.

dougiesquire · 2021-01-16T04:25:39Z

Okay, so there seems to be an issue with boolean indexing a numpy array using a dask array. This happens in xs.core.np_deterministic._match_nans() when a and b are dask arrays and weights is a numpy array (which is enforced in test_distance_metrics_xr_dask).

A simple example of the issue:

import numpy as np
import dask.array as da

data = np.random.rand(10, 10)

# This works
idx = data < 0.5
data[idx] = np.nan

# This produces an IndexError
idx_dask = da.from_array(idx)
data[idx_dask] = np.nan

This is probably the expected behaviour for dask/numpy. Certainly, indexing a dask array with numpy boolean indices also fails, though in this reverse case the error message is more interpretable. We could add a catch into _match_nans that computes the indices if weights is in memory? Or we could enforce earlier on that a, b and weights are all the same array type.

Or I could open an issue with the dask dev and see what they say?

ahuang11 · 2021-01-16T19:07:47Z

I think
"Or we could enforce earlier on that a, b and weights are all the same array type"
is the most reasonable.

aaronspring · 2021-05-05T22:04:45Z

What about this PR? Any chances of merging? @raybellwaves @ahuang11

ahuang11 · 2021-05-07T20:01:16Z

Feel free to move forward with it; if you want to make a new PR to facilitate that.

dougiesquire · 2021-05-09T01:48:51Z

Just started taking a look a this now and I'm struggling a little to work out what is going on (it doesn't help that all the test logs have expired).

As I've said above, I think the move to dask='allowed' in favour of dask='parallelized' is a good idea because the latter comes with additional overhead and will tend to be slower. But, depending on the xskillscore method in question, the change may range from being as simple as switching the keywords, to requiring a full refactor of the apply_ufunc'd function. Therefore, I wonder if it may be easier to implement this in more bite-sized pieces?

What do folks think about closing this PR and me opening another issue listing all uses of apply_ufunc with dask='parallelized' and a quick summary of the work entailed to get dask='allowed' running? Then we can start targeting and benchmarking specific methods as dedicated PRs?

raybellwaves · 2021-05-09T01:58:10Z

yes please do! @ahuang11 said he was happy for us to close and implement.

I agree. I think we can expose the dask arg in all our functions. Currently a user (or us) can't change it at run time and it's hard coded to parallelized. When we expose it we can benchmark both methods then we can choose the best default value.

@aaronspring is pretty savvy as asv if any help is required with benchmarking.

Could start with one metric: https://github.com/xarray-contrib/xskillscore/blob/main/xskillscore/core/deterministic.py#L800

dougiesquire · 2021-05-09T04:39:38Z

Closing and moving conversation to #315

Replace parallelized with allowed

389df7d

aaronspring reviewed Nov 4, 2020

View reviewed changes

aaronspring approved these changes Nov 13, 2020

View reviewed changes

Revert some probabilistic metrics to parallelized

317caaf

aaronspring mentioned this pull request Jan 8, 2021

0.0.19 release? #238

Closed

4 tasks

Ray Bell and others added 3 commits January 11, 2021 22:07

lint

d9b8514

Revert some probabilistic metrics to parallelized

fb25d3d

rebase

0ffe301

use variable

6c2e5ab

Merge branch 'master' into allowed

1149c2b

raybellwaves marked this pull request as draft January 15, 2021 14:53

raybellwaves mentioned this pull request Jan 31, 2021

create test_weights.py #254

Closed

raybellwaves mentioned this pull request Feb 10, 2021

create weights as dask array in test_deterministic.py #265

Closed

This was referenced May 6, 2021

show failing #305

Closed

rm weights.load() #306

Merged

dougiesquire mentioned this pull request May 9, 2021

apply_ufunc with dask="parallelized" and "allowed" #315

Open

25 tasks

dougiesquire closed this May 9, 2021

rpnaut mentioned this pull request Nov 2, 2021

resample and boolean indexing with dask-arrays #356

Open

Replace parallelized with allowed #221

Replace parallelized with allowed #221

Conversation

ahuang11 commented Nov 4, 2020

aaronspring commented Nov 4, 2020

aaronspring Nov 4, 2020

Choose a reason for hiding this comment

aaronspring Nov 4, 2020

Choose a reason for hiding this comment

ahuang11 Nov 4, 2020

Choose a reason for hiding this comment

aaronspring Nov 4, 2020

Choose a reason for hiding this comment

ahuang11 commented Nov 4, 2020 via email

ahuang11 commented Nov 4, 2020 via email

ahuang11 commented Nov 5, 2020 • edited Loading

ahuang11 commented Nov 5, 2020

ahuang11 commented Nov 5, 2020

aaronspring commented Nov 5, 2020

ahuang11 commented Nov 6, 2020

aaronspring commented Nov 8, 2020

aaronspring commented Nov 8, 2020 • edited Loading

ahuang11 commented Nov 8, 2020 via email

aaronspring commented Nov 8, 2020

aaronspring commented Nov 10, 2020 • edited Loading

ahuang11 commented Nov 13, 2020 • edited Loading

aaronspring left a comment

Choose a reason for hiding this comment

ahuang11 commented Nov 13, 2020 • edited Loading

bradyrx commented Nov 13, 2020

bradyrx commented Nov 13, 2020

aaronspring commented Dec 7, 2020

ahuang11 commented Dec 11, 2020

ahuang11 commented Dec 11, 2020

aaronspring commented Jan 8, 2021

ahuang11 commented Jan 9, 2021

raybellwaves commented Jan 12, 2021

raybellwaves commented Jan 12, 2021

raybellwaves commented Jan 13, 2021

raybellwaves commented Jan 15, 2021

raybellwaves commented Jan 15, 2021

ahuang11 commented Jan 15, 2021

raybellwaves commented Jan 15, 2021

ahuang11 commented Jan 15, 2021 via email

raybellwaves commented Jan 15, 2021

dougiesquire commented Jan 15, 2021

raybellwaves commented Jan 16, 2021

raybellwaves commented Jan 16, 2021

dougiesquire commented Jan 16, 2021

dougiesquire commented Jan 16, 2021 • edited Loading

ahuang11 commented Jan 16, 2021

aaronspring commented May 5, 2021

ahuang11 commented May 7, 2021

dougiesquire commented May 9, 2021

raybellwaves commented May 9, 2021 • edited Loading

dougiesquire commented May 9, 2021

ahuang11 commented Nov 5, 2020 •

edited

Loading

aaronspring commented Nov 8, 2020 •

edited

Loading

aaronspring commented Nov 10, 2020 •

edited

Loading

ahuang11 commented Nov 13, 2020 •

edited

Loading

ahuang11 commented Nov 13, 2020 •

edited

Loading

dougiesquire commented Jan 16, 2021 •

edited

Loading

raybellwaves commented May 9, 2021 •

edited

Loading