Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking Growing Season Length #3

Merged
merged 1 commit into from
Jan 17, 2020
Merged

Conversation

aulemahal
Copy link
Contributor

The current computation of "growing season length" in xclim uses enormous amounts of memory and usually fails with large datasets. I tested some other method to compute the thing and results are good, but less incredible than the last two benchmarks made this way.

Two methods:

  • The current implementation with small changes.
  • Using xc.run_length.first_run calls.

For the second case, I tested a lot of different versions, to try and pinpoint what was responsible for the memory consumption. The best way, is exp_firstruncheck.

Graphs:

  1. Small chunks (50x50) and many years (99).
    growing_season_99years_50x50_notimechunks

  2. Large chunks (200x200) and fewer years (50).
    growing_season_50years_200x200_notimechunks

Conclusion is that the default version with small tweaks can be sped up and made to take less memory. But, the method with first_run while being slower, consumes a lot less memory and does so more stabily.

I yet have to test with data that has chunks smaller than a year. More to come.

@aulemahal aulemahal added the enhancement New feature or request label Jan 16, 2020
@aulemahal aulemahal self-assigned this Jan 16, 2020
@aulemahal
Copy link
Contributor Author

@tlogan2000 With firstruncheck, I am currently computing the growing season length of a full generic scenario and the memory is stable at 10-12 GB (9-10% of doris) and I estimate a computation time of 25-30 min.

@tlogan2000 tlogan2000 merged commit ed5b38c into master Jan 17, 2020
@tlogan2000
Copy link
Contributor

Great. 25 minutes seems a bit long? What is the calc time for a 'normal' indicator. In any case at least we have a version that is memory stable

@sbiner
Copy link

sbiner commented Jan 17, 2020 via email

@aulemahal
Copy link
Contributor Author

aulemahal commented Jan 17, 2020

@sbiner J'utilise memory_profiler. Assez cool! Je lance mon script pour chaque "expérience" avec:

>>> mprof run -C bench_gsl.py exp

Le -C c'est pour suivre tous les fils et processus ("children"), nécessaire avec dask. Ça peut aussi faire des figures directement, mais j'ai préféré écrire mon propre code dans le script.

@sbiner
Copy link

sbiner commented Jan 17, 2020 via email

@aulemahal
Copy link
Contributor Author

@tlogan2000 According to my Portraits Climatiques update, 25 minutes seems normal for a 2-variable indicator (tas is made from tasmin and tasmax).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants