Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relocate data loading in standardize-gcm #179

Merged

Conversation

emileten
Copy link
Contributor

@emileten emileten commented Feb 16, 2022

@brews there is more detailed background information in ClimateImpactLab/downscaleCMIP6#574, but here I summarize it :

This PR directly modifies a previous one that had breaking changes : #151
Back then, the change was to load the data in memory in dodola.services.clean_cmip6, to fix chunking problems in the case where in the underlying dodola.core.standardize_gcm, calendar == 360_day.

We observe in ClimateImpactLab/downscaleCMIP6#574 that this PR broke runs with data that's too large for our nodes and the downstream computations.

My suggested fix here is to simply move that data loading operation to where it's needed. It turns out that models with high resolution data do not have a 360 days calendar. It's only luck, but I find this quick, cheap fix appealing. And loading operations are arguably part of dodola.core, since all the code inside of it relies on xarray datasets.

@emileten emileten self-assigned this Feb 16, 2022
@emileten emileten added the bug Something isn't working label Feb 16, 2022
@emileten emileten changed the title relocate data loading relocate data loading in standardize-gcm Feb 16, 2022
@emileten emileten requested a review from brews February 16, 2022 12:11
Copy link
Member

@brews brews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix, @emileten.

I have one change/confirmation before this merges. It's noted with an inline-comment.

I'm curious where exactly we have the memory explosion with EC-Earth3 input... we may not have time to profile that properly. I don't know if you've already looked into this?

Once this merges we want to be sure to manually check that dodola:dev works properly with 360 calendar GCM input as well as with input from EC-Earth3.

dodola/core.py Outdated Show resolved Hide resolved
Co-authored-by: Brewster Malevich <sbmalev@gmail.com>
@emileten
Copy link
Contributor Author

Thanks for the quick fix, @emileten.

I have one change/confirmation before this merges. It's noted with an inline-comment.

I'm curious where exactly we have the memory explosion with EC-Earth3 input... we may not have time to profile that properly. I don't know if you've already looked into this?

Once this merges we want to be sure to manually check that dodola:dev works properly with 360 calendar GCM input as well as with input from EC-Earth3.

@brews Thanks for reviewing !

Yes I looked at this, it's a quick manual profiling, but those are the two lines that trigger this memory blow up :

ds_cleaned["pr"] = ds_cleaned["pr"] * mmday_conversion

ds_converted = xclim_remove_leapdays(ds_cleaned)

Which is not surprising, if the data is loaded in memory.

@emileten
Copy link
Contributor Author

emileten commented Feb 17, 2022

I checked with dodola:dev that this PR

  • fixes our problems, in this successful workflow, that's repeating a standardize step from that one that had failed.

  • introduces backward compatible changes, in this successful workflow, that's repeating a standardize step from that one, that had succeeded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants