-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
relocate data loading in standardize-gcm #179
relocate data loading in standardize-gcm #179
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix, @emileten.
I have one change/confirmation before this merges. It's noted with an inline-comment.
I'm curious where exactly we have the memory explosion with EC-Earth3 input... we may not have time to profile that properly. I don't know if you've already looked into this?
Once this merges we want to be sure to manually check that dodola:dev works properly with 360 calendar GCM input as well as with input from EC-Earth3.
Co-authored-by: Brewster Malevich <sbmalev@gmail.com>
@brews Thanks for reviewing ! Yes I looked at this, it's a quick manual profiling, but those are the two lines that trigger this memory blow up :
Which is not surprising, if the data is loaded in memory. |
@brews there is more detailed background information in ClimateImpactLab/downscaleCMIP6#574, but here I summarize it :
This PR directly modifies a previous one that had breaking changes : #151
Back then, the change was to load the data in memory in
dodola.services.clean_cmip6
, to fix chunking problems in the case where in the underlyingdodola.core.standardize_gcm
,calendar == 360_day
.We observe in ClimateImpactLab/downscaleCMIP6#574 that this PR broke runs with data that's too large for our nodes and the downstream computations.
My suggested fix here is to simply move that data loading operation to where it's needed. It turns out that models with high resolution data do not have a 360 days calendar. It's only luck, but I find this quick, cheap fix appealing. And loading operations are arguably part of
dodola.core
, since all the code inside of it relies on xarray datasets.