Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hack fix to validate OOM errors #181

Merged
merged 4 commits into from
Feb 18, 2022

Conversation

brews
Copy link
Member

@brews brews commented Feb 18, 2022

Fixes dodola validate-dataset failing with out-of-memory errors on small workers without external dask cluster. This was especially a problem on quality-control checks to 0.25 degree grid input data.

This solution is somewhat hackish in that we're reading and subsetting the input data to year available year in dodola.services — essentially refactoring the larger validation flow from dodola.core to dodola.services so that we can keep all the storage I/O implementation details out of dodola.core. This is an adaptation of the solution to used in this temporary fix.

@brews brews added the bug Something isn't working label Feb 18, 2022
@brews brews requested a review from emileten February 18, 2022 02:11
@brews brews self-assigned this Feb 18, 2022
@emileten
Copy link
Contributor

Thanks @brews ! I will have a look at this a little later if that's fine.

@emileten
Copy link
Contributor

emileten commented Feb 18, 2022

@brews I am a bit confused by the change history here.

It seems that what you moved to dodola.services here is something we've been using for some time already, but that was hard code in this yaml : https://github.com/ClimateImpactLab/downscaleCMIP6/blob/18cf791d4c169e707d4ae072cd4c217f65658421/workflows/templates/qualitycontrol-check-cmip6.yaml#L71. Am I right ?

And today you took the time to clean things up, i.e, to have something in services that the workflow yamls can directly use ?

@brews
Copy link
Member Author

brews commented Feb 18, 2022

@emileten yeah. Even if this is a bit of a hacky solution I'd prefer this over the argo script we were working with because 1) it simplifies the argo Workflow, and 2) it gets basic, regular testing here. This is a pretty important and often used bit of code. We also have a much better idea of what we want it to do.

Over the next couple of days/weeks we'll slowly clean up scripts/prototypes in the argo workflows and try to consolidate it into dodola. I try to do it when I have the time.

@brews
Copy link
Member Author

brews commented Feb 18, 2022

...and the hard-coded script you pointed to was originally written as a quick work-around to this memory bug in #126.

@brews brews merged commit 696938a into ClimateImpactLab:main Feb 18, 2022
@brews brews deleted the fix_validate_memory branch February 18, 2022 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dodola validate-dataset reads in entire zarr store?
2 participants