-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HadGEM3-GC31-MM tasmin, tasmax, pr e2e runs fail at cleaning due to OOM error #589
Comments
This model has a slightly higher resolution than |
Ah I see. |
@brews I think, if we're willing to solve that, the cheapest way is to rechunk the data before and after the What do you think ? Tagging you in particular because you had an opinion regarding this here : ClimateImpactLab/dodola#150. What I am thinking of though, in contrast to what I had suggested back then, is not to rechunk within dodola's standardize_gcm function but in separate workflow steps, before and after -- like we do in other parts of the workflow -- using |
@emileten Hmmm.... as you likely know, the hard thing here is that we clean and standardize the raw data so that it's in good enough shape (i.e. it's consistent enough) for us to do things like rechunking without failing on issues like unexpected variable/coord/dim names, etc. And it's wasn't a problem until now because the standardizing step was relatively cheap. The other thing is that I think rechunking is often done after 1x1 regridding... so the data is a conveniently small, standardized size that fits in memory when it happens — making it a relatively fast and reliable operation. Have you just run the rechunk workflowtemplate on the raw HadGEM3-GC31-MM? Can you get it in the needed chunks without OOM errors? (I realize I don't even know HadGEM3-GC31-MM's native size on disk or resolution.) |
@brews thanks ! I haven't. That's a good suggestion, let me at least try to do that and see what it gives. |
Hm yes indeed it's not completely straightforward. I need to play a bit with these spatial bounds just like we do here. but unfortunately before 'standardizing' the data... Getting this error in this workflow :
|
Yeah, and I'm pretty certain that this would error from other raw GCM input, too. You know more about the 360-day calendar conversion implementation than I do, @emileten. Do you feel like this particular conversion is something that we might be able to make chunk friendly or do you feel like this is too much of a pain? (...I might have already asked you this for another issue...) |
@brews yes, I think it would be a lot of work. We decided to abandon these models. |
Thanks, @emileten. I also tried bumping the container memory for this standardizing step up from ~40GiB to 68GiB and it still gets an OOM error, so a small memory bump wasn't a quick fix, either. |
Yea this is a lot of data and we're doing various things with it, in particular these |
Workflows :
https://argo.cildc6.org/workflows/default/e2e-hadgem3-gc31-mm-tasmax-c9v8q?tab=workflow
https://argo.cildc6.org/workflows/default/e2e-hadgem3-gc31-mm-tasmax-c9v8q?tab=workflow
https://argo.cildc6.org/workflows/default/e2e-hadgem3-gc31-mm-pr-d8v6x?tab=workflow
This model has a particularly high resolution, with 324 latitude bands and 432 longitude bands, versus 144 and 192 respectively for its low resolution equivalent
HadGEM3-GC31-LL
that we successfully ran.Blocks progress on #586, #587, #225
The text was updated successfully, but these errors were encountered: