-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annual statistic on yearly chunks to keep memory low #792
Conversation
Reduces the memory by 50% in my test case 🍺 |
I'm getting the following error:
|
The error is from this recipe: https://github.com/ESMValGroup/ESMValTool-AR6/blob/ar6_chap_3_ocean_figures/esmvaltool/recipes/recipe_ocean_heat_content_full.yml (note that this a private branch). |
cheers @ledm - I had forgotten to re-sort the time points after applying the |
Nice! thanks @valeriupredoi |
I am currently running a test with the two ACCESS models you pointed me to (2 datas, 70 years each) and will compare how much mem we save (scalability is always an issue), it'd be good if you could run your mega 39G recipe see if there's any visible improvement 🍺 |
OK some info on memory optimization and scalability:
so it seems as this would help only in the case of <=30 years but the gain for 10-20 years is pretty good at 40-50% that scales with the x-y size of the data; @bjlittle it'd be nice if you guys would implement a lazy version of EDIT |
found and fixed mask bug, thanks to @ledm who alerted me this morning that his Ocean data was not masked anymore; testing done with this little script that gives absolutely identical results compared to import iris
import numpy as np
# master branch
c1 = iris.load_cube("Om/recipe_sections_Ophelie_20200925_105437/preproc/diag_transect_Pacific/so/CMIP6_ACCESS-CM2_Omon_historical_r1i1p1f1_so_1860-1870.nc")
c2 = iris.load_cube("Om/recipe_sections_Ophelie_20200925_105437/preproc/diag_transect_Pacific/so/CMIP6_ACCESS-CM2_Omon_piControl_r1i1p1f1_so_960-970.nc")
# optimized mem branch
c3 = iris.load_cube("Ob/recipe_sections_Ophelie_20200925_114041/preproc/diag_transect_Pacific/so/CMIP6_ACCESS-CM2_Omon_historical_r1i1p1f1_so_1860-1870.nc")
c4 = iris.load_cube("Ob/recipe_sections_Ophelie_20200925_114041/preproc/diag_transect_Pacific/so/CMIP6_ACCESS-CM2_Omon_piControl_r1i1p1f1_so_960-970.nc")
# figure out if different
# total data elements / total unmasked data elements
print(np.prod(c1.data.shape), c1.data.count())
print(np.prod(c2.data.shape), c2.data.count())
print(np.prod(c3.data.shape), c3.data.count())
print(np.prod(c4.data.shape), c4.data.count())
print(np.ma.mean(c1.data), np.ma.mean(c3.data))
print(np.ma.mean(c2.data), np.ma.mean(c4.data)) memory profile seems to be unaffected, and stands as in comment |
I think |
We were making annual means using iris.analysis.MEAN (right @valeriupredoi?) and it was causing memory issues on large files. Maybe it can only get so lazy? |
indeed both @bouweandela and @ledm are right - the |
Could you share the recipe that causes the problem? And the version of iris you were using? |
Not necessarily. Dask chops arrays up in chunks. These chunks are then processed in parallel, the number of chunks that gets processed at the same time is probably configurable and by default equal to the number of logical (CPU) cores you have in the machine. The actual memory use will thus depend on 1) how much memory is needed to process a single chunk and 2) how many chunks you processing at the same time. Typically the amount of memory needed per chunk is several times the size of the chunk, depending on how well the code you're running was written. You can check the chunk dimensions using something like |
A common "gotcha" is to think that this depends only in how many parallel computations are you doing, but is quite common to have issues because each serial computation needs more than one chunk, even all of them if you are not lucky. Please, make sure that this is not the case for the worst dataset |
as I understand it the number of chunks is set by Iris and is hardcoded; in any case, as per my previous comments, I don't think this implementation helps while working with large datasets in time (many years), that's why I kept asking @ledm to give me a couple comparison numbers for memory. There is however consistent and significant improvement for <30 years and something we would want to consider for other statistics too, ESMValTool is not run only on big-mem machines and for huge datasets, imrpoving memory consumption for small-scale runs benefits the user that just wants to run something smaller on their laptop |
@@ -327,7 +327,28 @@ def annual_statistics(cube, operator='mean'): | |||
|
|||
if not cube.coords('year'): | |||
iris.coord_categorisation.add_year(cube, 'time') | |||
return cube.aggregated_by('year', operator) | |||
years = sorted(list(set(cube.coord("year").points))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you not using somehting like this?
cube = CubeList([year.aggregated_by('year', operator) for year in cube.slices_over('year' )]).merge_cube()
Is there a performance reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, I'd rather not since that looks to me like it's a bit clunky - I'd rather not repeat all the memory profiling tests for a rewrite of the same functionality 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just curious, as you write lots of code for something that can be a one liner
@jvegasbsc @bouweandela can I please get some sort of comment on this one - do you guys think a rerun of the memory profiling to show improvement via this would be needed or we can merge as it is now? 🍺 |
Hi V, I think it would be better to contribute this improvement to iris, because more people can benefit from it when it's there and it will keep our own code short and easy to maintain. |
@valeriupredoi Reading through all the comments again, I'm a bit lost in what this pull request actually solves. Could you explain it more clearly in the pull request description? I see mentions of lower memory use and slower performance, but only for small datasets, for large datasets memory use (and performance) seems unaffected. Did I understand this correctly? |
indeed for large datasets there is no visible increase in performance from my tests, nor there is a decrease, so I think this would be useful for smaller scale diagnostics. I am ok with you guys if you deem it not useful enough, but I think we can gain a bit in the acses of smaller diags/datasets - which is quite common in the users' community 🍺 |
But memory use is a problem for large datasets, not for small ones, so in that case I would prefer to keep the old code, because it's much simpler so smaller chance of bugs and therefore lower maintenance. |
totally agree but there is a benefit from this in that people can run smaller analyses on smaller-memory machines (like their laptops) - like I says, up to you to decide the fate of this, I am not happy it doesn't improve memory with larger datasets |
Since |
Notes on memory consumption and scalability
Before you start, please read our contribution guidelines.
Tasks
yamllint
to check that your YAML files do not contain mistakesIf you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.
Closes #issue_number