Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annual statistic on yearly chunks to keep memory low #792

Closed
wants to merge 9 commits into from

Conversation

valeriupredoi
Copy link
Contributor

@valeriupredoi valeriupredoi commented Sep 22, 2020

Notes on memory consumption and scalability

Before you start, please read our contribution guidelines.

Tasks

  • Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
  • This pull request has a descriptive title that can be used in a changelog
  • Add unit tests
  • Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
  • If writing a new/modified preprocessor function, please update the documentation
  • Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
  • Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.
  • Please use yamllint to check that your YAML files do not contain mistakes
  • If you make backward incompatible changes to the recipe format, make a new pull request in the ESMValTool repository and add the link below

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.


Closes #issue_number

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Sep 22, 2020

Reduces the memory by 50% in my test case 🍺

@ledm
Copy link
Contributor

ledm commented Sep 22, 2020

I'm getting the following error:

2020-09-22 12:50:36,192 UTC [31034] INFO    Maximum memory used (estimate): 39.0 GB
2020-09-22 12:50:36,193 UTC [31034] INFO    Sampled every second. It may be inaccurate if short but high spikes in memory consumption occur.
2020-09-22 12:50:36,195 UTC [31034] ERROR   Program terminated abnormally, see stack trace below for more information
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 730, in _run_task
    output_files = task.run()
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 242, in run
    self.output_files = self._run(input_files)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 428, in _run
    product.apply(step, self.debug)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 296, in apply
    self.cubes = preprocess(self.cubes, step, **self.settings[step])
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 241, in preprocess
    result.append(_run_preproc_function(function, item, settings))
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 224, in _run_preproc_function
    return function(items, **kwargs)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/_time.py", line 336, in annual_statistics
    annual_mean_cube.coord("time").points = [
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/iris/coords.py", line 1964, in points
    self._new_points_requirements(points)
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/iris/coords.py", line 1952, in _new_points_requirements
    raise ValueError(emsg.format(self.name(), self.__class__.__name__))
ValueError: The 'time' DimCoord points array must be strictly monotonic.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_main.py", line 430, in run
    fire.Fire(ESMValTool())
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs) 
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_main.py", line 407, in run
    process_recipe(recipe_file=recipe, config_user=cfg)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_main.py", line 98, in process_recipe
    recipe.run()
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_recipe.py", line 1358, in run
    run_tasks(self.tasks,
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 650, in run_tasks
    _run_tasks_parallel(tasks, max_parallel_tasks)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 695, in _run_tasks_parallel
    _copy_results(task, running[task]) 
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 718, in _copy_results
    task.output_files, updated_products = future.get()
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 730, in _run_task
    output_files = task.run()
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 242, in run
    self.output_files = self._run(input_files)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 428, in _run
    product.apply(step, self.debug)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 296, in apply
    self.cubes = preprocess(self.cubes, step, **self.settings[step])
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 241, in preprocess
    result.append(_run_preproc_function(function, item, settings))
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 224, in _run_preproc_function
    return function(items, **kwargs)   
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/_time.py", line 336, in annual_statistics
    annual_mean_cube.coord("time").points = [
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/iris/coords.py", line 1964, in points
    self._new_points_requirements(points)
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/iris/coords.py", line 1952, in _new_points_requirements
    raise ValueError(emsg.format(self.name(), self.__class__.__name__))
ValueError: The 'time' DimCoord points array must be strictly monotonic.

@ledm
Copy link
Contributor

ledm commented Sep 22, 2020

@valeriupredoi
Copy link
Contributor Author

cheers @ledm - I had forgotten to re-sort the time points after applying the set() - sneaky bugger 🍺

@ledm
Copy link
Contributor

ledm commented Sep 22, 2020

Nice! thanks @valeriupredoi

@valeriupredoi
Copy link
Contributor Author

I am currently running a test with the two ACCESS models you pointed me to (2 datas, 70 years each) and will compare how much mem we save (scalability is always an issue), it'd be good if you could run your mega 39G recipe see if there's any visible improvement 🍺

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Sep 22, 2020

OK some info on memory optimization and scalability:

  • for 10 years (x2 models) I can see a 1.7G/2.7G = 40% improvement in memory,
  • for 20 years (x2 models) I can see a 1.8G/3.0G = 40% improvement in memory,
  • for 30 years (x2 models) I can see a 2.8G/3.3G = 15% improvement in memory,
  • for 40 years almost no visible improvement

so it seems as this would help only in the case of <=30 years but the gain for 10-20 years is pretty good at 40-50% that scales with the x-y size of the data; @bjlittle it'd be nice if you guys would implement a lazy version of aggregated_by() in the future 🍺

EDIT aggregated_by() actually returns a lazy cube (my bad @bjlittle but you guys should remove the note in the iris documentation that says it doesn't perform a lazy operation) and I also measured the memory intake at higher numbers of years (60-100) and indeed, this method does not improve nor it affects the memory consumption; it makes the run some 20-30% slower but that time is highly inaccurate due to the node being shared memory one and it's spending a lot of time waiting on available memory

@valeriupredoi valeriupredoi changed the title do the annual stat on yearly chunks so mem is low Annual statistic on yearly chunks to keep memory low Sep 22, 2020
@valeriupredoi
Copy link
Contributor Author

found and fixed mask bug, thanks to @ledm who alerted me this morning that his Ocean data was not masked anymore; testing done with this little script that gives absolutely identical results compared to master functionality:

import iris
import numpy as np

# master branch
c1 = iris.load_cube("Om/recipe_sections_Ophelie_20200925_105437/preproc/diag_transect_Pacific/so/CMIP6_ACCESS-CM2_Omon_historical_r1i1p1f1_so_1860-1870.nc")
c2 = iris.load_cube("Om/recipe_sections_Ophelie_20200925_105437/preproc/diag_transect_Pacific/so/CMIP6_ACCESS-CM2_Omon_piControl_r1i1p1f1_so_960-970.nc")

# optimized mem branch
c3 = iris.load_cube("Ob/recipe_sections_Ophelie_20200925_114041/preproc/diag_transect_Pacific/so/CMIP6_ACCESS-CM2_Omon_historical_r1i1p1f1_so_1860-1870.nc")
c4 = iris.load_cube("Ob/recipe_sections_Ophelie_20200925_114041/preproc/diag_transect_Pacific/so/CMIP6_ACCESS-CM2_Omon_piControl_r1i1p1f1_so_960-970.nc")

# figure out if different
# total data elements / total unmasked data elements
print(np.prod(c1.data.shape), c1.data.count())
print(np.prod(c2.data.shape), c2.data.count())
print(np.prod(c3.data.shape), c3.data.count())
print(np.prod(c4.data.shape), c4.data.count())
print(np.ma.mean(c1.data), np.ma.mean(c3.data))
print(np.ma.mean(c2.data), np.ma.mean(c4.data))

memory profile seems to be unaffected, and stands as in comment

@bouweandela
Copy link
Member

I think Cube.aggregated_by is lazy if the aggregator provided supports lazy aggregation. E.g. iris.analysis.MEAN would be lazy, while some of the other statistics are not.

@ledm
Copy link
Contributor

ledm commented Sep 29, 2020

I think Cube.aggregated_by is lazy if the aggregator provided supports lazy aggregation. E.g. iris.analysis.MEAN would be lazy, while some of the other statistics are not.

We were making annual means using iris.analysis.MEAN (right @valeriupredoi?) and it was causing memory issues on large files. Maybe it can only get so lazy?

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Sep 29, 2020

indeed both @bouweandela and @ledm are right - the MEAN aggregator is lazy, but as Lee points out, lazy means it takes much less memory but still chucks in a ton of memory if the data is huge. This implementation here buys memory performance at the expense of time performance (although I can't put a decisive number on the latter since I use Jasmin and those shared nodes are anything but a reliable source for timing). What baffles me is that I am expecting (from my tests) to see an insignificant memory performance improvement for large datasets with lots of years, but rather very solid performance improvement for datasets of 30 odd years - but @ledm has seen good improvement for his runs with hundreds of years too, maybe I'm missing something obvious 😕

@bouweandela
Copy link
Member

We were making annual means using iris.analysis.MEAN (right @valeriupredoi?) and it was causing memory issues on large files. Maybe it can only get so lazy?

Could you share the recipe that causes the problem? And the version of iris you were using? aggregated_by is lazy since iris v2.3.

@bouweandela
Copy link
Member

lazy means it takes much less memory but still chucks in a ton of memory if the data is huge.

Not necessarily. Dask chops arrays up in chunks. These chunks are then processed in parallel, the number of chunks that gets processed at the same time is probably configurable and by default equal to the number of logical (CPU) cores you have in the machine. The actual memory use will thus depend on 1) how much memory is needed to process a single chunk and 2) how many chunks you processing at the same time. Typically the amount of memory needed per chunk is several times the size of the chunk, depending on how well the code you're running was written. You can check the chunk dimensions using something like print(cube.lazy_data().chunks) and use that to compute the in memory size of a single chunk.

@jvegreg
Copy link
Contributor

jvegreg commented Oct 1, 2020

  1. how many chunks you processing at the same time.

A common "gotcha" is to think that this depends only in how many parallel computations are you doing, but is quite common to have issues because each serial computation needs more than one chunk, even all of them if you are not lucky. Please, make sure that this is not the case for the worst dataset

@valeriupredoi
Copy link
Contributor Author

as I understand it the number of chunks is set by Iris and is hardcoded; in any case, as per my previous comments, I don't think this implementation helps while working with large datasets in time (many years), that's why I kept asking @ledm to give me a couple comparison numbers for memory. There is however consistent and significant improvement for <30 years and something we would want to consider for other statistics too, ESMValTool is not run only on big-mem machines and for huge datasets, imrpoving memory consumption for small-scale runs benefits the user that just wants to run something smaller on their laptop

@@ -327,7 +327,28 @@ def annual_statistics(cube, operator='mean'):

if not cube.coords('year'):
iris.coord_categorisation.add_year(cube, 'time')
return cube.aggregated_by('year', operator)
years = sorted(list(set(cube.coord("year").points)))
Copy link
Contributor

@jvegreg jvegreg Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you not using somehting like this?

cube = CubeList([year.aggregated_by('year', operator) for year in cube.slices_over('year' )]).merge_cube()

Is there a performance reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, I'd rather not since that looks to me like it's a bit clunky - I'd rather not repeat all the memory profiling tests for a rewrite of the same functionality 😁

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just curious, as you write lots of code for something that can be a one liner

@valeriupredoi
Copy link
Contributor Author

@jvegasbsc @bouweandela can I please get some sort of comment on this one - do you guys think a rerun of the memory profiling to show improvement via this would be needed or we can merge as it is now? 🍺

@bouweandela
Copy link
Member

Hi V, I think it would be better to contribute this improvement to iris, because more people can benefit from it when it's there and it will keep our own code short and easy to maintain.

@bouweandela
Copy link
Member

@valeriupredoi Reading through all the comments again, I'm a bit lost in what this pull request actually solves. Could you explain it more clearly in the pull request description? I see mentions of lower memory use and slower performance, but only for small datasets, for large datasets memory use (and performance) seems unaffected. Did I understand this correctly?

@valeriupredoi
Copy link
Contributor Author

@valeriupredoi Reading through all the comments again, I'm a bit lost in what this pull request actually solves. Could you explain it more clearly in the pull request description? I see mentions of lower memory use and slower performance, but only for small datasets, for large datasets memory use (and performance) seems unaffected. Did I understand this correctly?

indeed for large datasets there is no visible increase in performance from my tests, nor there is a decrease, so I think this would be useful for smaller scale diagnostics. I am ok with you guys if you deem it not useful enough, but I think we can gain a bit in the acses of smaller diags/datasets - which is quite common in the users' community 🍺

@bouweandela
Copy link
Member

But memory use is a problem for large datasets, not for small ones, so in that case I would prefer to keep the old code, because it's much simpler so smaller chance of bugs and therefore lower maintenance.

@valeriupredoi
Copy link
Contributor Author

valeriupredoi commented Mar 12, 2021

totally agree but there is a benefit from this in that people can run smaller analyses on smaller-memory machines (like their laptops) - like I says, up to you to decide the fate of this, I am not happy it doesn't improve memory with larger datasets

@schlunma
Copy link
Contributor

schlunma commented Apr 2, 2024

Since aggregated_by is now lazy for most operators and dask distributed schedulers can be used when running ESMValTool, I think this can be closed. Please reopen if necessary.

@schlunma schlunma closed this Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants