Annual statistic on yearly chunks to keep memory low #792

valeriupredoi · 2020-09-22T10:55:38Z

Notes on memory consumption and scalability

Before you start, please read our contribution guidelines.

Tasks

Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
This pull request has a descriptive title that can be used in a changelog
Add unit tests
Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
If writing a new/modified preprocessor function, please update the documentation
Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.
Please use yamllint to check that your YAML files do not contain mistakes
If you make backward incompatible changes to the recipe format, make a new pull request in the ESMValTool repository and add the link below

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.

Closes #issue_number

valeriupredoi · 2020-09-22T10:58:18Z

Reduces the memory by 50% in my test case 🍺

ledm · 2020-09-22T12:55:50Z

I'm getting the following error:

2020-09-22 12:50:36,192 UTC [31034] INFO    Maximum memory used (estimate): 39.0 GB
2020-09-22 12:50:36,193 UTC [31034] INFO    Sampled every second. It may be inaccurate if short but high spikes in memory consumption occur.
2020-09-22 12:50:36,195 UTC [31034] ERROR   Program terminated abnormally, see stack trace below for more information
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 730, in _run_task
    output_files = task.run()
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 242, in run
    self.output_files = self._run(input_files)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 428, in _run
    product.apply(step, self.debug)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 296, in apply
    self.cubes = preprocess(self.cubes, step, **self.settings[step])
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 241, in preprocess
    result.append(_run_preproc_function(function, item, settings))
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 224, in _run_preproc_function
    return function(items, **kwargs)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/_time.py", line 336, in annual_statistics
    annual_mean_cube.coord("time").points = [
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/iris/coords.py", line 1964, in points
    self._new_points_requirements(points)
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/iris/coords.py", line 1952, in _new_points_requirements
    raise ValueError(emsg.format(self.name(), self.__class__.__name__))
ValueError: The 'time' DimCoord points array must be strictly monotonic.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_main.py", line 430, in run
    fire.Fire(ESMValTool())
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/fire/core.py", line 138, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs) 
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_main.py", line 407, in run
    process_recipe(recipe_file=recipe, config_user=cfg)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_main.py", line 98, in process_recipe
    recipe.run()
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_recipe.py", line 1358, in run
    run_tasks(self.tasks,
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 650, in run_tasks
    _run_tasks_parallel(tasks, max_parallel_tasks)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 695, in _run_tasks_parallel
    _copy_results(task, running[task]) 
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 718, in _copy_results
    task.output_files, updated_products = future.get()
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 730, in _run_task
    output_files = task.run()
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/_task.py", line 242, in run
    self.output_files = self._run(input_files)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 428, in _run
    product.apply(step, self.debug)
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 296, in apply
    self.cubes = preprocess(self.cubes, step, **self.settings[step])
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 241, in preprocess
    result.append(_run_preproc_function(function, item, settings))
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/__init__.py", line 224, in _run_preproc_function
    return function(items, **kwargs)   
  File "/home/users/ldemora/workspace/ESMValTool_AR6/ESMValCore/esmvalcore/preprocessor/_time.py", line 336, in annual_statistics
    annual_mean_cube.coord("time").points = [
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/iris/coords.py", line 1964, in points
    self._new_points_requirements(points)
  File "/home/users/ldemora/anaconda3_20190821/envs/ar6/lib/python3.8/site-packages/iris/coords.py", line 1952, in _new_points_requirements
    raise ValueError(emsg.format(self.name(), self.__class__.__name__))
ValueError: The 'time' DimCoord points array must be strictly monotonic.

ledm · 2020-09-22T13:10:43Z

The error is from this recipe: https://github.com/ESMValGroup/ESMValTool-AR6/blob/ar6_chap_3_ocean_figures/esmvaltool/recipes/recipe_ocean_heat_content_full.yml

(note that this a private branch).

valeriupredoi · 2020-09-22T14:04:43Z

cheers @ledm - I had forgotten to re-sort the time points after applying the set() - sneaky bugger 🍺

ledm · 2020-09-22T14:05:29Z

Nice! thanks @valeriupredoi

valeriupredoi · 2020-09-22T14:08:03Z

I am currently running a test with the two ACCESS models you pointed me to (2 datas, 70 years each) and will compare how much mem we save (scalability is always an issue), it'd be good if you could run your mega 39G recipe see if there's any visible improvement 🍺

valeriupredoi · 2020-09-22T15:01:58Z

OK some info on memory optimization and scalability:

for 10 years (x2 models) I can see a 1.7G/2.7G = 40% improvement in memory,
for 20 years (x2 models) I can see a 1.8G/3.0G = 40% improvement in memory,
for 30 years (x2 models) I can see a 2.8G/3.3G = 15% improvement in memory,
for 40 years almost no visible improvement

so it seems as this would help only in the case of <=30 years but the gain for 10-20 years is pretty good at 40-50% that scales with the x-y size of the data; @bjlittle it'd be nice if you guys would implement a lazy version of aggregated_by() in the future 🍺

EDIT aggregated_by() actually returns a lazy cube (my bad @bjlittle but you guys should remove the note in the iris documentation that says it doesn't perform a lazy operation) and I also measured the memory intake at higher numbers of years (60-100) and indeed, this method does not improve nor it affects the memory consumption; it makes the run some 20-30% slower but that time is highly inaccurate due to the node being shared memory one and it's spending a lot of time waiting on available memory

valeriupredoi · 2020-09-25T11:52:26Z

found and fixed mask bug, thanks to @ledm who alerted me this morning that his Ocean data was not masked anymore; testing done with this little script that gives absolutely identical results compared to master functionality:

import iris
import numpy as np

# master branch
c1 = iris.load_cube("Om/recipe_sections_Ophelie_20200925_105437/preproc/diag_transect_Pacific/so/CMIP6_ACCESS-CM2_Omon_historical_r1i1p1f1_so_1860-1870.nc")
c2 = iris.load_cube("Om/recipe_sections_Ophelie_20200925_105437/preproc/diag_transect_Pacific/so/CMIP6_ACCESS-CM2_Omon_piControl_r1i1p1f1_so_960-970.nc")

# optimized mem branch
c3 = iris.load_cube("Ob/recipe_sections_Ophelie_20200925_114041/preproc/diag_transect_Pacific/so/CMIP6_ACCESS-CM2_Omon_historical_r1i1p1f1_so_1860-1870.nc")
c4 = iris.load_cube("Ob/recipe_sections_Ophelie_20200925_114041/preproc/diag_transect_Pacific/so/CMIP6_ACCESS-CM2_Omon_piControl_r1i1p1f1_so_960-970.nc")

# figure out if different
# total data elements / total unmasked data elements
print(np.prod(c1.data.shape), c1.data.count())
print(np.prod(c2.data.shape), c2.data.count())
print(np.prod(c3.data.shape), c3.data.count())
print(np.prod(c4.data.shape), c4.data.count())
print(np.ma.mean(c1.data), np.ma.mean(c3.data))
print(np.ma.mean(c2.data), np.ma.mean(c4.data))

memory profile seems to be unaffected, and stands as in comment

bouweandela · 2020-09-25T14:50:33Z

I think Cube.aggregated_by is lazy if the aggregator provided supports lazy aggregation. E.g. iris.analysis.MEAN would be lazy, while some of the other statistics are not.

ledm · 2020-09-29T08:10:55Z

I think Cube.aggregated_by is lazy if the aggregator provided supports lazy aggregation. E.g. iris.analysis.MEAN would be lazy, while some of the other statistics are not.

We were making annual means using iris.analysis.MEAN (right @valeriupredoi?) and it was causing memory issues on large files. Maybe it can only get so lazy?

valeriupredoi · 2020-09-29T11:11:10Z

indeed both @bouweandela and @ledm are right - the MEAN aggregator is lazy, but as Lee points out, lazy means it takes much less memory but still chucks in a ton of memory if the data is huge. This implementation here buys memory performance at the expense of time performance (although I can't put a decisive number on the latter since I use Jasmin and those shared nodes are anything but a reliable source for timing). What baffles me is that I am expecting (from my tests) to see an insignificant memory performance improvement for large datasets with lots of years, but rather very solid performance improvement for datasets of 30 odd years - but @ledm has seen good improvement for his runs with hundreds of years too, maybe I'm missing something obvious 😕

bouweandela · 2020-10-01T07:36:41Z

We were making annual means using iris.analysis.MEAN (right @valeriupredoi?) and it was causing memory issues on large files. Maybe it can only get so lazy?

Could you share the recipe that causes the problem? And the version of iris you were using? aggregated_by is lazy since iris v2.3.

bouweandela · 2020-10-01T07:45:27Z

lazy means it takes much less memory but still chucks in a ton of memory if the data is huge.

Not necessarily. Dask chops arrays up in chunks. These chunks are then processed in parallel, the number of chunks that gets processed at the same time is probably configurable and by default equal to the number of logical (CPU) cores you have in the machine. The actual memory use will thus depend on 1) how much memory is needed to process a single chunk and 2) how many chunks you processing at the same time. Typically the amount of memory needed per chunk is several times the size of the chunk, depending on how well the code you're running was written. You can check the chunk dimensions using something like print(cube.lazy_data().chunks) and use that to compute the in memory size of a single chunk.

jvegreg · 2020-10-01T08:30:46Z

how many chunks you processing at the same time.

A common "gotcha" is to think that this depends only in how many parallel computations are you doing, but is quite common to have issues because each serial computation needs more than one chunk, even all of them if you are not lucky. Please, make sure that this is not the case for the worst dataset

valeriupredoi · 2020-10-01T10:53:50Z

as I understand it the number of chunks is set by Iris and is hardcoded; in any case, as per my previous comments, I don't think this implementation helps while working with large datasets in time (many years), that's why I kept asking @ledm to give me a couple comparison numbers for memory. There is however consistent and significant improvement for <30 years and something we would want to consider for other statistics too, ESMValTool is not run only on big-mem machines and for huge datasets, imrpoving memory consumption for small-scale runs benefits the user that just wants to run something smaller on their laptop

jvegreg · 2020-10-01T13:37:22Z

esmvalcore/preprocessor/_time.py

@@ -327,7 +327,28 @@ def annual_statistics(cube, operator='mean'):

    if not cube.coords('year'):
        iris.coord_categorisation.add_year(cube, 'time')
-    return cube.aggregated_by('year', operator)
+    years = sorted(list(set(cube.coord("year").points)))


Why are you not using somehting like this?

cube = CubeList([year.aggregated_by('year', operator) for year in cube.slices_over('year' )]).merge_cube()

Is there a performance reason?

not sure, I'd rather not since that looks to me like it's a bit clunky - I'd rather not repeat all the memory profiling tests for a rewrite of the same functionality 😁

I was just curious, as you write lots of code for something that can be a one liner

valeriupredoi · 2021-03-02T11:39:33Z

@jvegasbsc @bouweandela can I please get some sort of comment on this one - do you guys think a rerun of the memory profiling to show improvement via this would be needed or we can merge as it is now? 🍺

bouweandela · 2021-03-04T14:25:58Z

Hi V, I think it would be better to contribute this improvement to iris, because more people can benefit from it when it's there and it will keep our own code short and easy to maintain.

bouweandela · 2021-03-12T10:22:05Z

@valeriupredoi Reading through all the comments again, I'm a bit lost in what this pull request actually solves. Could you explain it more clearly in the pull request description? I see mentions of lower memory use and slower performance, but only for small datasets, for large datasets memory use (and performance) seems unaffected. Did I understand this correctly?

valeriupredoi · 2021-03-12T10:25:58Z

@valeriupredoi Reading through all the comments again, I'm a bit lost in what this pull request actually solves. Could you explain it more clearly in the pull request description? I see mentions of lower memory use and slower performance, but only for small datasets, for large datasets memory use (and performance) seems unaffected. Did I understand this correctly?

indeed for large datasets there is no visible increase in performance from my tests, nor there is a decrease, so I think this would be useful for smaller scale diagnostics. I am ok with you guys if you deem it not useful enough, but I think we can gain a bit in the acses of smaller diags/datasets - which is quite common in the users' community 🍺

bouweandela · 2021-03-12T10:31:48Z

But memory use is a problem for large datasets, not for small ones, so in that case I would prefer to keep the old code, because it's much simpler so smaller chance of bugs and therefore lower maintenance.

valeriupredoi · 2021-03-12T10:37:12Z

totally agree but there is a benefit from this in that people can run smaller analyses on smaller-memory machines (like their laptops) - like I says, up to you to decide the fate of this, I am not happy it doesn't improve memory with larger datasets

schlunma · 2024-04-02T11:00:07Z

Since aggregated_by is now lazy for most operators and dask distributed schedulers can be used when running ESMValTool, I think this can be closed. Please reopen if necessary.

do the stat on chunks so mem is low

a9f7366

valeriupredoi requested review from ledm, bouweandela and jvegreg September 22, 2020 10:55

valeriupredoi added 2 commits September 22, 2020 12:38

correct handling of time points

cf27e5f

fixed bug

a675b1b

preserve time points order

6f56a92

valeriupredoi changed the title ~~do the annual stat on yearly chunks so mem is low~~ Annual statistic on yearly chunks to keep memory low Sep 22, 2020

bouweandela mentioned this pull request Sep 24, 2020

Topics for ESMValTool workshop in November ESMValGroup/ESMValTool#1817

Closed

valeriupredoi added 2 commits September 25, 2020 11:44

added mask to annual stats tests

d1711f2

force mask in

ebd82fa

valeriupredoi added 2 commits September 25, 2020 13:19

added mask and no mask test case

b68db90

mask assignment lazily

d22b47e

jvegreg reviewed Jan 5, 2021

View reviewed changes

Merge branch 'master' into optimize_mem_annual_statistic

2b4438b

schlunma closed this Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annual statistic on yearly chunks to keep memory low #792

Annual statistic on yearly chunks to keep memory low #792

valeriupredoi commented Sep 22, 2020 •

edited

Loading

valeriupredoi commented Sep 22, 2020 •

edited

Loading

ledm commented Sep 22, 2020

ledm commented Sep 22, 2020

valeriupredoi commented Sep 22, 2020

ledm commented Sep 22, 2020

valeriupredoi commented Sep 22, 2020

valeriupredoi commented Sep 22, 2020 •

edited

Loading

valeriupredoi commented Sep 25, 2020

bouweandela commented Sep 25, 2020

ledm commented Sep 29, 2020

valeriupredoi commented Sep 29, 2020 •

edited

Loading

bouweandela commented Oct 1, 2020

bouweandela commented Oct 1, 2020

jvegreg commented Oct 1, 2020

valeriupredoi commented Oct 1, 2020

jvegreg Oct 1, 2020 •

edited

Loading

valeriupredoi Mar 2, 2021

jvegreg Mar 12, 2021

valeriupredoi commented Mar 2, 2021

bouweandela commented Mar 4, 2021

bouweandela commented Mar 12, 2021

valeriupredoi commented Mar 12, 2021

bouweandela commented Mar 12, 2021

valeriupredoi commented Mar 12, 2021 •

edited

Loading

schlunma commented Apr 2, 2024

Annual statistic on yearly chunks to keep memory low #792

Annual statistic on yearly chunks to keep memory low #792

Conversation

valeriupredoi commented Sep 22, 2020 • edited Loading

valeriupredoi commented Sep 22, 2020 • edited Loading

ledm commented Sep 22, 2020

ledm commented Sep 22, 2020

valeriupredoi commented Sep 22, 2020

ledm commented Sep 22, 2020

valeriupredoi commented Sep 22, 2020

valeriupredoi commented Sep 22, 2020 • edited Loading

valeriupredoi commented Sep 25, 2020

bouweandela commented Sep 25, 2020

ledm commented Sep 29, 2020

valeriupredoi commented Sep 29, 2020 • edited Loading

bouweandela commented Oct 1, 2020

bouweandela commented Oct 1, 2020

jvegreg commented Oct 1, 2020

valeriupredoi commented Oct 1, 2020

jvegreg Oct 1, 2020 • edited Loading

Choose a reason for hiding this comment

valeriupredoi Mar 2, 2021

Choose a reason for hiding this comment

jvegreg Mar 12, 2021

Choose a reason for hiding this comment

valeriupredoi commented Mar 2, 2021

bouweandela commented Mar 4, 2021

bouweandela commented Mar 12, 2021

valeriupredoi commented Mar 12, 2021

bouweandela commented Mar 12, 2021

valeriupredoi commented Mar 12, 2021 • edited Loading

schlunma commented Apr 2, 2024

valeriupredoi commented Sep 22, 2020 •

edited

Loading

valeriupredoi commented Sep 22, 2020 •

edited

Loading

valeriupredoi commented Sep 22, 2020 •

edited

Loading

valeriupredoi commented Sep 29, 2020 •

edited

Loading

jvegreg Oct 1, 2020 •

edited

Loading

valeriupredoi commented Mar 12, 2021 •

edited

Loading