Use dask to reduce memory consumption of extract_levels for masked data #776

valeriupredoi · 2020-09-11T11:56:23Z

Before you start, please read our contribution guidelines.

Tasks

Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
This pull request has a descriptive title that can be used in a changelog
Add unit tests
Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
If writing a new/modified preprocessor function, please update the documentation
Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.

partially addressing #775

Mostly appreciated by ocean people, I noticed a memory consumption reduction by about 30% for level selection.

…h these dask masks

valeriupredoi · 2020-10-01T11:05:47Z

can one of you @bouweandela @jvegasbsc @schlunma pls review this, it'd be useful to have it in the release 👍 🍺

bouweandela

Could you add a test like this one, but with lazy input and output data?

ESMValCore/tests/integration/preprocessor/_regrid/test_extract_levels.py

Lines 48 to 57 in a6d4bbc

    
           def test_interpolation__linear(self): 
        
               levels = [0.5, 1.5] 
        
               scheme = 'linear' 
        
               result = extract_levels(self.cube, levels, scheme) 
        
               expected = np.array([[[[2., 3.], [4., 5.]], [[6., 7.], [8., 9.]]], 
        
                                    [[[14., 15.], [16., 17.]], [[18., 19.], 
        
                                                                [20., 21.]]]]) 
        
               self.assert_array_equal(result.data, expected) 
        
               self.shape[self.z_dim] = len(levels) 
        
               self.assertEqual(result.shape, tuple(self.shape))

bouweandela · 2020-10-01T12:32:22Z

Oh wait, that won't work, because this doesn't make the process completely lazy, right?

esmvalcore/preprocessor/_regrid.py

valeriupredoi · 2020-10-01T12:39:34Z

Oh wait, that won't work, because this doesn't make the process completely lazy, right?

yer, it's a half-mule process due to vinterp not returning a lazy object

esmvalcore/preprocessor/_regrid.py

valeriupredoi · 2020-10-02T09:39:44Z

@bouweandela 👍 or 👎

bouweandela · 2020-10-02T11:03:23Z

@bouweandela +1 or -1

Please see #776 (comment)

esmvalcore/preprocessor/_regrid.py

Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>

bouweandela · 2020-10-05T11:48:06Z

tests/unit/preprocessor/_regrid/test_extract_levels.py

        with mock.patch(
                'stratify.interpolate', return_value=new_data) as mocker:
+            # first test lazy
+            loaded_cube = iris.load_cube(self.filename)


To make a cube with realized data lazy, you can do

cube.data = cube.lazy_data()

or

cube.data = da.asarray(cube.data, chunks=(1, 2, 3, 4))

if you want more control over how the lazy array is created. There is no need to first save to file.

cool! I didn't know that, but I wanted to replicate the actual conditions the function runs in anyway

valeriupredoi added 2 commits September 11, 2020 12:27

dasking a bit - reduces memory by 30%

2286620

conform with deprecation warning

3b43ba6

valeriupredoi added enhancement New feature or request preprocessor Related to the preprocessor labels Sep 11, 2020

valeriupredoi requested review from bouweandela and jvegreg September 11, 2020 11:56

valeriupredoi added 2 commits September 11, 2020 13:10

removed previous change, I need to understand what the hell is up wit…

8f6d224

…h these dask masks

bummer - the test cube didnt have lazy data

df90a4e

This was referenced Sep 11, 2020

Optimization needed for regrid (including esmpy) and level selection #775

Closed

python-stratify still needed? #35

Open

added extra test case for lazy

eb0c877

bouweandela requested changes Oct 1, 2020

View reviewed changes

bouweandela reviewed Oct 1, 2020

View reviewed changes

esmvalcore/preprocessor/_regrid.py Outdated Show resolved Hide resolved

filling a la Bouwe

4bbd4e7

schlunma reviewed Oct 1, 2020

View reviewed changes

esmvalcore/preprocessor/_regrid.py Show resolved Hide resolved

schlunma approved these changes Oct 1, 2020

View reviewed changes

bouweandela changed the title ~~Use dask for vertical_interpolate in _regrid.py~~ Use dask to reduce memory consumption of extract_levels for masked data Oct 2, 2020

bouweandela reviewed Oct 2, 2020

View reviewed changes

esmvalcore/preprocessor/_regrid.py Outdated Show resolved Hide resolved

Bouwe super method

7dc8db1

Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>

bouweandela reviewed Oct 5, 2020

View reviewed changes

bouweandela approved these changes Oct 5, 2020

View reviewed changes

bouweandela merged commit 4f8ed70 into master Oct 5, 2020

bouweandela deleted the optimize_vertical_interpolate branch October 5, 2020 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use dask to reduce memory consumption of extract_levels for masked data #776

Use dask to reduce memory consumption of extract_levels for masked data #776

valeriupredoi commented Sep 11, 2020 •

edited by bouweandela

Loading

valeriupredoi commented Oct 1, 2020

bouweandela left a comment

bouweandela commented Oct 1, 2020

valeriupredoi commented Oct 1, 2020

valeriupredoi commented Oct 2, 2020

bouweandela commented Oct 2, 2020

bouweandela Oct 5, 2020

valeriupredoi Oct 5, 2020

	def test_interpolation__linear(self):
	levels = [0.5, 1.5]
	scheme = 'linear'
	result = extract_levels(self.cube, levels, scheme)
	expected = np.array([[[[2., 3.], [4., 5.]], [[6., 7.], [8., 9.]]],
	[[[14., 15.], [16., 17.]], [[18., 19.],
	[20., 21.]]]])
	self.assert_array_equal(result.data, expected)
	self.shape[self.z_dim] = len(levels)
	self.assertEqual(result.shape, tuple(self.shape))

Use dask to reduce memory consumption of extract_levels for masked data #776

Use dask to reduce memory consumption of extract_levels for masked data #776

Conversation

valeriupredoi commented Sep 11, 2020 • edited by bouweandela Loading

valeriupredoi commented Oct 1, 2020

bouweandela left a comment

Choose a reason for hiding this comment

bouweandela commented Oct 1, 2020

valeriupredoi commented Oct 1, 2020

valeriupredoi commented Oct 2, 2020

bouweandela commented Oct 2, 2020

bouweandela Oct 5, 2020

Choose a reason for hiding this comment

valeriupredoi Oct 5, 2020

Choose a reason for hiding this comment

valeriupredoi commented Sep 11, 2020 •

edited by bouweandela

Loading