Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dask to reduce memory consumption of extract_levels for masked data #776

Merged
merged 7 commits into from
Oct 5, 2020

Conversation

valeriupredoi
Copy link
Contributor

@valeriupredoi valeriupredoi commented Sep 11, 2020

Before you start, please read our contribution guidelines.

Tasks

  • Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
  • This pull request has a descriptive title that can be used in a changelog
  • Add unit tests
  • Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
  • If writing a new/modified preprocessor function, please update the documentation
  • Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
  • Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.


partially addressing #775

Mostly appreciated by ocean people, I noticed a memory consumption reduction by about 30% for level selection.

@valeriupredoi valeriupredoi added enhancement New feature or request preprocessor Related to the preprocessor labels Sep 11, 2020
@valeriupredoi
Copy link
Contributor Author

can one of you @bouweandela @jvegasbsc @schlunma pls review this, it'd be useful to have it in the release 👍 🍺

Copy link
Member

@bouweandela bouweandela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test like this one, but with lazy input and output data?

def test_interpolation__linear(self):
levels = [0.5, 1.5]
scheme = 'linear'
result = extract_levels(self.cube, levels, scheme)
expected = np.array([[[[2., 3.], [4., 5.]], [[6., 7.], [8., 9.]]],
[[[14., 15.], [16., 17.]], [[18., 19.],
[20., 21.]]]])
self.assert_array_equal(result.data, expected)
self.shape[self.z_dim] = len(levels)
self.assertEqual(result.shape, tuple(self.shape))

@bouweandela
Copy link
Member

Oh wait, that won't work, because this doesn't make the process completely lazy, right?

@valeriupredoi
Copy link
Contributor Author

Oh wait, that won't work, because this doesn't make the process completely lazy, right?

yer, it's a half-mule process due to vinterp not returning a lazy object

@valeriupredoi
Copy link
Contributor Author

@bouweandela 👍 or 👎

@bouweandela
Copy link
Member

@bouweandela +1 or -1

Please see #776 (comment)

@bouweandela bouweandela changed the title Use dask for vertical_interpolate in _regrid.py Use dask to reduce memory consumption of extract_levels for masked data Oct 2, 2020
Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
with mock.patch(
'stratify.interpolate', return_value=new_data) as mocker:
# first test lazy
loaded_cube = iris.load_cube(self.filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make a cube with realized data lazy, you can do

cube.data = cube.lazy_data()

or

cube.data = da.asarray(cube.data, chunks=(1, 2, 3, 4))

if you want more control over how the lazy array is created. There is no need to first save to file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! I didn't know that, but I wanted to replicate the actual conditions the function runs in anyway

@bouweandela bouweandela merged commit 4f8ed70 into master Oct 5, 2020
@bouweandela bouweandela deleted the optimize_vertical_interpolate branch October 5, 2020 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants