Change volume stats to handle and output masked array result #618

valeriupredoi · 2020-04-20T12:49:36Z

Before you start, please read CONTRIBUTING.md.

Tasks

Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
This pull request has a descriptive title that can be used in a changelog
Add unit tests
Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
If writing a new/modified preprocessor function, please update the documentation
Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.

no, it's just being petty, it can bugger off

Please use yamllint to check that your YAML files do not contain mistakes
If you make backward incompatible changes to the recipe format, make a new pull request in the ESMValTool repository and add the link below

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.

Closes #611

sloosvel · 2020-04-21T08:51:34Z

Thanks @valeriupredoi . I'm still getting NaNs though. This is what the last elements of column and depth_volume look like respectively when debugging:

column:
72:array(0.13211952, dtype=float32)
73:array(0.13278724, dtype=float32)
74:masked_array(data=--, mask=True, fill_value=1e+20, dtype=float64)

depth_volume:
72:4726634600000000.0
73:3180468400000000.0
74:masked

I think it's because of the masked depth_volume, because I changed the last value to 0 just to test with a non-masked weight and the results I got are not nans anymore. So it looks like np.ma.average is not able to handle masked weights after all.

valeriupredoi · 2020-04-21T09:25:17Z

@sloosvel indeed, np.ma.average() is being silly:

In [10]: np.ma.average(np.ma.array([1, 2, 3, 5]), weights=np.ma.array([2,2,2,2], mask=[True, True, True, False]))
Out[10]: 5.0

In [11]: np.ma.average(np.ma.array([1, 2, 3, 5]), weights=np.ma.array([2,2,2,2], mask=[True, True, True, True]))
Out[11]: nan

and it looks like the actual logic is correct from the documentation since the statistic is computed using sum(weights) as denominator (obv, weighted mean heh) so in case of a fully masked weights array we are a bit stuck - should probably perform a simple mean in that case since weights are completely irrelevant but that could assign a bit more weight to that particular column- I am not sure how to approach from the statistical point of vew - @ledm and advice?

valeriupredoi · 2020-04-21T09:26:39Z

or we can just completely mask that column?

valeriupredoi · 2020-04-21T09:54:20Z

OK I masked it completely to get rid of nan's

sloosvel · 2020-04-21T11:37:03Z

I think the routine is computing the 2D average first and then the average over all levels. So the output of averaging column should be the total average. What these last commits seem to be doing is just set the total result from nan to masked, which would mean that the output would still be empty.

Anyway, I think it's a bit strange to have a completely masked level in volcello.

valeriupredoi · 2020-04-21T11:48:42Z

so that's the average volume per time point with computed average over a set of columns at different depths, weighted by each column's volume; the code changes mask those values that have weights masked equivalent to saying if the column volume is not viable as a weight there (coz it's masked) it means I may not have enough information about that particular column, and extrapolating this - if all my columns have masked values I don't have enough information to compute an average volume at that time point so I am masking it - I think it's fine

sloosvel · 2020-04-23T07:39:54Z

I don't know that's seems overly complicated to me. I took a closer look and it turns out that depth_volume is a list. I changed its type to np.ma.array and I don't get nans anymore.

valeriupredoi · 2020-04-23T11:12:02Z

@sloosvel this is the generalized approach, note that even if depth_volume is a masked array and its values are all masked you will still get nan's ie np.ma.average(np.ma.array([1, 2, 3, 5]), weights=np.ma.array([2,2,2,2], mask=[True, True, True, True])) returns nan

valeriupredoi · 2020-04-30T11:05:26Z

any news on this @sloosvel 🍺

sloosvel · 2020-04-30T14:23:11Z

I am not sure if a case in which all depth_volume values are masked should be taken into account. That would mean that the volcello files consist of only masked values. Which would make the data not really useful.

There are several models that have the last layer of the ocean data fully masked. This routine is not handling well this case because the depth_volume used in np.average (or np.ma.average) has a masked value in the last level and is declared as a list instead of being a masked array. If the type is changed to masked array before the final averaging, I get proper results instead of a result with just NaN values.

valeriupredoi · 2020-04-30T14:43:08Z

cool! then apply whatever changes are needed here, job done by the sounds of it 🍺

…to change_volume_stats_to_masked_result

jvegreg · 2020-05-05T14:31:10Z

@ledm @bouweandela Can you please review this? It is blocking us

…to change_volume_stats_to_masked_result

valeriupredoi

ok by me, shame I can't approve my own PR that is in fact not my PR 😁 @sloosvel does your implementation sort out all them nan'd?

sloosvel · 2020-05-05T14:55:04Z

I added a test for nans and looks like it's passing

valeriupredoi · 2020-05-05T15:11:29Z

Indeed it do 😁 What's with all the deleted blank lines? New Flake8 conventions for docstrings? - there has to be a blank line between docstring body and eg Parameters----

sloosvel · 2020-05-06T09:43:06Z

I copy-pasted code and probably messed the format while doing so, will correct now.

bouweandela · 2020-05-19T11:35:40Z

@ledm @bouweandela Can you please review this? It is blocking us

@jvegasbsc Please feel free to help out with reviewing, it is impossible for me to review every single pull request in a timely manner.

tests/unit/preprocessor/_volume/test_volume.py

sloosvel · 2020-05-20T08:57:36Z

What if all values for a particular time step are masked? Are you sure that never happens?

@bouweandela Well if you want to consider that case I think the easiest is to save the result in line 275 in np.ma.array(result) instead of np.array(result), let me know what you think. I don't know whether or not this happens very often or rarely. It's not the case for which I opened the issue. The main problem that I was having is that the average was being computed with lists that where not handling well the presence of a masked value in the volume. So I changed the implementation to use masked arrays instead.

ya that was my previous implementation that looked a bit overkill (inc a test for it) but @sloosvel assures me that can't happen

@valeriupredoi Your implementation was not working for the datasets that I was testing. Because the average was still being computed with lists instead of masked arrays, so the result was a NaN that was getting masked at the very end. I cannot have average_volume = NaN (which is what happens when using master) or average_volume = masked (which is what happens when using your implementation) for all ocean variables in all of our EC-Earth experiments, it does not make sense and it's not the result that we get using other old packages in the department.

bouweandela · 2020-05-20T09:20:53Z

Well if you want to consider that case I think the easiest is to save the result in line 275 in np.ma.array(result) instead of np.array(result), let me know what you think.

Yes, that would be a good solution.

bouweandela · 2020-05-20T10:00:31Z

@mattiarighi Could you please test and merge when you're happy with the result?

mattiarighi · 2020-05-22T09:29:55Z

@sloosvel can you provide a test case (specific model and variable(s) to test)?

sloosvel · 2020-05-27T08:44:38Z

@mattiarighi Sorry for the delay, our data it's on its way to be published but it's not available yet. The HadGEM3-GC31-LL historical data could be another example of a volcello with a fully masked level, but I'm having concatenation errors with this dataset.

mattiarighi · 2020-05-27T08:47:03Z

The changes in code look fine to me.
If this solves your issue please approve the PR and I will merge.

valeriupredoi added 2 commits April 20, 2020 13:47

added masked results tests

cb755e7

masked output of volume stats

e04f2a8

valeriupredoi added the enhancement New feature or request label Apr 20, 2020

valeriupredoi requested review from ledm and sloosvel April 20, 2020 12:49

removed redundant equality

4aeb453

valeriupredoi mentioned this pull request Apr 20, 2020

Empty cubes after volume_statistics for certain datasets #611

Closed

valeriupredoi changed the title ~~Change volume stats to handle and outpt masked array result~~ Change volume stats to handle and output masked array result Apr 20, 2020

valeriupredoi requested a review from bouweandela April 20, 2020 14:50

masking the nan resulted from fully masked weights array

9a04909

valeriupredoi added 3 commits April 21, 2020 11:13

fixed codacy

e128de0

nicer implementation to remove nans

4bc98e2

added test for nans

c5ea0bf

sloosvel and others added 2 commits April 30, 2020 18:31

Convert lists to masked arrays

0adcaf1

Merge branch 'master' of https://github.com/ESMValGroup/ESMValCore in…

73f55c9

…to change_volume_stats_to_masked_result

Merge branch 'master' of https://github.com/ESMValGroup/ESMValCore in…

9850680

…to change_volume_stats_to_masked_result

valeriupredoi commented May 5, 2020

View reviewed changes

Fix format

bf93eba

bouweandela reviewed May 19, 2020

View reviewed changes

tests/unit/preprocessor/_volume/test_volume.py Outdated Show resolved Hide resolved

Consider fully masked timesteps

af0cd9b

bouweandela approved these changes May 20, 2020

View reviewed changes

valeriupredoi mentioned this pull request May 26, 2020

Make preprocessor function volume_statistics lazy #647

Closed

sloosvel approved these changes May 27, 2020

View reviewed changes

mattiarighi merged commit c80c8e4 into master May 27, 2020

mattiarighi deleted the change_volume_stats_to_masked_result branch May 27, 2020 08:55

bouweandela added the preprocessor Related to the preprocessor label Jul 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change volume stats to handle and output masked array result #618

Change volume stats to handle and output masked array result #618

valeriupredoi commented Apr 20, 2020 •

edited

Loading

sloosvel commented Apr 21, 2020

valeriupredoi commented Apr 21, 2020

valeriupredoi commented Apr 21, 2020

valeriupredoi commented Apr 21, 2020

sloosvel commented Apr 21, 2020

valeriupredoi commented Apr 21, 2020

sloosvel commented Apr 23, 2020

valeriupredoi commented Apr 23, 2020 •

edited

Loading

valeriupredoi commented Apr 30, 2020

sloosvel commented Apr 30, 2020

valeriupredoi commented Apr 30, 2020

jvegreg commented May 5, 2020

valeriupredoi left a comment

sloosvel commented May 5, 2020

valeriupredoi commented May 5, 2020 •

edited

Loading

sloosvel commented May 6, 2020

bouweandela commented May 19, 2020 •

edited

Loading

sloosvel commented May 20, 2020 •

edited

Loading

bouweandela commented May 20, 2020

bouweandela commented May 20, 2020

mattiarighi commented May 22, 2020

sloosvel commented May 27, 2020

mattiarighi commented May 27, 2020

Change volume stats to handle and output masked array result #618

Change volume stats to handle and output masked array result #618

Conversation

valeriupredoi commented Apr 20, 2020 • edited Loading

sloosvel commented Apr 21, 2020

valeriupredoi commented Apr 21, 2020

valeriupredoi commented Apr 21, 2020

valeriupredoi commented Apr 21, 2020

sloosvel commented Apr 21, 2020

valeriupredoi commented Apr 21, 2020

sloosvel commented Apr 23, 2020

valeriupredoi commented Apr 23, 2020 • edited Loading

valeriupredoi commented Apr 30, 2020

sloosvel commented Apr 30, 2020

valeriupredoi commented Apr 30, 2020

jvegreg commented May 5, 2020

valeriupredoi left a comment

Choose a reason for hiding this comment

sloosvel commented May 5, 2020

valeriupredoi commented May 5, 2020 • edited Loading

sloosvel commented May 6, 2020

bouweandela commented May 19, 2020 • edited Loading

sloosvel commented May 20, 2020 • edited Loading

bouweandela commented May 20, 2020

bouweandela commented May 20, 2020

mattiarighi commented May 22, 2020

sloosvel commented May 27, 2020

mattiarighi commented May 27, 2020

valeriupredoi commented Apr 20, 2020 •

edited

Loading

valeriupredoi commented Apr 23, 2020 •

edited

Loading

valeriupredoi commented May 5, 2020 •

edited

Loading

bouweandela commented May 19, 2020 •

edited

Loading

sloosvel commented May 20, 2020 •

edited

Loading