Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check horizontal grid before regridding #507

Merged
merged 23 commits into from
May 14, 2021

Conversation

BenMGeo
Copy link

@BenMGeo BenMGeo commented Feb 21, 2020

Check if horizontal grid of cube is same as of target grid before regridding. Exits on first difference.
Note: No warning or feedback, if regridding is skipped as there is no logging functionality implemented in file yet.

Tasks

  • Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
  • This pull request has a descriptive title that can be used in a changelog
  • Add unit tests
  • Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
  • If writing a new/modified preprocessor function, please update the documentation
  • Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
  • Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.

Closes #485

Benjamin Müller added 3 commits February 21, 2020 09:00
Check if horizontal grid of cube is same as of target grid before regridding. Exits on first difference.
Note: No warning or feedback, if regridding is skipped as there is no logging functionality implemented in file yet.
@BenMGeo BenMGeo self-assigned this Feb 21, 2020
@BenMGeo
Copy link
Author

BenMGeo commented Feb 21, 2020

I'm not sure why the tests fail. Is it, because the tests assume different coordinate names?

@valeriupredoi
Copy link
Contributor

valeriupredoi commented Feb 21, 2020

@BenMGeo the test is failing beause the first conditional in your function is failing ie cube.coord(coord).shape is a MagicMock object eg <Mock name='mock.coord().shape' id='140630128663888'> that differs from cube1 to cube2. In any case, please don't compare bounds at regrid level, especially horizontal regrid level, and here's why:

  • cubes that get passed to the regrid module already have been checked on bounds existence by the CMOR checks and fixes module
  • your conditional is way too strict ie
                if (c1bndval != c2bndval
                        and not np.allclose(c1bndval, c2bndval)):

will fail many times since bounds are not always equal and the differences can be of order 1 that will immediately fail allclose test since allclose is sensitive to <1e-5 see https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html

  • regridding should happen even if latitude or longitude are not x-y coord names -> your first conditional will return coord not found error
  • as a good practice, if we really want to allow for bounds checks we should introduce it as a separate regrid preprocessor option that can be turned on or off by the user, and executed part of the regrid wrapper in preprocessor/__init__.py

🍺

Copy link
Contributor

@valeriupredoi valeriupredoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see my comment 🍺

@valeriupredoi
Copy link
Contributor

right - so I totally misinterpreted the purpose of this PR - just saw the issue related to it 😁 For that case there is no need to go this deep in checks (note that the comments above still stand but they are irrelevant wrt to the purpose of the PR) - you can just check if cube == target_grid provided an extra check that target_grid is a cube 🍺

Copy link
Member

@bouweandela bouweandela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you could also add a unit test for the new functionality

@bouweandela
Copy link
Member

bouweandela commented Feb 24, 2020

I'm not sure why the tests fail. Is it, because the tests assume different coordinate names?

The faling test is trying to check that the following code is executed and the coord method is called with the correct arguments:

xcoord = target_grid.coord(axis='x', dim_coords=True)
ycoord = target_grid.coord(axis='y', dim_coords=True)

with the correct arguments as on the line above. The reason that the test fails, is that you have added an extra call to the coord method with a different argument 'latitude' here:
for coord in ['latitude', 'longitude']:
# Compare shapes.
if cube1.coord(coord).shape != cube2.coord(coord).shape:

@BenMGeo
Copy link
Author

BenMGeo commented Feb 25, 2020

I'm not sure why the tests fail. Is it, because the tests assume different coordinate names?

The faling test is trying to check that the following code is executed and the coord method is called with the correct arguments:

xcoord = target_grid.coord(axis='x', dim_coords=True)
ycoord = target_grid.coord(axis='y', dim_coords=True)

with the correct arguments as on the line above. The reason that the test fails, is that you have added an extra call to the coord method with a different argument 'latitude' here:

for coord in ['latitude', 'longitude']:
# Compare shapes.
if cube1.coord(coord).shape != cube2.coord(coord).shape:

So, I assumed that usually the preprocessor already checked for these specific coord names before. I could simply add x and y, as alternative. Reading the coordinates is an issue as we would also read time and z-coord, if available.

@BenMGeo
Copy link
Author

BenMGeo commented Feb 25, 2020

right - so I totally misinterpreted the purpose of this PR - just saw the issue related to it grin For that case there is no need to go this deep in checks (note that the comments above still stand but they are irrelevant wrt to the purpose of the PR) - you can just check if cube == target_grid provided an extra check that target_grid is a cube beer

This does not catch the case that a self defined target_grid is fitting the cube in the function "accidentally". This is why I tried to include the checks on a different level.

@BenMGeo
Copy link
Author

BenMGeo commented Feb 25, 2020

It would be nice if you could also add a unit test for the new functionality

Will do. But I wanted to understand the failing of the current tests before.

Benjamin Müller and others added 3 commits February 25, 2020 07:11
Co-Authored-By: Bouwe Andela <bouweandela@users.noreply.github.com>
@bouweandela
Copy link
Member

So, I assumed that usually the preprocessor already checked for these specific coord names before.

Yes, the preprocessor has checked that the coordinates have the correct names, so I think it's probably fine to use the names to find the horizontal coordinates.

I could simply add x and y, as alternative.

It would be good if you could have a careful look at the failing test. While simply using the same arguments in the second call to the coord method might make the test pass, I'm not sure it will still test what the author of the test intended, i.e. that this code is correctly executed:

xcoord = target_grid.coord(axis='x', dim_coords=True)
ycoord = target_grid.coord(axis='y', dim_coords=True)

@mattiarighi mattiarighi added the preprocessor Related to the preprocessor label Feb 26, 2020
new grid checker for avoiding horizontal regriding needs additional calls
@BenMGeo BenMGeo requested a review from valeriupredoi March 9, 2020 13:54
Copy link
Member

@bouweandela bouweandela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ben,

Thanks for adding a first test. I changed it a bit, so the check if the grids are the same always returns False in the existing tests. If you would like more information on mocking, you can have a look at the unittest.mock documentation. It may seem daunting at first, but actually it's not that complicated, the basic idea is that you replace parts of your code with Mock objects, which do nothing at all (except maybe return a value, specified through the side_effect argument/attribute). Mocking can help to make tests more modular, that is why it is a popular technique.

It would be really great if you could also add a test that shows that cubes with a different grid actually get regridded to the target grid, while cubes with an identical grid are not regridded at all. There should be no need for mocking to do that ;-)

@BenMGeo
Copy link
Author

BenMGeo commented Mar 23, 2020

It would be really great if you could also add a test that shows that cubes with a different grid actually get regridded to the target grid, while cubes with an identical grid are not regridded at all. There should be no need for mocking to do that ;-)

I thought that this is already included in the tests that are there at the moment (excluding shifted bounds).

@bouweandela
Copy link
Member

I thought that this is already included in the tests that are there at the moment (excluding shifted bounds).

I checked and indeed there are tests that regridding works in tests/integration/preprocessor/_regrid/test_regrid.py. However, for the new functionality you added, i.e. regridding is skipped if the grids are the same, there is of course no test yet. Maybe you could add it there?

@bouweandela
Copy link
Member

Friendly ping @BenMGeo Any plans to complete this pull request?

@BenMGeo
Copy link
Author

BenMGeo commented Jun 15, 2020

Friendly ping @BenMGeo Any plans to complete this pull request?

Planned for the next 2 weeks while processing huge data. hopefully can implement reasonable tests in the end.

@stefsmeets stefsmeets assigned stefsmeets and unassigned BenMGeo Apr 26, 2021
@stefsmeets
Copy link
Contributor

Hi @BenMGeo , this PR has not seen much activity lately. No worries! Since it is so close to being finished, I hope you do not mind that I help out to get it over the finish line! 👍

@stefsmeets
Copy link
Contributor

Hi @bouweandela and @valeriupredoi , from my understanding there are still two open issues with this PR:

It would be really great if you could also add a test that shows that cubes with a different grid actually get regridded to the target grid, while cubes with an identical grid are not regridded at all. There should be no need for mocking to do that ;-)

I have added a test for this case (test_regrid_is_skipped_if_grids_are_the_same).

Yes, that should be fine. Or you could just make one of the coordinates of equal shape but with different bounds, then you also have this case covered.

I added a test case with the same shape, but different bounds both lat and lon.

I also refactored the new tests a bit to use pytest instead of the old unittest library. Could you have a look whether your concerns have been addressed?

@stefsmeets stefsmeets requested a review from bouweandela April 26, 2021 14:45
Copy link
Member

@bouweandela bouweandela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @BenMGeo and thanks for wrapping this up @stefsmeets! 🥳

@bouweandela bouweandela merged commit b46829e into master May 14, 2021
@bouweandela bouweandela deleted the BenMGeo-skip-regrid-if-same branch May 14, 2021 10:06
@BenMGeo
Copy link
Author

BenMGeo commented May 14, 2021

Thanks Peeps for finishing this.
I'm sorry that I answer so late. Hard time to contribute further, these days :(

senesis pushed a commit that referenced this pull request Jun 7, 2021
…grid (#507)

Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
Co-authored-by: Stef Smeets <s.smeets@esciencecenter.nl>
zklaus pushed a commit that referenced this pull request Jun 11, 2021
* Add basic support for variable mappings

* Add first era5 mapping

* Find files for CMIP6 DCPP startdates (#771)

* First attempte

* Do not require start and end years, add them later

* Correct condition

* Avoid key error in fx variables

* Consider two possible paths

* Fix function name

* Fix variable name

* Avoid duplicates in filename

* Add test for startdate expansion

* Add test for the replace tags method

* Rename tag

* Add documentation

* Allow to load subexps per timerange or as a whole

* Fix condition

* Remove 'all_years' functionality

* Fix conditions

* Fix flake

* Remove whitespace

Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es>

* Skip regridding if the target grid is almost identical to the source grid (#507)

Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
Co-authored-by: Stef Smeets <s.smeets@esciencecenter.nl>

* Fixes for sos and siconc of BCC models (#1090)

* sos and siconc fixed

* tests added

* test fixed

* fix flake8

* fix flake8

* fix codacy issue

* Pin cf-units and fix tests (cf-units>=2.1.5) (#1140)

* pin cf-units

* pin cf-units

* fix test

* fix test

* Handle IPSL-CM6 (the feature won't actually work without #1124)

* class Huss inherits from cass Tas. Also : Fix codacy diags.

* Replace os.system() by subprocess.run()

* Fix flake8 diags

* var_mapping -> extra_facets

* Rename _config/variable_details to _config/extra_facets

* Fix doc re. lack of 'output_file as a dict', and choice of native6

* Fix codacy diags in ipsl_cm6.py

* Use project IPSLCM to handle IPSL-CM6

* Implement changes according to Bouwe's review, 2021/06/07 (except unit tests)

* Add unit tests for _fixes/ipslcm/ipsl_cm6.py

* delete esmvalcore/cmor/_fixes/native6/ipsl_cm6.py

* Delete old file esmvalcore/_config/extra_facets/native6-ipsl-cm6-mappings.yml

* Restore main versions for _recipe.py and cmor_fixes/fix.py

* Restore main version for _recipe.py

* Delete extraneous era5-mappings.yml

* Avoid using mapping_key when calling fix.get_cube_from_list()

* Empty change in fix.py for forcing codacy to re-scan it

* Polish doc

* Polish doc again

* Again...

* and again ...

* Fix typo in comment

* Fixes according to @zklaus review

* Reduce formatting changes

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/develop/fixing_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/quickstart/find_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Update doc/quickstart/find_data.rst

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

* Minor formatting improvements

* Organize mapping file in each realm in two sections (CMIP6 and IPSL)

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>
Co-authored-by: sloosvel <45196700+sloosvel@users.noreply.github.com>
Co-authored-by: Javier Vegas-Regidor <javier.vegas@bsc.es>
Co-authored-by: Benjamin Müller <b.mueller@iggf.geo.uni-muenchen.de>
Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl>
Co-authored-by: Stef Smeets <s.smeets@esciencecenter.nl>
Co-authored-by: Rémi Kazeroni <70641264+remi-kazeroni@users.noreply.github.com>
Co-authored-by: Valeriu Predoi <valeriu.predoi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preprocessor Related to the preprocessor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

regrid preprocessor regrids dataset though it has the same grid specification as target grid
5 participants