Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix units after derivation #754

Merged
merged 12 commits into from
Oct 9, 2020
Merged

Fix units after derivation #754

merged 12 commits into from
Oct 9, 2020

Conversation

schlunma
Copy link
Contributor

@schlunma schlunma commented Aug 19, 2020

Adds checks if the units after executing a derivation script are compatible to the CMOR table and converts the units if necessary. Since not all derivation scripts have unit tests, I do not know if all of these still work. However, the changes in this PR only raise an exception if the units are not convertible (e.g. m and kg) and in this case the tool should definitely not silently set the units to the excepted ones like it does right now.

Tasks

  • Create an issue to discuss what you are going to do, if you haven't done so already (and add the link at the bottom)
  • This pull request has a descriptive title that can be used in a changelog
  • Add unit tests
  • Public functions should have a numpy-style docstring so they appear properly in the API documentation. For all other functions a one line docstring is sufficient.
  • If writing a new/modified preprocessor function, please update the documentation
  • Circle/CI tests pass. Status can be seen below your pull request. If the tests are failing, click the link to find out why.
  • Codacy code quality checks pass. Status can be seen below your pull request. If there is an error, click the link to find out why. If you suspect Codacy may be wrong, please ask by commenting.
  • Please use yamllint to check that your YAML files do not contain mistakes
  • If you make backward incompatible changes to the recipe format, make a new pull request in the ESMValTool repository and add the link below

If you need help with any of the tasks above, please do not hesitate to ask by commenting in the issue or pull request.


Closes #750

@schlunma schlunma added bug Something isn't working variable derivation Related to variable derivation functions labels Aug 19, 2020
@schlunma schlunma self-assigned this Aug 19, 2020
@bouweandela
Copy link
Member

It would be good to test if the currently implemented derivation scripts actually work, instead of just adding some warnings and leaving the problems for our users. Maybe you could update the recipe examples/recipe_preprocessor_derive_test.yml so it includes all currently implemented derivation functions and then we can at least test it for for both a CMIP5 and a CMIP6 dataset.

@schlunma
Copy link
Contributor Author

Good idea! I totally forgot that this recipe existed. I will update it and test all variables for CMIP5 and CMIP6 models.

@schlunma
Copy link
Contributor Author

schlunma commented Aug 21, 2020

I updated the recipe in ESMValGroup/ESMValTool#1790 with the new derived variables and added CMIP6 models.

I had to fix several issues in the derivation scripts (but only one is related to this PR):

  • Compared to CMIP5, some standard_names of the required variables changed in CMIP6. Since some derivation scripts used this as a constraint for extracting cubes, these failed in CMIP5. I replaced these constraints with var_name_constraint.
  • In other cases, the short_names of the required variables changed from CMIP5 to CMIP6. Since the automatic conversion does not work right now (see Improve variable alias managament #595), I adapted the get_required function of the affected variables.
  • For toz, the desired units are Dobson units. Those can be either 0.4462 mmol m-2 or 2.687e20 m-2. Since cf_units uses the first definition and we used the latter, the derivation of toz in this PR raised a ValueError due to incompatible conversions. I fixed this in the derivation script.
  • For ohc I could not test CMIP6 data due to the lack of datasets that provide volcello and thetao.
  • For amoc, the required variables is msftmyz which does not exist in CMIP6. Since I do not know the CMIP6 equivalent, I skipped this for CMIP6. There is already some discussion about this in Problems with msftyz data #359, maybe @ledm or @zklaus can comment on this.

Please note that I only tested if the test recipe ran successful. I did not check if the resulting data is scientifically correct.

@bouweandela
Copy link
Member

I updated the recipe in ESMValGroup/ESMValTool#1790 with the new derived variables and added CMIP6 models.

Great!

For ohc I could not test CMIP6 data due to the lack of datasets that provide volcello and thetao.

A synda search for volcello and thetao in CMIP6 gives me quite a few results:

synda search project=CMIP6 variable_id=thetao

For amoc, the required variables is msftmyz which does not exist in CMIP5.

The variable is defined in the CMIP5 Omon table. A synda search for msftmyz in CMIP5 also gives some results:

synda search project=CMIP5 variable=msftmyz

@schlunma
Copy link
Contributor Author

A synda search for volcello and thetao in CMIP6 gives me quite a few results:

synda search project=CMIP6 variable_id=thetao

I found NorCPM1 (that was the only one with volcello for the historical run) and added it to the recipe.

For amoc, the required variables is msftmyz which does not exist in CMIP5.

The variable is defined in the CMIP5 Omon table. A synda search for msftmyz in CMIP5 also gives some results:

synda search project=CMIP5 variable=msftmyz

Sorry, I meant CMIP6. CMIP5 is already included in the recipe.

Copy link
Member

@bouweandela bouweandela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left a small suggestion for writing the tests in a nicer way. Only if you have time.

@bouweandela
Copy link
Member

@axel-lauer Do you think it would be possible to find someone to do a scientific review of this pull request? That would require running the recipe in ESMValGroup/ESMValTool#1790 and having a look at the results.

It would be great if this could still be included in v2.1, but the deadline is already this Monday, so maybe it will have to wait for the next release.

@schlunma
Copy link
Contributor Author

schlunma commented Oct 2, 2020

Change the derivation of vegfrac and ohc accoring to this comment, but did not test though.

@bouweandela
Copy link
Member

@axel-lauer Would you be able to do a final test of the changes you proposed, so we can still include this in v2.1? We're doing the feature freeze today, but since this is a bugfix, we could still include it if it gets approved soon.

@axel-lauer
Copy link
Contributor

I'll test the changes and report back soon...

@axel-lauer
Copy link
Contributor

There were two corrections introduced: och and vegfrac. The changes to ohc look good, the values for CMIP6 models are now in the expected range. vegfrac still looks weird (e.g. vegfrac=100% in Antarctica). This is not related to this PR or the corrections introduced but I think it would be good to get this fixed now.

vegfrac is currently calculated as 100.0 - baresoilfrac (in %)

The CMOR tables (CMIP5) say:
baresoilFrac = fraction of entire grid cell that is covered by bare soil.
residualFrac = Fraction of Grid Cell that is Land but Neither Vegetation-Covered nor Bare Soil

From this, I would presume vegfrac should actually be calculated as follows:

vegfrac = 100.0 - baresoilfrac - residualfrac

@ledm , I think you introduced vegfrac. Can you confirm that this would be correct? Please let us know what you think s soon as possible.

@axel-lauer
Copy link
Contributor

@ledm do you hav an opinion on correcting vegfrac? If not, I propose to change the calculation of vegfrac to what I think is correct:

vegfrac = 100.0 - baresoilfrac - residualfrac

@schlunma could you implement this so we can get this merged?

@schlunma
Copy link
Contributor Author

schlunma commented Oct 8, 2020

Fixed vegfrac (I also adapted the recipe in ESMValGroup/ESMValTool#1790). The output looks better now, but there are weird values at the coast of Antarctica and Greenland:

vegfrac

While testing this I encountered an issue with the CMOR checker. residualFrac has the coordinate typeresidual which has no standard name in the CMOR table, i.e. ''. However, since iris gives None for an empty standard name, the checker tool fails with

type: standard_name should be , not None

I added a fix for this:

                 try:
                     cube_coord = self._cube.coord(var_name=coordinate.out_name)
-                    if cube_coord.standard_name != coordinate.standard_name:
+                    if (cube_coord.standard_name is None and
+                            coordinate.standard_name == ''):
+                        pass
+                    elif cube_coord.standard_name != coordinate.standard_name:
                         self.report_critical(
                             self._attr_msg,
                             coordinate.out_name,

This isn't really nice but does the job. The optimal solution here would be when iris.cube.Cube.standard_name == '' evaluates True when the standard name is not set (i.e. is None).

@bouweandela
Copy link
Member

This isn't really nice but does the job. The optimal solution here would be when iris.cube.Cube.standard_name == '' evaluates True when the standard name is not set (i.e. is None).

I think the best solution would be to change the way we read the cmor tables so we read this in as None instead of as '', but let's do that for the next release.

@axel-lauer Would you be able to test the latest changes?

@axel-lauer
Copy link
Contributor

I tested the latest changes and it works. As @schlunma already noticed, however, there are spurious values along the coast lines. I presume these are caused by us assuming a land fraction of 100% for all land grid cells, i.e.:

vegfrac = 100% - baresoil - residualfrac

I think to get rid of those spurious coastal values, we would have to use the "real" land fraction and calculate vegfrac as:

vegfrac = landfrac - baresoil - residualfrac

Here is the "corrected" example from @schlunma. This now looks fine to me.

vegfrac

As "landfrac" I used the atmospheric fx file "sftlf". I guess to generalize this, a check would be needed if baresoil / residualfrac and sftlf are on the same grid.

@axel-lauer
Copy link
Contributor

Thanks @schlunma! I tested this and it works. I would leave it like this for now. Let's get this merged then!

@bouweandela bouweandela merged commit 04c029d into master Oct 9, 2020
@bouweandela bouweandela deleted the fix_units_after_derivation branch October 9, 2020 14:25
valeriupredoi pushed a commit that referenced this pull request Oct 9, 2020
* Added check for units at the end of derivation function

* Fixed tests for variable derivation interface

* Adapted several derivation scripts

* Adapted sm derivation script for CMIP6

* Fixed derivation of vegfrac and ohc

* Fixed CMOR checker when coordinate standard_name is empty
@valeriupredoi
Copy link
Contributor

good stuff, guys! we have just cherry-picked this PR in the 2.1 release 🍺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working variable derivation Related to variable derivation functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong units after derivation of variable
4 participants