Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with msftyz data #359

Open
zklaus opened this issue Nov 6, 2019 · 4 comments
Open

Problems with msftyz data #359

zklaus opened this issue Nov 6, 2019 · 4 comments
Assignees
Labels
cmor Related to the CMOR standard

Comments

@zklaus
Copy link

zklaus commented Nov 6, 2019

In #310 @ledm wrote:

I've been looking at what data is available and I've seen that the follow variable have CMIP6 model data in the historical r1* ensemble member:

* `msftmyz` : GISS-E2-1-G, GISS-E2-1-G

* `msftmz` : CanESM5, CESM2, CESM2-WACCM, SAM0-UNICON

* `msftyz` : CNRM-CM6-1, CNRM-CM6-1, CNRM-ESM2-1, EC-Earth3-Veg, EC-Earth3-Veg, EC-Earth3-Veg, EC-Earth3-Veg, EC-Earth3-Veg, EC-Earth3-Veg, EC-Earth3-Veg, EC-Earth3-Veg, EC-Earth3-Veg, EC-Earth3-Veg, IPSL-CM6A-LR, HadGEM3-GC31-LL, UKESM1-0-LL

and later

I'm finding a whole series of problems with CMIP6 models data for msftyz. No single CMIP6 dataset can be read with no fixes and some of the changes required are rather extensive.

I had a look at the available data on mistral for msftyz. Removing duplicates from the above list we have

msftyz

  • CNRM-CM6-1
    Loads fine in iris
  • CNRM-ESM2-1
    Loads fine in iris
  • EC-Earth3-Veg
    Loads fine in iris
  • IPSL-CM6A-LR
    Uses an old version of the data request (see Deal with different data_specs_versions in CMIP6 #159)
  • HadGEM3-GC31-LL
    Loads fine in iris
  • UKESM1-0-LL
    Loads fine in iris

Before investigating further, @ledm, what are the problems you are encountering?

@ledm
Copy link
Contributor

ledm commented Nov 6, 2019

As of right now, I have written fixes for all the CMIP6 models. They are available here:
https://github.com/ESMValGroup/ESMValCore/compare/development_amoc_cmip6?expand=1

The fixes are:

  • CNRM models:

    • uses latitude instead of grid_latitude
    • latitude points were integers from 0 to 291 instead of floats between -90 and 90.
    • set region coord var_name to basin.
    • depth coordinate called Vertical W levels instead of depth.
  • EC earth models:

    • set region coord var_name to basin.
  • GISS models:

    • Firstly, they produced msftmyz instead of msftyz. My solution was to create symbolic links with the correct file name. This is not a real solution for distributed dataset.
    • With that in mind, I changed the cube's standard name, long name and variable_id.
    • This model uses lattitude instead of grid_latitude
    • set region coord var_name to basin.
  • HadGEM3 and UKESM:

    • set region coord var_name to basin.
  • IPSL:

    • msftyz has a dimension sized 1, so we need to use iris.util.squeeze to fix it and remove the longitude dimension.
    • depth coordinate called Vertical W levels instead of depth.
    • deleted the entire basin dimension, as there are many things wrong there.
    • create a new basin dimension and add it in.

@zklaus
Copy link
Author

zklaus commented Nov 7, 2019

I know this is rather complicated, but we have to take into account the official CMIP6 Model Output Requirements.

There we find (Requirements for coordinate variables, note 12; page 7)

A few variables are functions of simple index dimensions, recorded in files as “basin”, “line”, “type”, “landuse”, or “soilpools” (these are the altLabels or out_names) which require definition of an auxiliary coordinate variable named “sector” that is pointed to by a “coordinates” attribute attached to the variable. The “sector” auxiliary coordinate will be netCDF type NC_CHAR and will formally be two-dimensional with its second dimension’s size (“strlen”) set to the maximum string length. For each of these auxiliary coordinate variables, a long_name attribute and, when defined, the standard_name attribute should be stored as indicated by “title” and the “standardName” in the data request’s grids section and by the “long_name” and “standard_name” in the CMIP6_coordinate.json file. The values and order of the labels stored in these auxiliary variables should be consistent with the lists specified by “requested” in the data request’s grids section and in the CMIP6_coordinate.json file.

Related is also (Requirements for output variables, note 11; page 11), however, (Requirements for output variables, note 13.f; page 13) talks about (Ofx, basin), ie the variable, not the dimension basin.

The way data should look like is illustrated by (Example 4, page 25).

So we see that both EC-Earth3 and the UK models are correct. Your fixes should not be applied because having a variable with the same name as a dimension generally means that the variable is a (dimension) coordinate variable, but that can not be the case here because dimensional coordinates must be numeric (and monotonous).

The Good News

It's actually very simple to work with this data. Just do a

c.extract(iris.Constraint(region='atlantic_arctic_ocean'))

to get at it.

I will look at the other model families now.

@zklaus
Copy link
Author

zklaus commented Nov 7, 2019

The other models do seem to need some fixes, but they are too involved for me to sort out right now.
Just some remarks:

General

We have to keep clear the concepts of var_name (such as lev), long_name (such as Vertical W levels), and standard_name (such as depth). Then there are CMIP6 dimensions, that are not to be confused with CF/netCDF dimensions. CMIP6 dimensions link variables with coordinates, but this is not always unique.
The pertinent example for this is olevel which refers to any form of ocean levels and looking at CMIP6_coordinate.json we find that there are actually 3 completely different ocean coordinates defined that can stand in for olevel.

In your fix you write

basin = cube.coord('Vertical W levels')
basin.var_name = 'depth'

but this is not renaming the variable from "Vertical W levels" to "depth", instead it is assigning a new var_name, overwriting the previous one.
Whether this is the correct var_name is model specific (see below), but has little to do with the long_name.

Generally speaking, olevel coordinates have an out_name (that's what the var_name is called in the CMIP tables) of lev.

So how do we get the coordinate we want in the most general way?
Use

cube.coord(axis='Z')

This will apply all the (rather involved) logic laid out in the CF conventions to identify the vertical coordinate.

CNRM

latitude coordinate

You are quite right in your approach to fix it. But I wonder: How did you come up with the new latitudes? Considering that the full grid (cf eg (Omon, thetao) has the same number of lats (294) and those go from -80 to 90 it seems a bit surprising that the same latitudes from -80 to 70.
Also, don't forget the bounds.

vertical coordinate

I think the original var_name of lev is correct and should not be changed.
It seems the most appropriate CMIP6 dimension is depth_coord. Looking at the nemo manual, W levels suggest olevhalf with depth_coord_half as an alternative. But really, this is something for the modellers to mull over.

GISS

Let's leave it for now.

IPSL

squeezing is good

vertical coordinate

It applies larger what I said above about CNRM. Only that IPSL is slightly worse with the wrong var_name of olevel, so maybe change that to lev.

basin coordinate

Your idea is correct, I belief. However, the result should be the same as we already find in EC-Earth and the UK models.

@mattiarighi mattiarighi added the cmor Related to the CMOR standard label Jan 7, 2020
@ledm
Copy link
Contributor

ledm commented Jul 15, 2020

I'd like to revive this issue. I've encountered further trouble with the derived variable amoc with CMIP6 . I suspect that the fixes in https://github.com/ESMValGroup/ESMValCore/compare/development_amoc_cmip6?expand=1 may be outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cmor Related to the CMOR standard
Projects
None yet
Development

No branches or pull requests

3 participants