Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline observations download #1657

Merged
merged 222 commits into from
Feb 25, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
222 commits
Select commit Hold shift + click to select a range
2975dcb
Update docker and create action
Aug 20, 2020
aee817c
Remove Julia
Aug 20, 2020
336eb1f
Update dockerfiles
Aug 24, 2020
812d63d
Update dockerfile
Aug 24, 2020
13c54c0
First refactor of download scripts
May 12, 2020
c133be8
Add support for ESACCI data through ftp
May 12, 2020
8194493
Add progress bar to ftp and esacci_fire
May 12, 2020
11e68b4
Fix fire and improve ftp progress bar
May 13, 2020
39dd435
Added wget download support
May 13, 2020
ff6fb40
More work on NASA datasets
May 13, 2020
c67c923
Fix flake8 and reorganize
May 13, 2020
5c1a814
Add berkeley earth downloader
May 13, 2020
554fcc1
Added calipso
May 13, 2020
103c1a5
Add ESACCI-OC download
May 13, 2020
bb49b1c
Add ncep and common base class
May 13, 2020
a1cda97
Add progressbar2 dependency
May 13, 2020
e7f5043
Fix flake8 tests
May 14, 2020
c46a388
Add data command to ESMValTool
May 14, 2020
32ce996
Reorganize and rename formatters
May 14, 2020
0fbe69f
Fix tests
May 14, 2020
12db72d
Fix tests
May 15, 2020
28ed187
Refactor download and format
May 26, 2020
76261e4
Add install and fix some scripts
May 27, 2020
1b1263f
Fix BerkeleyEarth
Aug 18, 2020
acb5caf
Add APHRO-MA downloader
Aug 19, 2020
7cb30c5
Fix CALIPSO-GOCCP
Aug 19, 2020
ca61c59
Add AURA-TES downloader
Aug 19, 2020
43d6fd0
Updated CDS datasets
Aug 20, 2020
cdef8ff
Update CowtanWay
Aug 31, 2020
e551616
Add CRU downloads
Aug 31, 2020
d9833e1
Add ESACCI-Aerosol
Aug 31, 2020
c9bcefd
Fix get_year
Aug 31, 2020
0d9712d
Fix ESACCI-Cloud
Aug 31, 2020
165c857
Add Duveiller2018
Aug 31, 2020
cbb2115
Add CT2019
Aug 31, 2020
83d9805
Add Eppley-VGPM-MODIS
Aug 31, 2020
67bfa07
Fix flake8
Aug 31, 2020
6da93c4
Add lots of datasets
Sep 1, 2020
ea4440b
Add HALOE
Sep 1, 2020
c24ea1d
Add HadCRUT and HadISST
Sep 1, 2020
2074213
Add GPCC
Sep 1, 2020
1141977
Add ISCCP-FH
Sep 1, 2020
d204c24
Add Landschuetzer2016
Sep 1, 2020
50a6e6c
Add LandFlux-EVAL
Sep 1, 2020
c00e3a3
Fix Flake8
Sep 1, 2020
2857de3
Add PERSIANN
Sep 1, 2020
fa0c45c
Add PHC
Sep 1, 2020
d6f156a
Add WOA
Sep 1, 2020
d00f174
Add REGEN
Sep 1, 2020
e295666
Fix flake8
Sep 4, 2020
7cac5e2
Move and reorganize
Sep 14, 2020
729368c
Added info for a bunch of new datasets
Sep 14, 2020
8a8da80
Finished added info for all datasets
Sep 15, 2020
6f07581
Fix Flake8
Sep 15, 2020
98ff5b2
Improve and fix tests
Sep 15, 2020
6eda8c7
Fix a bunch of codacy issues
Sep 15, 2020
5c49fdb
Fix codacy issues
Sep 16, 2020
c6e19b3
Improve doc
Sep 16, 2020
62b6572
Improve doc
Sep 17, 2020
b20fbf5
Improve doc
Sep 17, 2020
e0a3cec
Merge remote-tracking branch 'origin/master' into refactor_downloads
Sep 18, 2020
825eaec
Add ESACCI-SOS
Sep 21, 2020
806135a
Make ESACCI_SOS use OBS6
Sep 29, 2020
ab31ade
Fix data info
Oct 19, 2020
c01b4bd
Fix paths
Oct 19, 2020
c24b6f2
Fix utilities
Oct 19, 2020
8f158d6
Merge branch 'master' of https://github.com/ESMValGroup/ESMValTool in…
Nov 25, 2020
84a0255
Make utils public and add info for last dataset
Nov 25, 2020
d88506a
Fix flake8
Nov 25, 2020
5b111cb
Fix esrl cmor script
Nov 25, 2020
4958bbf
Test datasets info
Nov 25, 2020
35ebbf1
Pass pre-commit hooks
Nov 25, 2020
31d8fc5
Pre-commit datasets
Nov 25, 2020
78e57a3
Pre commit formatters passed
Nov 25, 2020
5907fb8
Merge branch 'master' of https://github.com/ESMValGroup/ESMValTool in…
Jan 19, 2021
467ed49
Remove unwanted changes
Jan 19, 2021
48594b1
Add latest changes
Jan 19, 2021
efffda8
Remove unwanted file
Jan 19, 2021
882074b
Fix some tests
Jan 19, 2021
1fa5dd3
Merge branch 'main' of https://github.com/ESMValGroup/ESMValTool into…
Aug 4, 2021
0b97bca
Format files
Aug 4, 2021
65a94fe
Fix some tests
Aug 4, 2021
997d373
Fix CDS-SATELLITE-ALBEDO
Aug 5, 2021
ff04320
Fix tests
Aug 5, 2021
9c58535
Fix format
Sep 2, 2021
974d240
Add source to displayed info
Sep 2, 2021
b3dfee1
Fix overwrite with wget
Sep 2, 2021
38acdd2
Hide some name errors
Sep 2, 2021
8ddfa55
Fix ESACCI Aerosol download
Sep 23, 2021
ae085d7
Merge branch 'main' of https://github.com/ESMValGroup/ESMValTool into…
Sep 23, 2021
b8a7dcc
Make dataset list mandatory
Sep 28, 2021
53bd898
Merge branch 'main' of https://github.com/ESMValGroup/ESMValTool into…
Nov 29, 2021
246cfba
Fix GLODAP and flake8
Nov 29, 2021
60899b3
Fix GCP
Nov 29, 2021
9c37eb7
Add HadCRUT5 downloader
Nov 29, 2021
6d8697d
Add last datasets to the datasets.yml
Nov 29, 2021
fb5444c
Fix path for tests
Nov 29, 2021
7857d80
Fix WOA
Nov 29, 2021
fa17d7b
First bunch of fixes for the tier issue
Nov 29, 2021
9803179
Fix NDP and other datasets
Nov 30, 2021
1a220a2
Fix OSI-450 cmorization and add downloader
Nov 30, 2021
7696958
Fix MTE
Nov 30, 2021
32068e0
Fix cmor interface
Nov 30, 2021
b04802b
Fix schema
Nov 30, 2021
ba97bef
Fix prepare
Nov 30, 2021
8bb1271
Fix doc
Nov 30, 2021
03c8de4
Add tests for downloader interfaces
Nov 30, 2021
2749033
Fix data download interface test
Dec 1, 2021
c1f899a
Fix default start and end dates in downloaders
Dec 1, 2021
645e557
Fix PATMOS-x and PERSIANN-CDR downloads
Dec 1, 2021
726f55d
Fix some review comments
Dec 1, 2021
9ff1881
Fix some review comments
Dec 1, 2021
4922839
Fixes for ESACCI-OZONE
Dec 2, 2021
5cb1fd8
Fixes interface calls for NCL
Dec 2, 2021
165e6ae
Removed oudated and not working downloaders
Dec 2, 2021
47d91bf
Fix temp folder name according to command called
Dec 2, 2021
a105880
FIx NDP
Dec 6, 2021
88866d9
Fix NDP doc
Dec 6, 2021
ca5630b
Merge branch 'main' of https://github.com/ESMValGroup/ESMValTool into…
Dec 6, 2021
9828c31
Fix hardcode tier in NASA downloader
Dec 6, 2021
6939c75
Update esmvaltool/cmorizers/data/formatters/datasets/cds_satellite_so…
Dec 7, 2021
1274e86
Apply comments from Remi review
Dec 7, 2021
df88fa3
Do not gues_bounds if they are already there
Dec 7, 2021
131ccf2
Fix NDP
Dec 7, 2021
44d2937
Fix CDS Soil Moisture
Dec 7, 2021
501cd88
Set default file_pattern for CDS
Dec 7, 2021
1691332
Merge branch 'main' of https://github.com/ESMValGroup/ESMValTool into…
Dec 7, 2021
ce01a65
Fix ESACCI-SOS
Dec 7, 2021
15f924c
Fix CDS-XCH4
Dec 7, 2021
baf22d5
Fix UERRA
Dec 7, 2021
3617d02
Address Remi comments
Dec 7, 2021
b5383cf
Explicit imports
Dec 7, 2021
d2278bb
Skip certificate check in NASA datasets
Dec 7, 2021
1135e83
Fix ESACCI-OZONE download
Dec 7, 2021
5556e12
Fix ESACCI-OZONE coordinate
Dec 7, 2021
668eb9f
changed cmorizer paths in recipe docs
Dec 8, 2021
2ec941b
writing cmorizer instructions updated
Dec 9, 2021
ad35405
Update dataset.rst
remi-kazeroni Dec 9, 2021
c9a5dbd
obtaining input doc updated
Dec 9, 2021
fed8c5d
making dataset doc updated
Dec 9, 2021
8fe867f
Merge branch 'refactor_downloads' of https://github.com/ESMValGroup/E…
Dec 10, 2021
feb1613
Update dataset.rst
remi-kazeroni Dec 10, 2021
63a8097
Update dataset.rst
remi-kazeroni Dec 10, 2021
fea8ab1
Update input.rst
remi-kazeroni Dec 10, 2021
37b51d3
Update dataset.rst
remi-kazeroni Dec 10, 2021
f4b7d5f
Fix Remi comments
Dec 10, 2021
0631027
Update esmvaltool/cmorizers/data/downloaders/datasets/isccp_fh.py
Dec 10, 2021
7259d67
Update esmvaltool/cmorizers/data/datasets.yml
Dec 10, 2021
6b16ec6
Update esmvaltool/cmorizers/data/formatters/datasets/era_interim.py
Dec 10, 2021
6c406ad
Fix utilities.ncl
Dec 10, 2021
5bafa5d
Add download instructions
Dec 10, 2021
26d8d0e
Update doc/sphinx/source/community/dataset.rst
remi-kazeroni Dec 10, 2021
5dd8e32
Apply suggestions from Javi
remi-kazeroni Dec 10, 2021
4a42db7
Update dataset.rst
remi-kazeroni Dec 10, 2021
81c7102
Update dataset.rst
remi-kazeroni Dec 10, 2021
38a52b1
Update input.rst
remi-kazeroni Dec 10, 2021
e3769de
headers fixed
Dec 15, 2021
984ada3
typo fixed
Dec 16, 2021
b16eb76
function doc updated
Dec 16, 2021
c7ffb73
Update cds_satellite_albedo.py
remi-kazeroni Dec 16, 2021
2b4a37d
esacci-lst cmorizer added
Jan 26, 2022
25eef89
Merge branch 'main' into refactor_downloads
Jan 26, 2022
25c48d6
fix flake8
Jan 26, 2022
92a66e3
fix woa formatter
Jan 27, 2022
e7f9255
decrease yamllint errors
Jan 27, 2022
835b11a
delete unused file
Jan 27, 2022
8b63a79
Merge branch 'main' into refactor_downloads
Jan 28, 2022
c5dbf38
update esacci-oc downloader
Jan 28, 2022
39ff992
cleanup
Jan 28, 2022
3fee425
note on netrc usage
Jan 31, 2022
74280db
Update input.rst
remi-kazeroni Jan 31, 2022
febb188
docstring fixed
Feb 1, 2022
0a1c30a
codespell issues fixed
Feb 2, 2022
5441a85
remove unused args
Feb 2, 2022
82f5f0f
fix formatting issues
Feb 2, 2022
bb52bb0
docsring fixed
Feb 4, 2022
74a21eb
formatting fixed
Feb 4, 2022
2932100
fix some pylint
Feb 4, 2022
af437cb
fix docstring
Feb 4, 2022
90fe442
Improve some formatting
Feb 4, 2022
b2acb8b
fix f-string in commons
Feb 4, 2022
03cf0b7
Add explicit parameter names
Feb 4, 2022
7e3c632
Remove local disables for invalid name
Feb 4, 2022
df9d535
Add explicit parameter names
Feb 4, 2022
1163bbf
Remove local disables for invalid-name
Feb 4, 2022
42da3f5
improve formatting in formatter scripts
Feb 4, 2022
02c2087
Merge branch 'refactor_downloads' of https://github.com/ESMValGroup/E…
Feb 4, 2022
017b061
fix import
Feb 7, 2022
d0307d7
Disable some pylint messages to be able to check for more relevant ones
Feb 8, 2022
488de84
Disable pylint in prospector since it is run on its own
Feb 8, 2022
4c10edc
Simplify formatting of pylint config (no change in config)
Feb 8, 2022
a8b7eea
Move pylint config from pyproject.toml to .pylintrc
Feb 8, 2022
c709dc0
Disable duplicate-code which detects a lot of similar lines
Feb 8, 2022
1aaf8cf
Simplify prospector config formatting
Feb 8, 2022
92b6758
Align prospector check with yapf formatting
Feb 8, 2022
d5a8a25
Simple fixes
Feb 8, 2022
2f722a8
Fix variable names in eppley cmorizer
Feb 8, 2022
69efc96
Fix variable names in esacci_oc cmorizer
Feb 8, 2022
f63e12b
Merge branch 'main' into refactor_downloads
Feb 8, 2022
ce96e89
change default start date
Feb 9, 2022
43d2a5f
fix download dir structure
Feb 9, 2022
6c4e47f
remove file cleaning from cmorizers
Feb 9, 2022
6ed7c2e
fix flake8
Feb 9, 2022
4b75630
fix some codacy issues
Feb 9, 2022
7e38dde
Fixed Scripps-CO2-KUM for new version
schlunma Feb 9, 2022
6e88021
reduce codacy issues
Feb 10, 2022
fb9574e
fix flake8 in phc
Feb 10, 2022
631a2e6
Update cds_uerra.py
remi-kazeroni Feb 10, 2022
da3fbf7
fix config in formatter class
Feb 11, 2022
6e2f3f9
fix cube loading
Feb 17, 2022
9c76899
Merge branch 'main' into refactor_downloads
Feb 17, 2022
b80684a
parse date improvements
Feb 17, 2022
58dd382
fix codacy issues
Feb 18, 2022
50b20ed
fix codacy esacci_watervapour
Feb 18, 2022
16c9645
Merge branch 'main' into refactor_downloads
Feb 18, 2022
6db278d
change prepare behaviour
Feb 22, 2022
acacebb
Update esmvaltool/cmorizers/data/formatters/datasets/esacci_sst.py
remi-kazeroni Feb 24, 2022
c6d715a
fix and align exceptions
Feb 24, 2022
7e8b14c
warning message added
Feb 24, 2022
1390702
exception for unsupported datetime formats addedd
Feb 25, 2022
1a2d1a7
revert pylint and prospector changes
Feb 25, 2022
3dd2e1c
fix exception
Feb 25, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 33 additions & 6 deletions doc/sphinx/source/community/dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,15 @@ Dataset documentation
The documentation required for a CMORizer script is the following:

- Make sure that the new dataset is added to the list of
:ref:`supported_datasets`
:ref:`supported_datasets` and to the file datasets.yml_.
- The in code documentation should contain clear instructions on how to obtain
the data
the data.
- A BibTeX file named ``<dataset>.bibtex`` defining the reference for the new
dataset should be placed in the directory ``esmvaltool/references/``, see
:ref:`adding_references` for detailed instructions.

.. _datasets.yml: https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/datasets.yml

For more general information on writing documentation, see :ref:`documentation`.

.. _dataset-test:
Expand All @@ -37,7 +39,9 @@ To test a pull request for a new CMORizer script:

#. Download the data following the instructions included in the script and place
it in the ``RAWOBS`` path specified in your ``config-user.yml``
#. Run the CMORizer script by running ``cmorize_obs -c <config-file> -o <dataset>``
#. If available, use the downloading script by running
``esmvaltool data download --config_file <config-file> <dataset>``
#. Run the cmorization by running ``esmvaltool data format <config-file> <dataset>``
#. Copy the resulting data to the ``OBS`` (for CMIP5 compliant data) or ``OBS6``
(for CMIP6 compliant data) path specified in your
``config-user.yml``
Expand Down Expand Up @@ -74,6 +78,8 @@ Dataset description
Check that new dataset has been added to the table of observations defined in
the ESMValTool guide user’s guide in section :ref:`inputdata`
(generated from ``doc/sphinx/source/input.rst``).
Check that the new dataset has also been added to the file `datasets.yml
<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/datasets.yml>`__.

BibTeX info file
----------------
Expand All @@ -87,11 +93,24 @@ recipe_check_obs.yml
Check that new dataset has been added to the testing recipe
``esmvaltool/recipes/examples/recipe_check_obs.yml``

Downloader script
-----------------

If present, check that the new downloader script
``esmvaltool/cmorizers/data/downloaders/datasets/<dataset>.py``
meets standards.
This includes the following items:

* Code quality checks

1. Code quality
2. No Codacy errors reported

CMORizer script
---------------

Check that the new CMORizer script
``esmvaltool/cmorizers/obs/cmorize_obs_<dataset>.{py,ncl}``
``esmvaltool/cmorizers/data/formatters/datasets/<dataset>.{py,ncl}``
meets standards.
This includes the following items:

Expand All @@ -110,13 +129,21 @@ Config file
-----------

If present, check config file ``<dataset>.yml`` in
``esmvaltool/cmorizers/obs/cmor_config/`` for correctness.
``esmvaltool/cmorizers/data/cmor_config/`` for correctness.
Use ``yamllint`` to check for syntax errors and common mistakes.

Run downloader script
---------------------

If available, make sure the downloader script is working by running
``esmvaltool data download --config_file <config-file> <dataset>``


Run CMORizer
------------

Make sure CMORizer is working by running ``cmorize_obs -c <config-file> -o <dataset>``
Make sure CMORizer is working by running
``esmvaltool data format --config_file <config-file> <dataset>``

Check output of CMORizer
------------------------
Expand Down
124 changes: 91 additions & 33 deletions doc/sphinx/source/develop/dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ data set for the use in ESMValTool.
| `1. Check if your variable is CMOR standard`_
| `2. Edit your configuration file`_
| `3. Store your dataset in the right place`_
| `3.1 Downloader script (optional)`_
| `4. Create a cmorizer for the dataset`_
| `4.1 Cmorizer script written in python`_
| `4.2 Cmorizer script written in NCL`_
Expand Down Expand Up @@ -75,14 +76,62 @@ for downloading (e.g. providing contact information, licence agreements)
and using the observations. The unformatted (raw) observations
should then be stored then in the appropriate of these three folders.

For each additional dataset, an entry needs to be made to the file
`datasets.yml
<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/datasets.yml>`_.
The dataset entry should contain:

- the correct ``tier`` information;
- the ``source`` of the raw data;
- the ``last_access`` date;
- the ``info`` that explain how to download the data.

Note that these fields should be identical to the content of the header
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep that info in the header of the cmorizer script?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good question. I don't have a strong opinion. Maybe having the info in one place (datasets.yml) is enough. But that needs a bit of extra work to finish the PR.

of the cmorizing script (see Section `4. Create a cmorizer for the dataset`_).

3.1 Downloader script (optional)
--------------------------------

A Python script can be written to download raw observations
from source and store the data in the appropriate tier subdirectory of the
folder ``RAWOBS`` automatically.
There are many downloading scripts available in
`/esmvaltool/cmorizers/data/downloaders/datasets/
<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/data/downloaders/datasets/>`_
where several data download mechanisms are provided:

- A `wget` get based downloader for http(s) downloads, with a specific derivation for NASA datasets.
- A `ftp` downloader with a specific derivation for ESACCI datasets available from CEDA.
- A Climate Data Store downloader based on `cdsapi`.

Note that the name of this downloading script has to be identical to the
name of the dataset.

Depending on the source server, the downloading script needs to contain paths to
raw observations, filename patterns and various necessary fields to retrieve
the data.
Default ``start_date`` and ``end_date`` can be provided in cases where raw data
are stored in daily, monthly, and yearly files.

The downloading script for the given dataset can be run with:

.. code-block:: console

esmvaltool data download --config_file <config-user.yml> <dataset-name>

.. note::
The options ``--start`` and ``--end`` can be added to the command above to
restrict the download of raw data to a time range. They will be ignored is a specific dataset
does not support it (i.e. because it is provided as a single file).

4. Create a cmorizer for the dataset
====================================

There are many cmorizing scripts available in `/esmvaltool/cmorizers/obs/
<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/obs/>`_
There are many cmorizing scripts available in
`/esmvaltool/cmorizers/data/formatters/datasets/
<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/formatters/datasets/>`_
where solutions to many kinds of format issues with observational data are
addressed. Most of these scripts are written in NCL at the moment, but more
and more examples for Python-based cmorizing scripts become available.
addressed. These scripts are either written in Python or in NCL.

.. note::
NCL support will terminate soon, so new cmorizer scripts should preferably be
Expand All @@ -99,16 +148,17 @@ and one written in NCL, are explained in more detail.
-------------------------------------

Find here an example of a cmorizing script, written for the ``MTE`` dataset
that is available at the MPI for Biogeochemistry in Jena: `cmorize_obs_mte.py
<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/obs/cmorize_obs_mte.py>`_.
that is available at the MPI for Biogeochemistry in Jena: `mte.py
<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/formatters/datasets/mte.py>`_.

All the necessary information about the dataset to write the filename
correctly, and which variable is of interest, is stored in a separate
configuration file: `MTE.yml
<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/obs/cmor_config/MTE.yml>`_
in the directory ``ESMValTool/esmvaltool/cmorizers/obs/cmor_config/``. Note
that the name of this configuration file has to be identical to the name of
your data set. It is recommended that you set ``project`` to ``OBS6`` in the
<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/cmor_config/MTE.yml>`_
in the directory ``ESMValTool/esmvaltool/cmorizers/data/cmor_config/``. Note
that both the name of this configuration file and the cmorizing script have to be
identical to the name of your dataset.
It is recommended that you set ``project`` to ``OBS6`` in the
configuration file. That way, the variables defined in the CMIP6 CMOR table,
augmented with the custom variables described above, are available to your script.

Expand All @@ -124,28 +174,31 @@ If a single dataset has more than one reference,
it is possible to add tags as a list e.g. ``reference: ['tag1', 'tag2']``.
The third part in the configuration file defines the variables that are supposed to be cmorized.

The actual cmorizing script ``cmorize_obs_mte.py`` consists of a header with
The actual cmorizing script ``mte.py`` consists of a header with
information on where and how to download the data, and noting the last access
of the data webpage.

The main body of the CMORizer script must contain a function called

.. code-block:: python

def cmorization(in_dir, out_dir, cfg, config_user):
def cmorization(in_dir, out_dir, cfg, cfg_user, start_date, end_date):

with this exact call signature. Here, ``in_dir`` corresponds to the input
directory of the raw files, ``out_dir`` to the output directory of final
reformatted data set and ``cfg`` to the configuration dictionary given by
the ``.yml`` configuration file. The return value of this function is ignored. All
reformatted data set, ``cfg`` to the dataset-specific configuration file,
``cfg_user`` to the user configuration file, ``start_date`` to the start
of the period to format, and ``end_date`` to the end of the period to format.
If not needed, the last three arguments can be ignored using underscores.
The return value of this function is ignored. All
the work, i.e. loading of the raw files, processing them and saving the final
output, has to be performed inside its body. To simplify this process, ESMValTool
provides a set of predefined utilities.py_, which can be imported into your CMORizer
by

.. code-block:: python

from . import utilities as utils
from esmvaltool.cmorizers.data import utilities as utils

Apart from a function to easily save data, this module contains different kinds
of small fixes to the data attributes, coordinates, and metadata which are
Expand All @@ -157,16 +210,16 @@ that code style). For example, the function ``_get_filepath`` converts the raw
filepath to the correct one and the function ``_extract_variable`` extracts and
saves a single variable from the raw data.

.. _utilities.py: https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/obs/utilities.py
.. _utilities.py: https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/utilities.py


4.2 Cmorizer script written in NCL
----------------------------------

Find here an example of a cmorizing script, written for the ``ESACCI XCH4``
dataset that is available on the Copernicus Climate Data Store:
`cmorize_obs_cds_xch4.ncl
<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/obs/cmorize_obs_cds_xch4.ncl>`_.
`cds_xch4.ncl
<https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/formatters/datasets/cds_xch4.ncl>`_.

The first part of the script collects all the information about the dataset
that are necessary to write the filename correctly and to understand which
Expand All @@ -183,20 +236,14 @@ CMOR_TABLE.
through the loading of the script interface.ncl_. There are similar
functions available for python scripts.

.. _interface.ncl: https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/obs/interface.ncl
.. _interface.ncl: https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/formatters/interface.ncl

.. _utilities.ncl: https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/obs/utilities.ncl
.. _utilities.ncl: https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/cmorizers/data/formatters/utilities.ncl

In the second part of the script each variable defined in ``VAR`` is separately
extracted from the original data file and processed. Most parts of the code are
commented, and therefore it should be easy to follow. ESMValTool provides a set
of predefined utilities.ncl_, which can be imported into your CMORizer
by

.. code-block:: NCL

loadscript(getenv("esmvaltool_root") + "/esmvaltool/cmorizers/obs/utilities.ncl")

of predefined utilities.ncl_, which are imported by default into your CMORizer.
This module contains different kinds of small fixes to the data attributes,
coordinates, and metadata which are necessary for the data field to be
CMOR-compliant.
Expand All @@ -208,20 +255,31 @@ The cmorizing script for the given dataset can be run with:

.. code-block:: console

cmorize_obs -c <config-user.yml> -o <dataset-name>
esmvaltool data format --config_file <config-user.yml> <dataset-name>


.. note::

The output path given in the configuration file is the path where
your cmorized dataset will be stored. The ESMValTool will create a folder
with the correct tier information (see Section `2. Edit your configuration file`_) if that tier folder is not
already available, and then a folder named after the data set. In this
folder the cmorized data set will be stored as a netCDF file.
with the correct tier information
(see Section `2. Edit your configuration file`_) if that tier folder is not
already available, and then a folder named after the dataset.
In this folder the cmorized data set will be stored as a NetCDF file.
The cmorized dataset will be automatically moved to the correct tier
subfolder of your OBS or OBS6 directory if the option
``--install=True`` is used in the command above and no such directory
was already created.

If your run was successful, one or more NetCDF files are produced in your
output directory.

If a downloading script is available for the dataset, the downloading and
the cmorizing scripts can be run in a single command with:

.. code-block:: console

esmvaltool data prepare --config_file <config-user.yml> <dataset-name>

6. Naming convention of the observational data files
====================================================
Expand Down Expand Up @@ -265,8 +323,8 @@ The different parts of the name are explained in more detail here:
(``mon``);
- xch4: Is the name of the variable. Each observational data file is supposed
to only include one variable per file;
- 200301-201612: Is the period the dataset spans with ``200301`` being the
start year and month, and ``201612`` being the end year and month;
- 200301-201812: Is the period the dataset spans with ``200301`` being the
start year and month, and ``201812`` being the end year and month;

.. note::
There is a different naming convention for ``obs4MIPs`` data (see the exact
Expand Down
54 changes: 51 additions & 3 deletions doc/sphinx/source/input.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ or, if you need longer term access or more computational resources, the

If the options above are not available to you, ESMValTool also offers a feature
to make it easy to download CMIP6, CMIP5, CMIP3, CORDEX, and obs4MIPs from ESGF.
ESMValTool also provides support to download some observational dataset from source.

The chapter in the ESMValCore documentation on
:ref:`finding data <esmvalcore:findingdata>` explains how to
Expand Down Expand Up @@ -71,7 +72,54 @@ Observations

Observational and reanalysis products in the standard CF/CMOR format used in CMIP and required by the ESMValTool are available via the obs4MIPs and ana4mips projects at the ESGF (e.g., https://esgf-data.dkrz.de/projects/esgf-dkrz/). Their use is strongly recommended, when possible.

Other datasets not available in these archives can be obtained by the user from the respective sources and reformatted to the CF/CMOR standard. ESMValTool currently support two ways to perform this reformatting (aka 'CMORization'). The first is to use a CMORizer script to generate a local pool of reformatted data that can readily be used by the ESMValTool. The second way is to implement specific 'fixes' for your dataset. In that case, the reformatting is performed 'on the fly' during the execution of an ESMValTool recipe (note that one of the first preprocessor tasks is 'CMOR checks and fixes'). Below, both methods are explained in more detail.
Other datasets not available in these archives can be obtained by the user from the respective sources
and reformatted to the CF/CMOR standard.
The list of datasets supported by ESMValTool can be obtained with:

.. code-block:: bash

esmvaltool data list

Datasets for which auto-download is supported can be downloaded with:

.. code-block:: bash

esmvaltool data download --config_file [CONFIG_FILE] [DATASET_LIST]

Note that all Tier3 and some Tier2 datasets for which auto-download is supported
will require an authentification. In such cases enter your credentials in your
``~/.netrc`` file as explained
`here <https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html>`_.

An entry to the ``~/.netrc`` should look like:

.. code-block:: bash

machine [server_name] login [user_name] password [password]

Make sure that the permissions of the ``~/.netrc`` file are set so only you and administrators
can read it, i.e.

.. code-block:: bash

chmod 600 ~/.netrc
ls -l ~/.netrc

The latter command should show ``-rw-------``.

For other datasets, downloading instructions can be obtained with:

.. code-block:: bash

esmvaltool data info [DATASET]

ESMValTool currently support two ways to perform this reformatting (aka 'CMORization').
The first is to use a CMORizer to generate a local pool of reformatted data that can
readily be used by the ESMValTool.
The second way is to implement specific 'fixes' for your dataset.
In that case, the reformatting is performed 'on the fly' during the execution of an ESMValTool
recipe (note that one of the first preprocessor tasks is 'CMOR checks and fixes').
Below, both methods are explained in more detail.

Using a CMORizer script
-----------------------
Expand All @@ -89,7 +137,7 @@ To CMORize one or more datasets, run:

.. code-block:: bash

cmorize_obs -c [CONFIG_FILE] -o [DATASET_LIST]
esmvaltool data format --config_file [CONFIG_FILE] [DATASET_LIST]

The path to the raw data to be CMORized must be specified in the
:ref:`user configuration file<config-user>` as RAWOBS.
Expand Down Expand Up @@ -117,7 +165,7 @@ may be ``sat`` (satellite data), ``reanaly`` (reanalysis data),
``ground`` (ground observations), ``clim`` (derived climatologies),
``campaign`` (aircraft campaign).

At the moment, cmorize_obs supports Python and NCL scripts.
At the moment, ``esmvaltool data format`` supports Python and NCL scripts.

.. _cmorization_as_fix:

Expand Down
Loading