-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add xCDAT tutorial datasets and update gallery notebooks #705
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #705 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 15 16 +1
Lines 1621 1658 +37
=========================================
+ Hits 1621 1658 +37 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a540a3a
to
92204be
Compare
376b1cc
to
696b0f4
Compare
For some of these examples, we probably need to host some ESGF datasets in a The added benefit of this approach is that we can use real-world datasets and it can help standardize our approach to testing. |
My proposed solution
# Gentle Introduction
* "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"
# xCDAT utilities
* "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_187001_189412.nc"
* "https://esgf-data2.llnl.gov/thredds/dodsC/user_pub_work/E3SM/1_0/amip_1850_aeroF/1deg_atm_60-30km_ocean/atmos/180x360/time-series/mon/ens2/v3/TS_189501_191912.nc",
# Spatial Averaging
* "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"
* "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/pr/gn/v20200605/pr_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"
# Temporal Averaging
* "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"
* "https://esgf-data1.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/3hr/tas/gn/v20200605/tas_3hr_ACCESS-ESM1-5_historical_r10i1p1f1_gn_201001010300-201501010000.nc"
# Climatologies and departures
* "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/Amon/tas/gn/v20200605/tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412.nc"
# This dataset should not be downloaded. We can subset
* "http://esgf.nci.org.au/thredds/dodsC/master/CMIP6/CMIP/CSIRO/ACCESS-ESM1-5/historical/r10i1p1f1/3hr/tas/gn/v20200605/tas_3hr_ACCESS-ESM1-5_historical_r10i1p1f1_gn_201001010300-201501010000.nc"
# Horizontal regridding
* "http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/CCCma/CanESM5/historical/r13i1p1f1/Amon/tas/gn/v20190429/tas_Amon_CanESM5_historical_r13i1p1f1_gn_185001-201412.nc"
* "http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/abrupt-4xCO2/r1i1p1f1/day/tas/gr2/v20180701/tas_day_GFDL-CM4_abrupt-4xCO2_r1i1p1f1_gr2_00010101-00201231.nc"
# Vertical regridding
* "http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/NCAR/CESM2/historical/r1i1p1f1/Omon/so/gn/v20190308/so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc",
* "http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/NCAR/CESM2/historical/r1i1p1f1/Omon/thetao/gn/v20190308/thetao_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412.nc",
* "http://aims3.llnl.gov/thredds/dodsC/css03_data/CMIP6/CMIP/NOAA-GFDL/GFDL-CM4/abrupt-4xCO2/r1i1p1f1/day/tas/gr2/v20180701/tas_day_GFDL-CM4_abrupt-4xCO2_r1i1p1f1_gr2_00010101-00201231.nc"
|
2e46058
to
aaa981a
Compare
f01cc93
to
9d4b61b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @xCDAT/core-developers, I finally finished this PR. This PR updates the Jupyter Notebooks to use datasets from the new repository, xCDAT/xcdat-data. It contains the same datasets previously sourced from ESGF but with reduced file sizes by subsetting on time or lat/lon. Most plots should remain the same or similar to before.
My self-review checks out and I plan on merging by the end of the week. If anybody has time in the next few days, a quick review of the diffs would be great. Otherwise I'll proceed with merging to move-on.
XARRAY_DATASETS = list(file_formats.keys()) + ["era5-2mt-2019-03-uk.grib"] | ||
XCDAT_DATASETS: Dict[str, str] = { | ||
# Monthly precipitation data from the ACCESS-ESM1-5 model. | ||
"pr_amon_access": "pr_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412_subset.nc", | ||
# Monthly ocean salinity data from the CESM2 model. | ||
"so_omon_cesm2": "so_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412_subset.nc", | ||
# Monthly near-surface air temperature from the ACCESS-ESM1-5 model. | ||
"tas_amon_access": "tas_Amon_ACCESS-ESM1-5_historical_r10i1p1f1_gn_185001-201412_subset.nc", | ||
# 3-hourly near-surface air temperature from the ACCESS-ESM1-5 model. | ||
"tas_3hr_access": "tas_3hr_ACCESS-ESM1-5_historical_r10i1p1f1_gn_201001010300-201501010000_subset.nc", | ||
# Monthly near-surface air temperature from the CanESM5 model. | ||
"tas_amon_canesm5": "tas_Amon_CanESM5_historical_r13i1p1f1_gn_185001-201412_subset.nc", | ||
# Monthly ocean potential temperature from the CESM2 model. | ||
"thetao_omon_cesm2": "thetao_Omon_CESM2_historical_r1i1p1f1_gn_185001-201412_subset.nc", | ||
# Monthly cloud fraction data from the E3SM-2-0 model. | ||
"cl_amon_e3sm2": "cl_Amon_E3SM-2-0_historical_r1i1p1f1_gr_185001-189912_subset.nc", | ||
# Monthly air temperature data from the E3SM-2-0 model. | ||
"ta_amon_e3sm2": "ta_Amon_E3SM-2-0_historical_r1i1p1f1_gr_185001-189912_subset.nc", | ||
} | ||
|
||
|
||
def open_dataset( | ||
name: str, | ||
cache: bool = True, | ||
cache_dir: None | str | os.PathLike = DEFAULT_CACHE_DIR_NAME, | ||
add_bounds: List[CFAxisKey] | Tuple[CFAxisKey, ...] | None = ("X", "Y"), | ||
**kargs, | ||
) -> xr.Dataset: | ||
""" | ||
Open a dataset from the online repository (requires internet). | ||
|
||
This function is mostly based on ``xarray.tutorial.open_dataset()`` with | ||
some modifications, including adding missing bounds to the dataset. | ||
|
||
If a local copy is found then always use that to avoid network traffic. | ||
|
||
Available xCDAT datasets: | ||
|
||
* ``"pr_amon_access"``: Monthly precipitation data from the ACCESS-ESM1-5 model. | ||
* ``"so_omon_cesm2"``: Monthly ocean salinity data from the CESM2 model. | ||
* ``"tas_amon_access"``: Monthly near-surface air temperature from the ACCESS-ESM1-5 model. | ||
* ``"tas_3hr_access"``: 3-hourly near-surface air temperature from the ACCESS-ESM1-5 model. | ||
* ``"tas_amon_canesm5"``: Monthly near-surface air temperature from the CanESM5 model. | ||
* ``"thetao_omon_cesm2"``: Monthly ocean potential temperature from the CESM2 model. | ||
* ``"cl_amon_e3sm2"``: Monthly cloud fraction data from the E3SM-2-0 model. | ||
* ``"ta_amon_e3sm2"``: Monthly air temperature data from the E3SM-2-0 model. | ||
|
||
Parameters | ||
---------- | ||
name : str | ||
Name of the file containing the dataset. | ||
e.g. 'tas_amon_access' | ||
cache_dir : path-like, optional | ||
The directory in which to search for and write cached data. | ||
cache : bool, optional | ||
If True, then cache data locally for use on subsequent calls | ||
add_bounds : List[CFAxisKey] | Tuple[CFAxisKey] | None, optional | ||
List or tuple of axis keys for which to add bounds, by default | ||
("X", "Y"). | ||
**kargs : dict, optional | ||
Passed to ``xcdat.open_dataset``. | ||
""" | ||
try: | ||
import pooch | ||
except ImportError as e: | ||
raise ImportError( | ||
"tutorial.open_dataset depends on pooch to download and manage datasets." | ||
" To proceed please install pooch." | ||
) from e | ||
|
||
# Avoid circular import in __init__.py | ||
from xcdat.dataset import open_dataset | ||
|
||
logger = pooch.get_logger() | ||
logger.setLevel("WARNING") | ||
|
||
cache_dir = _construct_cache_dir(cache_dir) | ||
|
||
filename = XCDAT_DATASETS.get(name) | ||
if filename is None: | ||
raise ValueError( | ||
f"Dataset {name} not found. Available xcdat datasets are: {XCDAT_DATASETS.keys()}" | ||
) | ||
|
||
path = pathlib.Path(filename) | ||
url = f"{base_url}/raw/{version}/{path.name}" | ||
|
||
headers = {"User-Agent": f"xcdat {sys.modules['xcdat'].__version__}"} | ||
downloader = pooch.HTTPDownloader(headers=headers) | ||
|
||
filepath = pooch.retrieve( | ||
url=url, known_hash=None, path=cache_dir, downloader=downloader | ||
) | ||
ds = open_dataset(filepath, **kargs, add_bounds=add_bounds) | ||
|
||
if not cache: | ||
ds = ds.load() | ||
pathlib.Path(filepath).unlink() | ||
|
||
return ds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new tutorial.py
module with xcdat.tutorial.open_dataset()
.
@tomvothecoder In my very quick glimpse I don't see any obviously noticeable issues! Notebooks are looking good to me. It's great to leverage xarray's sample datasets so we don't have to maintain our own. Thank you for your work for this PR! |
Thanks for the review @lee1043! I actually decided to create xCDAT sample datasets (https://github.com/xCDAT/xcdat-data) which contain the same ESGF datasets but subsetted. This allows us to keep the same examples in the notebook. I found using the xarray sample datasets resulted in more significant changes in the notebook. |
@tomvothecoder if maintaining our own sample dataset is not a huge effort, I am not oppose on that. Thanks a lot! |
Description
This PR updates the Jupyter Notebooks to use datasets from the new repository, xCDAT/xcdat-data. This repository contains the same datasets previously sourced from ESGF but with reduced file sizes by subsetting on time or lat/lon. Most plots should remain the same or similar to before.
Related Issues
Changes Implemented
xcdat.tutorial
module with thexcdat.tutorial.open_dataset()
function, modeled afterxarray.tutorial.open_dataset()
.xcdat.tutorial.open_dataset()
in the API reference documentation.pooch
as an optional dependency, updating:conda-env/dev.yml
pyproject.toml
Notebooks Checklist
climatology-and-departures.ipynb
general-utilities.ipynb
introduction-to-xcdat.ipynb
parallel-computing-with-dask.ipynb
regridding-horizontal.ipynb
regridding-vertical.ipynb
spatial-average.ipynb
temporal-average.ipynb
introduction-to-xcdat.ipynb
Review
Please go through each notebook and compare them side-by-side.
main
): https://xcdat.readthedocs.io/en/main/galleryChecklist
If applicable: