Skip to content

Commit

Permalink
Cleaned up script with load testing.
Browse files Browse the repository at this point in the history
  • Loading branch information
jbusecke committed Jun 3, 2024
1 parent 3baa6a9 commit 5b3b8eb
Show file tree
Hide file tree
Showing 4 changed files with 24 additions and 3 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,4 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
combined_full.json
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,17 @@ We aim to do this:

1. Install the required dependencies via pip
```
mamba create -n esgf-virtual-zarr-data-access python=3.11
mamba activate esgf-virtual-zarr-data-access
pip install -r requirements.txt
```

2. Modify the urls, and the output json filename in `virtual-zarr-script.py`, and run.
2. Modify the urls, and the output json filename in `virtual-zarr-script.py`, and run the script.
```
python virtual-zarr-script.py
```

3. Check that the generated JSON file is readable with xarray
3. Check that the generated JSON file is readable with xarray and average the full dataset (this is also done in the script)

```python
import xarray as xr
Expand All @@ -28,7 +33,6 @@ ds = xr.open_dataset(
chunks={},
)
ds.mean().load() # test that all chunks can be accessed.

```

## Goals
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ kerchunk
xarray
requests
aiohttp
tqdm
dask
14 changes: 14 additions & 0 deletions virtual-zarr-script.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from tqdm.auto import tqdm
from virtualizarr import open_virtual_dataset
import xarray as xr
from dask.diagnostics import ProgressBar

urls = [
"http://aims3.llnl.gov/thredds/fileServer/css03_data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_201501-201912.nc",
Expand All @@ -27,3 +28,16 @@
combine_attrs="drop_conflicts",
)
combined_vds.virtualize.to_kerchunk(json_filename, format="json")

## test load and print the the mean of the output
print(f"Loading the mean of the virtual dataset from {json_filename=}")

ds = xr.open_dataset(
json_filename,
engine='kerchunk',
chunks={},
)
print(f"Dataset before mean: {ds}")
with ProgressBar():
ds_mean = ds.mean().load()
print(ds_mean)

0 comments on commit 5b3b8eb

Please sign in to comment.