Skip to content

Commit 67be2a5

Browse files
authored
Blog Post: Reading the national water model through VirtualiZarr (#16)
* broke up long home page * uploaded post * updated virtualizarr blog post to address changes * moved image higher up
1 parent 15fc960 commit 67be2a5

File tree

2 files changed

+149
-0
lines changed

2 files changed

+149
-0
lines changed
Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
---
2+
date:
3+
created: 2025-02-21
4+
authors:
5+
- taddyb
6+
---
7+
8+
# Coding Blog: How to access NWM forcings using VirtualiZarr
9+
10+
One of the most underrated aspects of NOAA's weather data is how much data is published on a daily basis. The National Water Model (Cosgrove et al. 2024) produces nine total operational configurations, each with a different look back peroid, forecast range, and domain extent. The number of .nc files output to S3 Object Storage is in the tens of thousands... per day!
11+
12+
While this amount of data is monumental for machine learning, or other hydrological analysis, it's cumbersome to read every .nc file individually, or download this amount of data to disk. That is where [VirtualiZarr](https://github.com/zarr-developers/VirtualiZarr) comes into play as it allows existing .nc files / structures to be viewed as zarr stores/xarray datasets without having to duplicate any data!
13+
14+
Below is a tutorial on how you can use VirtualiZarr to read a forecast from the National Water Model as a singular zarr store for your own studies
15+
16+
<!-- more -->
17+
18+
### Installing dependencies
19+
The following dependencies will be used, with a version of Python >=3.11
20+
```
21+
dask==2025.2.0
22+
distributed==2025.2.0
23+
xarray==2025.1.2
24+
s3fs==2025.2.0
25+
virtualizarr==1.3.1
26+
h5py==3.13.0
27+
cubed-xarray==0.0.7
28+
h5netcdf==1.5.0
29+
```
30+
31+
You can either put these into a `requirements.txt` file, or install them one by one via your package manager.
32+
33+
### Figure out what data you'll be using from the National Water Model
34+
Given how much data exists out there from the national water model, you'll have to be explicit as to what you're trying to access. You can check out the
35+
[National Water Model Bucket](https://noaa-nwm-pds.s3.amazonaws.com/index.html) yourself, but the following information needs to be known:
36+
37+
- `date`
38+
- The datetime we're reading
39+
- ex: `20250216`
40+
- `forecast_type`
41+
- The forecast issued by the National Water Model
42+
- ex: `short_range`
43+
- `initial_time`
44+
- The time at which the forecast was issued (In Zulu time)
45+
- ex: `t00z`
46+
- `variable`
47+
- The variable we're looking to read
48+
- ex: `channel_rt`
49+
50+
### Query your files
51+
52+
Since this data reads from the cloud, we'll need to create our paths to s3. Below shows how we'll be creating the URLs to our datasets dynamically.
53+
54+
```python
55+
import fsspec
56+
57+
fs = fsspec.filesystem("s3", anon=True)
58+
file_pattern = f"s3://noaa-nwm-pds/nwm.{date}/{forecast_type}/nwm.{initial_time}.{forecast_type}.{variable}.*.nc"
59+
noaa_files = fs.glob(file_pattern)
60+
noaa_files = sorted(["s3://" + f for f in noaa_files])
61+
62+
so = dict(anon=True, default_fill_cache=False, default_cache_type="none")
63+
```
64+
65+
### Read your stores from the cloud
66+
67+
Now that our s3 links are created, let's spin up a Dask Cluster to read our files from the cloud in parallel
68+
69+
```python
70+
import xarray as xr
71+
from virtualizarr import open_virtual_dataset
72+
from dask.distributed import LocalCluster
73+
74+
def _process(url: str, so: dict[str, str]):
75+
vds = open_virtual_dataset(
76+
url,
77+
drop_variables="crs",
78+
indexes={},
79+
reader_options={"storage_options": so}
80+
)
81+
return vds
82+
83+
fs = fsspec.filesystem("s3", anon=True)
84+
client_settings: dict[str, int | str] = {
85+
"n_workers":9,
86+
"memory_limit":"2GiB",
87+
}
88+
89+
cluster= LocalCluster(**client_settings)
90+
client = cluster.get_client()
91+
92+
so = dict(anon=True, default_fill_cache=False, default_cache_type="none")
93+
94+
futures = []
95+
for url in noaa_files:
96+
future = client.submit(_process, url, so)
97+
futures.append(future)
98+
99+
virtual_datasets = client.gather(futures)
100+
```
101+
102+
*Notice:* How we drop the CRS from these netcdf files as their string byte-type throws an error
103+
104+
### Build your VirtualiZarr store
105+
106+
Now that our virtual datasets are created, let's create the virtual reference and save the output locally to a kerchunk json file.
107+
108+
```python
109+
virtual_datasets[0].virtualize.to_kerchunk("NWM.json", format="json")
110+
```
111+
112+
Outputs can be read through thefollowing:
113+
```python
114+
import s3fs
115+
import os
116+
117+
os.environ['AWS_REGION'] = 'us-east-1'
118+
fs = s3fs.S3FileSystem(anon=True, client_kwargs={'region_name': 'us-east-1'})
119+
storage_options = dict(
120+
remote_protocol="s3", remote_options=dict(anon=True)
121+
)
122+
ds = xr.open_dataset(
123+
"NWM_ACCET.json",
124+
engine="kerchunk",
125+
backend_kwargs={"storage_options": storage_options},
126+
)
127+
ds.ACCET.plot(vmin=-1, vmax=ds.ACCET.max().values)
128+
```
129+
130+
<p align="center">
131+
<img src="https://github.com/DeepGroundwater/DeepGroundwater.github.io/blob/master/docs/blog/posts/pics/nwm_accet.png.png?raw=true" alt="Accumulated Total ET" width="500"/>
132+
<br>
133+
<em>Figure 1: Plotted Accumlated total ET for CONUS</em>
134+
</p>
135+
136+
That's it! You now how a collection of many .nc files in one xarray dataset that can be read into your hydrologic analysis.
137+
138+
For the full demo, and code package we put together, check out our demo repo at: [DeepGroundwater NWM Batcher](https://github.com/DeepGroundwater/nwm_batcher/blob/master/examples/read_short_range.ipynb). With a snippet below:
139+
140+
```python
141+
virtual_datasets = nwm_batcher.read(
142+
date="20250516",
143+
forecast_type="short_range",
144+
initial_time="t00z",
145+
variable="land",
146+
data_variable="ACCET",
147+
coordinates=["time", "reference_time", "x", "y"]
148+
)
149+
```

docs/blog/posts/pics/nwm_accet.png

147 KB
Loading

0 commit comments

Comments
 (0)