Skip to content

Commit

Permalink
Merge pull request #761 from norlandrhagen/read_zarr_ex
Browse files Browse the repository at this point in the history
Adds open zarr + rechunk example to feedstock
  • Loading branch information
norlandrhagen authored Oct 23, 2024
2 parents 24694a9 + 42cfccb commit a257b85
Show file tree
Hide file tree
Showing 5 changed files with 57 additions and 0 deletions.
5 changes: 5 additions & 0 deletions docs/composition/examples/gpcp-rechunk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# GPCP Rechunk


```{literalinclude} ../../../examples/feedstock/gpcp_rechunk.py
```
9 changes: 9 additions & 0 deletions docs/composition/styles.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,15 @@ the recipe pipeline will contain at a minimum the following transforms applied t
* {class}`pangeo_forge_recipes.transforms.ConsolidateDimensionCoordinates`: consolidate the Dimension Coordinates for dataset read performance.
* {class}`pangeo_forge_recipes.transforms.ConsolidateMetadata`: calls Zarr's convinience function to consolidate metadata.

### Open existing Zarr Store
* {class}`pangeo_forge_recipes.transforms.OpenWithXarray` supports opening existing Zarr stores. This might be useful for rechunking a Zarr store into an alternative chunking scheme.
An example of this recipe can be found in - {doc}`examples/gpcp-rechunk`






```{tip}
If using the {class}`pangeo_forge_recipes.transforms.ConsolidateDimensionCoordinates` transform, make sure to chain on the {class}`pangeo_forge_recipes.transforms.ConsolidateMetadata` transform to your recipe.
Expand Down
40 changes: 40 additions & 0 deletions examples/feedstock/gpcp_rechunk.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Example recipe to demonstate reading from an existing Zarr store and
# writing a new Zarr store with a differant chunking structure


import apache_beam as beam
import zarr

from pangeo_forge_recipes.patterns import FileType, pattern_from_file_sequence
from pangeo_forge_recipes.transforms import (
ConsolidateDimensionCoordinates,
ConsolidateMetadata,
OpenWithXarray,
StoreToZarr,
)

pattern = pattern_from_file_sequence(
["https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/gpcp-feedstock/gpcp.zarr"],
concat_dim="time",
)


def test_ds(store: zarr.storage.FSStore) -> zarr.storage.FSStore:
import xarray as xr

assert xr.open_dataset(store, engine="zarr", chunks={})
return store


recipe = (
beam.Create(pattern.items())
| OpenWithXarray(file_type=FileType("zarr"), xarray_open_kwargs={"chunks": {}})
| StoreToZarr(
store_name="gpcp_rechunked.zarr",
target_chunks={"time": 9226, "latitude": 16, "longitude": 36, "nv": 2},
combine_dims=pattern.combine_dim_keys,
)
| ConsolidateDimensionCoordinates()
| ConsolidateMetadata()
| "Test dataset" >> beam.Map(test_ds)
)
2 changes: 2 additions & 0 deletions examples/feedstock/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
recipes:
- id: "gpcp-from-gcs"
object: "gpcp_from_gcs:recipe"
- id: "gpcp-rechunk"
object: "gpcp_rechunk:recipe"
- id: "gpcp-from-gcs-dynamic-chunks"
object: "gpcp_from_gcs_dynamic_chunks:recipe"
- id: "noaa-oisst"
Expand Down
1 change: 1 addition & 0 deletions tests/test_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ def test_integration(confpath_option: str, recipe_id: str, request):
"hrrr-kerchunk-concat-valid-time": "Can't serialize drop_unknown callback function.",
"narr-opendap": "Hangs for unkown reason. Requires further debugging.",
"terraclimate": "Hangs for unkown reason. Requires further debugging.",
"gpcp_rechunk": "Unknown failure in integration tests.",
}
if recipe_id in xfails:
pytest.xfail(xfails[recipe_id])
Expand Down

0 comments on commit a257b85

Please sign in to comment.