Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reference HDF recipe type #174

Merged
merged 26 commits into from
Aug 31, 2021
Merged

Conversation

martindurant
Copy link
Contributor

No description provided.

@rabernat
Copy link
Contributor

The existing test fixtures, e.g.

@pytest.fixture(scope="session", params=["D", "2D"])
def netcdf_local_paths(daily_xarray_dataset, tmpdir_factory, request):
"""Return a list of paths pointing to netcdf files."""

Provide netCDF files that you could use to test this recipe...whenever you're ready.

Let me know how I can help here.

@martindurant martindurant marked this pull request as ready for review August 2, 2021 17:58
@martindurant
Copy link
Contributor Author

Note the interesting facet here: the recipe produces both a combined JSON file for the dataset, and an intake YAML file that knows how to load it.

(NB: TODOs could become issues after merger, but not ones that I intend
to work on immediately)
@martindurant
Copy link
Contributor Author

@rabernat , any thoughts here?
I'd like to get this in and have a go at making a recipe - which recipe, in turn, should be a good test case should we factor out paths and chunk handling logic.

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great start!

My main observation is that you are assuming access to a shared global filesystem. This work for the test environment but won't work in cloud.

It would be great if you could push all i/o through the classes and function in storage.py, e.g. MetadataTarget, etc. That will allow these recipes to easily plug in to the bakeries. I know this may seem inconvenient, but I think there are advantages to having all i/o happen in one place.


def iter_chunks(self) -> Iterable[Hashable]:
"""Iterate over all target chunks."""
return fsspec.open_files(self.netcdf_url, **self.netcdf_storage_options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this return? Open file objects?

Would nice to plug in FilePattern here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns an OpenFiles instance, which will open all contained files when encountering a with, or can be iterated to expose individual OpenFile instances.
What would we do with a FilePattern? We don't have logic to infer coordinates or add them into the chunking scheme. It would be nice in the long run, but tricky.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would we do with a FilePattern?

Mostly the point is to align with the rest of Pangeo Forge.

I would recommend using this instead

def iter_inputs(self) -> Iterator[InputKey]:
for input_key in self.file_pattern:
yield input_key

and then using the input key to look up the url from the FilePattern object, e.g.

pangeo_forge_recipes/recipes/reference_hdf_zarr.py Outdated Show resolved Hide resolved
pangeo_forge_recipes/recipes/reference_hdf_zarr.py Outdated Show resolved Hide resolved
pangeo_forge_recipes/recipes/reference_hdf_zarr.py Outdated Show resolved Hide resolved

def iter_chunks(self) -> Iterable[Hashable]:
"""Iterate over all target chunks."""
return fsspec.open_files(self.netcdf_url, **self.netcdf_storage_options)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns an OpenFiles instance, which will open all contained files when encountering a with, or can be iterated to expose individual OpenFile instances.
What would we do with a FilePattern? We don't have logic to infer coordinates or add them into the chunking scheme. It would be nice in the long run, but tricky.

pangeo_forge_recipes/recipes/reference_hdf_zarr.py Outdated Show resolved Hide resolved
@rabernat
Copy link
Contributor

FYI working on this today.

@rabernat
Copy link
Contributor

@martindurant - I'm working on making a tutorial example for this recipe. I decided to use the CMIP6 netCDF files stored in this directory: s3://esgf-world/CMIP6/OMIP/NOAA-GFDL/GFDL-CM4/omip1/r1i1p1f1/Omon/thetao/gr/v20180701/. I've successfully generated the reference file, but I can't read it back using intake. I'm wondering if you could help me debug this.

The files are here

import s3fs
fs = s3fs.S3FileSystem(anon=True)
base_path = 's3://esgf-world/CMIP6/OMIP/NOAA-GFDL/GFDL-CM4/omip1/r1i1p1f1/Omon/thetao/gr/v20180701/'
all_paths = fs.ls(base_path)

I'm creating the following FilePattern

from pangeo_forge_recipes.patterns import pattern_from_file_sequence
pattern = pattern_from_file_sequence(['s3://' + path for path in all_paths], 'time')

And the following recipe

from pangeo_forge_recipes.recipes.reference_hdf_zarr import ReferenceHDFRecipe
rec = ReferenceHDFRecipe(
    pattern,
    xarray_concat_args={"dim": "time"}  # note that this could be discovered automatically from the pattern
)

I assigned storage as follows

import tempfile
from fsspec.implementations.local import LocalFileSystem
from pangeo_forge_recipes.storage import FSSpecTarget, MetadataTarget

fs_local = LocalFileSystem()

cache_dir = tempfile.TemporaryDirectory()
metadata_cache = MetadataTarget(fs_local, cache_dir.name)

target_dir = tempfile.TemporaryDirectory()
target = FSSpecTarget(fs_local, target_dir.name)

rec.target = target
rec.metadata_cache = metadata_cache

And executed as follows

from dask.diagnostics import ProgressBar
delayed = rec.to_dask()
with ProgressBar():
    # took 1 hr 6 min on my computer 😱
    delayed.compute()

The files it produced (reference.json and reference.yaml) are saved in this gist.

I am able to open the file using fsspec + xarray

import fsspec
m = fsspec.get_mapper(
    "reference://",
    fo=target._full_path("reference.json"),
    target_protocol="file",
    remote_protocol="s3",
    skip_instance_cache=True,
)
ds = xr.open_dataset(m, engine='zarr', backend_kwargs={'consolidated': False}, chunks={})
print(ds)
<xarray.Dataset>
Dimensions:    (lat: 180, time: 3600, bnds: 2, lev: 35, lon: 360)
Coordinates:
  * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lev        (lev) float64 2.5 10.0 20.0 32.5 ... 5e+03 5.5e+03 6e+03 6.5e+03
  * lon        (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * time       (time) object 1708-01-17 03:30:00 ... 1720-07-17 08:30:00
Dimensions without coordinates: bnds
Data variables:
    lat_bnds   (time, lat, bnds) float64 dask.array<chunksize=(240, 180, 2), meta=np.ndarray>
    lev_bnds   (time, lev, bnds) float64 dask.array<chunksize=(240, 35, 2), meta=np.ndarray>
    lon_bnds   (time, lon, bnds) float64 dask.array<chunksize=(240, 360, 2), meta=np.ndarray>
    thetao     (time, lev, lat, lon) float32 dask.array<chunksize=(1, 18, 90, 180), meta=np.ndarray>
    time_bnds  (time, bnds) object dask.array<chunksize=(1, 2), meta=np.ndarray>
Attributes: (12/45)
    Conventions:           CF-1.7 CMIP-6.0 UGRID-1.0
    _nc3_strict:           1
    activity_id:           OMIP
    branch_method:         none provided
    branch_time_in_child:  0.0
    comment:               Experiment name = OM4p25_IAF_BLING_csf_rerun\nFor ...
    ...                    ...
    table_id:              Omon
    title:                 NOAA GFDL GFDL-CM4 model output prepared for CMIP6...
    tracking_id:           hdl:21.14100/97e4edf3-22e7-4e5f-831a-f2a671b7094f
    variable_id:           thetao
    variant_info:          N/A
    variant_label:         r1i1p1f1

🎉 This is great!

However, I can't get the intake method to work. For reference, the intake catalog is just

sources:
  data:
    args:
      chunks: {}
      consolidated: false
      storage_options:
        fo: /var/folders/n8/63q49ms55wxcj_gfbtykwp5r0000gn/T/tmp69uw_f7l/reference.json
        remote_options: {}
        remote_protocol: s3
        skip_instance_cache: true
        target_options: {}
        target_protocol: file
      urlpath: reference://
    description: ''
    driver: intake_xarray.xzarr.ZarrSource
import intake
cat = intake.open_catalog(target._full_path("reference.yaml"))
cat.data.read()

Gives this error

ValueError: cannot reshape array of size 360 into shape (240,180,2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/n8/63q49ms55wxcj_gfbtykwp5r0000gn/T/ipykernel_17531/3745362929.py in <module>
      1 import intake
      2 cat = intake.open_catalog(target._full_path("reference.yaml"))
----> 3 cat.data.read()

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/intake_xarray/base.py in read(self)
     38         """Return a version of the xarray with all the data in memory"""
     39         self._load_metadata()
---> 40         return self._ds.load()
     41 
     42     def read_chunked(self):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/dataset.py in load(self, **kwargs)
    863 
    864             # evaluate all the dask arrays simultaneously
--> 865             evaluated_data = da.compute(*lazy_data.values(), **kwargs)
    866 
    867             for k, data in zip(lazy_data, evaluated_data):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/base.py in compute(*args, **kwargs)
    566         postcomputes.append(x.__dask_postcompute__())
    567 
--> 568     results = schedule(dsk, keys, **kwargs)
    569     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    570 

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
     77             pool = MultiprocessingPoolExecutor(pool)
     78 
---> 79     results = get_async(
     80         pool.submit,
     81         pool._max_workers,

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/local.py in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
    515                             _execute_task(task, data)  # Re-execute locally
    516                         else:
--> 517                             raise_exception(exc, tb)
    518                     res, worker_id = loads(res_info)
    519                     state["cache"][key] = res

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/local.py in reraise(exc, tb)
    323     if exc.__traceback__ is not tb:
    324         raise exc.with_traceback(tb)
--> 325     raise exc
    326 
    327 

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    221     try:
    222         task, data = loads(task_info)
--> 223         result = _execute_task(task, data)
    224         id = get_id()
    225         result = dumps((result, id))

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/dask/array/core.py in getter(a, b, asarray, lock)
    103         # `is_arraylike` evaluates to `True` in that case.
    104         if asarray and (not is_arraylike(c) or isinstance(c, np.matrix)):
--> 105             c = np.asarray(c)
    106     finally:
    107         if lock:

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    352 
    353     def __array__(self, dtype=None):
--> 354         return np.asarray(self.array, dtype=dtype)
    355 
    356     def __getitem__(self, key):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    516 
    517     def __array__(self, dtype=None):
--> 518         return np.asarray(self.array, dtype=dtype)
    519 
    520     def __getitem__(self, key):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    417     def __array__(self, dtype=None):
    418         array = as_indexable(self.array)
--> 419         return np.asarray(array[self.key], dtype=None)
    420 
    421     def transpose(self, order):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/coding/variables.py in __array__(self, dtype)
     68 
     69     def __array__(self, dtype=None):
---> 70         return self.func(self.array)
     71 
     72     def __repr__(self):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/coding/variables.py in _apply_mask(data, encoded_fill_values, decoded_fill_value, dtype)
    135 ) -> np.ndarray:
    136     """Mask all matching values in a NumPy arrays."""
--> 137     data = np.asarray(data, dtype=dtype)
    138     condition = False
    139     for fv in encoded_fill_values:

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/core/indexing.py in __array__(self, dtype)
    417     def __array__(self, dtype=None):
    418         array = as_indexable(self.array)
--> 419         return np.asarray(array[self.key], dtype=None)
    420 
    421     def transpose(self, order):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/xarray/backends/zarr.py in __getitem__(self, key)
     73         array = self.get_array()
     74         if isinstance(key, indexing.BasicIndexer):
---> 75             return array[key.tuple]
     76         elif isinstance(key, indexing.VectorizedIndexer):
     77             return array.vindex[

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in __getitem__(self, selection)
    660 
    661         fields, selection = pop_fields(selection)
--> 662         return self.get_basic_selection(selection, fields=fields)
    663 
    664     def get_basic_selection(self, selection=Ellipsis, out=None, fields=None):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in get_basic_selection(self, selection, out, fields)
    785                                                 fields=fields)
    786         else:
--> 787             return self._get_basic_selection_nd(selection=selection, out=out,
    788                                                 fields=fields)
    789 

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in _get_basic_selection_nd(self, selection, out, fields)
    828         indexer = BasicIndexer(selection, self)
    829 
--> 830         return self._get_selection(indexer=indexer, out=out, fields=fields)
    831 
    832     def get_orthogonal_selection(self, selection, out=None, fields=None):

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in _get_selection(self, indexer, out, fields)
   1123             # allow storage to get multiple items at once
   1124             lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)
-> 1125             self._chunk_getitems(lchunk_coords, lchunk_selection, out, lout_selection,
   1126                                  drop_axes=indexer.drop_axes, fields=fields)
   1127 

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in _chunk_getitems(self, lchunk_coords, lchunk_selection, out, lout_selection, drop_axes, fields)
   1837         for ckey, chunk_select, out_select in zip(ckeys, lchunk_selection, lout_selection):
   1838             if ckey in cdatas:
-> 1839                 self._process_chunk(
   1840                     out,
   1841                     cdatas[ckey],

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in _process_chunk(self, out, cdata, chunk_selection, drop_axes, out_is_ndarray, fields, out_selection, partial_read_decode)
   1743         except ArrayIndexError:
   1744             cdata = cdata.read_full()
-> 1745         chunk = self._decode_chunk(cdata)
   1746 
   1747         # select data from chunk

/opt/miniconda3/envs/pangeo-forge-recipes/lib/python3.9/site-packages/zarr/core.py in _decode_chunk(self, cdata, start, nitems, expected_shape)
   1990         # ensure correct chunk shape
   1991         chunk = chunk.reshape(-1, order='A')
-> 1992         chunk = chunk.reshape(expected_shape or self._chunks, order=self._order)
   1993 
   1994         return chunk

ValueError: cannot reshape array of size 360 into shape (240,180,2)

Any idea what might be going on?

@martindurant
Copy link
Contributor Author

I think I did exactly the same as you and got reasonable output:

In [16]: s = intake_xarray.xzarr.ZarrSource("reference://", storage_options=dict(fo="reference.json", remote_protocol="s3", chunks={}), consolidated=False)

In [17]: s.to_dask()
Out[17]:
<xarray.Dataset>
Dimensions:    (lat: 180, time: 3600, bnds: 2, lev: 35, lon: 360)
Coordinates:
  * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lev        (lev) float64 2.5 10.0 20.0 32.5 ... 5e+03 5.5e+03 6e+03 6.5e+03
  * lon        (lon) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
  * time       (time) object 1708-01-17 03:30:00 ... 1720-07-17 08:30:00
Dimensions without coordinates: bnds
Data variables:
    lat_bnds   (time, lat, bnds) float64 dask.array<chunksize=(240, 180, 2), meta=np.ndarray>
    lev_bnds   (time, lev, bnds) float64 dask.array<chunksize=(240, 35, 2), meta=np.ndarray>
    lon_bnds   (time, lon, bnds) float64 dask.array<chunksize=(240, 360, 2), meta=np.ndarray>
    thetao     (time, lev, lat, lon) float32 dask.array<chunksize=(1, 18, 90, 180), meta=np.ndarray>
    time_bnds  (time, bnds) object dask.array<chunksize=(1, 2), meta=np.ndarray>
Attributes: (12/45)
    Conventions:           CF-1.7 CMIP-6.0 UGRID-1.0
    _nc3_strict:           1
    activity_id:           OMIP
    branch_method:         none provided
    branch_time_in_child:  0.0
    comment:               Experiment name = OM4p25_IAF_BLING_csf_rerun\nFor ...
    ...                    ...
    table_id:              Omon
    title:                 NOAA GFDL GFDL-CM4 model output prepared for CMIP6...
    tracking_id:           hdl:21.14100/97e4edf3-22e7-4e5f-831a-f2a671b7094f
    variable_id:           thetao
    variant_info:          N/A
    variant_label:         r1i1p1f1

or with the catalog file (having edited the url to "{{ CATALOG_DIR }}/reference.json", which is, in retrospect, what makes sense anyway).

In [30]: cat = intake.open_catalog("reference.yml")

In [31]: cat.data.to_dask()
Out[31]:
<xarray.Dataset>
Dimensions:    (lat: 180, time: 3600, bnds: 2, lev: 35, lon: 360)
Coordinates:
  * lat        (lat) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
  * lev        (lev) float64 2.5 10.0 20.0 32.5 ... 5e+03 5.5e+03 6e+03 6.5e+03
...

Doubtless my environment doesn't quite match yours?

@rabernat
Copy link
Contributor

Thanks for the quick reply Martin!

Yes, this does work for me. 👍

s = intake_xarray.xzarr.ZarrSource(
    "reference://",
    storage_options=dict(
        fo=target._full_path("reference.json"),
        remote_protocol="s3",
        chunks={}
    ),
    consolidated=False)
s.to_dask()

And so does this!

import intake
cat = intake.open_catalog(target._full_path("reference.yaml"))
cat.data.to_dask()

The difference is that before I was calling cat.data.read() This suggests that there may be some more subtle error hiding in the chunks.

@rabernat
Copy link
Contributor

And I found it...

import intake
cat = intake.open_catalog(target._full_path("reference.yaml"))
ds = cat.data.to_dask()
ds.lat_bnds.values
# -> ValueError: cannot reshape array of size 360 into shape (240,180,2)

@martindurant
Copy link
Contributor Author

OK, so there is a bug in reference-maker combine:
the lat_bnds (for example) were not a function of time. When combined, it might have been reasonable to have it become a function of time (time, lat, bnds) and this would be the right thing to do for some variables (particularly if there is ONE time per file, and no other variable is a function of time).

In this case, the variable is constant and should not have gained the extra dimension. In MultiZarrToZarr, the variable should have fallen under case c), I can run debug to see why that didn't happen - do you have the original reference files around?

A note:
(360,) can indeed be reshaped and broadcast into (240, 180, 2), seeing 180x2. That's what I would have expected to happen

@martindurant
Copy link
Contributor Author

I think that in this specific case, you needed the extra argument at the creation of the aggregate JSON file.

In general, you can indeed override any specific arguments when instantiating an intake source.

ds = cat.data(chunks={"time": 1200}).to_dask()

Unfortunately, if you wanted to update an argument that was embedded within another structure, such as one of the components of storage_options, it would take more work - there is no obvious API for this.

@cisaacstern
Copy link
Member

Will this close #70?

@martindurant
Copy link
Contributor Author

@cisaacstern - yes and no. This kind of recipe certainly is one way of generating metadata, as described in the issue. I still think it's a valid discussion, though, unless we can convince ourselves that reference-maker does/can give all the metadata that we might want.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@rabernat
Copy link
Contributor

Just a note on some changes I've made: I feel like don't want to have optional dependencies for pangeo-forge-recipes right now. This introduces unnecessary complexity in our test suite and CI. So here I would just prefer to have fsspec-reference-maker, intake, and h5py become required dependencies. How does that sound?

@martindurant
Copy link
Contributor Author

Intake/intake-xarray is only required for testing here, so it doesn't need to be in the formal requirements.

You should be warned that the long-term for reference-maker is to support many file types (e.g., grib2), and you probably don't want to add all the dependencies. h5py should be the most common one, though, and we already use that. reference-maker will probably not depend on the various format-specific packages directly, only xarray, zarr.

ci/py3.8.yml Outdated
@@ -40,4 +40,4 @@ dependencies:
- pip
- pip:
- pytest-timeout
- git+https://github.com/intake/fsspec-reference-maker
- sspec-reference-maker
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing "f"

@rabernat
Copy link
Contributor

There is some weird stuff in the rtd build...

  Downloading prefect-0.5.2-py3-none-any.whl (219 kB)
  Downloading prefect-0.5.1-py3-none-any.whl (217 kB)
  Downloading prefect-0.5.0-py3-none-any.whl (208 kB)
INFO: pip is looking at multiple versions of fsspec to determine which version is compatible with other requirements. This could take a while.
Collecting fsspec
  Downloading fsspec-2021.6.1-py3-none-any.whl (115 kB)
INFO: pip is looking at multiple versions of marshmallow to determine which version is compatible with other requirements. This could take a while.
  Downloading fsspec-2021.6.0-py3-none-any.whl (114 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  Downloading fsspec-2021.5.0-py3-none-any.whl (111 kB)
  Downloading fsspec-2021.4.0-py3-none-any.whl (108 kB)
  Downloading fsspec-0.9.0-py3-none-any.whl (107 kB)
  Downloading fsspec-0.8.7-py3-none-any.whl (103 kB)
  Downloading fsspec-0.8.6-py3-none-any.whl (103 kB)
INFO: pip is looking at multiple versions of fsspec to determine which version is compatible with other requirements. This could take a while.
  Downloading fsspec-0.8.5-py3-none-any.whl (98 kB)
  Downloading fsspec-0.8.4-py3-none-any.whl (91 kB)
  Downloading fsspec-0.8.3-py3-none-any.whl (88 kB)
  Downloading fsspec-0.8.2-py3-none-any.whl (87 kB)
  Downloading fsspec-0.8.1-py3-none-any.whl (87 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. If you want to abort this run, you can press Ctrl + C to do so. To improve how pip performs, tell us what happened here: https://pip.pypa.io/surveys/backtracking
  Downloading fsspec-0.8.0-py3-none-any.whl (85 kB)
  Downloading fsspec-0.7.4-py3-none-any.whl (75 kB)
  Downloading fsspec-0.7.3-py3-none-any.whl (70 kB)
  Downloading fsspec-0.7.2-py3-none-any.whl (67 kB)
  Downloading fsspec-0.7.1-py3-none-any.whl (66 kB)
  Downloading fsspec-0.7.0-py3-none-any.whl (66 kB)
  Downloading fsspec-0.6.3-py3-none-any.whl (65 kB)
  Downloading fsspec-0.6.2-py3-none-any.whl (62 kB)
  Downloading fsspec-0.6.1-py3-none-any.whl (62 kB)
  Downloading fsspec-0.6.0-py3-none-any.whl (61 kB)
  Downloading fsspec-0.5.2.tar.gz (64 kB)
  Downloading fsspec-0.5.1-py3-none-any.whl (56 kB)
  Downloading fsspec-0.4.5.tar.gz (61 kB)
  Downloading fsspec-0.4.4.tar.gz (58 kB)
  Downloading fsspec-0.4.3.tar.gz (58 kB)
  Downloading fsspec-0.4.2.tar.gz (58 kB)
  Downloading fsspec-0.4.1.tar.gz (57 kB)
  Downloading fsspec-0.4.0.tar.gz (57 kB)
  Downloading fsspec-0.3.6.tar.gz (57 kB)
  Downloading fsspec-0.3.5.tar.gz (57 kB)
  Downloading fsspec-0.3.4.tar.gz (57 kB)
  Downloading fsspec-0.3.3.tar.gz (56 kB)
  Downloading fsspec-0.3.2.tar.gz (56 kB)
  Downloading fsspec-0.3.1.tar.gz (56 kB)
  Downloading fsspec-0.3.0.tar.gz (54 kB)
  Downloading fsspec-0.2.3.tar.gz (54 kB)
  Downloading fsspec-0.2.2.tar.gz (54 kB)
  Downloading fsspec-0.2.1.tar.gz (51 kB)
  Downloading fsspec-0.2.0.tar.gz (47 kB)
  Downloading fsspec-0.1.4.tar.gz (46 kB)
  Downloading fsspec-0.1.3.tar.gz (46 kB)
  Downloading fsspec-0.1.2.tar.gz (34 kB)
  Downloading fsspec-0.1.1.tar.gz (34 kB)
  Downloading fsspec-0.1.0.tar.gz (34 kB)
INFO: pip is looking at multiple versions of sphinx-autodoc-typehints to determine which version is compatible with other requirements. This could take a while.
Collecting sphinx-autodoc-typehints
  Downloading sphinx_autodoc_typehints-1.11.1-py3-none-any.whl (8.7 kB)
  Downloading sphinx_autodoc_typehints-1.11.0-py3-none-any.whl (8.5 kB)
  Downloading sphinx_autodoc_typehints-1.10.3-py3-none-any.whl (8.4 kB)
  Downloading sphinx_autodoc_typehints-1.10.2-py3-none-any.whl (8.4 kB)


Command killed due to excessive memory consumption

Maybe we should be pinning dependencies more?

@rabernat
Copy link
Contributor

Something about the ci changes has made it take forever to build the python 3.8 environment. I'm going to try unpinning some versions and see if it improves.

https://github.com/pangeo-forge/pangeo-forge-recipes/runs/3424372828?check_suite_focus=true

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martindurant - from my point of view this is good to go. Would appreciate your check on the items below.

Also your feedback on the tutorial and docs for the Reference recipe would be great.

@rabernat
Copy link
Contributor

@cisaacstern if you're interested, a review would be helpful here. When you approve, we can merge this PR and add our second Recipe class to the package! 🎉 🎊 Having other Recipe classes is very important because it will help us generalize some of the code patterns. For example, I'm very pleased we can reuse both the FilePattern and the Target stuff here.

Copy link
Member

@cisaacstern cisaacstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool.

My main question/suggestion is around the name of the new recipe class. Comment about that provided in-line on the recipe module.

.github/workflows/main.yaml Show resolved Hide resolved
from pangeo_forge_recipes.recipes.base import BaseRecipe
from pangeo_forge_recipes.recipes.xarray_zarr import XarrayZarrRecipe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the ReferenceHDFRecipe test module is called test_reference.py, should this module be renamed to test_xarray_zarr.py?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree.

Copy link
Contributor

@rabernat rabernat Aug 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can fix this in #167, which also includes test refactors.

pangeo_forge_recipes/recipes/reference_hdf_zarr.py Outdated Show resolved Hide resolved
pangeo_forge_recipes/recipes/reference_hdf_zarr.py Outdated Show resolved Hide resolved
tests/test_references.py Outdated Show resolved Hide resolved
@martindurant
Copy link
Contributor Author

I don't see anything remaining to be done here.

@rabernat rabernat merged commit 690594a into pangeo-forge:master Aug 31, 2021
@martindurant martindurant deleted the references branch August 31, 2021 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants