-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Unable to specify list for urlpath with NetCDFSource + fsspec #98
Comments
thanks for the detailed reports @pbranson looks like we should definitely add some additional tests in this library to try and catch these issues. I think an open question right now is whether fsspec or xarray should handle the loading. Hopefully @martindurant can provide some guidance here. With some backend refactoring in xarray underway i'm not sure of the best way forward as it seems there are incompatibilities with how the data is stored (Zarr, netcdf, tiff) and what code ultimately handles I/O (zarr, h5netcdf, rasterio) given either URLs or fsspec objects. See pydata/xarray#4823 (comment). |
xarray should do this in the long term, but the storage backend implementation has changed a lot. The PR that is still not merged now only deals with the zarr engine, precisely because it's not clear which engine can accept what kind of input, and there are no tests for any of it. |
Yeah I'm not sure the best solution here. It would seem that the solution would be to revert the NetCDFSource to defer to XArray, but where does that leave catalogs of netCDF files accessed from a http server. Maybe could add an engine argument that passes urlpath as str for engine='netcdf'? Or add some of the features (like a list or pattern of urls) that are in NetCDFSource to OpenDAPSource? |
import intake_thredds
intake_thredds.__version__ # '2021.6.16'
intake_thredds.THREDDSMergedSource(url='simplecache::https://dapds00.nci.org.au/thredds/catalog/rr6/oceanmaps_datasets/version_3.3/analysis/eta/catalog.xml',
path=['ocean_an00_2020052*12_eta.nc'],
driver='netcdf').to_dask()
<xarray.Dataset>
Dimensions: (xt_ocean: 3600, yt_ocean: 1500, Time: 9, nv: 2)
Coordinates:
* xt_ocean (xt_ocean) float64 0.05 0.15 0.25 0.35 ... 359.8 359.9 360.0
* yt_ocean (yt_ocean) float64 -74.95 -74.85 -74.75 ... 74.75 74.85 74.95
* Time (Time) datetime64[ns] 2020-05-22 2020-05-23 ... 2020-05-30
* nv (nv) float64 1.0 2.0
Data variables:
eta_t (Time, yt_ocean, xt_ocean) float32 dask.array<chunksize=(1, 1500, 3600), meta=np.ndarray>
average_T1 (Time) datetime64[ns] dask.array<chunksize=(1,), meta=np.ndarray>
average_T2 (Time) datetime64[ns] dask.array<chunksize=(1,), meta=np.ndarray>
average_DT (Time) timedelta64[ns] dask.array<chunksize=(1,), meta=np.ndarray>
Time_bounds (Time, nv) datetime64[ns] dask.array<chunksize=(1, 2), meta=np.ndarray> |
@pbranson is this still an issue for you? |
Thanks @aaronspring yes this isnt so much a problem now that intake-thredds has been developed further. |
I have encountered a bug with a recent change in either intake-xarray or fsspec where passing a list for the urlpath to intake-xarray fails. This worked previously. Sorry havent had time to dig further, but thought I would log it
A minimum reproducible example is:
Which outputs:
Relevant library versions:
cc @martindurant @scottyhq
The text was updated successfully, but these errors were encountered: