-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opening http remote files still not working with defaults #135
Comments
The fsspec.open feels very inconsistent. Ryan went on a super deep dive looking into behavior across https/local etc. |
God how annoying. Literally what is the point of fsspec if it doesn't actually specify a common behaviour across different filesystems? It really all is a symptom of this fsspec/filesystem_spec#1446. What can we do about it in virtualizarr though? We could try to test all the filesystems we might care about (again fsspec should really be doing that...) We could maintain our own little |
The problem is in VirtualiZarr/virtualizarr/kerchunk.py Lines 56 to 58 in 87221ea
That's defining It's technically API-breaking, but I'd recommend just setting that to |
Oh thanks @TomAugspurger ! I thought I fixed that bug in a commit I added to #126 but I obviously missed another occurrence of the default s3-specific values. It would still be great to have tests that this all works on the main types of storage we care about. |
I like this idea, but maybe I should take a backseat, since I have the tendency to always write massive end-to-end tests 😆. |
Continuation of #126
Over at https://github.com/jbusecke/esgf-virtual-zarr-data-access I am maintaining a simple example how to virtualize CMIP6 datasets.
This is a minimal example to reproduce the bug:
this works as intended, but if I leave out the
reader_options={}
it failsFile ~/miniforge3/envs/esgf-virtual-zarr-data-access/lib/python3.11/site-packages/fsspec/asyn.py:118, in sync_wrapper..wrapper(*args, **kwargs)
115 @functools.wraps(func)
116 def wrapper(*args, **kwargs):
117 self = obj or args[0]
--> 118 return sync(self.loop, func, *args, **kwargs)
File ~/miniforge3/envs/esgf-virtual-zarr-data-access/lib/python3.11/site-packages/fsspec/asyn.py:103, in sync(loop, func, timeout, *args, **kwargs)
101 raise FSTimeoutError from return_result
102 elif isinstance(return_result, BaseException):
--> 103 raise return_result
104 else:
105 return return_result
File ~/miniforge3/envs/esgf-virtual-zarr-data-access/lib/python3.11/site-packages/fsspec/asyn.py:56, in _runner(event, coro, result, timeout)
54 coro = asyncio.wait_for(coro, timeout=timeout)
55 try:
---> 56 result[0] = await coro
57 except Exception as ex:
58 result[0] = ex
File ~/miniforge3/envs/esgf-virtual-zarr-data-access/lib/python3.11/site-packages/fsspec/implementations/http.py:435, in HTTPFileSystem._info(self, url, **kwargs)
432 except Exception as exc:
433 if policy == "get":
434 # If get failed, then raise a FileNotFoundError
--> 435 raise FileNotFoundError(url) from exc
436 logger.debug("", exc_info=exc)
438 return {"name": url, "size": None, **info, "type": "file"}
FileNotFoundError: http://aims3.llnl.gov/thredds/fileServer/css03_data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710/tas_Amon_MPI-ESM1-2-HR_ssp126_r1i1p1f1_gn_201501-201912.nc
I suspect that there is still something going wrong with the default keywords logic, as I raised back here?
Wondering if we should add some end to end tests with small files on every supported filesystem?
The text was updated successfully, but these errors were encountered: