-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support opening files over http #121
Comments
@jbusecke or anyone else wanna take this one? Don't wanna step on any toes. Probably related |
I was messing around with this last week. Only tricky thing is the diff --git a/virtualizarr/utils.py b/virtualizarr/utils.py
index 6ba7105..f332163 100644
--- a/virtualizarr/utils.py
+++ b/virtualizarr/utils.py
@@ -2,6 +2,8 @@ from __future__ import annotations
from typing import TYPE_CHECKING, Optional
+from pandas import read_csv
+
if TYPE_CHECKING:
from fsspec.implementations.local import LocalFileOpener
from s3fs.core import S3File
@@ -54,11 +56,15 @@ def _fsspec_openfile_from_filepath(
# using dict merge operator to add in defaults if keys are not specified
storage_options = s3_anon_defaults | storage_options
- fpath = fsspec.filesystem(protocol, **storage_options).open(filepath)
-
else:
- raise NotImplementedError(
- "Only local and s3 file protocols are currently supported"
+ storage_options = (
+ reader_options.get("storage_options") if reader_options else {}
)
+ fpath = fsspec.filesystem(protocol, **storage_options).open(filepath)
+
+ # else:
+ # raise NotImplementedError(
+ # "Only local and s3 file protocols are currently supported"
+ # )
return fpath
diff --git a/virtualizarr/xarray.py b/virtualizarr/xarray.py
index 80fce37..9ea0170 100644
--- a/virtualizarr/xarray.py
+++ b/virtualizarr/xarray.py
@@ -108,6 +108,7 @@ def open_virtual_dataset(
vds_refs = kerchunk.read_kerchunk_references_from_file(
filepath=filepath,
filetype=filetype,
+ reader_options=reader_options,
)
virtual_vars = virtual_vars_from_kerchunk_refs(
vds_refs, With that, I think I got HTTPs urls (to Azure Blob Storage) working: import virtualizarr
import planetary_computer
import xarray as xr
import virtualizarr.kerchunk
urls = [
"https://nasagddp.blob.core.windows.net/nex-gddp-cmip6/NEX/GDDP-CMIP6/UKESM1-0-LL/ssp585/r1i1p1f2/tas/tas_day_UKESM1-0-LL_ssp585_r1i1p1f2_gn_2100.nc",
"https://nasagddp.blob.core.windows.net/nex-gddp-cmip6/NEX/GDDP-CMIP6/UKESM1-0-LL/ssp585/r1i1p1f2/pr/pr_day_UKESM1-0-LL_ssp585_r1i1p1f2_gn_2100.nc"
]
urls = [planetary_computer.sign(url) for url in urls]
ds1 = virtualizarr.open_virtual_dataset(urls[0], reader_options={"storage_options": {}}, filetype=virtualizarr.kerchunk.FileType.netcdf4)
ds2 = virtualizarr.open_virtual_dataset(urls[1], reader_options={"storage_options": {}}, filetype=virtualizarr.kerchunk.FileType.netcdf4)
ds = xr.combine_by_coords([ds1, ds2], join="exact", combine_attrs="drop_conflicts")
ds |
Thats super nice @TomAugspurger. Could you make a PR out of that branch of yours? Id love to try this for some testing, and just want to avoid duplication. |
I won't have time in the next few weeks, but feel free to take whatever you want 🙂
Tangentially related: hopefully virtualizarr can be deliberate about exactly where IO can occur (ideally isolated to scanning source files). The majority of my headaches with running data pipelines comes from IO, and if that can be isolated / minimized we'll be better off for it.
…________________________________
From: Julius Busecke ***@***.***>
Sent: Tuesday, May 21, 2024 1:07 PM
To: TomNicholas/VirtualiZarr ***@***.***>
Cc: Mention ***@***.***>; Comment ***@***.***>; Subscribed ***@***.***>
Subject: Re: [TomNicholas/VirtualiZarr] Support opening files over http (Issue #121)
Thats super nice @TomAugspurger<https://github.com/TomAugspurger>. Could you make a PR out of that branch of yours? Id love to try this for some testing, and just want to avoid duplication.
—
Reply to this email directly, view it on GitHub<#121 (comment)> or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAKAOIU47MFGR3YVNLCQ6W3ZDOENTBFKMF2HI4TJMJ2XIZLTSSBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLAVFOZQWY5LFVI3DQMZWGI3DGNBVGGSG4YLNMWUWQYLTL5WGCYTFNSWHG5LCNJSWG5C7OR4XAZNMJFZXG5LFINXW23LFNZ2KM5DPOBUWG44TQKSHI6LQMWVHEZLQN5ZWS5DPOJ42K5TBNR2WLKJXGY4TGMBZGI3TRAVEOR4XAZNFNFZXG5LFUV3GC3DVMWVDEMZQGYYTOMBXGM4YFJDUPFYGLJLMMFRGK3FFOZQWY5LFVI3DQMZWGI3DGNBVGGTXI4TJM5TWK4VGMNZGKYLUMU>.
You are receiving this email because you were mentioned.
Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
I have another usecase and associated feature request.
I want to prototype virtualizarring some of these datasets, but for that we need to be able to support opening files via HTTP protocol in addition to S3 and local.
Here is a MRE:
Gives:
It seems that we might just have to add a new case here to make this work?
The text was updated successfully, but these errors were encountered: