Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make fsspec_open_kwargs, query_string_secrets, & is_opendap attributes of FilePattern #167

Merged
merged 107 commits into from
Sep 2, 2021
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
107 commits
Select commit Hold shift + click to select a range
8245088
add input_fname var to cache_file function
cisaacstern Jul 29, 2021
1175af4
add query_string_secrets param to XarrayZarrRecipe, pass through to c…
cisaacstern Jul 29, 2021
ee77eba
lint
cisaacstern Jul 29, 2021
45e661d
don't need default arg in storage.py bc it's enforced by XarrayZarrRe…
cisaacstern Jul 29, 2021
62904b6
dataclass param lines don't terminate in commas
cisaacstern Jul 29, 2021
5d496e3
update tests for query string secrets
cisaacstern Jul 31, 2021
688135c
lint tests
cisaacstern Jul 31, 2021
3680a34
parse + unparse url in _add_query_string_secrets
cisaacstern Aug 4, 2021
78e2af3
wrap fsspec.open with _get_opener
cisaacstern Aug 4, 2021
ea858a0
remove fake_secrets fixture
cisaacstern Aug 4, 2021
c42151e
correct & simplify additions to test_storage.py
cisaacstern Aug 4, 2021
d276d1a
_get_opener always takes mode='rb'
cisaacstern Aug 4, 2021
6874ec6
lint
cisaacstern Aug 4, 2021
4ca70fa
simplify token string
cisaacstern Aug 4, 2021
91a16ce
add test for multiple params in query string
cisaacstern Aug 4, 2021
76c7ced
cleaner to make secrets conditional on 'path' rather than 'file_paths'
cisaacstern Aug 4, 2021
de679dc
remove fake token from fixtured path if present
cisaacstern Aug 4, 2021
a158c04
use parse_qs + urlencode to allow secrets to be a dict
cisaacstern Aug 4, 2021
a4c32ab
update type hint and docstring in xarray_zarr to reflect new secrets …
cisaacstern Aug 4, 2021
25a67a2
update secrets to dict in test_storage.py
cisaacstern Aug 4, 2021
a34b31e
refactor existing tests for click cli
cisaacstern Aug 4, 2021
85d317d
add click as known_third_party
cisaacstern Aug 4, 2021
c530e8e
lint
cisaacstern Aug 4, 2021
afe9ed0
make items_per_file a fixture
cisaacstern Aug 4, 2021
ba798e0
refactor 'local_paths' and 'by_variable' into single fixture
cisaacstern Aug 4, 2021
56049c9
simplify conditional tuple unpacking in conftest.py
cisaacstern Aug 4, 2021
9348eee
remove http from test_file_opener, to try new fixture
cisaacstern Aug 4, 2021
7095f34
update netcdf_http_paths for new fixture
cisaacstern Aug 4, 2021
31beca6
conditional return for netcdf_http_paths
cisaacstern Aug 4, 2021
48ef594
add http paths back into test_file_opener params
cisaacstern Aug 4, 2021
f85c15a
refactor test_fixtures.py for new fixture
cisaacstern Aug 4, 2021
f3e2b49
remove TODO comment re: http fixture b/c complete
cisaacstern Aug 4, 2021
61e4396
refactor sequential_recipe fixture
cisaacstern Aug 5, 2021
7f4370a
remove now nonexistent multi_variable recipe from tests
cisaacstern Aug 5, 2021
92078e9
lint
cisaacstern Aug 5, 2021
f7f8961
Merge remote-tracking branch 'upstream/master' into scrub-tokens
cisaacstern Aug 24, 2021
315c292
conditional file_pattern in subset test
cisaacstern Aug 24, 2021
116cac7
refactor conditional file_pattern logic into helper func
cisaacstern Aug 24, 2021
1161c9e
remove 'sequential' from recipe fixture name
cisaacstern Aug 25, 2021
635d0b7
remove test_recipes redundancy
cisaacstern Aug 25, 2021
829692b
paths/urls are all that's needed in text_fixtures
cisaacstern Aug 25, 2021
60713c5
netcdf_paths should always return 4-tuple
cisaacstern Aug 25, 2021
056b6a2
try to refactor tests a bit
rabernat Aug 26, 2021
25eb0c6
make start server func, add http basic auth test
cisaacstern Aug 26, 2021
636f03f
add basic auth test to test_file_opener params
cisaacstern Aug 26, 2021
3999d56
add query string fixture
cisaacstern Aug 27, 2021
dddaf8f
check for query string on server side
cisaacstern Aug 27, 2021
b3dd635
add query string test
cisaacstern Aug 27, 2021
2a6f3eb
secrets was missing from file_opener in test
cisaacstern Aug 27, 2021
4732062
refactor 3 http fixtures into single fixture
cisaacstern Aug 27, 2021
b485ba8
change test params to reflect fixture refactor
cisaacstern Aug 27, 2021
d2b2f47
add aiohttp as known_third_party (used in auth test)
cisaacstern Aug 27, 2021
3646d51
lint conftest.py
cisaacstern Aug 27, 2021
c9164fc
add requests as known_third_party (used in auth test)
cisaacstern Aug 27, 2021
4ef3716
Merge remote-tracking branch 'upstream/master' into scrub-tokens
cisaacstern Aug 27, 2021
1236ada
Skip auth and query string tests in test_fixtures.py
cisaacstern Aug 27, 2021
b01308f
Merge remote-tracking branch 'upstream/master' into scrub-tokens
cisaacstern Aug 27, 2021
08e4ae1
add netcdf_http_file_pattern fixture
cisaacstern Aug 28, 2021
79f7d25
add test_recipe_http_caching_copying test
cisaacstern Aug 28, 2021
2e6422f
add auth kwargs where missing in xarray_zarr.py
cisaacstern Aug 28, 2021
81b7c2a
mypy lint
cisaacstern Aug 28, 2021
f647ba2
pass auth kwargs to cache_input_metadata
cisaacstern Aug 28, 2021
ba9f790
fix path_format for http multivariate patterns
cisaacstern Aug 30, 2021
48a96ff
add attribures to FilePattern
cisaacstern Aug 30, 2021
a24c9bd
remove fsspec_open_kwargs as XarrayZarrRecipe kwarg
cisaacstern Aug 30, 2021
8501c2c
remove query_string_secrets as XarrayZarrRecipe kwarg
cisaacstern Aug 30, 2021
39bc994
assign & pass auth kwargs from path fixtures
cisaacstern Aug 31, 2021
710011e
assign auth kwargs from path fixtures in test_storage.py
cisaacstern Aug 31, 2021
7f01b40
combine always == 'by_coords'
cisaacstern Aug 31, 2021
e8999f8
use presence of auth kwargs as skip trigger in fixture test
cisaacstern Aug 31, 2021
0cc0b0d
lint line length in fixture test
cisaacstern Aug 31, 2021
0044348
move fsspec_open_kwargs and query_string_secrets to FilePattern **kwargs
cisaacstern Aug 31, 2021
404b0ac
edit make_file_pattern to reflect new FilePattern **kwargs
cisaacstern Aug 31, 2021
a1c43f3
remove roundabout auth kwargs checking in test_recipes.py
cisaacstern Aug 31, 2021
03de7fd
make is_opendap an attribute of FilePattern
cisaacstern Aug 31, 2021
de54dde
update XarrayZarrRecipe to reflect is_opendap as FilePattern attribute
cisaacstern Aug 31, 2021
8faf935
Merge remote-tracking branch 'upstream/master' into scrub-tokens
cisaacstern Aug 31, 2021
608cdea
refactor test_patterns with fixtures
cisaacstern Aug 31, 2021
3009731
is_opendap and fsspec_open_kwargs are mutually exclusive
cisaacstern Aug 31, 2021
cc0f9b5
test new FilePattern attributes and __init__ ValueErrors
cisaacstern Aug 31, 2021
f455fe1
lint
cisaacstern Aug 31, 2021
1d1c8bd
revert local path fixture name
cisaacstern Aug 31, 2021
62ec589
update test_references for path fixture refactor
cisaacstern Aug 31, 2021
548fc82
complete reversion of local fixture name
cisaacstern Aug 31, 2021
08e0136
terraclimate tutorial: remove deprecated mentions of fsspec_open_kwargs
cisaacstern Aug 31, 2021
fad5739
netcdf sequential tutorial: remove deprecated mentions of fsspec_open…
cisaacstern Aug 31, 2021
f799309
re-run cmip6-recipe.ipynb with fsspec_open_kwargs passed to FilePattern
cisaacstern Aug 31, 2021
3822c0c
opendap subset tutorial: remove deprecated mentions of fsspec_open_kw…
cisaacstern Sep 1, 2021
fc25fa4
multi variable tutorial: remove deprecated mentions of fsspec_open_kw…
cisaacstern Sep 1, 2021
2c5cc19
reset kernelspec for cmip6-recipe.ipynb
cisaacstern Sep 1, 2021
3573d1e
fix nitems_per_file typo in file pattern docs
cisaacstern Sep 1, 2021
1c7450d
add narrative docs for new FilePattern attrs
cisaacstern Sep 1, 2021
f4b2bb4
add HTTP authentication examples to docs
cisaacstern Sep 1, 2021
d0982fa
Update docs/recipe_user_guide/file_patterns.md
cisaacstern Sep 1, 2021
0b5f659
Merge remote-tracking branch 'charles/scrub-tokens' into scrub-tokens
cisaacstern Sep 1, 2021
efbf335
make all FilePattern.__init__ kwargs explicit
cisaacstern Sep 1, 2021
5f750f4
clean up redundant control flow for assertions in test_patterns
cisaacstern Sep 1, 2021
7117a66
for optional FilePattern kwargs, set default to None
cisaacstern Sep 1, 2021
72da50f
remove test_recipe_caching_copying redundancy with lazy fixture
cisaacstern Sep 1, 2021
4b2ae4a
define make_netcdf_local_paths function
cisaacstern Sep 2, 2021
f6b9774
refactor file pattern fixtures to distinguish between sequential and …
cisaacstern Sep 2, 2021
9f132ae
pass sequential file pattern fixture in test_references.py
cisaacstern Sep 2, 2021
725f173
refactor xarray_zarr recipe fixtures using make_netCDFtoZarr_recipe f…
cisaacstern Sep 2, 2021
5b11533
fix typo in conftest.py
cisaacstern Sep 2, 2021
a59fbd7
add sequential only recipe for test_lock_timeout
cisaacstern Sep 2, 2021
9578870
lint
cisaacstern Sep 2, 2021
6946ca1
lint 2
cisaacstern Sep 2, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion pangeo_forge_recipes/recipes/xarray_zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,7 @@ def cache_input(
delete_input_encoding: bool,
process_input: Optional[Callable[[xr.Dataset, str], xr.Dataset]],
metadata_cache: Optional[MetadataTarget],
query_string_secrets: Optional[str],
is_opendap=bool,
) -> None:
if cache_inputs:
Expand All @@ -165,7 +166,7 @@ def cache_input(
raise ValueError("input_cache is not set.")
logger.info(f"Caching input '{input_key!s}'")
fname = file_pattern[input_key]
input_cache.cache_file(fname, **fsspec_open_kwargs)
input_cache.cache_file(fname, query_string_secrets, **fsspec_open_kwargs)

if cache_metadata:
return cache_input_metadata(
Expand Down Expand Up @@ -703,6 +704,7 @@ class XarrayZarrRecipe(BaseRecipe):
lock_timeout: Optional[int] = None
subset_inputs: SubsetSpec = field(default_factory=dict)
is_opendap: bool = False
query_string_secrets: Optional[str] = None

# internal attributes not meant to be seen or accessed by user
_concat_dim: str = field(default_factory=str, repr=False, init=False)
Expand Down Expand Up @@ -846,6 +848,7 @@ def cache_input(self) -> Callable[[Hashable], None]:
process_input=self.process_input,
metadata_cache=self.metadata_cache,
is_opendap=self.is_opendap,
query_string_secrets=self.query_string_secrets,
)

@property
Expand Down
9 changes: 7 additions & 2 deletions pangeo_forge_recipes/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ def _full_path(self, path: str) -> str:
class CacheFSSpecTarget(FlatFSSpecTarget):
"""Alias for FlatFSSpecTarget"""

def cache_file(self, fname: str, **open_kwargs) -> None:
def cache_file(self, fname: str, secrets: Optional[str], **open_kwargs) -> None:
cisaacstern marked this conversation as resolved.
Show resolved Hide resolved
# check and see if the file already exists in the cache
logger.info(f"Caching file '{fname}'")
if self.exists(fname):
Expand All @@ -158,7 +158,8 @@ def cache_file(self, fname: str, **open_kwargs) -> None:
logger.info(f"File '{fname}' is already cached")
return

input_opener = fsspec.open(fname, mode="rb", **open_kwargs)
input_fname = fname if not secrets else _add_query_string_secrets(fname, secrets)
input_opener = fsspec.open(input_fname, mode="rb", **open_kwargs)
target_opener = self.open(fname, mode="wb")
logger.info(f"Coping remote file '{fname}' to cache")
_copy_btw_filesystems(input_opener, target_opener)
Expand Down Expand Up @@ -240,3 +241,7 @@ def _slugify(value: str) -> str:
value = unicodedata.normalize("NFKD", value).encode("ascii", "ignore").decode("ascii")
value = re.sub(r"[^.\w\s-]+", "_", value.lower())
return re.sub(r"[-\s]+", "-", value).strip("-_")


def _add_query_string_secrets(fname: str, secrets: str) -> str:
return fname + secrets
cisaacstern marked this conversation as resolved.
Show resolved Hide resolved
12 changes: 12 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,13 @@ def teardown():
return all_urls, items_per_file


@pytest.fixture(scope="session")
def netcdf_http_paths_with_secrets(netcdf_http_paths, fake_secrets):
all_urls, items_per_file = netcdf_http_paths
all_urls = [url + fake_secrets for url in all_urls]
cisaacstern marked this conversation as resolved.
Show resolved Hide resolved
return all_urls, items_per_file


@pytest.fixture()
def tmp_target(tmpdir_factory):
fs = fsspec.get_filesystem_class("file")()
Expand All @@ -175,6 +182,11 @@ def tmp_cache(tmpdir_factory):
return cache


@pytest.fixture(scope="session")
def fake_secrets():
return "?a-pretend-api-token"


@pytest.fixture()
def tmp_metadata_target(tmpdir_factory):
path = str(tmpdir_factory.mktemp("cache"))
Expand Down
24 changes: 20 additions & 4 deletions tests/test_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,25 +40,41 @@ def test_metadata_target(tmp_metadata_target):


@pytest.mark.parametrize(
"file_paths", [lazy_fixture("netcdf_local_paths"), lazy_fixture("netcdf_http_paths")]
"file_paths",
[
lazy_fixture("netcdf_local_paths"),
lazy_fixture("netcdf_http_paths"),
lazy_fixture("netcdf_http_paths_with_secrets"),
],
)
@pytest.mark.parametrize("copy_to_local", [False, True])
@pytest.mark.parametrize("use_cache, cache_first", [(False, False), (True, False), (True, True)])
@pytest.mark.parametrize("use_dask", [True, False])
@pytest.mark.parametrize("use_xarray", [True, False])
@pytest.mark.parametrize("use_query_string_secrets", [True, False])
def test_file_opener(
file_paths, tmp_cache, copy_to_local, use_cache, cache_first, dask_cluster, use_dask, use_xarray
file_paths,
tmp_cache,
fake_secrets,
copy_to_local,
use_cache,
cache_first,
dask_cluster,
use_dask,
use_xarray,
use_query_string_secrets,
):
all_paths, _ = file_paths
path = str(all_paths[0])
cache = tmp_cache if use_cache else None
secrets = fake_secrets if use_query_string_secrets and file_paths[0][-1] == "?" else None
cisaacstern marked this conversation as resolved.
Show resolved Hide resolved

def do_actual_test():
if cache_first:
cache.cache_file(path)
cache.cache_file(path, secrets)
cisaacstern marked this conversation as resolved.
Show resolved Hide resolved
assert cache.exists(path)
details = cache.fs.ls(cache.root_path, detail=True)
cache.cache_file(path)
cache.cache_file(path, secrets)
# check that nothing happened
assert cache.fs.ls(cache.root_path, detail=True) == details

Expand Down