-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
listdir in HttpStore and use consolidated metadata #82
Conversation
Merge branch 'http' # Conflicts: # R/attributes.R # R/stores.R # R/zarr-array.R # R/zarr-group.R # man/Attributes.Rd # man/Store.Rd # man/ZarrArray.Rd # man/ZarrGroup.Rd # man/as.array.ZarrArray.Rd # man/sub-.ZarrArray.Rd # man/subset-.ZarrArray.Rd
@@ -39,7 +39,7 @@ jobs: | |||
uses: actions/cache@v2 | |||
with: | |||
path: ${{ env.R_LIBS_USER }}/* | |||
key: ${{ steps.get-version.outputs.os-version }}-${{ steps.get-version.outputs.r-version }}-${{ env.cache-version }}-deps | |||
key: ${{ hashFiles('DESCRIPTION') }}-${{ steps.get-version.outputs.os-version }}-${{ steps.get-version.outputs.r-version }}-${{ env.cache-version }}-deps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this ensures the cache gets cleared when DESCRIPTION changes
if(is_na(mode)) { | ||
mode <- "a" | ||
if(inherits(store, "HttpStore")) | ||
mode <- "r" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes here get zarr_open working properly with HttpStore
@@ -245,6 +245,9 @@ normalize_store_arg <- function(store, storage_options=NA, mode=NA) { | |||
} | |||
|
|||
if(is.character(store)) { | |||
if(grepl("^https?://", store)) { | |||
return(HttpStore$new(store)) | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed to explicitly check for http / https to get HttpStores to open with convenience functions
#' Get consolidated metadata if it exists. | ||
get_consolidated_metadata = function() { | ||
return(private$zmetadata) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just returns NULL unless the initialize function sets zmetadata to something.
} else { | ||
key <- item | ||
} | ||
key <- item_to_key(item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was used in a couple places so I move it to utils
if(substr(url, length(url), length(url)) == "/") { | ||
private$url <- substr(url, 1, length(url)-1) | ||
if(substr(url, nchar(url), nchar(url)) == "/") { | ||
private$url <- substr(url, 1, nchar(url)-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just stuck out as obviously wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this
@@ -395,6 +418,8 @@ HttpStore <- R6::R6Class("HttpStore", | |||
opts = private$options, | |||
headers = private$headers | |||
) | |||
|
|||
private$zmetadata <- private$get_zmetadata() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replaces NULL with .zmetadata if it can be retrieved.
#' Set cache time of http requests. | ||
#' @param seconds number of seconds until cache is invalid -- 0 for no cache | ||
set_cache_time_seconds = function(seconds) { | ||
private$cache_time_seconds <- seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we support environment variables / setting / clearing with some global action? It would be nice to be able to manually modify cache behavior from way up the calling stack where you don't have access to modify the store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea - it could be left as a follow-up issue to complete or done here, either way is fine with me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm tight on time today -- I'll open an issue to follow up on it.
meta_bytes <- private$store$get_item(mkey) | ||
meta <- private$store$metadata_class$decode_array_metadata(meta_bytes) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put this here since this is where a list is retrieved and the zmeta cache is already in that form.
private$meta <- private$store$metadata_class$decode_group_metadata(meta_bytes) | ||
|
||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is mostly reindentation -- just trying to pull the info from zmeta.
|
||
test_that("http listdir and zmeta", { | ||
|
||
url<- "https://raw.githubusercontent.com/DOI-USGS/rnz/main/inst/extdata/bcsd.zarr" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we merge this PR, this url can switch to this repository.
R/stores.R
Outdated
key <- item_to_key(item) | ||
|
||
# per-session http get cache | ||
mem_get <- memoise::memoise(\(client, path) client$get(path), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this get re-memoized on each call to make_request()
? Can we create mem_get
in the constructor, and then store it as a variable on the instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question -- probably. I'll make that change quick.
if(substr(url, length(url), length(url)) == "/") { | ||
private$url <- substr(url, 1, length(url)-1) | ||
if(substr(url, nchar(url), nchar(url)) == "/") { | ||
private$url <- substr(url, 1, nchar(url)-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this
#' Set cache time of http requests. | ||
#' @param seconds number of seconds until cache is invalid -- 0 for no cache | ||
set_cache_time_seconds = function(seconds) { | ||
private$cache_time_seconds <- seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea - it could be left as a follow-up issue to complete or done here, either way is fine with me
As discussed in #79 -- this PR includes a rudimentary listdir for HttpStore that accesses and parses consolidated metadata.
I've introduced
vcr
for testing andmemoise
to avoid repeated requests to http resources.This PR now also includes use of consolidated metadata for httpstore and other reads. Testing could be a bit more complete, but I feel pretty good about it. Performance is WAY WAY better when trying to access all the metadata for a store.
Happy to refactor or rework this if the pattern I put in isn't in line with your vision here @keller-mark -- R6 is totally new to me so this is a bit hacky -- hopefully I am seeing the forest for the trees!