Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

listdir in HttpStore and use consolidated metadata #82

Merged
merged 21 commits into from
May 27, 2024

Conversation

dblodgett-usgs
Copy link
Collaborator

@dblodgett-usgs dblodgett-usgs commented May 21, 2024

As discussed in #79 -- this PR includes a rudimentary listdir for HttpStore that accesses and parses consolidated metadata.

I've introduced vcr for testing and memoise to avoid repeated requests to http resources.

This PR now also includes use of consolidated metadata for httpstore and other reads. Testing could be a bit more complete, but I feel pretty good about it. Performance is WAY WAY better when trying to access all the metadata for a store.

Happy to refactor or rework this if the pattern I put in isn't in line with your vision here @keller-mark -- R6 is totally new to me so this is a bit hacky -- hopefully I am seeing the forest for the trees!

@dblodgett-usgs dblodgett-usgs changed the title listdir in HttpStore listdir in HttpStore and use consolidated metadata May 23, 2024
@@ -39,7 +39,7 @@ jobs:
uses: actions/cache@v2
with:
path: ${{ env.R_LIBS_USER }}/*
key: ${{ steps.get-version.outputs.os-version }}-${{ steps.get-version.outputs.r-version }}-${{ env.cache-version }}-deps
key: ${{ hashFiles('DESCRIPTION') }}-${{ steps.get-version.outputs.os-version }}-${{ steps.get-version.outputs.r-version }}-${{ env.cache-version }}-deps
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ensures the cache gets cleared when DESCRIPTION changes

if(is_na(mode)) {
mode <- "a"
if(inherits(store, "HttpStore"))
mode <- "r"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes here get zarr_open working properly with HttpStore

@@ -245,6 +245,9 @@ normalize_store_arg <- function(store, storage_options=NA, mode=NA) {
}

if(is.character(store)) {
if(grepl("^https?://", store)) {
return(HttpStore$new(store))
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed to explicitly check for http / https to get HttpStores to open with convenience functions

#' Get consolidated metadata if it exists.
get_consolidated_metadata = function() {
return(private$zmetadata)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just returns NULL unless the initialize function sets zmetadata to something.

} else {
key <- item
}
key <- item_to_key(item)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was used in a couple places so I move it to utils

if(substr(url, length(url), length(url)) == "/") {
private$url <- substr(url, 1, length(url)-1)
if(substr(url, nchar(url), nchar(url)) == "/") {
private$url <- substr(url, 1, nchar(url)-1)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just stuck out as obviously wrong

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this

@@ -395,6 +418,8 @@ HttpStore <- R6::R6Class("HttpStore",
opts = private$options,
headers = private$headers
)

private$zmetadata <- private$get_zmetadata()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaces NULL with .zmetadata if it can be retrieved.

#' Set cache time of http requests.
#' @param seconds number of seconds until cache is invalid -- 0 for no cache
set_cache_time_seconds = function(seconds) {
private$cache_time_seconds <- seconds
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we support environment variables / setting / clearing with some global action? It would be nice to be able to manually modify cache behavior from way up the calling stack where you don't have access to modify the store.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea - it could be left as a follow-up issue to complete or done here, either way is fine with me

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tight on time today -- I'll open an issue to follow up on it.

meta_bytes <- private$store$get_item(mkey)
meta <- private$store$metadata_class$decode_array_metadata(meta_bytes)
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put this here since this is where a list is retrieved and the zmeta cache is already in that form.

private$meta <- private$store$metadata_class$decode_group_metadata(meta_bytes)

}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mostly reindentation -- just trying to pull the info from zmeta.


test_that("http listdir and zmeta", {

url<- "https://raw.githubusercontent.com/DOI-USGS/rnz/main/inst/extdata/bcsd.zarr"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we merge this PR, this url can switch to this repository.

R/stores.R Outdated
key <- item_to_key(item)

# per-session http get cache
mem_get <- memoise::memoise(\(client, path) client$get(path),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this get re-memoized on each call to make_request()? Can we create mem_get in the constructor, and then store it as a variable on the instance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question -- probably. I'll make that change quick.

if(substr(url, length(url), length(url)) == "/") {
private$url <- substr(url, 1, length(url)-1)
if(substr(url, nchar(url), nchar(url)) == "/") {
private$url <- substr(url, 1, nchar(url)-1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this

#' Set cache time of http requests.
#' @param seconds number of seconds until cache is invalid -- 0 for no cache
set_cache_time_seconds = function(seconds) {
private$cache_time_seconds <- seconds
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea - it could be left as a follow-up issue to complete or done here, either way is fine with me

@keller-mark keller-mark merged commit 6920e83 into keller-mark:main May 27, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants