Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read/write in parallel #91

Merged
merged 11 commits into from
Jun 21, 2024
Merged

Read/write in parallel #91

merged 11 commits into from
Jun 21, 2024

Conversation

keller-mark
Copy link
Owner

@keller-mark keller-mark commented Jun 19, 2024

Unfortunately, the switch from crul::HttpClient to crul::HttpRequest + crul::AsyncVaried breaks the VCR request mocking due to ropensci/vcr#246

Fixes #83
Related to #31

To get tests passing again, depends on ropensci/crul#180

R/stores.R Outdated
# For some reason, the crul::HttpClient fails in parallel settings
# (when used inside foreach %dopar% loops). This alternative
# with HttpRequest and AsyncVaried seems to work.
# Reference: https://docs.ropensci.org/crul/articles/async.html
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there maybe an argument to switch to a more traditional http client here? Seems like there's a lot of complexity coming together.

Maybe another option is to have one simple pathway for metadata gets and another non-caching parallelized get utility for iterating over chunks?

@dblodgett-usgs
Copy link
Collaborator

Looks like a great step forward.

Big question is whether foreach is the right parallelization option? I've gotten a lot of mileage out of pbapply, which offers an out of the box progress bar for *apply functions. Looks like you are just doubling down on doParallel? Looks like they haven't had any development in two years and he last change was a package maintainer update.

The other thing is what this ends up looking like from a user-facing API and UX point of view. It would be nice to havea vignette showing what the expected use pattern is. Once this is merged, maybe I can work that up as a test of what you did here?

@keller-mark
Copy link
Owner Author

Big question is whether foreach is the right parallelization option? I've gotten a lot of mileage out of pbapply

I am not tied to foreach and doParallel - i will try pbapply instead, thank you for the pointer

The other thing is what this ends up looking like from a user-facing API and UX point of view. It would be nice to havea vignette showing what the expected use pattern is. Once this is merged, maybe I can work that up as a test of what you did here?

Yes definitely we will need documentation of the usage. The tests are currently the only place that shows this

options(pizzarr.parallel_read_enabled = TRUE)

@dblodgett-usgs
Copy link
Collaborator

Cool -- I'm not attached to any one parallelization scheme but it does seem that future is the one with the most community support. http://www.futureverse.org/ It looks like that community prefers https://progressr.futureverse.org/ which takes a different approach from pbapply, it requires a more tightly coupled implementation than pbapply. I just kind of fell into pbapply and have liked it because I can just use my typical pattern where I map to a list of jobs then apply to a result list that I then recombine and I don't have to do anything else to get a progress bar. If I had to switch to a different apply function it would be a totally trivial change.

e.g. https://github.com/DOI-USGS/hydroloom/blob/38ebe341ed9e25071adba2af0d6c4c3999caafd0/R/make_attribute_topology.R#L67

@keller-mark keller-mark merged commit efe90f7 into main Jun 21, 2024
2 of 4 checks passed
@keller-mark keller-mark mentioned this pull request Jun 22, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache control
2 participants