-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelise styling #277
Comments
Reference: #370. |
Could be on by default, works for me except for the output of incomplete lines (to indicate progress). I think we can solve this by printing complete lines after completion in the multicore case. We need to evaluate if it's worth to enable multiprocess on Windows. Check if |
Can you give me a heads-up on the current status here? I was planning to create a macro for tic (running on Travis CI for every build) that uses multicore support. I guess no one knows when purrr will have a native integration so using furrr seems to be a good alternative right now? |
Yes, progress bars are an issue. We could first also implement a verbose argument as suggested in #375 and turn verbose off for multicore support.
Why does the operating matter here and would you prefer Using furrr sounds like a good plan to me. The only thing I am not sure about is how to choose the strategy. Because in the help file, it says:
So not calling |
I did some more investigation on this.
get_wd_from_temp_dir <- function() {
withr::with_dir(tempdir(), {
future.apply::future_sapply(1:2, function(x) {
print(getwd())
})
})
}
future::plan(future::sequential)
get_wd_from_temp_dir() # works
#> [1] "/private/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T/Rtmpoj656A"
#> [1] "/private/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T/Rtmpoj656A"
#> [1] "/private/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T/Rtmpoj656A"
#> [2] "/private/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T/Rtmpoj656A"
future::plan(future::multisession)
getwd()
#> [1] "/private/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T/RtmpdmE1EH/reprex34736a6dec94"
get_wd_from_temp_dir() # does not work
#> [1] "/private/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T/RtmpdmE1EH/reprex34736a6dec94"
#> [1] "/private/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T/RtmpdmE1EH/reprex34736a6dec94"
#> [1] "/private/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T/RtmpdmE1EH/reprex34736a6dec94"
#> [2] "/private/var/folders/8l/fhmv3yj12kncddcjqwc19tkr0000gr/T/RtmpdmE1EH/reprex34736a6dec94" Created on 2019-08-03 by the reprex package (v0.3.0) |
We could probably work around that by converting paths to absolute paths at some point and not use |
I was going to say have you considered If you can define a working dir variable in package scope it should get passed to a child process in which it is referenced by default with some caveats it can also be done explicitly see tmpDir <- tempdir()
get_wd_from_temp_dir <- function() {
future.apply::future_sapply(1:2, function(x) {
withr::with_dir(tmpDir, {
print(getwd())
})
})
}
future::plan(future::sequential)
get_wd_from_temp_dir()
#> [1] "/tmp/RtmpfOVKI1"
#> [1] "/tmp/RtmpfOVKI1"
#> [1] "/tmp/RtmpfOVKI1" "/tmp/RtmpfOVKI1"
future::plan(future::multiprocess)
#> Warning: [ONE-TIME WARNING] Forked processing ('multicore') is disabled
#> in future (>= 1.13.0) when running R from RStudio, because it is
#> considered unstable. Because of this, plan("multicore") will fall
#> back to plan("sequential"), and plan("multiprocess") will fall back to
#> plan("multisession") - not plan("multicore") as in the past. For more
#> details, how to control forked processing or not, and how to silence this
#> warning in future R sessions, see ?future::supportsMulticore
get_wd_from_temp_dir()
#> [1] "/tmp/RtmpfOVKI1"
#> [1] "/tmp/RtmpfOVKI1"
#> [1] "/tmp/RtmpfOVKI1" "/tmp/RtmpfOVKI1" Created on 2019-08-25 by the reprex package (v0.3.0) |
I opened an issue: https://github.com/HenrikBengtsson/future.apply/issues/50 |
Regarding parallel backends: Consider the callr backend. It solves many issues which multicore/multisession face. |
As @RichardJActon suggests in #277 (comment), you could do: withr::with_dir(tempdir(), {
pwd <- getwd()
future.apply::future_sapply(1:2, function(x) {
setwd(pwd)
print(getwd())
})
}) This will work with all future backends that run on the localhost (e.g. You can robustify this and provide more informative error messages by using: get_wd <- function() {
structure(getwd(), hostname = Sys.info()[["nodename"]])
}
set_wd <- function(pwd) {
target_hostname <- attr(pwd, "hostname")
if (is.null(target_hostname)) {
stop("Cannot change directory. Argument 'pwd' lacks attribute 'hostname'.")
}
hostname <- Sys.info()[["nodename"]]
if (!identical(target_hostname, hostname)) {
stop(sprintf("Cannot change directory on %s. Target directory %s is on another machine (%s).", sQuote(hostname), sQuote(pwd), target_hostname))
}
if (!utils::file_test("-d", pwd)) {
stop(sprintf("No such directory on %s: %s", sQuote(hostname), sQuote(pwd)))
}
setwd(pwd)
} and then withr::with_dir(tempdir(), {
pwd <- get_wd()
future.apply::future_sapply(1:2, function(x) {
set_wd(pwd)
print(getwd())
})
}) e.g.
For a full explanation, see futureverse/future#363 (comment). |
Thanks for taking time fur such a detailed answer @HenrikBengtsson. As of now,
To make styling work with remote workers, I think we had to change this to:
I think I don't like the fact that we can write files only at the very end of the process (please correct me if this assumption is wrong). I think it's nice when you style many files and you see progress (on the console and with
This essentially would boil down to what @HenrikBengtsson suggested in #277 (comment). Just one edge case: What if a remote machine has the same node name and the directory to change to exists, will it fail or not? Because I think it should fail. Otherwise, it will have side effects on the remote machine (i.e. styling the files there instead of the localhost I think). |
No. The reason is that such features need to come in at the design level (as I mention in the corresponding future issue). That is major work to solve. I don't want to open up for half-baked thing before because then we're ending up with too many hacks and a dead end in the long run.
That's completely different and definitely not meant for others to use.
You'd need to add even more protections, e.g. write a unique file on master and verify that the worker sees it. |
I just tried a quick implementation of this in my fork https://github.com/RichardJActon/styler passing the directory to the workers and changing this line: https://github.com/RichardJActon/styler/blob/4a28e70617aaafbe9713c465983e562d9da8ee5f/R/transform-files.R#L23 to the furrr version which seems to work fine. The progress only shows up in chunks as the workers return though. |
Nice. Interested in a PR? Yeah I think that's not so desirable. Any idea how to fix it? Also, why |
Just my 2 cents while watching: {furrr} hasn't seen a commit in 2 years and I'd rely on {future.apply}, especially for pkg use. |
Using
|
Some new updates to {progressr} that are relevant here too: https://twitter.com/henrikbengtsson/status/1260336782421323777 |
@krlmlr any progress on that with the {callr} approach? |
Not yet. |
Functions like
style_pkg()
could be parallelised well, at least in principle - and speed up styling much.For pre-commit hook, we should probably disable this.
Edit: a related issue is caching of results to the end of improving speed. See #320.
The text was updated successfully, but these errors were encountered: