-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gather without loading all dependencies at the same time #325
Comments
Thanks for bringing this up, @bart1. It can be tricky to set up workflows when you cannot load everything into memory. In your case, I would recommend that you not gather these massive simulation objects. Even if library(drake)
library(magrittr)
sim_fun <- function(rep, ...) {
data.frame(x = rnorm(25), y = rnorm(25))
}
save_plot <- function(data, file) {
pdf(file)
plot(y ~ x, data = data)
dev.off()
}
plan <- drake_plan(sim = sim_fun("REP"), save_plot(sim_REP, file_out("sim_REP.pdf")),
strings_in_dots = "literals") %>% evaluate_plan(wildcard = "REP", values = paste0("rep",
1:5)) %>% print
#> # A tibble: 10 x 2
#> target command
#> <chr> <chr>
#> 1 sim_rep1 "sim_fun(\"rep1\")"
#> 2 sim_rep2 "sim_fun(\"rep2\")"
#> 3 sim_rep3 "sim_fun(\"rep3\")"
#> 4 sim_rep4 "sim_fun(\"rep4\")"
#> 5 sim_rep5 "sim_fun(\"rep5\")"
#> 6 "\"sim_rep1.pdf\"" "save_plot(sim_rep1, file_out(\"sim_rep1.pdf\"))"
#> 7 "\"sim_rep2.pdf\"" "save_plot(sim_rep2, file_out(\"sim_rep2.pdf\"))"
#> 8 "\"sim_rep3.pdf\"" "save_plot(sim_rep3, file_out(\"sim_rep3.pdf\"))"
#> 9 "\"sim_rep4.pdf\"" "save_plot(sim_rep4, file_out(\"sim_rep4.pdf\"))"
#> 10 "\"sim_rep5.pdf\"" "save_plot(sim_rep5, file_out(\"sim_rep5.pdf\"))"
vis_drake_graph(drake_config(plan)) make(plan)
#> target sim_rep1
#> target sim_rep2
#> target sim_rep3
#> target sim_rep4
#> target sim_rep5
#> target file "sim_rep1.pdf"
#> target file "sim_rep2.pdf"
#> target file "sim_rep3.pdf"
#> target file "sim_rep4.pdf"
#> target file "sim_rep5.pdf" |
Just as a small addition, with the require(tabulizer)
gatherCmd<-data.frame(target="'comb.pdf'",command=paste('merge_pdfs(c(',paste('file_in("sim_rep',1:5,'.pdf")',sep='', collapse=','),"), file_out('comb.pdf'))"))
plan<-rbind(plan, gatherCmd) |
Nice! Have you heard of |
That looks like a nice package, have used some alternatives to that before just a quick example: library(drake)
library(magrittr)
library(ggplot2)
require(patchwork)
sim_fun <- function(rep, ...) {
data.frame(x = rnorm(25), y = rnorm(25))
}
sims <- drake_plan(sim = sim_fun("REP"), strings_in_dots = "literals") %>%
evaluate_plan(wildcard = "REP", values = paste0("rep",
1:5)) %>% print
plots<-drake_plan(plt=ggplot(data=dataset__, aes(x=x,y=y))+geom_point()) %>% plan_analyses(sims)
comb<-data.frame(target="'plot.pdf'",
command=paste('ggsave(',paste(plots$target, collapse=' + '),", file=file_out('plot.pdf'))"))
plan<-rbind(sims, plots,comb)
vis_drake_graph(drake_config(plan))
make(plan) Would there be an alternative to generating the command with paste? or is that for the time being the most efficient? |
Maybe a new function like plots
## # A tibble: 5 x 2
## target command
## <chr> <chr>
## 1 plt_sim_rep1 ggplot(data = sim_rep1, aes(x = x, y = y)) + geom_point()
## 2 plt_sim_rep2 ggplot(data = sim_rep2, aes(x = x, y = y)) + geom_point()
## 3 plt_sim_rep3 ggplot(data = sim_rep3, aes(x = x, y = y)) + geom_point()
## 4 plt_sim_rep4 ggplot(data = sim_rep4, aes(x = x, y = y)) + geom_point()
## 5 plt_sim_rep5 ggplot(data = sim_rep5, aes(x = x, y = y)) + geom_point()
reduce_plan(plots, op = "+", target = "reduced_target")
## # A tibble: 5 x 2
## target command
## <chr> <chr>
## reduced_target plt_sim_rep1 + plt_sim_rep2 + plt_sim_rep3 + plt_sim_rep4 + plt_sim_rep5 I would definitely welcome a pull request with the implementation. |
Update: development drake now has a much friendlier (experimental) API. It is now easier to gather by specific groups. Details: https://ropenscilabs.github.io/drake-manual/plans.html#create-large-plans-the-easy-way. |
Edit: the link changed to https://ropenscilabs.github.io/drake-manual/plans.html#large-plans. |
I'm encountering the following issue I run simulations using drake. A the end of each simulation I get quite a big R6 object that I use drake to store. Afterwards I want to generate a pdf with exploratory plots for each simulation in a single pdf. Currently I gather all simulations in a list with gather and then plot from this list. This has the problem that it is not very scalable because of memory limitations with loading all simulation at the same time. I also makes me store all simulations twice in the cache, once in the list, and once individually.
This is some example code:
I guess the same problem would occur when one fits a lot of model (e.g. gsp example) and want to use the default plot function on each model.
I wondered if I'm missing something or if this is not possible. An alternative approach would be to first reduce all the simulations (for example make plots using ggplot per simulation or extracting the necessary data) gather these and then plot them. This is not always a very easy option when preexisting plot functions are use.
I wonder if it would be useful/possible to have some kind of recursive version of gather that loads caches one by one
The text was updated successfully, but these errors were encountered: