-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic branching and file_out() directories #1141
Comments
Oh interesting, that seems risky then. I'll take your suggestion in #1140 and output to a single directory instead of multiple, but this could still result in problems. How best to get around this do you think if you files need to be written? Aggregate first and then create a function which writes by parsing over the aggregated target? Would this solve it? |
Maybe something like this?
|
I realize that won't work... as the output of |
#1141 (comment) is on the right track. Those dynamic targets should do the expensive work behind the files but not write the files themselves. Then a downstream target writes all the files. |
I can't quite get it to work, but if you have suggestions, let me know!
|
The plan <- drake_plan(
settings = c(1, 2, 3, 4),
datasets = target(
do_expensive_computation(settings),
dynamic = map(settings)
),
external_files = target(
write_all_datasets_quickly(datasets, settings)
)
) Or if everything is fast, maybe just plan <- drake_plan(
files = c(1, 2, 3, 4),
external_files = target(
write_all_files_quickly(files)
)
) |
Thanks for the suggestion, in your example I guess you would just add a I'm just having a hard time with the
|
Yes, I agree.
What version of |
I just updated, here's the results:
|
I'll just add one more bit of code that could help:
|
I think that is the right approach. I am getting different results than you, but I think it is the right way to go. library(drake)
dir.create("all_figures")
make_lines <- function() {
output <- c("lines")
}
write_file <- function(content, file_names, dir) {
for(i in seq_len(length(file_names))) {
file_out <- paste0(dir, "/", file_names[i], ".txt")
writeLines(content[[i]], file_out)
}
}
plan <- drake_plan(
file_names = c(1L, 2L, 3L, 4L),
lines = target(
make_lines(), dynamic = map(file_names)
),
write_lines = write_file(lines, file_names, dir = file_out("all_figures"))
)
make(plan)
#> target file_names
#> dynamic lines
#> subtarget lines_0b3474bd
#> subtarget lines_b2a5c9b8
#> subtarget lines_71f311ad
#> subtarget lines_98cf3c11
#> aggregate lines
#> target write_lines
readLines("all_figures/1.txt")
#> [1] "lines"
make(plan)
#> All targets are already up to date.
write_file(readd(lines), readd(file_names), "all_figures")
readLines("all_figures/1.txt")
#> [1] "lines"
make(plan)
#> All targets are already up to date. Created on 2020-01-21 by the reprex package (v0.3.0) |
Strange - but I think I found something that could help. I just tested your example on a linux server and I got the same output as you. Previously I was just testing this simple case on my Mac in Rstudio. I don't run anything on my Mac so it's I consider this issue solved for me, are you aware of performance differences between the two platforms? If it's useful, On my mac I am running |
I have noticed the occasional odd performance difference. Oddly enough, proffer::pprof({
make_lines <- function() {...}
plan <- drake_plan(...)
make(plan)
}) |
Thanks for the tip. Coincidentally, I just happened on another case where functionality is different on Mac vs Linux:
|
Would you open new issues for follow-up questions like this? I feel like we went off topic a few times. Do you have library(drake)
library(tibble)
create_list <- function(i, a,b,c){
return(list(a=a,b=b,c=c))
}
plan <- drake_plan(
run_list = tibble(
i = 1:10,
),
output = target(
create_list(run_list$i, "a", "b", "c"), dynamic = map(run_list)
)
)
make(plan)
#> In drake, consider r_make() instead of make(). r_make() runs make() in a fresh R session for enhanced robustness and reproducibility.
#> target run_list
#> dynamic output
#> subtarget output_982d7b12
#> subtarget output_2c5f0700
#> subtarget output_623f5045
#> subtarget output_4dd68315
#> subtarget output_1ebcff01
#> subtarget output_346c3e69
#> subtarget output_f1d97afc
#> subtarget output_d480acf2
#> subtarget output_94395b4c
#> subtarget output_da98694d
#> aggregate output
str(readd(output))
#> List of 30
#> $ a: chr "a"
#> $ b: chr "b"
#> $ c: chr "c"
#> $ a: chr "a"
#> $ b: chr "b"
#> $ c: chr "c"
#> $ a: chr "a"
#> $ b: chr "b"
#> $ c: chr "c"
#> $ a: chr "a"
#> $ b: chr "b"
#> $ c: chr "c"
#> $ a: chr "a"
#> $ b: chr "b"
#> $ c: chr "c"
#> $ a: chr "a"
#> $ b: chr "b"
#> $ c: chr "c"
#> $ a: chr "a"
#> $ b: chr "b"
#> $ c: chr "c"
#> $ a: chr "a"
#> $ b: chr "b"
#> $ c: chr "c"
#> $ a: chr "a"
#> $ b: chr "b"
#> $ c: chr "c"
#> $ a: chr "a"
#> $ b: chr "b"
#> $ c: chr "c" Created on 2020-01-21 by the reprex package (v0.3.0) Because dynamic targets are library(drake)
library(tibble)
create_list <- function(i, a,b,c){
return(list(a=a,b=b,c=c))
}
plan <- drake_plan(
run_list = tibble(
i = 1:10,
),
output = target(
create_list(run_list$i, "a", "b", "c"), dynamic = map(run_list)
)
)
make(plan)
#> target run_list
#> dynamic output
#> subtarget output_982d7b12
#> subtarget output_2c5f0700
#> subtarget output_623f5045
#> subtarget output_4dd68315
#> subtarget output_1ebcff01
#> subtarget output_346c3e69
#> subtarget output_f1d97afc
#> subtarget output_d480acf2
#> subtarget output_94395b4c
#> subtarget output_da98694d
#> aggregate output
str(readd(output, subtarget_list = TRUE))
#> List of 10
#> $ output_982d7b12:List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ output_2c5f0700:List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ output_623f5045:List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ output_4dd68315:List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ output_1ebcff01:List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ output_346c3e69:List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ output_f1d97afc:List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ output_d480acf2:List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ output_94395b4c:List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ output_da98694d:List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c" Created on 2020-01-21 by the reprex package (v0.3.0) Either that or wrap each sub-target's value in another list. library(drake)
library(tibble)
create_list <- function(i, a,b,c){
return(list(list(a=a,b=b,c=c)))
}
plan <- drake_plan(
run_list = tibble(
i = 1:10,
),
output = target(
create_list(run_list$i, "a", "b", "c"), dynamic = map(run_list)
)
)
make(plan)
#> target run_list
#> dynamic output
#> subtarget output_982d7b12
#> subtarget output_2c5f0700
#> subtarget output_623f5045
#> subtarget output_4dd68315
#> subtarget output_1ebcff01
#> subtarget output_346c3e69
#> subtarget output_f1d97afc
#> subtarget output_d480acf2
#> subtarget output_94395b4c
#> subtarget output_da98694d
#> aggregate output
str(readd(output))
#> List of 10
#> $ :List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ :List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ :List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ :List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ :List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ :List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ :List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ :List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ :List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c"
#> $ :List of 3
#> ..$ a: chr "a"
#> ..$ b: chr "b"
#> ..$ c: chr "c" Created on 2020-01-21 by the reprex package (v0.3.0) |
Yes I will open a new issue - sorry! |
Update: #1178 can combine dynamic branching with dynamic files. |
Prework
drake
's code of conduct.remotes::install_github("ropensci/drake")
) and mention the SHA-1 hash of the Git commit you install.EDIT
Best solution yet: #1178
Description
When a
file_out()
directory is used in combination with dynamic branching, it is possible to fooldrake
into accepting an old set of files.file_out()
is unique among all the triggers because it is part of the output of a target, not the input. That means if any time a target gets built, all itsfile-out()
files are automatically declared up to date. I believe this is what causes the trouble.There may be clever ways to work with hashes to invalidate sub-targets more readily. However, it is tricky because, again, a
file_out()
is an end product, and dynamic branching requires we compute sub-target hashes before building the sub-targets (the hash is part of the name).I propose we throw an error if there are
file_out()
files for dynamic targets.Related: #1140
Reproducible example
Provide a minimal reproducible example with code and output that demonstrates the problem. The
reprex()
function from thereprex
package is extremely helpful for this.To help us read your code, please try to follow the tidyverse style guide. The
style_text()
andstyle_file()
functions from thestyler
package make it easier.Expected result
Created on 2020-01-20 by the reprex package (v0.3.0)
Expected result
readLines("all_figures/3")
should return"lines2"
. Better to avoidfile_out()
+ dynamic branching entirely.The text was updated successfully, but these errors were encountered: