-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve targets when changing plan's parameters - is it possible? #935
Comments
This happens because the names of the dependencies change when you switch library(drake)
plan <- drake_plan(
temp = target(
matrix(runif(10 ^ 3) + t, ncol = 5),
transform = map(t = c(2, 5))
)
)
config <- drake_config(plan)
make(plan)
#> target temp_2
#> target temp_5
outdated(config)
#> character(0)
plan <- drake_plan(
temp = target(
matrix(runif(10 ^ 3) + t, ncol = ncol),
transform = cross(
t = c(2, 5),
ncol = c(5, 2)
)
)
)
config <- drake_config(plan)
outdated(config)
#> [1] "temp_2_2" "temp_2_5" "temp_5_2" "temp_5_5" Created on 2019-07-10 by the reprex package (v0.3.0) If your original targets had started with "temp_5", they might have stayed up to date. library(drake)
plan <- drake_plan(
temp_5 = target(
matrix(runif(10 ^ 3) + t, ncol = 5),
transform = map(t = c(2, 5))
)
)
config <- drake_config(plan)
make(plan)
#> target temp_5_2
#> target temp_5_5
outdated(config)
#> character(0)
plan <- drake_plan(
temp = target(
matrix(runif(10 ^ 3) + t, ncol = ncol),
transform = cross(
t = c(2, 5),
ncol = c(5, 2),
.id = c(ncol, t) # Switch the order of ncol and t in the names.
)
)
)
config <- drake_config(plan)
outdated(config)
#> [1] "temp_2_2" "temp_2_5" Created on 2019-07-10 by the reprex package (v0.3.0) There is currently no way to rename a target in library(drake)
plan <- drake_plan(
temp = target(
matrix(runif(10 ^ 3) + t, ncol = 5),
transform = map(t = c(2, 5))
)
)
config <- drake_config(plan)
make(plan)
#> target temp_2
#> target temp_5
outdated(config)
#> character(0)
plan <- drake_plan(
temp = target(
matrix(runif(10 ^ 3) + t, ncol = ncol),
transform = cross(
ncol = c(5, 2), # Let's switch the order of ncol and t.
t = c(2, 5)
)
)
)
config <- drake_config(plan)
# ncol is the first index.
outdated(config)
#> [1] "temp_2_2" "temp_2_5" "temp_5_2" "temp_5_5"
# Let's rename the original targets.
for (ns in c("objects", "meta")) {
config$cache$duplicate("temp_2", "temp_5_2", namespace = ns)
config$cache$duplicate("temp_5", "temp_5_5", namespace = ns)
}
outdated(config)
#> [1] "temp_2_2" "temp_2_5"
make(plan)
#> target temp_2_2
#> target temp_2_5 Created on 2019-07-10 by the reprex package (v0.3.0) |
It's not an ideal solution because of the issue with downstream targets, but I think it would be worth wrapping this up in a |
|
Hmm... but the next |
Hi Will, Maybe instead of using the names # Without mapping the parameter ncol
plan <- drake_plan(
temp = target(
matrix(runif(10^3)+t, ncol = 5),
transform = map(t = c(2,5))
)
)
#Mapping the parameter ncol
plan2 <- drake_plan(
temp = target(
matrix(runif(10^3)+t, ncol = ncol),
transform = cross(t = c(2,5),
ncol = c(5, 2)
)
)
)
plan2$command %in% plan$command
TRUE TRUE FALSE FALSE The Since the code in a reproducible framework tends to be deterministic, if the functions didn't change, the number of operations didn't change, and the calls didn't change, the results should be the same. In pseudo-code: if( hash_function_up_to_target_not_changed = T) {
if(sequence_of_calls_not_changed = T){
if( call_function_not_changed = T){
target <- "up_to_date"
}
}
} So, this is maybe a way to prevent rebuilding targets just because the hyperparameters in the calls of cross or map changed.
For example: # Name is created in a agglomerative manner and
# becomes very long for long plans
plan <- drake_plan(
temp = target(
matrix(runif(10^3)+t, ncol = ncol),
transform = cross(t = c(2,5),
ncol = !!c(5:10)
)
)
,new_temp = target(
temp+t,
transform = map(temp,
t = !!c(2,5)
)
)
)
#Maybe instead of:
print("new_temp_2_temp_2_5L")
#Just some:
print("new_temp_1")
#For long plans? Maybe some numbering from oldest to newest, so targets didn't get renamed with changes in their parameter order? Just some ideas I had thinking about this. |
As you point out later, commands are not the only things that can invalidate targets. Lines 213 to 245 in 03c3d40
For more information, see https://ropenscilabs.github.io/drake-manual/triggers.html. I think it is important to decouple a target's name from all of these triggers, including the command, and I am not sure how else we could create an internal target name that is more stable than the human-readable one in the plan. (For hashes, we need the target's value, which may not exist yet.)
Yes. In transformations, you can use the library(drake)
drake_plan(
temp = target(
matrix(runif(10 ^ 3) + offset, ncol = ncol),
transform = cross(
offset = c(2, 5),
ncol = !!seq_len(2)
)
),
new_temp = target(
temp + offset,
transform = map(
temp,
offset = !!c(2, 5),
.id = c(offset, ncol)
)
)
)
#> # A tibble: 8 x 2
#> target command
#> <chr> <expr>
#> 1 temp_2_1L matrix(runif(10^3) + 2, ncol = 1L)
#> 2 temp_5_1L matrix(runif(10^3) + 5, ncol = 1L)
#> 3 temp_2_2L matrix(runif(10^3) + 2, ncol = 2L)
#> 4 temp_5_2L matrix(runif(10^3) + 5, ncol = 2L)
#> 5 new_temp_2_1L temp_2_1L + 2
#> 6 new_temp_5_1L temp_5_1L + 5
#> 7 new_temp_2_2L temp_2_2L + 2
#> 8 new_temp_5_2L temp_5_2L + 5
drake_plan(
temp = target(
matrix(runif(10 ^ 3) + offset, ncol = ncol),
transform = cross(
offset = c(2, 5),
ncol = !!seq_len(2),
.id = FALSE
)
),
new_temp = target(
temp + offset,
transform = map(
temp,
offset = !!c(2, 5),
.id = FALSE
)
)
)
#> # A tibble: 8 x 2
#> target command
#> <chr> <expr>
#> 1 temp matrix(runif(10^3) + 2, ncol = 1L)
#> 2 temp_2 matrix(runif(10^3) + 5, ncol = 1L)
#> 3 temp_3 matrix(runif(10^3) + 2, ncol = 2L)
#> 4 temp_4 matrix(runif(10^3) + 5, ncol = 2L)
#> 5 new_temp temp + 2
#> 6 new_temp_2 temp_2 + 5
#> 7 new_temp_3 temp_3 + 2
#> 8 new_temp_4 temp_4 + 5
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
settings <- expand.grid(offset = c(2, 5), ncol = seq_len(2)) %>%
mutate(id = offset + ncol)
drake_plan(
temp = target(
matrix(runif(10 ^ 3) + offset, ncol = ncol),
transform = map(.data = !!settings, .id = id)
),
new_temp = target(
temp + offset,
transform = map(temp, .id = id)
)
)
#> # A tibble: 8 x 2
#> target command
#> <chr> <expr>
#> 1 temp_3 matrix(runif(10^3) + 2, ncol = 1L)
#> 2 temp_6 matrix(runif(10^3) + 5, ncol = 1L)
#> 3 temp_4 matrix(runif(10^3) + 2, ncol = 2L)
#> 4 temp_7 matrix(runif(10^3) + 5, ncol = 2L)
#> 5 new_temp_3 temp_3 + 2
#> 6 new_temp_6 temp_6 + 5
#> 7 new_temp_4 temp_4 + 2
#> 8 new_temp_7 temp_7 + 5 Created on 2019-07-10 by the reprex package (v0.3.0) |
Random number generator seeds are problematic when it comes to renaming targets. Sorry, but now that I know this, I strongly discourage renaming targets. In fact, I plan to remove the |
Prework
drake
's code of conduct.drake-r-package
tag.Question
Is it possible to preserve targets already built when changing the call of parameters using Drake?
A small example below. The matrix with
ncol=5
is rebuilt when adding this parameter to thetransform
argument:The text was updated successfully, but these errors were encountered: