Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve targets when changing plan's parameters - is it possible? #935

Closed
3 tasks
nettoyoussef opened this issue Jul 10, 2019 · 7 comments
Closed
3 tasks

Comments

@nettoyoussef
Copy link

Prework

Question

Is it possible to preserve targets already built when changing the call of parameters using Drake?

A small example below. The matrix with ncol=5 is rebuilt when adding this parameter to the transform argument:

library(storr)
library(drake)

cache <- storr_rds(paste0(tempdir(), "/.drake"), compress = F)

# Without mapping the parameter ncol
plan <- drake_plan(
  
  temp = target(
                matrix(runif(10^3)+t, ncol = 5),
                transform = map(t = c(2,5))
        )
  )

# Configure plan
config<- drake_config(plan,
                      cache = cache
                      )

#Execute plan
make(config = config)

#Targets updated
vis_drake_graph(config)

#Mapping the parameter ncol
plan <- drake_plan(
  
  temp = target(
                matrix(runif(10^3)+t, ncol = ncol),
                transform = cross(t = c(2,5), 
                                  ncol = c(5, 2)
                                 )
        )
  )

#Configure new plan
config<- drake_config(plan,
                      cache = cache
                      )

#Previous built targets appear as outdated
vis_drake_graph(config)
@wlandau
Copy link
Member

wlandau commented Jul 10, 2019

This happens because the names of the dependencies change when you switch map() to cross().

library(drake)

plan <- drake_plan(
  temp = target(
    matrix(runif(10 ^ 3) + t, ncol = 5),
    transform = map(t = c(2, 5))
  )
)

config <- drake_config(plan)
make(plan)
#> target temp_2
#> target temp_5
outdated(config)
#> character(0)

plan <- drake_plan(
  temp = target(
    matrix(runif(10 ^ 3) + t, ncol = ncol),
    transform = cross(
      t = c(2, 5), 
      ncol = c(5, 2)
    )
  )
)

config <- drake_config(plan)
outdated(config)
#> [1] "temp_2_2" "temp_2_5" "temp_5_2" "temp_5_5"

Created on 2019-07-10 by the reprex package (v0.3.0)

If your original targets had started with "temp_5", they might have stayed up to date.

library(drake)

plan <- drake_plan(
  temp_5 = target(
    matrix(runif(10 ^ 3) + t, ncol = 5),
    transform = map(t = c(2, 5))
  )
)

config <- drake_config(plan)
make(plan)
#> target temp_5_2
#> target temp_5_5
outdated(config)
#> character(0)

plan <- drake_plan(
  temp = target(
    matrix(runif(10 ^ 3) + t, ncol = ncol),
    transform = cross(
      t = c(2, 5), 
      ncol = c(5, 2),
      .id = c(ncol, t) # Switch the order of ncol and t in the names.
    )
  )
)

config <- drake_config(plan)
outdated(config)
#> [1] "temp_2_2" "temp_2_5"

Created on 2019-07-10 by the reprex package (v0.3.0)

There is currently no way to rename a target in drake itself, but here is a workaround. One caveat: the renamed targets will be up to date, but not any downstream targets that depend on them.

library(drake)

plan <- drake_plan(
  temp = target(
    matrix(runif(10 ^ 3) + t, ncol = 5),
    transform = map(t = c(2, 5))
  )
)

config <- drake_config(plan)
make(plan)
#> target temp_2
#> target temp_5
outdated(config)
#> character(0)

plan <- drake_plan(
  temp = target(
    matrix(runif(10 ^ 3) + t, ncol = ncol),
    transform = cross(
      ncol = c(5, 2), # Let's switch the order of ncol and t.
      t = c(2, 5)
    )
  )
)

config <- drake_config(plan)

# ncol is the first index.
outdated(config)
#> [1] "temp_2_2" "temp_2_5" "temp_5_2" "temp_5_5"

# Let's rename the original targets.
for (ns in c("objects", "meta")) {
  config$cache$duplicate("temp_2", "temp_5_2", namespace = ns)
  config$cache$duplicate("temp_5", "temp_5_5", namespace = ns)
}

outdated(config)
#> [1] "temp_2_2" "temp_2_5"
make(plan)
#> target temp_2_2
#> target temp_2_5

Created on 2019-07-10 by the reprex package (v0.3.0)

@wlandau
Copy link
Member

wlandau commented Jul 10, 2019

It's not an ideal solution because of the issue with downstream targets, but I think it would be worth wrapping this up in a rename_targets() function.

@wlandau
Copy link
Member

wlandau commented Jul 10, 2019

duplicate_targets() could be a better name because both names become valid. (NB because of the way storr is organized, this does not actually duplicate the data.)

@wlandau
Copy link
Member

wlandau commented Jul 10, 2019

Hmm... but the next make() on the new target updates the new name and invalidates the old one. Going with rename_targets().

@nettoyoussef
Copy link
Author

nettoyoussef commented Jul 10, 2019

Hi Will,

Maybe instead of using the names drake would just compare the calls:

# Without mapping the parameter ncol
plan <- drake_plan(
  
  temp = target(
                matrix(runif(10^3)+t, ncol = 5),
                transform = map(t = c(2,5))
        )
  )

#Mapping the parameter ncol
plan2 <- drake_plan(
  
  temp = target(
                matrix(runif(10^3)+t, ncol = ncol),
                transform = cross(t = c(2,5), 
                                  ncol = c(5, 2)
                                 )
        )
  )

plan2$command %in% plan$command
TRUE  TRUE FALSE FALSE

The command column of the plan is identical, so in this case, it would work.
Better yet, if it separates the arguments to the call in a pairlist the argument order wouldn't matter.

Since the code in a reproducible framework tends to be deterministic, if the functions didn't change, the number of operations didn't change, and the calls didn't change, the results should be the same. In pseudo-code:

if( hash_function_up_to_target_not_changed = T) {
  
  if(sequence_of_calls_not_changed = T){
 
    if( call_function_not_changed = T){
    
       target <- "up_to_date"
   }
 }
}

So, this is maybe a way to prevent rebuilding targets just because the hyperparameters in the calls of cross or map changed.

There is currently no way to rename a target in drake itself, but here is a workaround. One caveat: the renamed targets will be up to date, but not any downstream targets that depend on them.

drake name scheming is great for short plans, but for large ones, the names for vis_drake_graph become unreadable. Is there a way to specifying another name scheme in drake_plan?

For example:

# Name is created in a agglomerative manner and 
# becomes very long for long plans
plan <- drake_plan(
  
  temp = target(
                matrix(runif(10^3)+t, ncol = ncol),
                transform = cross(t = c(2,5), 
                                  ncol = !!c(5:10)
                                 )
        )
  
  ,new_temp = target(
                    temp+t,
                transform = map(temp, 
                                t = !!c(2,5) 
                                 )
        )
    
    
  )

#Maybe instead of:
print("new_temp_2_temp_2_5L")

#Just some:
print("new_temp_1")

#For long plans? 

Maybe some numbering from oldest to newest, so targets didn't get renamed with changes in their parameter order?

Just some ideas I had thinking about this.

@nettoyoussef nettoyoussef changed the title Preserve parameter when changing plan - is it possible? Preserve targets when changing plan's parameters - is it possible? Jul 10, 2019
@wlandau
Copy link
Member

wlandau commented Jul 10, 2019

Maybe instead of using the names drake would just compare the calls...

As you point out later, commands are not the only things that can invalidate targets.

drake/R/exec-triggers.R

Lines 213 to 245 in 03c3d40

should_build_target <- function(target, meta, config) {
if (meta$imported) {
return(TRUE)
}
if (meta$missing) {
return(TRUE)
}
condition <- condition_trigger(target = target, meta = meta, config = config)
if (is.logical(condition)) {
return(condition)
}
if (identical(meta$trigger$command, TRUE)) {
if (command_trigger(target = target, meta = meta, config = config)) {
return(TRUE)
}
}
if (identical(meta$trigger$depend, TRUE)) {
if (depend_trigger(target = target, meta = meta, config = config)) {
return(TRUE)
}
}
if (identical(meta$trigger$file, TRUE)) {
if (file_trigger(target = target, meta = meta, config = config)) {
return(TRUE)
}
}
if (!is.null(meta$trigger$change)) {
if (change_trigger(target = target, meta = meta, config = config)) {
return(TRUE)
}
}
FALSE
}

For more information, see https://ropenscilabs.github.io/drake-manual/triggers.html.

I think it is important to decouple a target's name from all of these triggers, including the command, and I am not sure how else we could create an internal target name that is more stable than the human-readable one in the plan. (For hashes, we need the target's value, which may not exist yet.)

drake name scheming is great for short plans, but for large ones, the names for vis_drake_graph become unreadable. Is there a way to specifying another name scheme in drake_plan?

Yes. In transformations, you can use the .id argument to select which grouping variables become part of the names. Examples:

library(drake)

drake_plan(
  temp = target(
    matrix(runif(10 ^ 3) + offset, ncol = ncol),
    transform = cross(
      offset = c(2, 5), 
      ncol = !!seq_len(2)
    )
  ),
  new_temp = target(
    temp + offset,
    transform = map(
      temp, 
      offset = !!c(2, 5),
      .id = c(offset, ncol)
    )
  )
)
#> # A tibble: 8 x 2
#>   target        command                           
#>   <chr>         <expr>                            
#> 1 temp_2_1L     matrix(runif(10^3) + 2, ncol = 1L)
#> 2 temp_5_1L     matrix(runif(10^3) + 5, ncol = 1L)
#> 3 temp_2_2L     matrix(runif(10^3) + 2, ncol = 2L)
#> 4 temp_5_2L     matrix(runif(10^3) + 5, ncol = 2L)
#> 5 new_temp_2_1L temp_2_1L + 2                     
#> 6 new_temp_5_1L temp_5_1L + 5                     
#> 7 new_temp_2_2L temp_2_2L + 2                     
#> 8 new_temp_5_2L temp_5_2L + 5

drake_plan(
  temp = target(
    matrix(runif(10 ^ 3) + offset, ncol = ncol),
    transform = cross(
      offset = c(2, 5), 
      ncol = !!seq_len(2),
      .id = FALSE
    )
  ),
  new_temp = target(
    temp + offset,
    transform = map(
      temp, 
      offset = !!c(2, 5),
      .id = FALSE
    )
  )
)
#> # A tibble: 8 x 2
#>   target     command                           
#>   <chr>      <expr>                            
#> 1 temp       matrix(runif(10^3) + 2, ncol = 1L)
#> 2 temp_2     matrix(runif(10^3) + 5, ncol = 1L)
#> 3 temp_3     matrix(runif(10^3) + 2, ncol = 2L)
#> 4 temp_4     matrix(runif(10^3) + 5, ncol = 2L)
#> 5 new_temp   temp + 2                          
#> 6 new_temp_2 temp_2 + 5                        
#> 7 new_temp_3 temp_3 + 2                        
#> 8 new_temp_4 temp_4 + 5

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
settings <- expand.grid(offset = c(2, 5), ncol = seq_len(2)) %>%
  mutate(id = offset + ncol)

drake_plan(
  temp = target(
    matrix(runif(10 ^ 3) + offset, ncol = ncol),
    transform = map(.data = !!settings, .id = id)
  ),
  new_temp = target(
    temp + offset,
    transform = map(temp, .id = id)
  )
)
#> # A tibble: 8 x 2
#>   target     command                           
#>   <chr>      <expr>                            
#> 1 temp_3     matrix(runif(10^3) + 2, ncol = 1L)
#> 2 temp_6     matrix(runif(10^3) + 5, ncol = 1L)
#> 3 temp_4     matrix(runif(10^3) + 2, ncol = 2L)
#> 4 temp_7     matrix(runif(10^3) + 5, ncol = 2L)
#> 5 new_temp_3 temp_3 + 2                        
#> 6 new_temp_6 temp_6 + 5                        
#> 7 new_temp_4 temp_4 + 2                        
#> 8 new_temp_7 temp_7 + 5

Created on 2019-07-10 by the reprex package (v0.3.0)

@wlandau
Copy link
Member

wlandau commented Jul 17, 2019

Random number generator seeds are problematic when it comes to renaming targets. Sorry, but now that I know this, I strongly discourage renaming targets. In fact, I plan to remove the rename_targets() function, re #946.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants