New function `reduce_plan()` #326

wlandau · 2018-03-15T19:00:06Z

From #325 (comment). Thanks to @bart1 for the idea.

bart1 · 2018-03-15T19:51:11Z

I like the idea, I'm was wondering if reduce would be the right name for such a function since reduce for me implies gathering iteratively, as I guess I was originally looking for in #325 . I guess this expectation is based also on the base R Reduce where ?base::Reduce states

 ‘Reduce’ uses a binary function to successively combine the elements of a given vector and a possibly given initial value.

Would another synonym of gather not be a better name e.g aggregate or collect

wlandau-lilly · 2018-03-15T20:00:48Z

For reduce_plan(), I am actually thinking of a couple different user-side options. One is to combine everything in one command.

x_plan
## # A tibble: 8 x 2
##   target command
##   <chr>  <chr>  
## 1 x_1    1      
## 2 x_2    2      
## 3 x_3    3      
## 4 x_4    4      
## 5 x_5    5      
## 6 x_6    6      
## 7 x_7    7      
## 8 x_8    8

reduce_plan(datasets, target = "x_sum")
## # A tibble: 1 x 2
##   target command                                      
##   <chr>  <chr>                                        
## 1 x_sum  x_1 + x_2 + x_3 + x_4 + x_5 + x_6 + x_7 + x_8

Another is to do a pairwise reduction, which could be parallelized with the jobs argument to make(). I think we should consider similar functionality for #233 (cc @krlmlr).

reduce_plan(datasets, target = "x_sum", pairwise = TRUE)
## # A tibble: 1 x 2
##   target command                                      
##   <chr>  <chr>                                        
## 1 x_sum_1  x_1 + x_2
## 2 x_sum_2  x_3 + x_4
## 3 x_sum_3  x_5 + x_6
## 4 x_sum_4  x_7 + x_8
## 5 x_sum_5  x_sum_1 + x_sum_2
## 6 x_sum_6  x_sum_3 + x_sum_4
## 7 x_sum    x_sum_5 + x_sum_6

I am not sure a pairwise gather_plan() is appropriate all the time. It could be useful for gather_plan(gather = "c"), but gather_plan(gather = "list") would turn a (nearly) binary tree instead of a flat list.

wlandau-lilly · 2018-03-15T21:04:29Z

Update: I implemented a new reduce_plan() function in the new i326 branch. It should allow you to do naive and pairwise reductions with binary operators and arbitrary functions that take at least two arguments. It seems to work for both even and odd numbers of targets, but I will need to add some tests before I do a PR and merge it. reduce_plan() is convenient and small enough that I think we can roll it into the CRAN release of 5.1.0 next week.

I really like this feature. It generalizes gather_plan() and helps you avoid memory issues and super long commands.

library(drake)
x_plan <- evaluate_plan(drake_plan(x = VALUE), wildcard = "VALUE", values = 1:9)
x_plan
#> # A tibble: 9 x 2
#>   target command
#>   <chr>  <chr>  
#> 1 x_1    1      
#> 2 x_2    2      
#> 3 x_3    3      
#> 4 x_4    4      
#> 5 x_5    5      
#> 6 x_6    6      
#> 7 x_7    7      
#> 8 x_8    8      
#> 9 x_9    9
reduce_plan(x_plan, target = "x_sum", begin = "", end = "")
#> # A tibble: 1 x 2
#>   target command                                                          
#>   <chr>  <chr>                                                            
#> 1 x_sum  x_1  +  x_2  +  x_3  +  x_4  +  x_5  +  x_6  +  x_7  +  x_8  +  …
reduce_plan(x_plan, target = "x_sum", pairwise = TRUE)
#> # A tibble: 8 x 2
#>   target  command            
#>   <chr>   <chr>              
#> 1 x_sum_1 (x_1 + x_2)        
#> 2 x_sum_2 (x_3 + x_4)        
#> 3 x_sum_3 (x_5 + x_6)        
#> 4 x_sum_4 (x_7 + x_8)        
#> 5 x_sum_5 (x_9 + x_sum_1)    
#> 6 x_sum_6 (x_sum_2 + x_sum_3)
#> 7 x_sum_7 (x_sum_4 + x_sum_5)
#> 8 x_sum   (x_sum_6 + x_sum_7)
reduce_plan(
  x_plan,
  target = "x_sum",
  pairwise = TRUE,
  begin = "fun(", op = ", ", 
  end = ")"
)
#> # A tibble: 8 x 2
#>   target  command              
#>   <chr>   <chr>                
#> 1 x_sum_1 fun(x_1, x_2)        
#> 2 x_sum_2 fun(x_3, x_4)        
#> 3 x_sum_3 fun(x_5, x_6)        
#> 4 x_sum_4 fun(x_7, x_8)        
#> 5 x_sum_5 fun(x_9, x_sum_1)    
#> 6 x_sum_6 fun(x_sum_2, x_sum_3)
#> 7 x_sum_7 fun(x_sum_4, x_sum_5)
#> 8 x_sum   fun(x_sum_6, x_sum_7)

krlmlr · 2018-03-15T22:15:52Z

How is this different from pack() in #304?

bart1 · 2018-03-15T22:53:57Z

That looks really nice, a quick example shows it also works well with fore xample combining ggplots:

require(drake)
require(patchwork)
require(ggplot2)
require(magrittr)
plots<-drake_plan(plot=ggplot(data=data.frame(x=rnorm(10), y=rnorm(10)))+geom_point(aes(x=x,y=y))) %>% 
  expand_plan(c("rep1", "rep2", "rep3", "rep4"))
plotsGathered<-reduce_plan(plots, target = "fullPlot", begin='',end='')
plan<-rbind(plots, plotsGathered, 
            drake_plan(ggsave(filename=file_out('test.pdf'), fullPlot))
            )
drake_graph(drake_config(plan))
make(plan)

One thing i wonder about is that currently it uses start and end a lot (for each pair):

> x_plan <- evaluate_plan(drake_plan(x = VALUE), wildcard = "VALUE", values = 1:9)
> reduce_plan(x_plan, target = "x_sum")
# A tibble: 1 x 2
  target command                                                            
  <chr>  <chr>                                                              
1 x_sum  ((((((((x_1 + x_2) + x_3) + x_4) + x_5) + x_6) + x_7) + x_8) + x_9)

I was thinking it might be better to only use it once when pairwise = FALSE since operators anyway work pairwise and it is not going to give any significant memory reduction since x_1 till x_9 are anyway going to be loaded at the same time before the command is run if I'm right (and thus should one use pairwise=TRUE)?

something like this hypothetical example:

> reduce_plan(x_plan, target = "x_sum", start='ggsave(', end=',filename=file_out("test.pdf")')
# A tibble: 1 x 2
  target command                                                            
  <chr>  <chr>                                                              
1 "\"test.pdf\""  ggsave(x_1 + x_2+ x_3+ x_4 + x_5 + x_6 + x_7 + x_8 + x_9, filename=file_out(\"test.pdf\"))

On the otherhand the current version keeps it more consistent between pairwise being true or false. I guess I dont know I see advantages and disadvantages of either

wlandau · 2018-03-16T01:35:29Z

It's a good point, but I think something like start and end are important for each pair when pairwise = TRUE (now the default) so people can more easily define their own reductions that conserve memory. The version of start and end you mentioned is much easier to do manually post hoc, and I hesitate to encumber the interface.

wlandau added type: new feature difficulty: beginner topic: api labels Mar 15, 2018

wlandau-lilly added this to the CRAN release 5.1.0 milestone Mar 15, 2018

wlandau-lilly assigned wlandau-lilly and wlandau and unassigned wlandau-lilly Mar 15, 2018

This was referenced Mar 16, 2018

Offload wildcard templating to the wildcard package #240

Closed

Add reduce_plan() #327

Merged

wlandau closed this as completed in c26eb38 Mar 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New function `reduce_plan()` #326

New function `reduce_plan()` #326

wlandau commented Mar 15, 2018 •

edited

Loading

bart1 commented Mar 15, 2018

wlandau-lilly commented Mar 15, 2018 •

edited

Loading

wlandau-lilly commented Mar 15, 2018 •

edited

Loading

krlmlr commented Mar 15, 2018

bart1 commented Mar 15, 2018

wlandau commented Mar 16, 2018

New function reduce_plan() #326

New function reduce_plan() #326

Comments

wlandau commented Mar 15, 2018 • edited Loading

bart1 commented Mar 15, 2018

wlandau-lilly commented Mar 15, 2018 • edited Loading

wlandau-lilly commented Mar 15, 2018 • edited Loading

krlmlr commented Mar 15, 2018

bart1 commented Mar 15, 2018

wlandau commented Mar 16, 2018

New function `reduce_plan()` #326

New function `reduce_plan()` #326

wlandau commented Mar 15, 2018 •

edited

Loading

wlandau-lilly commented Mar 15, 2018 •

edited

Loading

wlandau-lilly commented Mar 15, 2018 •

edited

Loading