Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to create a jagged cross() transform #697

Closed
htlin opened this issue Jan 31, 2019 · 4 comments
Closed

How to create a jagged cross() transform #697

htlin opened this issue Jan 31, 2019 · 4 comments

Comments

@htlin
Copy link

htlin commented Jan 31, 2019

I am hoping to do a cross() transform but I wouldn't want a complete cross product - rather a jagged version instead, e.g.:

plan <- drake_plan(
  s_load = target(
    load_csv(group, rep),
    transform = cross(
      group = c("G1", "G2"),
      rep = c("R1", "R2", "R3", "R4", "R5", "R6")
    )
  )
)

For example, my group G1 has rep R1-R6, but G2 only has R1-R4 which is missing R5-R6.
My function load_csv is searching for input files to read, in this case Gx_Ry.csv for example, but I don't have G2_R5.csv and G2_R6.csv and so it fails with files not found for those two targets.
Any recommendations would be appreciated, thanks!

@wlandau
Copy link
Member

wlandau commented Jan 31, 2019

Another nice one for the FAQ. Fortunately, this is straightforward if you create your own grid in advance and then use map().

library(drake)
library(tidyverse)
  
grid <- crossing(
  group = c("G1", "G2"),
  rep = c("R1", "R2", "R3", "R4", "R5", "R6")
) %>%
  filter(!(group == "G2" & rep %in% c("R5", "R6")))

drake_plan(
  s_load = target(
    load_csv(group, rep),
    transform = map(
      group = !!grid$group,
      rep = !!grid$rep
    )
  )
)
#> # A tibble: 10 x 2
#>    target           command                   
#>    <chr>            <chr>                     
#>  1 s_load_.G1._.R1. "load_csv(\"G1\", \"R1\")"
#>  2 s_load_.G1._.R2. "load_csv(\"G1\", \"R2\")"
#>  3 s_load_.G1._.R3. "load_csv(\"G1\", \"R3\")"
#>  4 s_load_.G1._.R4. "load_csv(\"G1\", \"R4\")"
#>  5 s_load_.G1._.R5. "load_csv(\"G1\", \"R5\")"
#>  6 s_load_.G1._.R6. "load_csv(\"G1\", \"R6\")"
#>  7 s_load_.G2._.R1. "load_csv(\"G2\", \"R1\")"
#>  8 s_load_.G2._.R2. "load_csv(\"G2\", \"R2\")"
#>  9 s_load_.G2._.R3. "load_csv(\"G2\", \"R3\")"
#> 10 s_load_.G2._.R4. "load_csv(\"G2\", \"R4\")"

Created on 2019-01-31 by the reprex package (v0.2.1.9000)

@wlandau wlandau closed this as completed Jan 31, 2019
@htlin
Copy link
Author

htlin commented Jan 31, 2019

Nice! Thanks for the solution.
Another thought I have now is that, can I make a target that tries to find all available files, and then dynamically generate (like yield in Python perhaps) named targets accordingly?

@wlandau
Copy link
Member

wlandau commented Feb 1, 2019

Sounds like #685, which many people have requested. In drake the plan needs to be fully written out before you call make(), which may limit what I think you are describing.

But if the files you mention are all available before you write the plan, then yes, you can write a plan whose target names are automatically generated.

library(drake)
files <- list.files("dir")
plan <- drake_plan(s_load = target(load_csv(file), transform = map(file = !!files)))

@wlandau
Copy link
Member

wlandau commented Feb 7, 2019

#720 will make custom grids easier. Check this out:

library(drake)
library(tidyverse)

grid <- crossing(
  group = c("G1", "G2"),
  rep = c("R1", "R2", "R3", "R4", "R5", "R6")
) %>%
  filter(!(group == "G2" & rep %in% c("R5", "R6")))

drake_plan(
  s_load = target(
    load_csv(group, rep),
    transform = map(.data = !!grid)
  )
)
#> # A tibble: 10 x 2
#>    target           command             
#>    <chr>            <expr>              
#>  1 s_load_.G1._.R1. load_csv("G1", "R1")
#>  2 s_load_.G1._.R2. load_csv("G1", "R2")
#>  3 s_load_.G1._.R3. load_csv("G1", "R3")
#>  4 s_load_.G1._.R4. load_csv("G1", "R4")
#>  5 s_load_.G1._.R5. load_csv("G1", "R5")
#>  6 s_load_.G1._.R6. load_csv("G1", "R6")
#>  7 s_load_.G2._.R1. load_csv("G2", "R1")
#>  8 s_load_.G2._.R2. load_csv("G2", "R2")
#>  9 s_load_.G2._.R3. load_csv("G2", "R3")
#> 10 s_load_.G2._.R4. load_csv("G2", "R4")

Created on 2019-02-07 by the reprex package (v0.2.1.9000)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants