-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot recover data frame inside plan after dynamic combining #1064
Comments
So glad to see these dynamic branching issues coming in so soon. Things work smoothest if all targets downstream of dynamic targets are also dynamic. Another library(drake)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
compute_max_x <- function(df) {
mutate(df, max_x = max(x))
}
df <- tibble(
g = rep(c("a", "b", "c"), length.out = 10),
x = runif(10)
)
plan <- drake_plan(
group = df$g,
df_group_max_splits = target(
compute_max_x(df),
dynamic = combine(df, .by = group)
),
df_group_max_combined_list = target(
bind_rows(df_group_max_splits),
dynamic = combine(df_group_max_splits)
),
df_overall_max = target(
compute_max_x(df_group_max_combined_list),
dynamic = map(df_group_max_combined_list)
)
)
make(plan)
#> target group
#> dynamic df_group_max_splits
#> subtarget df_group_max_splits_5319b5d3
#> subtarget df_group_max_splits_7f723e65
#> subtarget df_group_max_splits_2e487914
#> aggregate df_group_max_splits
#> dynamic df_group_max_combined_list
#> subtarget df_group_max_combined_list_a98b3360
#> aggregate df_group_max_combined_list
#> dynamic df_overall_max
#> subtarget df_overall_max_5077f7a9
#> aggregate df_overall_max
readd(df_group_max_combined_list)
#> [[1]]
#> # A tibble: 10 x 3
#> g x max_x
#> <chr> <dbl> <dbl>
#> 1 a 0.617 0.617
#> 2 a 0.0521 0.617
#> 3 a 0.381 0.617
#> 4 a 0.566 0.617
#> 5 b 0.382 0.725
#> 6 b 0.399 0.725
#> 7 b 0.725 0.725
#> 8 c 0.600 0.600
#> 9 c 0.114 0.600
#> 10 c 0.375 0.600
readd(df_overall_max)
#> [[1]]
#> # A tibble: 10 x 3
#> g x max_x
#> <chr> <dbl> <dbl>
#> 1 a 0.617 0.725
#> 2 a 0.0521 0.725
#> 3 a 0.381 0.725
#> 4 a 0.566 0.725
#> 5 b 0.382 0.725
#> 6 b 0.399 0.725
#> 7 b 0.725 0.725
#> 8 c 0.600 0.725
#> 9 c 0.114 0.725
#> 10 c 0.375 0.725 Created on 2019-11-13 by the reprex package (v0.3.0) Alternatively, library(drake)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
compute_max_x <- function(df) {
mutate(df, max_x = max(x))
}
df <- tibble(
g = rep(c("a", "b", "c"), length.out = 10),
x = runif(10)
)
plan <- drake_plan(
group = df$g,
df_group_max_splits = target(
compute_max_x(df),
dynamic = combine(df, .by = group)
),
df_group_max_combined_list = target(
bind_rows(df_group_max_splits),
dynamic = combine(df_group_max_splits)
),
df_overall_max = compute_max_x(readd(df_group_max_combined_list)[[1]])
)
make(plan)
#> target group
#> dynamic df_group_max_splits
#> subtarget df_group_max_splits_d7fb6b4e
#> subtarget df_group_max_splits_d7b62986
#> subtarget df_group_max_splits_30627286
#> aggregate df_group_max_splits
#> dynamic df_group_max_combined_list
#> subtarget df_group_max_combined_list_ecff515c
#> aggregate df_group_max_combined_list
#> target df_overall_max
readd(df_group_max_combined_list)
#> [[1]]
#> # A tibble: 10 x 3
#> g x max_x
#> <chr> <dbl> <dbl>
#> 1 a 0.567 0.629
#> 2 a 0.563 0.629
#> 3 a 0.629 0.629
#> 4 a 0.254 0.629
#> 5 b 0.653 0.846
#> 6 b 0.846 0.846
#> 7 b 0.282 0.846
#> 8 c 0.429 0.971
#> 9 c 0.971 0.971
#> 10 c 0.433 0.971
readd(df_overall_max)
#> # A tibble: 10 x 3
#> g x max_x
#> <chr> <dbl> <dbl>
#> 1 a 0.567 0.971
#> 2 a 0.563 0.971
#> 3 a 0.629 0.971
#> 4 a 0.254 0.971
#> 5 b 0.653 0.971
#> 6 b 0.846 0.971
#> 7 b 0.282 0.971
#> 8 c 0.429 0.971
#> 9 c 0.971 0.971
#> 10 c 0.433 0.971 Created on 2019-11-13 by the reprex package (v0.3.0) Maybe the manual should discuss these issues. Not exactly sure where it fits in the flow of the current dynamic branching chapter. PRs to the manual always welcome (recently moved here). |
I originally planned a dynamic |
Description
After using
combine
on a data frame, there is no documented method to recover the combined data frame from within the plan.Reproducible example
Expected result
The data frame is split according to the grouping variable and calculation is performed correctly on each split (aside: using a function called
combine
to perform a splitting operation is counterintuitive):Calling
combined
with no.by
argument combines the results as per the docs (aside 2: As I mentioned in another comment, one expects that callingbind_rows
on a list of data frames would return a data frame as it would if we weren't inside a Drake plan):The
df_overall_max
target should extract the data frame inside the single-item list and call the function on the entire combined data frame:What should have happened? Please be as specific as possible.
If I add
df2_class = class(df_group_max_combined_list)
as a target andreadd
it out, I get"drake_dynamic"
back instead of a list.I've read through the dynamic branching chapter in the book and can't seem to find a way to continue working with a target after doing the
combine .by
+combine
operation. The examples end with callingreadd
on the target containing the single-item list, so I'm not actually sure if this is supported.On the other hand, I might very well be misunderstanding how
combine
is supposed to work, in which case this wouldn't be a bug but rather a need to expand the docs.(Still, I'm very excited about the dynamic branching feature and hope to make greater use of it soon!)
Session info
The text was updated successfully, but these errors were encountered: