replicates within tar_map() [help] #173
-
Help
DescriptionI have built a pipeline for processing data, training/evaluating models and making predictions. It makes heavy use of library(targets)
library(tarchetypes)
library(tibble)
library(dplyr)
tar_option_set(
packages = c("tidymodels"),
seed = 1
)
make_data <- function(){
tibble(
a = runif(100),
b = runif(100),
outcome = rbinom(n = 100, size = 1, prob = 0.75)
)
}
datasets <- tibble(
data = syms(c("data_a", "data_b")),
stratum = "outcome",
names = c("a", "b")
)
data_pipeline <- tar_map(
values = datasets,
names = "names",
tar_target(
data_splits,
initial_split(data, strata = stratum)
),
tar_target(
training_data,
training(data_splits)
),
tar_target(
testing_data,
testing(data_splits)
),
tar_target(
folds,
vfold_cv(training_data)
)
)
list(
tar_target(
data_a,
make_data()
),
tar_target(
data_b,
make_data()
),
data_pipeline
) I tried using |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
It's tough when there are different targets for so many different versions of the data: the raw data, splits, training, testing, and folds. For simulation studies, I often recommend making each branch its own end-to-end simulation replication: generate the data, run a model, and report compact metrics that can be summarized across reps. If needed, the command supplied to |
Beta Was this translation helpful? Give feedback.
It's tough when there are different targets for so many different versions of the data: the raw data, splits, training, testing, and folds. For simulation studies, I often recommend making each branch its own end-to-end simulation replication: generate the data, run a model, and report compact metrics that can be summarized across reps. If needed, the command supplied to
tar_map_rep()
can share an upstream raw data object, and each simulation rep can split into a different set of folds and training/testing data. An example for clinical trial simulation is at https://github.com/wlandau/rpharma2023-pipeline.