[help] Dynamically link functions used in a do.call()
as dependencies for target branches
#1344
-
Help
DescriptionI am working on architecting a pipeline where we have a number of data files with a variety of potential parsers. There may end up being tons of data files, so declaring what parser each data file should get needs to happen dynamically. I would like to match a parser function to the data file in a dynamic branching target (I can do this) AND have that branch depend on the function it will apply (I have not figured this part out). I will separately handle a situation where a file does not having a matching parser, so please ignore that scenario in this use-case. I have read a discussion thread that talks about setting dependencies when using a tar_dir({
tar_script({
tar_option_set()
parser_typeA <- function(in_file) readRDS(in_file)[4,]
parser_typeB <- function(in_file) readRDS(in_file)[1,]
apply_parser <- function(parser_xwalk) {
fxn <- parser_xwalk$parser_fxn
args <- list(in_file = parser_xwalk[, 'in_file'])
do.call(fxn, args)
}
list(
# 1. Declare files to parse
tar_target(files_to_parse,
c('my_data_typeA.rds', 'my_data_typeB.rds'),
format = 'file'),
# 2. Create a crosswalk between the files and the parser they should use
tar_target(parser_xwalk, data.frame(in_file = files_to_parse,
parser_fxn = c('parser_typeA', 'parser_typeB'))),
# 3. Apply each parser to the files based on the crosswalk
# *The problem I am having* is that the parser functions are not dependencies
# Because I don't know how many files I will end up with, I want to dynamically
# match a file to a parser and have that branch depend on the function it uses.
tar_target(parsed_files,
# I've tried `as.symbol(parser_xwalk$parser_fxn)` here but that didn't work.
apply_parser(parser_xwalk),
pattern = map(parser_xwalk))
)
})
saveRDS(tibble(col1a = c(1:5), col2a = letters[1:5]), 'my_data_typeA.rds')
saveRDS(tibble(col1b = c(1:5), col2b = letters[1:5]), 'my_data_typeB.rds')
tar_make()
tar_visnetwork()
}) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
In your example, you hard-code the parser type in the |
Beta Was this translation helpful? Give feedback.
-
It's tricky to dynamically branch over functions such that a change to one parser function does not invalidate all the branches of tar_option_set()
parsers <- list(
parser_typeA = function(in_file) readRDS(in_file)[4,],
parser_typeB = function(in_file) readRDS(in_file)[1,]
)
apply_parser <- function(parser_xwalk) {
fxn <- eval(parse(text = unlist(parser_xwalk$parser_fxn)))
args <- list(in_file = parser_xwalk[, "in_file"])
do.call(fxn, args)
}
list(
# 1. Declare files to parse
tar_target(
files_to_parse,
c("my_data_typeA.rds", "my_data_typeB.rds"),
format = "file"
),
# 2. Create a crosswalk between the files and the parser they should use
tar_target(
parser_xwalk,
data.frame(
in_file = files_to_parse,
parser_fxn = lapply(parsers, deparse)
)
),
# 3. Apply each parser to the files based on the crosswalk
tar_target(
parsed_files,
apply_parser(parser_xwalk),
pattern = map(parser_xwalk)
)
) |
Beta Was this translation helpful? Give feedback.
It's tricky to dynamically branch over functions such that a change to one parser function does not invalidate all the branches of
parsed_files
. It's not elegant, but I think it will work if the actual function body becomes part ofparser_xwalk
, as opposed to the function name. Since functions can have brittle internals that change hashes unpredictably, the following sketch deparses them to text. This could lose information in the function closure injected byVectorize()
,purrr::safely()
, etc., but it might work in your case if your parsers are simple enough.