Cascadia R Conference 2019 Update: the slides from Tiernan Martin’s talk can be downloaded here: drakepkg-slides-cascadiarconf2019.pdf
The goal of drakepkg
is
to demonstrate how a drake
workflow can be organized as an R package.
Why do this? Because the package system in R provides a widely-adopted
method of structuring, documenting, testing, and sharing R code. While
most R packages are general purpose, this approach applies the same
framework to a specific workflow (or set of workflows). It increases the
reproducibility of a complex workflow without requiring users to
recreate the workflow’s environment with a container image (although
that approach is compatible with
drakepkg
- see
januz/drakepkg).
The drakepkg
package is
experimental in nature and currently requires some inconvenient steps
(see the drake manual - 7.4 Workflows as R
packages);
please use caution when applying this approach to your own work.
You can install the released version of
drakepkg
from its Github
repository with:
devtools::install_github("tiernanmartin/drakepkg")
The following table shows how each feature of a
drake
workflow is made accessible
within an R
package:
drake |
R Package |
---|---|
plans, commands | functions (R/*.R ) |
targets | stored in the cache (.drake/ ) |
input files, output files | internal data (inst/intdata/* ), external data (inst/extdata/* ), images and documents (inst/documents/* ) |
The package comes with two example
drake
plans, both of which are
loosely based on the main
example included in the
drake
package:
- An introductory plan:
drakepkg::get_example_plan_simple()
- A plan that involves downloading external data:
drakepkg::get_example_plan_external()
The first plan looks like this:
library(drake)
get_example_plan_simple()
#> # A tibble: 5 x 2
#> target command
#> <chr> <expr>
#> 1 raw_data readxl::read_excel(file_in("intdata/iris-internal.xlsx")) ~
#> 2 ready_data dplyr::mutate(raw_data, Species = forcats::fct_inorder(Specie~
#> 3 hist create_plot(ready_data) ~
#> 4 fit lm(Sepal.Width ~ Petal.Width + Species, ready_data) ~
#> 5 report write_html_report(hist, fit, knitr_in("documents/report-simpl~
Several commands used in the plan (e.g,create_plot()
,
write_report_simple()
) are included as part of the
drakepkg
R package and so
is the plan itself; the documentation for each of these functions can be
accessed using R’s help()
function (for example,
help(get_example_plan_simple)
).
Once you have installed and loaded
drakepkg
, you can
reproduce the introductory plan’s workflow by performing the following
steps:
- Copy the package’s directories and source code files into your
working directory with the
copy_drakepkg_files()
function - View the plan (
get_example_plan_simple()
) and then make it (make(get_example_plan_simple())
) - Access the plan’s targets using
drake
functions likereadd()
orloadd()
- View the html documents created by the workflow in the
documents/
directory
# Step 1: copy the source code files into the working directory
copy_drakepkg_files()
# Step 2A: view the example plan
get_example_plan_simple()
#> # A tibble: 5 x 2
#> target command
#> <chr> <expr>
#> 1 raw_data readxl::read_excel(file_in("intdata/iris-internal.xlsx")) ~
#> 2 ready_data dplyr::mutate(raw_data, Species = forcats::fct_inorder(Specie~
#> 3 hist create_plot(ready_data) ~
#> 4 fit lm(Sepal.Width ~ Petal.Width + Species, ready_data) ~
#> 5 report write_html_report(hist, fit, knitr_in("documents/report-simpl~
# Step 2B: make the example plan
make(get_example_plan_simple())
#> All targets are already up to date.
# Step 3: examine the plan's targets
readd(fit)
#>
#> Call:
#> lm(formula = Sepal.Width ~ Petal.Width + Species, data = ready_data)
#>
#> Coefficients:
#> (Intercept) Petal.Width Speciesversicolor
#> 3.236 0.781 -1.501
#> Speciesvirginica
#> -1.844
readd(hist)
This example and others are available in the package vignette
(vignette('drakepkg')
).