forked from gshotwell/easyMake
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.rmd
133 lines (109 loc) · 5.03 KB
/
README.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title: "easyMake"
output: github_document
---
This package is a fork from the
[unmaintained](https://github.com/GShotwell/easyMake) package.
easyMake is a proof of concept for a simple way to generate
Makefiles based on an R dataframe listing file dependencies. It
is not on CRAN, but you can install it with:
```
devtools::install_github("zauster/easyMake")
```
Make provides an incredibly powerful way to manage analysis
projects. By using a Makefile to specify the way in which the
files in your project depend on one another you can ensure that
your analysis is always up to date, and that files are not being
needlessly regenerated if nothing has changed.
Using a Makefile is one of the best things that you can do to
ensure that your analysis project is robust and reproducible.
Writing a Makefile, however, requires learning a new programming
paradigm, which is something many R users are uncomfortable with.
Since it is often easier to edit an existing Makefile than to
generate a new one from scratch, easyMake provides a set of tools
to quickly and easily set up your own Makefile.
easyMake is based on the principle that most R projects are built
around R scripts which execute various actions, and artifacts
which are the inputs and outputs to those scripts.
For instance, you might write a script which reads in a dataset,
alters it in some way, and then saves it as a new file for
another script to read. The Input -> Script -> Output structure
of many R projects lets us detect dependencies between files by
detecting which artifacts are read into each script. If a script
imports a file `data.csv` with `read.csv()` then `data.csv` is a
pre-requisite for that script. If it then saves it as
`data2.RData`, then we know that the script is itself a
pre-requisite for `data2.RData`.
This is rolled into the easyMake function `detect_dependencies()`
which reads all the R files in the working directory, identifies
which files they import and export, and then builds a dependency
edge list based on those relationships. The output of this
function looks like this:
```{r}
# Create edge list mannually
dependencies <- data_frame(
file = c("analysis/file2.R", "analysis/markdown.Rmd", "mtcars.csv",
"mtcars.RData", "analysis/file2.R"),
pre_req = c("mtcars.csv", "mtcars.RData", "analysis/file1.R",
"analysis/file2.R", "R/hello.R"))
dependencies
```
There are four rules to follow to make sure that
`detect_dependencies()` does a good job of identifying your file
structure:
- Use explicit file names in your file import and export
statements. In other words, don't assign file names
programatically, but instead use the form `export(file,
"filename.csv")`
- Do not use the same names for a script's imports and exports.
If a script reads `data.csv` it should not write to the file
`data.csv` but instead write to a new file name like
`data2.csv`. Even better, use meaningful names for imports and
exports, this helps in controlling the correct functioning of
the generated Makefile.
- Scripts should be pure in the sense that they only communicate
with the project through their imports and exports. A script
should not rely on, nor produce, any objects which are stored
in memory.
I recommend that you edit the dependency edge list to make sure
that it is caputuring all of the project's dependencies. Once you
have the graph you can do two things with it:
## Turn it into a Makefile
The `easy_make()` function simply takes a dependency edge list,
and generates a Makefile using this rules. Running
`easy_make(dependencies)` on the above dependency edge list would
product the following Makefile:
```
all: analysis/markdown.Rmd
analysis/file2.R: mtcars.csv R/hello.R
--touch analysis/file2.R
analysis/markdown.Rmd: mtcars.RData
Rscript -e 'rmarkdown::render("analysis/markdown.Rmd")'
mtcars.csv: analysis/file1.R
Rscript analysis/file1.R
mtcars.RData: analysis/file2.R
Rscript analysis/file2.R
```
Makefiles are simple to construct and edit. The first line of
each action defines the target file, and its prerequisites. When
you build a project using a Makefile, Make checks whether the
prerequisite was produced more recently than the target. If it
was, then it runs the command specified in the rule. The easyMake
package asumes that all of the actions are either running R
scripts, or rendering Rmarkdown documents. You can of course, run
other commands as part of the Makefile, you just need to add them
yourself.
`easy_make()` produces the Makefile using the following rules:
* If both the target and prerequsite are R files, touch the target file
* If only one is an R file, run the R file
* If one is a Rmd file, render it.
## Produce a Dependency Graph
If you want to take a look at your project's dependencies, you
can use the `graph_dependencies()` function. This takes
the dependency edgelist produced by `detect_dependencies()` and a
character vector containing the complete list of files, and
generates a picture of how those files depend on one another. A
sample graph might look like this:
```{r}
easyMake::graph_dependencies(dependencies)
```