-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathREADME.Rmd
72 lines (55 loc) · 3.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
# `drake` R package Stan model example
The goal of this workflow is to validate a small Bayesian hierarchical model.
```{r, eval = FALSE}
y_i ~ iid Normal(alpha + x_i * beta, sigma^2)
alpha ~ Normal(0, 1)
beta ~ Normal(0, 1)
sigma ~ Uniform(0, 1)
```
We simulate multiple datasets from the model and fit the model on each dataset. For each model fit, we determine if the 50% credible interval of the regression coefficient `beta` contains the true value of `beta` used to generate the data. If we implemented the model correctly, roughly 50% of the models should recapture the true `beta` in 50% credible intervals.
## The `drake` pipeline
The [`drake`](https://github.com/ropensci/drake) R package manages the workflow. It automatically skips steps of the pipeline when the results are already up to date, which is critical for Bayesian data analysis because it usually takes a long time to run Markov chain Monte Carlo. It also helps users understand and communicate this work dependency graphs (see `r_vis_drake_graph()`).
## File structure
The files in this example are organized as follows.
```{r, eval = FALSE}
├── run.sh
├── run.R
├── _drake.R
├── sge.tmpl
├── R/
├──── packages.R
├──── functions.R
├──── plan.R
├── stan/
├──── model.stan
└── report.Rmd
```
File | Purpose
---|---
`run.sh` | Shell script to run `run.R` in a persistent background process. Works on Unix-like systems. Helpful for long computations on servers.
`run.R` | R script to run `r_make()`.
`_drake.R` | The special R script that powers functions `r_make()` and friends ([details here]()).
`sge.tmpl` | A [`clustermq`](https://github.com/mschubert/clustermq) template file to deploy targets in parallel to a Sun Grid Engine cluster.
`R/packages.R` | A custom R script loading the packages we need.
`R/functions.R` | A custom R script with user-defined functions.
`R/plan.R` | A custom R script that defines the `drake` plan.
`stan/model.stan` | The specification of our Stan model.
`report.Rmd` | An R Markdown report summarizing the results of the analysis.
## How to run
1. Install the packages mentioned in `R/packages.R`.
1. Run the `drake` pipeline by either running `run.R` or `run.sh`. (The latter is for Unix-like systems only). This computation could take a while.
1. View the validation results in the output `report.html` file.
1. Make changes to the R code or Stan model, rerun the pipeline, and watch `drake` skip steps that are already up to date.
## Scale out
This computation is currently downsized for pedagogical purposes. To scale it up, open the `R/plan.R` script and increase the number of simulations (the number inside `seq_len()` in the `index` target).
## High-performance computing
You can run this project locally on your laptop or remotely on a cluster. The comments in the `_drake.R` file have specific directions. Details on high-performance computing are available in [this chapter of the manual](https://books.ropensci.org/drake/hpc.html).