forked from jrnold/bugs-examples-in-stan
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathcancer.Rmd
131 lines (112 loc) · 3.44 KB
/
cancer.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# Cancer: difference in two binomial proportions {#cancer}
```{r cancer_setup,message=FALSE,cache=FALSE}
library("tidyverse")
library("rstan")
```
Two groups chosen to be random samples from subpopulations of lung-cancer patients and cancer-free individuals.[^cancer]
The scientific question of interest is the difference in the smoking habits between two groups.
The results of the survey are:
```{r cancer}
cancer <- tribble(
~group, ~n, ~smokers,
"Cancer patients", 86, 82,
"Control group", 86, 72
)
```
```{r echo=FALSE,results='asis'}
cancer
```
## Two Sample Binomial Model
In implementing this model, we have just two data points (cancer patients and
control group) and a binomial sampling model, in which the population
proportions of smokers in each group appear as parameters. Quantities of
interest such as the difference in the population proportions and the log of
the odds ratio are computed in the generated quantities section. Uniform priors
on the population proportions are used in this example.
$$
\begin{aligned}[t]
r_i &\sim \mathsf{Binomial}(n_i, \pi_i)
\end{aligned}
$$
Additionally the difference,
$$
\delta = \pi_1 - \pi_2 ,
$$
and the log-odds ratio,
$$
\lambda = \log\left(\frac{\pi_1}{1 - \pi_1}\right) - \log \left( \frac{\pi_2}{1 - \pi_2} \right) ,
$$
It places uniform priors (Beta priors) are placed on $\pi$,
$$
\begin{aligned}
\pi_i &\sim \mathsf{Beta}(1, 1)
\end{aligned}
$$
The difference between and log odds ratio are defined in the `generated quantities` block.
```{r}
cancer_data <- list(
r <- cancer$smokers,
n <- cancer$n,
# beta prior on pi
p_a = rep(1, 2),
p_b = rep(1, 2)
)
```
The Stan model for this is:
```{r cancer_mod1,results='hide',cache.extra=tools::md5sum("stan/cancer1.stan")}
cancer_mod1 <- stan_model("stan/cancer1.stan")
```
```{r echo=FALSE,results='asis'}
cancer_mod1
```
Now estimate the model:
```{r cancer_fit1,results='hide'}
cancer_fit1 <- sampling(cancer_mod1, cancer_data)
```
```{r}
cancer_fit1
```
## Binomial Logit Model of the Difference
An alternative parameterization directly models the difference in the population proportion.
$$
\begin{aligned}[t]
r_i &\sim \mathsf{Binomial}(n_i, \pi_i) \\
\pi_1 &= \frac{1}{1 + \exp(-(\alpha + \beta)} \\
\pi_2 &= \frac{1}{1 + \exp(-\alpha))}
\end{aligned}
$$
The parameters $\alpha$ and $\beta$ are given weakly informative priors on the log-odds scale,
$$
\begin{aligned}
\alpha &\sim N(0, 10)\\
\beta &\sim N(0, 2.5)
\end{aligned}
$$
```{r cancer_mod2,results='hide',cache.extra=tools::md5sum("stan/cancer2.stan")}
cancer_mod2 <- stan_model("stan/cancer2.stan")
```
```{r echo=FALSE,results='asis'}
cancer_mod2
```
Re-use `r` and `n` values from `cancer_data`, but add the appropriate values for the prior distributions.
```{r cancer_data2}
cancer_data2 <- within(cancer_data, {
p_a <- p_b <- NULL
a_loc <- b_loc <- 0
a_scale <- 10
b_scale <- 2.5
})
```
Sample from the model:
```{r cancer_fit2,results='hide'}
cancer_fit2 <- sampling(cancer_mod2, cancer_data2)
```
```{r}
cancer_fit2
```
## Questions
1. Expression the Binomial Logit model of the Difference as a regression
1. What number of success and failures is a `Beta(1,1)` prior equivalent to?
[^cancer]: This example is derived from Simon Jackman,
"[Cancer: difference in two binomial proportions](https://web-beta.archive.org/web/20070601000000*/http://jackman.stanford.edu:80/mcmc/cancer.odc)",
*BUGS Examples,* 2007-07-24, This examples comes from @JohnsonAlbert1999a, using data from @Dorn1954a.