-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathBernoulli.Rmd
341 lines (277 loc) · 12.3 KB
/
Bernoulli.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
---
title: "Bernoulli"
output: rmarkdown::html_vignette
highlight: pygments
vignette: >
%\VignetteIndexEntry{Bernoulli}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = ""
)
knitr::opts_chunk$set(fig.width=12, fig.height=8)
```
## Background
So you have this model you (or your employers) really like and that has been extensively tested and validated.
A new batch of data arrives for you to analyse and you wonder how to best use the historical data ($D_0$) you have access to.
A little bit of searching on the internet suggests you might use a [power prior](https://projecteuclid.org/journals/statistical-science/volume-15/issue-1/Power-prior-distributions-for-regression-models/10.1214/ss/1009212673.full), which consist of raising the likelihood, $L(D_0 \mid \theta)$ to a scalar $a_0$, usually taken to be in $[0, 1]$.
Like so
$$ p(\theta \mid D_0) = L(D_0 \mid \theta)^{a_0} \pi(\theta), $$
where $\pi(\theta)$ is called the _initial_ prior for the parameter $\theta$.
So you code that up.
Only to realise you have no idea what $a_0$ should be.
What to do now?
A bit more searching returns [this](https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.6728) helpful review, the section 2.1 of which points you towards a **normalised power prior**:
$$ \tilde{p}(\theta \mid D_0) = \frac{L(D_0 \mid \theta)^{a_0} \pi(\theta)}{\int_{\boldsymbol{\Theta}} L(D_0 \mid t)^{a_0} \pi(t)\,dt} \pi_A(a_0),$$
which hinges on the quantity
$$ c(a_0) := \int_{\boldsymbol{\Theta}} L(D_0 \mid t)^{a_0} \pi(t)\,dt.$$
This normalised density can now be used to compute fancy posterior in face of the new data:
$$ \tilde{p}(\theta \mid D_0, D) = L(D \mid \theta)L(D_0 \mid \theta)^{a_0} \pi(\theta) \frac{\pi_A(a_0)}{c(a_0)}.$$
## Getting our hands dirty
In this vignette we will use the **npowerPrioR** package to implement the routines described in [Carvalho & Ibrahim (2021)](https://arxiv.org/abs/2004.14912) to reproduce the Bernoulli example (Scenario 1) in [Neuenschwander et al. (2009)](https://onlinelibrary.wiley.com/doi/10.1002/sim.3722).
The historical data consist of $N_0$ Bernoulli trials $x_{0i} \in \{0,1\}$.
Suppose there were $y_0 = \sum_{i=1}^{N_0}x_{0i}$ successes.
The model is
$$
\begin{align*}
\theta &\sim \operatorname{Beta}(c, d), \\
x_{0i} \mid \theta &\sim \operatorname{Bernoulli}(\theta).
\end{align*}
$$
This leads to a Beta posterior distribution for $\theta$,
$$
\begin{equation}
p(\theta \mid N_0, y_0, a_0) \propto \theta ^{a_0 y_0 + c - 1} (1-\theta)^{a_0 (N_0 -y_0) + d - 1},
\end{equation}
$$
and hence ([Neuenschwander et al., 2009](https://onlinelibrary.wiley.com/doi/10.1002/sim.3722)):
$$
\begin{equation}
c(a_0) = \frac{\mathcal{B}(a_0 y_0 + c, a_0 (N_0 -y_0) + d)}{\mathcal{B}(c, d)},
\end{equation}
$$
where $\mathcal{B}(w, z) = \frac{\Gamma(w)\Gamma(z)}{\Gamma(w + z)}$.
So here we will be comparing the approximate power prior to its exact counterpart, since in this simple example we know $c(a_0)$ exactly -- but we'll pretend we don't.
First, let's load up the package
```{r setup}
library(npowerPrioR)
```
Now let's have a look at what a program implementing the vanilla power prior looks like:
```{r}
vanilla.stan <- system.file("stan", "simple_Bernoulli_prior.stan",
package="npowerPrioR")
```
```{r}
writeLines(readLines(vanilla.stan))
```
So, as you can see, just a regular Stan model, with the exception that we have multiplied the likelihood by $a_0$.
There are few important things to notice here:
* The program must store the log-likelihood (`logL`) and its square (`logL_sq`) in the `transformed parameters` block. This is important for the routines in **npowerPrioR**;
* All of the component densities need to be written out in full, i.e. we always write
```
target += normal_lpdf(x | 0, 1);
```
instead of
```
x ~ normal(0, 1);
```
Now we can finally run our routine to estimate $c(a_0)$ using the bisection-type algorithm described in [Carvalho & Ibrahim (2021)](https://arxiv.org/abs/2004.14912).
```{r}
N_0 <- 100
y_0 <- 20
N <- 100
y <- 20
cc <- 1
dd <- 1
nu <- 1
eta <- 1
prior <- suppressMessages(stan_model(vanilla.stan))
bb.data <- list(
N0 = N_0,
y0 = y_0,
c = cc,
d = dd,
a_0 = NA
)
#####################
rstan_options(auto_write = TRUE)
options(mc.cores = 4)
epsilon <- 0.05
J <- 20
maxA <- 1
```
```{r}
adaptive.ca0.estimates <- build_grid(compiled.model.prior = prior,
eps = epsilon, M = maxA,
J = J, v1 = 10, v2 = 10,
stan.list = bb.data, pars = "theta")
warnings()
```
Let's look at what the product of our labour is:
```{r}
head(adaptive.ca0.estimates$result)
```
Nice! So we have measured $\log(c(a_0))$ at a few points.
Now we we'll fit a generalised additive model (GAM) to emulate $\log(c(a_0))$ at any point we want:
```{r}
fit.gam <- mgcv::gam(lc_a0 ~ s(a0, k = J + 1), data = adaptive.ca0.estimates$result)
```
Now we'll produce predictions from the GAM model to create a fine dictionary of pairs $a_0, \log(c(a_0))$.
We'll pick grid size of $K=20, 000$.
```{r}
K <- 2e4
bb.data.forposterior <- list(
N0 = N_0,
y0 = y_0,
c = cc,
d = dd,
nu = nu,
eta = eta,
N = N,
y = y,
K = K
)
pred_a0s <- seq(0, max(adaptive.ca0.estimates$result$a0), length.out = K)
a0_grid <- data.frame(a0 = pred_a0s,
lc_pred = predict(fit.gam, newdata = data.frame(a0 = pred_a0s)))
bb.data.forposterior$pred_grid_x <- a0_grid$a0
bb.data.forposterior$pred_grid_y <- a0_grid$lc_pred
```
These can be plugged in a modified program.
Let us investigate its structure:
```{r}
approximate.stan <- system.file("stan", "simple_Bernoulli_posterior_normalised_approximate.stan",
package="npowerPrioR")
```
```{r}
writeLines(readLines(approximate.stan))
```
As you can see, this modified program has a few key components:
* The `functions` block now includes functions approximately compute $\log(c(a_0))$ from a dictionary. Note that these functions are model-agnostic. You can just copy-paste them into a program for your model and they should work.
* The `model`block now has a line with `-approximate_ca0` in it (take note of the sign, it's important)
* The line dubbed "Likelihood" in the program implements the likelihood of the current data $L(D\mid \theta)$.
Now, let's compile
```{r}
approx.normalised.model <- suppressMessages(stan_model(approximate.stan))
```
and run our model
```{r}
approx.norm.posterior.bern <- sampling(approx.normalised.model,
data = bb.data.forposterior,
refresh = 500, iter = 4000)
```
OK, we'll annotate those results shortly.
To finish this analysis off, we will run both the unnormalised (mathematically wrong!, see [Neuenschwander et al., 2009](https://onlinelibrary.wiley.com/doi/10.1002/sim.3722)) version of the power prior and the exactly normalised model, because in this situation we're blessed enough to know the correct $c(a_0)$ in closed-form.
```{r}
unnormalised.stan <- system.file("stan", "simple_Bernoulli_posterior_unnormalised.stan",
package="npowerPrioR")
```
```{r}
unnorm.bern <- suppressMessages(stan_model(unnormalised.stan))
```
```{r}
unnorm.posterior.bern <- sampling(unnorm.bern, data = bb.data.forposterior,
refresh = 500, iter = 4000)
```
And now, the exactly normalised posterior:
```{r}
exactly.normalised.stan <- system.file("stan", "simple_Bernoulli_posterior_normalised.stan",
package="npowerPrioR")
```
```{r}
exactly.normalised.model <- suppressMessages(stan_model(exactly.normalised.stan))
```
```{r}
norm.posterior.bern <- sampling(exactly.normalised.model,
data = bb.data.forposterior,
refresh = 500, iter = 4000)
```
Good. Now let's compare all of the estimates we have computed. First, a little bit of prep
```{r}
# Eq 8 in Neuenschwander et al. 2009
posterior_a0_Bernoulli <- function(a_0, y0, n0, y, n, cc, dd, eta, nu, log = FALSE){
term1 <- lgamma(a_0 * n0 + cc + dd) + lgamma(a_0 * y0 + y + cc) + lgamma( a_0 *(n0-y0) + (n - y) + dd)
term2 <- lgamma(a_0 * y_0 + cc) + lgamma(a_0 * (n0 - y0) + dd ) + lgamma(a_0 * n0 + n + cc + dd)
term3 <- dbeta(a_0, shape1 = eta, shape2 = nu, log = TRUE)
ans <- term1 - term2 + term3
if(!log) ans <- exp(ans)
return(ans)
}
post_a0 <- function(x) {
posterior_a0_Bernoulli(a_0 = x, y0 = y_0, n0 = N_0,
y = y, n = N, cc = cc, dd = dd,eta = eta, nu = nu)
}
post_a0 <- Vectorize(post_a0)
Kp <- integrate(post_a0, 0, 1)$value
norm_post_a0 <- function(x) post_a0(x)/Kp
norm_post_a0 <- Vectorize(norm_post_a0)
```
Let's combine the posterior estimates for $a_0$ under the various models:
```{r}
a0.unnorm <- extract(unnorm.posterior.bern, 'a_0')$a_0
a0.approx <- extract(approx.norm.posterior.bern, 'a_0')$a_0
a0.norm <- extract(norm.posterior.bern, 'a_0')$a_0
a0.dt <- data.frame(a0 = c(a0.unnorm, a0.norm, a0.approx),
normalisation = c( rep("none", length(a0.unnorm)),
rep("exact", length(a0.norm)),
rep("approximate", length(a0.approx)) ))
```
Now, a nice plot:
```{r}
library(ggplot2)
a0_dist <- ggplot(a0.dt, aes(x = a0, fill = normalisation, colour = normalisation)) +
geom_density() +
stat_function(fun = function(x) dbeta(x, eta, nu),
geom = "line", colour = "black", linetype = "longdash") +
stat_function(fun = norm_post_a0,
geom = "line", colour = "black", linetype = "solid") +
# ggtitle("Bernoulli") +
facet_grid(normalisation~., scales = "free") +
scale_y_continuous("Density", expand = c(0, 0)) +
scale_x_continuous(expression(a[0]), expand = c(0, 0)) +
theme_bw(base_size = 20) +
theme(legend.position = "none") +
theme(legend.position = "bottom",
legend.justification = "centre",
legend.title = element_blank(),
strip.background = element_blank(),
strip.text.y = element_blank(),
legend.margin = margin(0, 0, 0, 0),
legend.box.margin = margin(0, 0, 0, 0))
a0_dist
```
Now let's look at the estimates of the success probability, $\theta$.
```{r}
unnorm.theta.dt <- data.frame(theta = extract(unnorm.posterior.bern, 'theta')$theta)
unnorm.theta.dt$normalisation <- "none"
approx.theta.dt <- data.frame(theta = extract(approx.norm.posterior.bern, 'theta')$theta)
approx.theta.dt$normalisation <- "approximate"
norm.theta.dt <- data.frame(theta = extract(norm.posterior.bern, 'theta')$theta)
norm.theta.dt$normalisation <- "exact"
par.posteriors <- rbind(unnorm.theta.dt, approx.theta.dt, norm.theta.dt)
a0_star <- 0.05
a_star <- a0_star*bb.data.forposterior$y0 + bb.data.forposterior$c + bb.data.forposterior$y
b_star <- a0_star*(bb.data.forposterior$N0 - bb.data.forposterior$y0) + bb.data.forposterior$d + (bb.data.forposterior$N - bb.data.forposterior$y)
theta_posterior <- ggplot(data = par.posteriors, aes(x = theta, colour = normalisation, fill = normalisation)) +
geom_density(alpha = .4) +
stat_function(fun = function(x) dbeta(x, a_star, b_star),
geom = "line", colour = "black", linetype = "solid") +
# geom_vline(xintercept = y/N, linetype = "dashed") +
scale_x_continuous(expression(theta), expand = c(0, 0)) +
scale_y_continuous("Density", expand = c(0, 0)) +
theme_bw(base_size = 20) +
theme(legend.position = "bottom",
legend.justification = "centre",
legend.title = element_blank(),
strip.background = element_blank(),
strip.text.y = element_blank(),
legend.margin = margin(0, 0, 0, 0),
legend.box.margin = margin(0, 0, 0, 0))
theta_posterior
```
## Conclusion
In this vignette we have shown how to write a Stan program to do a fixed $a_0$ power prior analysis for use within **npowerPrioR** and how to modify that program to include an approximate emulation function for the log-normalising constant, $\log(c(a_0))$.
We have also compared the approximately normalised power prior to the exactly normalised distribution in a situation where we know the true answer, and found that it gave very good results, both in terms of the marginal posterior for $a_0$ and the marginal posterior of the parameter of interest, $\theta$.