forked from jrnold/bugs-examples-in-stan
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathtruncated.Rmd
123 lines (106 loc) · 3.96 KB
/
truncated.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# Truncation: How does Stan deal with truncation?
```{r truncated_setup,message=FALSE}
library("tidyverse")
library("rstan")
```
Suppose we observed $y = (1, \dots, 9)$,[^truncated-source]
```{r}
y <- 1:9
```
These observations are drawn from a population distributed normal with unknown mean ($\mu$) and variance ($\sigma^2$), with the constraint that $y < 10$,
$$
\begin{aligned}[t]
y_i &\sim \mathsf{Normal}(\mu, \sigma^2) I(-\infty, 10) .
\end{aligned}
$$
With the censoring taken into account, the log likelihood is
$$
\log L(y; \mu, \sigma) = \sum_{i = 1}^n \left( \log \phi(y_i; \mu, \sigma^2) - \log\Phi(y_i; \mu, \sigma^2) \right)
$$
where $\phi$ is the normal distribution PDF, and $\Phi$ is the normal distribution $
The posterior of this model is not well identified by the data, so the mean, $\mu$, and scale, $\sigma$, are given informative priors based on the data,
$$
\begin{aligned}[t]
\mu &\sim \mathsf{Normal}(\bar{y}, s_y) ,\\
\sigma &\sim \mathsf{HalfCauchy}(0, s_y) .
\end{aligned}
$$
where $\bar{y}$ is the mean of $y$, and $s_y$ is the standard deviation of $y$. Alternatively, we could have standardized the data prior to estimation.
## Stan Model
See @Stan2016a, Chapter 11 "Truncated or Censored Data" for more on how Stan handles truncation and censoring.
In Stan the `T` operator used in sampling statement,
```
y ~ distribution(...) T[upper, lower];
```
is used to adjust the log-posterior contribution for truncation.
```{r truncate_mod,results='hide'}
truncate_mod <- stan_model("stan/truncated.stan")
```
```{r echo=FALSE,results='asis',cache=FALSE}
truncate_mod
```
## Estimation
```{r truncate_data}
truncate_data <- within(list(), {
y <- y
N <- length(y)
U <- 10
mu_mean <- mean(y)
mu_scale <- sd(y)
sigma_scale <- sd(y)
})
```
```{r truncate_fit1,results='hide',message=FALSE}
truncate_fit1 <- sampling(truncate_mod, data = truncate_data)
```
```{r}
truncate_fit1
```
We can compare these results to that of a model in which the truncation is not taken into account:
$$
\begin{aligned}[t]
y_i &\sim \mathsf{Normal}(\mu, \sigma^2), \\
\mu &\sim \mathsf{Normal}(\bar{y}, s_y) ,\\
\sigma &\sim \mathsf{HalfCauchy}(0, s_y) .
\end{aligned}
$$
```{r truncate_mod2,results='hide'}
truncate_mod2 <- stan_model("stan/normal.stan")
```
```{r echo=FALSE,results='asis',cache=FALSE}
truncate_mod2
```
```{r truncate_fit2,results='hide'}
truncate_fit2 <-
sampling(truncate_mod2, data = truncate_data)
```
```{r}
truncate_fit2
```
We can compare the densities for $\mu$ and $\sigma$ in each of these models.
```{r truncted_plot_density}
plot_density <- function(par) {
bind_rows(
tibble(value = rstan::extract(truncate_fit1, par = par)[[1]],
model = "truncated"),
tibble(value = rstan::extract(truncate_fit2, par = par)[[1]],
model = "non-truncated")
) %>%
ggplot(aes(x = value, colour = model, fill = model)) +
geom_density(alpha = 0.3) +
labs(x = eval(bquote(expression(.(as.name(par)))))) +
theme(legend.position = "bottom")
}
```
```{r truncate_plot_density_mu,fig.cap="Posterior density of $\\mu$ when estimated with and without truncation"}
plot_density("mu")
```
```{r truncate_plot_density_sigma,fig.cap="Posterior density of $\\sigma$ when estimated with and without truncation"}
plot_density("sigma")
```
## Questions
1. How are the densities of $\mu$ and $\sigma$ different under the two models? Why are they different?
1. Re-estimate the model with improper uniform priors for $\mu$ and $\sigma$. How do the posterior distributions change?
1. Suppose that the truncation points are unknown. Write a Stan model and estimate. See @Stan2016a, Section 11.2 "Unknown Truncation Points" for how to write such a model. How important is the prior you place on the truncation points?
[^truncated-source]: This example is derived from Simon Jackman. "Truncation: How does WinBUGS deal with truncation?" *BUGS Examples*, 2007-07-24,
[URL](https://web-beta.archive.org/web/20070724034035/http://jackman.stanford.edu:80/mcmc/SingleTruncation.odc).