forked from jrnold/bugs-examples-in-stan
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathbimodal.Rmd
97 lines (82 loc) · 2.84 KB
/
bimodal.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
# Bimodal: Extreme missingness in bivariate normal data {#bimodal}
```{r bimodal_setup,message=FALSE,cache=FALSE}
library("rstan")
library("tidyverse")
library("stringr")
```
Simple methods for dealing with missing data can run into trouble given pernicious patterns of missingness. A famous artificial data set designed to highlight this point was created by Gordon Murray, to show how an EM algorithm can run into problems [@Murray1977a,@DempsterLairdRubin1977a].
```{r Bimodal}
Bimodal <- tribble(
~ x1, ~ x2,
1, 1,
1, -1,
-1, 1,
-1, -1,
2, NA,
2, NA,
-2, NA,
-2, NA,
NA, 2,
NA, 2,
NA, -2,
NA, -2
)
```
```{r}
Bimodal
```
Assume bivariate normality, and that the means of the two variables are both
zero, but the variances and covariance are unknown. Inference about the
correlation coefficient $r$ between these two variables is not trivial in
this instance. The marginal complete-data likelihood for $r$ is not unimodal,
and has a saddle-point at zero, and two local maxima close to -1 and 1. A
Bayesian analysis (with uninformative priors) similarly recovers a bimodal
posterior density for the correlation coefficient; e.g., [@Tanner1996a,
@Congdon2007a].
```{r bimodal_mod,message=FALSE,results='hide',cache.extra=tools::md5sum("stan/bimodal.stan")}
bimodal_mod <- stan_model("stan/bimodal.stan")
```
```{r echo=FALSE}
bimodal_mod
```
You can ignore the **rstan** warning,
DIAGNOSTIC(S) FROM PARSER:
Warning (non-fatal):
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
X[i] ~ multi_normal(...)
since the left hand side is a simple linear relationship and no
Jacobian adjustment is needed.
All we did was replace index values in the transformed parameter.
```{r bimodal_data}
X_mat <- as.matrix(Bimodal)
X <- X_mat %>%
as_data_frame() %>%
mutate(.row = row_number()) %>%
gather(.col, value, -.row) %>%
mutate(.col = as.integer(str_replace(.col, "x", "")))
X_obs <- filter(X, !is.na(value))
X_miss <- filter(X, is.na(value))
bimodal_data <- within(list(), {
N <- nrow(X_mat)
x_obs <- X_obs$value
x_obs_row <- X_obs$.row
x_obs_col <- X_obs$.col
N_obs <- nrow(X_obs)
x_miss_row <- X_miss$.row
x_miss_col <- X_miss$.col
N_miss <- nrow(X_miss)
df <- 100
})
```
```{r bimodal_fit,message=FALSE,results='hide'}
bimodal_fit <- sampling(bimodal_mod, data = bimodal_data,
chains = 4)
```
```{r}
bimodal_fit
```
This example is derived from Simon Jackman, "Bimodal: Extreme missingness in
bivariate normal data",
[URL](https://web-beta.archive.org/web/20070724034055/http://jackman.stanford.edu:80/mcmc/bimodal.odc).