cancer.Rmd

# Cancer: difference in two binomial proportions {#cancer}

```{r cancer_setup,message=FALSE,cache=FALSE}
library("tidyverse")
library("rstan")
```

Two groups chosen to be random samples from subpopulations of lung-cancer patients and cancer-free individuals.[^cancer]
The scientific question of interest is the difference in the smoking habits between two groups.
The results of the survey are:
```{r cancer}
cancer <- tribble(
  ~group, ~n, ~smokers,
  "Cancer patients", 86, 82,
  "Control group", 86, 72
)
```
```{r echo=FALSE,results='asis'}
cancer
```

## Two Sample Binomial Model

In implementing this model, we have just two data points (cancer patients and
control group) and a binomial sampling model, in which the population
proportions of smokers in each group appear as parameters.  Quantities of
interest such as the difference in the population proportions and the log of
the odds ratio are computed in the generated quantities section. Uniform priors
on the population proportions are used in this example.

$$
\begin{aligned}[t]
r_i &\sim \mathsf{Binomial}(n_i, \pi_i)
\end{aligned}
$$
Additionally the difference,
$$
\delta = \pi_1 - \pi_2 ,
$$
and the log-odds ratio,
$$
\lambda = \log\left(\frac{\pi_1}{1 - \pi_1}\right) - \log \left( \frac{\pi_2}{1 - \pi_2} \right) ,
$$

It places uniform priors (Beta priors) are placed on $\pi$,
$$
\begin{aligned}
\pi_i &\sim \mathsf{Beta}(1, 1)
\end{aligned}
$$

The difference between and log odds ratio are defined in the `generated quantities` block.

```{r}
cancer_data <- list(
  r <- cancer$smokers,
  n <- cancer$n,
  # beta prior on pi
  p_a = rep(1, 2),
  p_b = rep(1, 2)
)
```

The Stan model for this is:
```{r cancer_mod1,results='hide',cache.extra=tools::md5sum("stan/cancer1.stan")}
cancer_mod1 <- stan_model("stan/cancer1.stan")
```
```{r echo=FALSE,results='asis'}
cancer_mod1
```

Now estimate the model:
```{r cancer_fit1,results='hide'}
cancer_fit1 <- sampling(cancer_mod1, cancer_data)
```
```{r}
cancer_fit1
```

## Binomial Logit Model of the Difference

An alternative parameterization directly models the difference in the population proportion.

$$
\begin{aligned}[t]
r_i &\sim \mathsf{Binomial}(n_i, \pi_i) \\
\pi_1 &= \frac{1}{1 + \exp(-(\alpha + \beta)} \\
\pi_2 &= \frac{1}{1 + \exp(-\alpha))}
\end{aligned}
$$
The parameters $\alpha$ and $\beta$ are given weakly informative priors on the log-odds scale,
$$
\begin{aligned}
\alpha &\sim N(0, 10)\\
\beta &\sim N(0, 2.5)
\end{aligned}
$$

```{r cancer_mod2,results='hide',cache.extra=tools::md5sum("stan/cancer2.stan")}
cancer_mod2 <- stan_model("stan/cancer2.stan")
```
```{r echo=FALSE,results='asis'}
cancer_mod2
```

Re-use `r` and `n` values from `cancer_data`, but add the appropriate values for the prior distributions.
```{r cancer_data2}
cancer_data2 <- within(cancer_data, {
  p_a <- p_b <- NULL
  a_loc <- b_loc <- 0
  a_scale <- 10
  b_scale <- 2.5
})
```

Sample from the model:
```{r cancer_fit2,results='hide'}
cancer_fit2 <- sampling(cancer_mod2, cancer_data2)
```
```{r}
cancer_fit2
```

## Questions

1.  Expression the Binomial Logit model of the Difference as a regression
1.  What number of success and failures is a `Beta(1,1)` prior equivalent to?

[^cancer]: This example is derived from Simon Jackman,
    "[Cancer: difference in two binomial proportions](https://web-beta.archive.org/web/20070601000000*/http://jackman.stanford.edu:80/mcmc/cancer.odc)",
    *BUGS Examples,* 2007-07-24, This examples comes from @JohnsonAlbert1999a, using data from @Dorn1954a.