-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
build_quarto_site(unfreeze = FALSE) for release 1.2.0
- Loading branch information
1 parent
bbc337b
commit 6d5d162
Showing
26 changed files
with
1,821 additions
and
77 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
{ | ||
"hash": "83d1c933472b25458b2eeb6efde3f062", | ||
"result": { | ||
"engine": "knitr", | ||
"markdown": "---\ntitle: \"Conditional Probability\"\nauthor: \"John Benninghoff\"\ndate: '2024-03-26'\ndate-modified: '2024-03-26'\ncategories: notes\norder: 105\noutput:\n html_notebook:\n theme:\n version: 5\n preset: bootstrap\n css: assets/extra.css\n pandoc_args: --shift-heading-level-by=1\n toc: yes\n toc_float:\n collapsed: no\n smooth_scroll: no\n---\n\n\nAn exploration of conditional probabilities in R, inspired by a 2015 blog [post](https://statmodeling.stat.columbia.edu/2015/07/09/hey-guess-what-there-really-is-a-hot-hand/) on the hot hand.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# no libraries\n```\n:::\n\n\n# Background\n\nI recently stumbled across a blog post from 2015,\n\"[Hey - guess what? There really is a hot hand!](https://statmodeling.stat.columbia.edu/2015/07/09/hey-guess-what-there-really-is-a-hot-hand/)\"\nThe article had some R code in it that was intriguing, exploring the following proposition the post\nquoted from a [paper](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2627354) it cited:\n\n> Jack takes a coin from his pocket and decides that he will flip it 4 times in a row, writing down\n> the outcome of each flip on a scrap of paper. After he is done flipping, he will look at the flips\n> that immediately followed an outcome of heads, and compute the relative frequency of heads on\n> those flips. Because the coin is fair, Jack of course expects this conditional relative frequency\n> to be equal to the probability of flipping a heads: 0.5. Shockingly, Jack is wrong. If he were to\n> sample 1 million fair coins and flip each coin 4 times, observing the conditional relative\n> frequency for each coin, on average the relative frequency would be approximately 0.4.\n\nWhat? OK, so let's follow along with the R code. The first block runs the simulation:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrep <- 1e6\nn <- 4\ndata <- array(sample(c(0, 1), rep * n, replace = TRUE), c(rep, n))\nprob <- rep(NA, rep)\nfor (i in 1:rep) {\n heads1 <- data[i, 1:(n - 1)] == 1\n heads2 <- data[i, 2:n] == 1\n prob[i] <- sum(heads1 & heads2) / sum(heads1)\n}\n```\n:::\n\n\nThe second block naively calculates the average:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nprint(mean(prob))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] NaN\n```\n\n\n:::\n:::\n\n\nThis doesn't work since, as the post points out, \"sometimes the first three flips are tails, so the\nprobability is 0/0.\" Discarding these gets us the correct average, which is approximately 0.4 and\nnot 0.5 as the quote predicts:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nprint(mean(prob, na.rm = TRUE))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.4050585\n```\n\n\n:::\n:::\n\n\nReading this and trying the code myself led me to ask, What the heck is going on here?\n\n# Huh?\n\nLet's follow along with the R code and try to work out why the conditional probability is 0.4.\n\n## Simulation\n\nLooking at the first part of the code:\n\n```r\nrep <- 1e6\nn <- 4\ndata <- array(sample(c(0, 1), rep * n, replace = TRUE), c(rep, n))\n```\n\nThis code simulates flipping the coin 4 times in a row 1 million times and stores the results in a\nmatrix:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(data)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n [,1] [,2] [,3] [,4]\n[1,] 1 1 0 0\n[2,] 1 1 0 1\n[3,] 1 1 1 0\n[4,] 0 0 1 1\n[5,] 0 0 1 0\n[6,] 0 1 0 0\n```\n\n\n:::\n:::\n\n\nBy convention, 1 is heads and 0 is tails. If the coin is fair, we should expect the proportion of\nheads to be about 0.5.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmean(data)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.5000165\n```\n\n\n:::\n\n```{.r .cell-code}\nround(mean(data), 2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.5\n```\n\n\n:::\n:::\n\n\nWhile there is some expected variance, the proportion is approximately 0.5.\n\n## Calculation\n\nLooking at the second part of the code:\n\n```r\nprob <- rep(NA, rep)\nfor (i in 1:rep) {\n heads1 <- data[i, 1:(n - 1)] == 1\n heads2 <- data[i, 2:n] == 1\n prob[i] <- sum(heads1 & heads2) / sum(heads1)\n}\n```\n\nThis counts the relative frequency of heads immediately after heads, by finding heads in positions\n1-3 (`heads1`), comparing to heads in positions 2-4 (`heads2`), and calculating the proportion of\nheads after heads (`prob[i]`). To see how this works in practice, we can test all possible\ncombinations of heads and tails:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncalc_prob <- function(flips) {\n heads1 <- flips[1:(n - 1)] == 1\n heads2 <- flips[2:n] == 1\n sum(heads1 & heads2) / sum(heads1)\n}\n\ntest_data <- expand.grid(0:1, 0:1, 0:1, 0:1)\ntest_prob <- rep(NA, nrow(test_data))\n\nfor (i in seq_len(nrow(test_data))) {\n f <- test_data[i, ]\n input <- paste0(\"c(\", toString(f), \")\")\n test_prob[i] <- calc_prob(f)\n print(paste0(i, \": \", input, \" = \", test_prob[i]))\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"1: c(0, 0, 0, 0) = NaN\"\n[1] \"2: c(1, 0, 0, 0) = 0\"\n[1] \"3: c(0, 1, 0, 0) = 0\"\n[1] \"4: c(1, 1, 0, 0) = 0.5\"\n[1] \"5: c(0, 0, 1, 0) = 0\"\n[1] \"6: c(1, 0, 1, 0) = 0\"\n[1] \"7: c(0, 1, 1, 0) = 0.5\"\n[1] \"8: c(1, 1, 1, 0) = 0.666666666666667\"\n[1] \"9: c(0, 0, 0, 1) = NaN\"\n[1] \"10: c(1, 0, 0, 1) = 0\"\n[1] \"11: c(0, 1, 0, 1) = 0\"\n[1] \"12: c(1, 1, 0, 1) = 0.5\"\n[1] \"13: c(0, 0, 1, 1) = 1\"\n[1] \"14: c(1, 0, 1, 1) = 0.5\"\n[1] \"15: c(0, 1, 1, 1) = 1\"\n[1] \"16: c(1, 1, 1, 1) = 1\"\n```\n\n\n:::\n:::\n\n\nThere are 16 possible combinations of 4 coin flips, with 5 possible outcomes: 0, 1/2, 2/3, 1, and\n0/0 (`NaN`). \n\n## How it works\n\nLooking at the permutations, the conditional probability starts to make sense. Calculating the\nconditional relative frequency for all permutations gives us:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmean(test_prob, na.rm = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.4047619\n```\n\n\n:::\n:::\n\n\nWhich is approximately 0.4. As we repeat coin flips, the frequency approaches this value.\n\nRecall that the [conditional probability](https://en.wikipedia.org/wiki/Conditional_probability)\nis: $\\large{P(A \\mid B) = \\frac{P(A \\cap B)}{P(B)}}$ (I had to look it up).\n\nIn this case, we are trying to calculate the probability of heads ($A$) given heads occurring at\nleast once ($B$), which is equivalent to `mean(test_prob, na.rm = TRUE)`.\n", | ||
"supporting": [], | ||
"filters": [ | ||
"rmarkdown/pagebreak.lua" | ||
], | ||
"includes": {}, | ||
"engineDependencies": {}, | ||
"preserve": {}, | ||
"postProcess": true | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.