diff --git a/README.md b/README.md index d321452..34767aa 100644 --- a/README.md +++ b/README.md @@ -52,6 +52,12 @@ Notebooks in this package: R](https://adv-r.hadley.nz/index.html), second edition, with comparisons to solutions from [Advanced R Solutions](https://advanced-r-solutions.rbind.io). +- [Conditional + Probability](https://jabenninghoff.github.io/rtraining/analysis/cond-prob.html) + (2024-03-26): An exploration of conditional probabilities in R, + inspired by a 2015 blog + [post](https://statmodeling.stat.columbia.edu/2015/07/09/hey-guess-what-there-really-is-a-hot-hand/) + on the hot hand. - [ggplot2 (Getting started)](https://jabenninghoff.github.io/rtraining/analysis/ggplot2-1.html) (2022-11-20): Workbook for completing quizzes and exercises from the diff --git a/_freeze/analysis/cond-prob/execute-results/html.json b/_freeze/analysis/cond-prob/execute-results/html.json new file mode 100644 index 0000000..2591df3 --- /dev/null +++ b/_freeze/analysis/cond-prob/execute-results/html.json @@ -0,0 +1,15 @@ +{ + "hash": "83d1c933472b25458b2eeb6efde3f062", + "result": { + "engine": "knitr", + "markdown": "---\ntitle: \"Conditional Probability\"\nauthor: \"John Benninghoff\"\ndate: '2024-03-26'\ndate-modified: '2024-03-26'\ncategories: notes\norder: 105\noutput:\n html_notebook:\n theme:\n version: 5\n preset: bootstrap\n css: assets/extra.css\n pandoc_args: --shift-heading-level-by=1\n toc: yes\n toc_float:\n collapsed: no\n smooth_scroll: no\n---\n\n\nAn exploration of conditional probabilities in R, inspired by a 2015 blog [post](https://statmodeling.stat.columbia.edu/2015/07/09/hey-guess-what-there-really-is-a-hot-hand/) on the hot hand.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# no libraries\n```\n:::\n\n\n# Background\n\nI recently stumbled across a blog post from 2015,\n\"[Hey - guess what? There really is a hot hand!](https://statmodeling.stat.columbia.edu/2015/07/09/hey-guess-what-there-really-is-a-hot-hand/)\"\nThe article had some R code in it that was intriguing, exploring the following proposition the post\nquoted from a [paper](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2627354) it cited:\n\n> Jack takes a coin from his pocket and decides that he will flip it 4 times in a row, writing down\n> the outcome of each flip on a scrap of paper. After he is done flipping, he will look at the flips\n> that immediately followed an outcome of heads, and compute the relative frequency of heads on\n> those flips. Because the coin is fair, Jack of course expects this conditional relative frequency\n> to be equal to the probability of flipping a heads: 0.5. Shockingly, Jack is wrong. If he were to\n> sample 1 million fair coins and flip each coin 4 times, observing the conditional relative\n> frequency for each coin, on average the relative frequency would be approximately 0.4.\n\nWhat? OK, so let's follow along with the R code. The first block runs the simulation:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrep <- 1e6\nn <- 4\ndata <- array(sample(c(0, 1), rep * n, replace = TRUE), c(rep, n))\nprob <- rep(NA, rep)\nfor (i in 1:rep) {\n heads1 <- data[i, 1:(n - 1)] == 1\n heads2 <- data[i, 2:n] == 1\n prob[i] <- sum(heads1 & heads2) / sum(heads1)\n}\n```\n:::\n\n\nThe second block naively calculates the average:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nprint(mean(prob))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] NaN\n```\n\n\n:::\n:::\n\n\nThis doesn't work since, as the post points out, \"sometimes the first three flips are tails, so the\nprobability is 0/0.\" Discarding these gets us the correct average, which is approximately 0.4 and\nnot 0.5 as the quote predicts:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nprint(mean(prob, na.rm = TRUE))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.4050585\n```\n\n\n:::\n:::\n\n\nReading this and trying the code myself led me to ask, What the heck is going on here?\n\n# Huh?\n\nLet's follow along with the R code and try to work out why the conditional probability is 0.4.\n\n## Simulation\n\nLooking at the first part of the code:\n\n```r\nrep <- 1e6\nn <- 4\ndata <- array(sample(c(0, 1), rep * n, replace = TRUE), c(rep, n))\n```\n\nThis code simulates flipping the coin 4 times in a row 1 million times and stores the results in a\nmatrix:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(data)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n [,1] [,2] [,3] [,4]\n[1,] 1 1 0 0\n[2,] 1 1 0 1\n[3,] 1 1 1 0\n[4,] 0 0 1 1\n[5,] 0 0 1 0\n[6,] 0 1 0 0\n```\n\n\n:::\n:::\n\n\nBy convention, 1 is heads and 0 is tails. If the coin is fair, we should expect the proportion of\nheads to be about 0.5.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmean(data)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.5000165\n```\n\n\n:::\n\n```{.r .cell-code}\nround(mean(data), 2)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.5\n```\n\n\n:::\n:::\n\n\nWhile there is some expected variance, the proportion is approximately 0.5.\n\n## Calculation\n\nLooking at the second part of the code:\n\n```r\nprob <- rep(NA, rep)\nfor (i in 1:rep) {\n heads1 <- data[i, 1:(n - 1)] == 1\n heads2 <- data[i, 2:n] == 1\n prob[i] <- sum(heads1 & heads2) / sum(heads1)\n}\n```\n\nThis counts the relative frequency of heads immediately after heads, by finding heads in positions\n1-3 (`heads1`), comparing to heads in positions 2-4 (`heads2`), and calculating the proportion of\nheads after heads (`prob[i]`). To see how this works in practice, we can test all possible\ncombinations of heads and tails:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncalc_prob <- function(flips) {\n heads1 <- flips[1:(n - 1)] == 1\n heads2 <- flips[2:n] == 1\n sum(heads1 & heads2) / sum(heads1)\n}\n\ntest_data <- expand.grid(0:1, 0:1, 0:1, 0:1)\ntest_prob <- rep(NA, nrow(test_data))\n\nfor (i in seq_len(nrow(test_data))) {\n f <- test_data[i, ]\n input <- paste0(\"c(\", toString(f), \")\")\n test_prob[i] <- calc_prob(f)\n print(paste0(i, \": \", input, \" = \", test_prob[i]))\n}\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"1: c(0, 0, 0, 0) = NaN\"\n[1] \"2: c(1, 0, 0, 0) = 0\"\n[1] \"3: c(0, 1, 0, 0) = 0\"\n[1] \"4: c(1, 1, 0, 0) = 0.5\"\n[1] \"5: c(0, 0, 1, 0) = 0\"\n[1] \"6: c(1, 0, 1, 0) = 0\"\n[1] \"7: c(0, 1, 1, 0) = 0.5\"\n[1] \"8: c(1, 1, 1, 0) = 0.666666666666667\"\n[1] \"9: c(0, 0, 0, 1) = NaN\"\n[1] \"10: c(1, 0, 0, 1) = 0\"\n[1] \"11: c(0, 1, 0, 1) = 0\"\n[1] \"12: c(1, 1, 0, 1) = 0.5\"\n[1] \"13: c(0, 0, 1, 1) = 1\"\n[1] \"14: c(1, 0, 1, 1) = 0.5\"\n[1] \"15: c(0, 1, 1, 1) = 1\"\n[1] \"16: c(1, 1, 1, 1) = 1\"\n```\n\n\n:::\n:::\n\n\nThere are 16 possible combinations of 4 coin flips, with 5 possible outcomes: 0, 1/2, 2/3, 1, and\n0/0 (`NaN`). \n\n## How it works\n\nLooking at the permutations, the conditional probability starts to make sense. Calculating the\nconditional relative frequency for all permutations gives us:\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmean(test_prob, na.rm = TRUE)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] 0.4047619\n```\n\n\n:::\n:::\n\n\nWhich is approximately 0.4. As we repeat coin flips, the frequency approaches this value.\n\nRecall that the [conditional probability](https://en.wikipedia.org/wiki/Conditional_probability)\nis: $\\large{P(A \\mid B) = \\frac{P(A \\cap B)}{P(B)}}$ (I had to look it up).\n\nIn this case, we are trying to calculate the probability of heads ($A$) given heads occurring at\nleast once ($B$), which is equivalent to `mean(test_prob, na.rm = TRUE)`.\n", + "supporting": [], + "filters": [ + "rmarkdown/pagebreak.lua" + ], + "includes": {}, + "engineDependencies": {}, + "preserve": {}, + "postProcess": true + } +} \ No newline at end of file diff --git a/docs/LICENSE.html b/docs/LICENSE.html index fb633db..0dab3d0 100644 --- a/docs/LICENSE.html +++ b/docs/LICENSE.html @@ -2,7 +2,7 @@ - + @@ -152,6 +152,12 @@ Using Rcpp + + + + + + + + + + + + + + + + + + + +
  • Notebooks
  • -
    Categories
    All (14)
    advanced-r (4)
    exercises (9)
    ggplot2 (5)
    notes (4)
    reading (1)
    +
    Categories
    All (15)
    advanced-r (4)
    exercises (9)
    ggplot2 (5)
    notes (5)
    reading (1)
    @@ -340,7 +346,7 @@

    Notebooks

    -
    +
    -
    +
    -
    +
    -
    +
    -
    +
    -
    +
    +
    +

    +
     
    +

    +
    + + +
    +
    -
    +
    -
    +
    -
    +
    -
    +
    -
    +
    -
    +
    -
    +
    -
    +

    diff --git a/docs/index.xml b/docs/index.xml index dc42a81..48b48a3 100644 --- a/docs/index.xml +++ b/docs/index.xml @@ -10,8 +10,445 @@ My notes and experiences learning R and RStudio, bundled as an R package (work-in-progress), and published to GitHub Pages using Quarto. en-us -quarto-1.4.551 -Thu, 21 Dec 2023 06:00:00 GMT +quarto-1.4.552 +Tue, 26 Mar 2024 05:00:00 GMT + + Conditional Probability + John Benninghoff + https://jabenninghoff.github.io/rtraining/analysis/cond-prob.html + An exploration of conditional probabilities in R, inspired by a 2015 blog
    post on the hot hand.

    +
    +
    # no libraries
    +
    +
    +

    Background

    +

    I recently stumbled across a blog post from 2015, “Hey - guess what? There really is a hot hand!” The article had some R code in it that was intriguing, exploring the following proposition the post quoted from a paper it cited:

    +
    +

    Jack takes a coin from his pocket and decides that he will flip it 4 times in a row, writing down the outcome of each flip on a scrap of paper. After he is done flipping, he will look at the flips that immediately followed an outcome of heads, and compute the relative frequency of heads on those flips. Because the coin is fair, Jack of course expects this conditional relative frequency to be equal to the probability of flipping a heads: 0.5. Shockingly, Jack is wrong. If he were to sample 1 million fair coins and flip each coin 4 times, observing the conditional relative frequency for each coin, on average the relative frequency would be approximately 0.4.

    +
    +

    What? OK, so let’s follow along with the R code. The first block runs the simulation:

    +
    +
    rep <- 1e6
    +n <- 4
    +data <- array(sample(c(0, 1), rep * n, replace = TRUE), c(rep, n))
    +prob <- rep(NA, rep)
    +for (i in 1:rep) {
    +  heads1 <- data[i, 1:(n - 1)] == 1
    +  heads2 <- data[i, 2:n] == 1
    +  prob[i] <- sum(heads1 & heads2) / sum(heads1)
    +}
    +
    +

    The second block naively calculates the average:

    +
    +
    print(mean(prob))
    +
    +
    [1] NaN
    +
    +
    +

    This doesn’t work since, as the post points out, “sometimes the first three flips are tails, so the probability is 0/0.” Discarding these gets us the correct average, which is approximately 0.4 and not 0.5 as the quote predicts:

    +
    +
    print(mean(prob, na.rm = TRUE))
    +
    +
    [1] 0.4050585
    +
    +
    +

    Reading this and trying the code myself led me to ask, What the heck is going on here?

    +
    +
    +

    Huh?

    +

    Let’s follow along with the R code and try to work out why the conditional probability is 0.4.

    +
    +

    Simulation

    +

    Looking at the first part of the code:

    +
    rep <- 1e6
    +n <- 4
    +data <- array(sample(c(0, 1), rep * n, replace = TRUE), c(rep, n))
    +

    This code simulates flipping the coin 4 times in a row 1 million times and stores the results in a matrix:

    +
    +
    head(data)
    +
    +
         [,1] [,2] [,3] [,4]
    +[1,]    1    1    0    0
    +[2,]    1    1    0    1
    +[3,]    1    1    1    0
    +[4,]    0    0    1    1
    +[5,]    0    0    1    0
    +[6,]    0    1    0    0
    +
    +
    +

    By convention, 1 is heads and 0 is tails. If the coin is fair, we should expect the proportion of heads to be about 0.5.

    +
    +
    mean(data)
    +
    +
    [1] 0.5000165
    +
    +
    round(mean(data), 2)
    +
    +
    [1] 0.5
    +
    +
    +

    While there is some expected variance, the proportion is approximately 0.5.

    +
    +
    +

    Calculation

    +

    Looking at the second part of the code:

    +
    prob <- rep(NA, rep)
    +for (i in 1:rep) {
    +  heads1 <- data[i, 1:(n - 1)] == 1
    +  heads2 <- data[i, 2:n] == 1
    +  prob[i] <- sum(heads1 & heads2) / sum(heads1)
    +}
    +

    This counts the relative frequency of heads immediately after heads, by finding heads in positions 1-3 (heads1), comparing to heads in positions 2-4 (heads2), and calculating the proportion of heads after heads (prob[i]). To see how this works in practice, we can test all possible combinations of heads and tails:

    +
    +
    calc_prob <- function(flips) {
    +  heads1 <- flips[1:(n - 1)] == 1
    +  heads2 <- flips[2:n] == 1
    +  sum(heads1 & heads2) / sum(heads1)
    +}
    +
    +test_data <- expand.grid(0:1, 0:1, 0:1, 0:1)
    +test_prob <- rep(NA, nrow(test_data))
    +
    +for (i in seq_len(nrow(test_data))) {
    +  f <- test_data[i, ]
    +  input <- paste0("c(", toString(f), ")")
    +  test_prob[i] <- calc_prob(f)
    +  print(paste0(i, ": ", input, " = ", test_prob[i]))
    +}
    +
    +
    [1] "1: c(0, 0, 0, 0) = NaN"
    +[1] "2: c(1, 0, 0, 0) = 0"
    +[1] "3: c(0, 1, 0, 0) = 0"
    +[1] "4: c(1, 1, 0, 0) = 0.5"
    +[1] "5: c(0, 0, 1, 0) = 0"
    +[1] "6: c(1, 0, 1, 0) = 0"
    +[1] "7: c(0, 1, 1, 0) = 0.5"
    +[1] "8: c(1, 1, 1, 0) = 0.666666666666667"
    +[1] "9: c(0, 0, 0, 1) = NaN"
    +[1] "10: c(1, 0, 0, 1) = 0"
    +[1] "11: c(0, 1, 0, 1) = 0"
    +[1] "12: c(1, 1, 0, 1) = 0.5"
    +[1] "13: c(0, 0, 1, 1) = 1"
    +[1] "14: c(1, 0, 1, 1) = 0.5"
    +[1] "15: c(0, 1, 1, 1) = 1"
    +[1] "16: c(1, 1, 1, 1) = 1"
    +
    +
    +

    There are 16 possible combinations of 4 coin flips, with 5 possible outcomes: 0, 1/2, 2/3, 1, and 0/0 (NaN).

    +
    +
    +

    How it works

    +

    Looking at the permutations, the conditional probability starts to make sense. Calculating the conditional relative frequency for all permutations gives us:

    +
    +
    mean(test_prob, na.rm = TRUE)
    +
    +
    [1] 0.4047619
    +
    +
    +

    Which is approximately 0.4. As we repeat coin flips, the frequency approaches this value.

    +

    Recall that the conditional probability is: (I had to look it up).

    +

    In this case, we are trying to calculate the probability of heads () given heads occurring at least once (), which is equivalent to mean(test_prob, na.rm = TRUE).

    + + +
    +
    + + ]]> + notes + https://jabenninghoff.github.io/rtraining/analysis/cond-prob.html + Tue, 26 Mar 2024 05:00:00 GMT + Using Rcpp John Benninghoff diff --git a/docs/listings.json b/docs/listings.json index 99af14e..1102f4c 100644 --- a/docs/listings.json +++ b/docs/listings.json @@ -7,6 +7,7 @@ "/analysis/r-setup-log.html", "/analysis/FaultTree.html", "/analysis/using-Rcpp.html", + "/analysis/cond-prob.html", "/analysis/advanced-r-1.html", "/analysis/advanced-r-2.html", "/analysis/advanced-r-3.html", diff --git a/docs/search.json b/docs/search.json index 23a9c82..b78d6c5 100644 --- a/docs/search.json +++ b/docs/search.json @@ -4,14 +4,21 @@ "href": "changelog.html", "title": "Changelog", "section": "", - "text": "Maintenance updates" + "text": "Added Conditional Probability: An exploration of conditional probabilities in R, inspired by a 2015 blog post on the hot hand" + }, + { + "objectID": "changelog.html#rtraining-1.2.0", + "href": "changelog.html#rtraining-1.2.0", + "title": "Changelog", + "section": "", + "text": "Added Conditional Probability: An exploration of conditional probabilities in R, inspired by a 2015 blog post on the hot hand" }, { "objectID": "changelog.html#rtraining-1.1.10", "href": "changelog.html#rtraining-1.1.10", "title": "Changelog", - "section": "", - "text": "Maintenance updates" + "section": "rtraining 1.1.10", + "text": "rtraining 1.1.10\n\nMaintenance updates" }, { "objectID": "changelog.html#rtraining-1.1.9", @@ -926,13 +933,6 @@ "Using Rcpp" ] }, - { - "objectID": "LICENSE.html", - "href": "LICENSE.html", - "title": "MIT License", - "section": "", - "text": "MIT License\nCopyright (c) 2020 rtraining authors\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\nTHE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE." - }, { "objectID": "analysis/r-books.html", "href": "analysis/r-books.html", @@ -1010,6 +1010,46 @@ "R Books" ] }, + { + "objectID": "LICENSE.html", + "href": "LICENSE.html", + "title": "MIT License", + "section": "", + "text": "MIT License\nCopyright (c) 2020 rtraining authors\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:\nThe above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.\nTHE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE." + }, + { + "objectID": "analysis/cond-prob.html", + "href": "analysis/cond-prob.html", + "title": "Conditional Probability", + "section": "", + "text": "An exploration of conditional probabilities in R, inspired by a 2015 blog post on the hot hand.\n# no libraries", + "crumbs": [ + "Changelog", + "Conditional Probability" + ] + }, + { + "objectID": "analysis/cond-prob.html#background", + "href": "analysis/cond-prob.html#background", + "title": "Conditional Probability", + "section": "Background", + "text": "Background\nI recently stumbled across a blog post from 2015, “Hey - guess what? There really is a hot hand!” The article had some R code in it that was intriguing, exploring the following proposition the post quoted from a paper it cited:\n\nJack takes a coin from his pocket and decides that he will flip it 4 times in a row, writing down the outcome of each flip on a scrap of paper. After he is done flipping, he will look at the flips that immediately followed an outcome of heads, and compute the relative frequency of heads on those flips. Because the coin is fair, Jack of course expects this conditional relative frequency to be equal to the probability of flipping a heads: 0.5. Shockingly, Jack is wrong. If he were to sample 1 million fair coins and flip each coin 4 times, observing the conditional relative frequency for each coin, on average the relative frequency would be approximately 0.4.\n\nWhat? OK, so let’s follow along with the R code. The first block runs the simulation:\n\nrep <- 1e6\nn <- 4\ndata <- array(sample(c(0, 1), rep * n, replace = TRUE), c(rep, n))\nprob <- rep(NA, rep)\nfor (i in 1:rep) {\n heads1 <- data[i, 1:(n - 1)] == 1\n heads2 <- data[i, 2:n] == 1\n prob[i] <- sum(heads1 & heads2) / sum(heads1)\n}\n\nThe second block naively calculates the average:\n\nprint(mean(prob))\n\n[1] NaN\n\n\nThis doesn’t work since, as the post points out, “sometimes the first three flips are tails, so the probability is 0/0.” Discarding these gets us the correct average, which is approximately 0.4 and not 0.5 as the quote predicts:\n\nprint(mean(prob, na.rm = TRUE))\n\n[1] 0.4050585\n\n\nReading this and trying the code myself led me to ask, What the heck is going on here?", + "crumbs": [ + "Changelog", + "Conditional Probability" + ] + }, + { + "objectID": "analysis/cond-prob.html#huh", + "href": "analysis/cond-prob.html#huh", + "title": "Conditional Probability", + "section": "Huh?", + "text": "Huh?\nLet’s follow along with the R code and try to work out why the conditional probability is 0.4.\n\nSimulation\nLooking at the first part of the code:\nrep <- 1e6\nn <- 4\ndata <- array(sample(c(0, 1), rep * n, replace = TRUE), c(rep, n))\nThis code simulates flipping the coin 4 times in a row 1 million times and stores the results in a matrix:\n\nhead(data)\n\n [,1] [,2] [,3] [,4]\n[1,] 1 1 0 0\n[2,] 1 1 0 1\n[3,] 1 1 1 0\n[4,] 0 0 1 1\n[5,] 0 0 1 0\n[6,] 0 1 0 0\n\n\nBy convention, 1 is heads and 0 is tails. If the coin is fair, we should expect the proportion of heads to be about 0.5.\n\nmean(data)\n\n[1] 0.5000165\n\nround(mean(data), 2)\n\n[1] 0.5\n\n\nWhile there is some expected variance, the proportion is approximately 0.5.\n\n\nCalculation\nLooking at the second part of the code:\nprob <- rep(NA, rep)\nfor (i in 1:rep) {\n heads1 <- data[i, 1:(n - 1)] == 1\n heads2 <- data[i, 2:n] == 1\n prob[i] <- sum(heads1 & heads2) / sum(heads1)\n}\nThis counts the relative frequency of heads immediately after heads, by finding heads in positions 1-3 (heads1), comparing to heads in positions 2-4 (heads2), and calculating the proportion of heads after heads (prob[i]). To see how this works in practice, we can test all possible combinations of heads and tails:\n\ncalc_prob <- function(flips) {\n heads1 <- flips[1:(n - 1)] == 1\n heads2 <- flips[2:n] == 1\n sum(heads1 & heads2) / sum(heads1)\n}\n\ntest_data <- expand.grid(0:1, 0:1, 0:1, 0:1)\ntest_prob <- rep(NA, nrow(test_data))\n\nfor (i in seq_len(nrow(test_data))) {\n f <- test_data[i, ]\n input <- paste0(\"c(\", toString(f), \")\")\n test_prob[i] <- calc_prob(f)\n print(paste0(i, \": \", input, \" = \", test_prob[i]))\n}\n\n[1] \"1: c(0, 0, 0, 0) = NaN\"\n[1] \"2: c(1, 0, 0, 0) = 0\"\n[1] \"3: c(0, 1, 0, 0) = 0\"\n[1] \"4: c(1, 1, 0, 0) = 0.5\"\n[1] \"5: c(0, 0, 1, 0) = 0\"\n[1] \"6: c(1, 0, 1, 0) = 0\"\n[1] \"7: c(0, 1, 1, 0) = 0.5\"\n[1] \"8: c(1, 1, 1, 0) = 0.666666666666667\"\n[1] \"9: c(0, 0, 0, 1) = NaN\"\n[1] \"10: c(1, 0, 0, 1) = 0\"\n[1] \"11: c(0, 1, 0, 1) = 0\"\n[1] \"12: c(1, 1, 0, 1) = 0.5\"\n[1] \"13: c(0, 0, 1, 1) = 1\"\n[1] \"14: c(1, 0, 1, 1) = 0.5\"\n[1] \"15: c(0, 1, 1, 1) = 1\"\n[1] \"16: c(1, 1, 1, 1) = 1\"\n\n\nThere are 16 possible combinations of 4 coin flips, with 5 possible outcomes: 0, 1/2, 2/3, 1, and 0/0 (NaN).\n\n\nHow it works\nLooking at the permutations, the conditional probability starts to make sense. Calculating the conditional relative frequency for all permutations gives us:\n\nmean(test_prob, na.rm = TRUE)\n\n[1] 0.4047619\n\n\nWhich is approximately 0.4. As we repeat coin flips, the frequency approaches this value.\nRecall that the conditional probability is: \\(\\large{P(A \\mid B) = \\frac{P(A \\cap B)}{P(B)}}\\) (I had to look it up).\nIn this case, we are trying to calculate the probability of heads (\\(A\\)) given heads occurring at least once (\\(B\\)), which is equivalent to mean(test_prob, na.rm = TRUE).", + "crumbs": [ + "Changelog", + "Conditional Probability" + ] + }, { "objectID": "analysis/ggplot2-5.html", "href": "analysis/ggplot2-5.html", @@ -1398,42 +1438,42 @@ { "objectID": "NEWS.html", "href": "NEWS.html", - "title": "rtraining 1.1.10", + "title": "rtraining 1.2.0", "section": "", - "text": "Maintenance updates" + "text": "Added Conditional Probability: An exploration of conditional probabilities in R, inspired by a 2015 blog post on the hot hand" }, { "objectID": "NEWS.html#new-features", "href": "NEWS.html#new-features", - "title": "rtraining 1.1.10", + "title": "rtraining 1.2.0", "section": "New Features", "text": "New Features\n\nmajor update: build-site has been replaced with an R function, build_analysis_site(), which retains all of the functionality of the old shell script. It is still considered Experimental, due to lack of test coverage and some features that are not implemented, but should work for projects with limited pkgdown customization. The update also includes a function to convert notebooks to html_document, to_document().\nbuild_analysis_site() will be migrated to rdev in a future release" }, { "objectID": "NEWS.html#new-content", "href": "NEWS.html#new-content", - "title": "rtraining 1.1.10", + "title": "rtraining 1.2.0", "section": "New Content", "text": "New Content\n\nR Setup Log: added notes on the package layout I use for “analysis” packages (will be converted to an rdev vignette in a future release)\nR Setup Log: added notes on my R Workflow\nR Training Log: updated with notes on my current book, R Packages" }, { "objectID": "NEWS.html#new-contentfeatures", "href": "NEWS.html#new-contentfeatures", - "title": "rtraining 1.1.10", + "title": "rtraining 1.2.0", "section": "New Content/Features", "text": "New Content/Features\n\nci(): run continuous integration tests locally: lint, R CMD check, and style (off by default).\ncheck_renv(): convenience function that runs renv status(), clean(), and optionally update() (on by default)." }, { "objectID": "NEWS.html#new-contentfeatures-1", "href": "NEWS.html#new-contentfeatures-1", - "title": "rtraining 1.1.10", + "title": "rtraining 1.2.0", "section": "New Content/Features", "text": "New Content/Features\n\nR Setup Log Notebook (r-setup-log.Rmd): My notes on my personal R setup\nminor updates to R Training Log\nstyle_all(): style all .R and .Rmd files in a project using styler\nlint_all(): lint all .R and .Rmd files in a project using lintr\nadd GitHub Actions for continuous integration testing" }, { "objectID": "NEWS.html#new-contentfeatures-2", "href": "NEWS.html#new-contentfeatures-2", - "title": "rtraining 1.1.10", + "title": "rtraining 1.2.0", "section": "New Content/Features", "text": "New Content/Features\n\nR Training Log Notebook (r-training-log.Rmd): Notes on learning R and RStudio\ntools/setup-r: shell script to install development packages to site repository on macOS + Homebrew\nbuild-site: build a website from a collection of R Notebooks (html_notebook) in notebooks/" }, diff --git a/docs/sitemap.xml b/docs/sitemap.xml index 510df92..f8cf123 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -10,67 +10,71 @@ https://jabenninghoff.github.io/rtraining/analysis/r-setup-log.html - 2024-03-10T15:50:34.723Z + 2024-03-10T16:36:55.928Z https://jabenninghoff.github.io/rtraining/analysis/advanced-r-3.html - 2024-03-10T15:14:34.413Z + 2024-03-10T16:36:55.924Z https://jabenninghoff.github.io/rtraining/analysis/FaultTree.html - 2024-03-10T15:15:48.920Z + 2024-03-10T16:36:55.925Z https://jabenninghoff.github.io/rtraining/analysis/ggplot2-1.html - 2024-03-10T15:16:14.661Z + 2024-03-10T16:36:55.925Z https://jabenninghoff.github.io/rtraining/analysis/ggplot2-2.html - 2024-03-10T15:16:45.063Z + 2024-03-10T16:36:55.926Z https://jabenninghoff.github.io/rtraining/analysis/ggplot2-4.html - 2024-03-10T15:18:15.297Z + 2024-03-10T16:36:55.927Z https://jabenninghoff.github.io/rtraining/analysis/using-Rcpp.html - 2024-03-10T15:41:50.933Z + 2024-03-10T16:36:55.928Z + + + https://jabenninghoff.github.io/rtraining/analysis/r-books.html + 2024-03-10T16:36:55.927Z https://jabenninghoff.github.io/rtraining/LICENSE.html 2021-10-04T01:37:12.312Z - https://jabenninghoff.github.io/rtraining/analysis/r-books.html - 2024-03-10T15:52:07.811Z + https://jabenninghoff.github.io/rtraining/analysis/cond-prob.html + 2024-03-30T18:21:28.934Z https://jabenninghoff.github.io/rtraining/analysis/ggplot2-5.html - 2024-03-10T15:18:55.267Z + 2024-03-10T16:36:55.927Z https://jabenninghoff.github.io/rtraining/analysis/ggplot2-3.html - 2024-03-10T15:17:40.250Z + 2024-03-10T16:36:55.926Z https://jabenninghoff.github.io/rtraining/analysis/r-training-log.html - 2024-03-10T15:29:00.538Z + 2024-03-10T16:36:55.928Z https://jabenninghoff.github.io/rtraining/analysis/advanced-r-4.html - 2024-03-10T15:15:11.663Z + 2024-03-10T16:36:55.925Z https://jabenninghoff.github.io/rtraining/analysis/advanced-r-2.html - 2024-03-10T15:13:45.584Z + 2024-03-10T16:36:55.924Z https://jabenninghoff.github.io/rtraining/analysis/advanced-r-1.html - 2024-03-10T15:12:55.046Z + 2024-03-10T16:36:55.923Z https://jabenninghoff.github.io/rtraining/NEWS.html - 2024-03-10T15:12:25.615Z + 2024-03-30T18:30:31.750Z https://jabenninghoff.github.io/rtraining/index.html