|
18 | 18 | "cell_type": "markdown",
|
19 | 19 | "metadata": {},
|
20 | 20 | "source": [
|
21 |
| - "The negative binomial distribution is flexible with multiple possible formulations. For example, it can model the number of *trials* or *failures* in a sequence of independent Bernoulli trials with probability of success (or failure) $p$ until the $k$-th \"success\". If we want to model the number of trials until the $k$-th success, we can use the following definition:\n", |
| 21 | + "The negative binomial distribution is flexible with multiple possible formulations. For example, it can model the number of *trials* or *failures* in a sequence of independent Bernoulli trials with probability of success (or failure) $p$ until the $k$-th \"success\". If we want to model the number of trials until the $k$-th success, the probability mass function (pmf) results:\n", |
22 | 22 | "\n",
|
23 | 23 | "$$\n",
|
24 |
| - "Y \\sim \\text{NB}(k, p)\n", |
| 24 | + "p(y | k, p)= \\binom{y - 1}{y-k}(1 -p)^{y - k}p^k\n", |
25 | 25 | "$$\n",
|
26 | 26 | "\n",
|
27 | 27 | "where $0 \\le p \\le 1$ is the probability of success in each Bernoulli trial, $k > 0$, usually integer, $y \\in \\{k, k + 1, \\cdots\\}$ and $Y$ is the number of trials until the $k$-th success.\n",
|
28 | 28 | "\n",
|
29 |
| - "The probability mass function (pmf) is \n", |
30 |
| - "\n", |
31 |
| - "$$\n", |
32 |
| - "p(y | k, p)= \\binom{y - 1}{y-k}(1 -p)^{y - k}p^k\n", |
33 |
| - "$$\n", |
34 |
| - "\n", |
35 | 29 | "In this case, since we are modeling the number of *trials* until the $k$-th success, $y$ starts at $k$ and can be any integer greater than or equal to $k$. If instead we want to model the number of *failures* until the $k$-th success, we can use the same definition but $Y$ represents failures and starts at $0$ and there's a slightly different pmf:\n",
|
36 | 30 | "\n",
|
37 | 31 | "$$\n",
|
38 | 32 | "p(y | k, p)= \\binom{y + k - 1}{k-1}(1 -p)^{y}p^k\n",
|
39 | 33 | "$$\n",
|
40 | 34 | "\n",
|
41 |
| - "In this case, $y$ starts at $0$ and can be any integer greater than or equal to $0$. When modeling failures, $y$ starts at 0, when modeling trials, $y$ starts at $k$.\n", |
42 |
| - "\n", |
43 |
| - "\n" |
| 35 | + "In this case, $y$ starts at $0$ and can be any integer greater than or equal to $0$. When modeling failures, $y$ starts at 0, when modeling trials, $y$ starts at $k$." |
44 | 36 | ]
|
45 | 37 | },
|
46 | 38 | {
|
47 | 39 | "cell_type": "markdown",
|
48 | 40 | "metadata": {},
|
49 | 41 | "source": [
|
50 |
| - "These are not the only ways of defining the negative binomial distribution, there are plenty of options! One of the most interesting, and the one you see in [PyMC3](https://docs.pymc.io/api/distributions/discrete.html#pymc3.distributions.discrete.NegativeBinomial), the library we use in Bambi for the backend, is as a continuous mixture. The negative binomial distribution describes a Poisson random variable whose rate is also a random variable (not a fixed constant!) following a gamma distribution. Or in other words, conditional on a gamma-distributed variable $\\mu$, the variable $Y$ has a Poisson distribution with mean $\\mu$.\n", |
| 42 | + "These are not the only ways of defining the negative binomial distribution, there are plenty of options! One of the most interesting, and the one you see in [PyMC](https://www.pymc.io/projects/docs/en/stable/api/distributions/generated/pymc.NegativeBinomial.html), the library we use in Bambi for the backend, is as a continuous mixture. The negative binomial distribution describes a Poisson random variable whose rate is also a random variable (not a fixed constant!) following a gamma distribution. Or in other words, conditional on a gamma-distributed variable $\\mu$, the variable $Y$ has a Poisson distribution with mean $\\mu$.\n", |
51 | 43 | "\n",
|
52 | 44 | "Under this alternative definition, the pmf is\n",
|
53 | 45 | "\n",
|
|
96 | 88 | "cell_type": "markdown",
|
97 | 89 | "metadata": {},
|
98 | 90 | "source": [
|
99 |
| - "Scipy uses the number of *failures* until $k$ successes definition, therefore $y$ starts at 0. In the following plot, we have the probability of observing $y$ failures before we see $k=3$ successes. " |
| 91 | + "SciPy uses the number of *failures* until $k$ successes definition, therefore $y$ starts at 0. In the following plot, we have the probability of observing $y$ failures before we see $k=3$ successes. " |
100 | 92 | ]
|
101 | 93 | },
|
102 | 94 | {
|
|
0 commit comments