From edf8e3ccc261c190cbf75e284dc9ec85506d91df Mon Sep 17 00:00:00 2001 From: Samuel Brand <48288458+SamuelBrand1@users.noreply.github.com> Date: Mon, 25 Mar 2024 22:45:58 +0000 Subject: [PATCH 1/7] Update description of discrete delay distributions --- model/equations.md | 44 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 38 insertions(+), 6 deletions(-) diff --git a/model/equations.md b/model/equations.md index ddf4ca21..e2a45fc1 100755 --- a/model/equations.md +++ b/model/equations.md @@ -56,15 +56,49 @@ Where $X$ and $Z$ are the design matrices for fixed and random effects, respecti ### Generation interval and delay to reporting time of reference +In our epidemiological modelling we represent connected events with delayed cause and effect as having delay distributions. Delay distributions represent the chance of paired events with a *primary* time $s$ and a *secondary* time $t \geq s$. Example distributions in this form: + +- **The generation interval.** This models the delay between infection time (primmary) and infectee time (secondary). +- **The incubation period**. This models the delay between infection time (primary) and the onset-of-symptoms time (secondary). +- **The reporting delay.** This models the delay between an onset time/specimen time (primary) and the reporting time (secondary). + 1. The generation interval is the random time between the infection of an index infection and the infection of a secondary infection. 2. The reporting reference time delay is the random time between infection of an eventual case and the reference time of the case ascertainment (see [Epinowcast definition](https://package.epinowcast.org/dev/articles/model.html#decomposition-into-expected-final-notifications-and-report-delay-components)). -This is a discrete time model, likely to use daily dynamics. Therefore, the distributions of the random time intervals above must be expressed as discrete probability mass functions (PMFs) over discrete time lags. +We intend to use discrete time models, likely daily dynamics. However, delay distributions are often reported in the literature as *continuous distributions*, either because the underlying data was on a fine-grained scale or because of analytic convenience. Additionally, if we are making inference on these distributions rather than using literature estimates it might be more convenient to use a parametric form of a continuous distribution (e.g. a Log-Normal distribution). + +Apart from user defined probability mass functions (PMFs) as in [EpiSewer](https://github.com/adrian-lison/EpiSewer/blob/main/vignettes/model-definition.md), creating consistent usage of discrete distributions based on associated continuous distributions is discussed by Park et al[^1]. The approach in Park et al is to treat the continuous representation of the delay distribution as generating the discrete representation through *interval censoring*. Interval censoring happens when an event time (either primary, secondary or both) are only known to occur within an interval. + +[^1]: [Park, SW, et al *Medrxiv* 2024](https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1) + +### Interval censoring in days with uniform primary event time + +Most of our use-cases will use double censoring of events into days; that is both primary and secondary events are censored onto a day. In a slight abuse of notation, we can treat $s,t$ as determining days *and* the continuous time earliest time point in a day. Let the continuous delay distribution have a density function $f$. Then, as per Park *et al*, the probability that the secondary event time $S$ occurs in day $t$ (i.e. \$ S \\in \[t, t+1)\$), given that the primary event time $P$ occurred in day $s$ (i.e. $P\in[s, s)$) is, + +$$ +\mathbb{P}(S = t| P = s) = \int_s^{s+1} \int_t^{t+1} g_P(x) f(y-x) \text{d}y \text{d}x. +$$ + +Where $g_P(x)$ is the density of the primary time conditioned to be within $[s, s+1)$ and $f(\tau) = 0$ for $\tau < 0$ is understood. + +This equation is tricky to implement numerically for two reasons: -Options for discretisation: -- User defined PMF (see [EpiSewer](https://github.com/adrian-lison/EpiSewer/blob/main/vignettes/model-definition.md) or wastewater model) +- In general, double integrals are numerically unstable in a number of cases. +- $g_P$ is not specified. -- Discretized PMF from a continuous distribution for the generation interval, (see [preprint](https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1)). +One option, which was assessed as robust in Park *et al*, is to approximate $g_P$ as uniform within the interval $[s, s+1)$. Using this approximation we can rewrite, + +$$ +\mathbb{P}(S = t| P = s) = \int_0^{1} \int_{t-s}^{t -s +1} f(y-x) \text{d}y \text{d}x. +$$ + +Which shows that, as expected, the discrete delay probability only depends on the day difference $T = t-s = \tau$. Finally, we can swap the integrals and use the [PDF of sum random variables identity](https://en.wikipedia.org/wiki/Convolution_of_probability_distributions) to write, + +$$ +\mathbb{P}(T = \tau) = F_{T+U}(\tau+1) - F_{T+U}(\tau). +$$ + +Where $F_{T+U}$ is the cumulative probabilty function of the delay $T$ with density $f$ and $U \sim \mathcal{U}[0,1]$. Calculating $F_{T+U}$ for any analytical distribution and value of $\tau = 0, 1, 2,...$ is a _single integral_ which has stable numerical quadrature properties. See [here](https://github.com/CDCgov/Rt-without-renewal/blob/401e028600cecebc76682023eb215d31ead6326d/EpiAware/src/EpiAwareUtils/censored_pmf.jl#L63C1-L75C4) for an example implementation. ### Reporting delay between the time of reference and the time of report @@ -92,8 +126,6 @@ The [hazard](https://en.wikipedia.org/wiki/Proportional_hazards_model) of a surv $$h_{t,d} = P(\text{delay}=d|\text{delay} \geq d, W_{t,d}).$$ - - ## Signals ### Hospitalizations From ec32f75ffda34cd53f5537b5f00c88031e7a0694 Mon Sep 17 00:00:00 2001 From: Samuel Brand <48288458+SamuelBrand1@users.noreply.github.com> Date: Mon, 25 Mar 2024 22:48:20 +0000 Subject: [PATCH 2/7] remove double desciption --- model/equations.md | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/model/equations.md b/model/equations.md index e2a45fc1..f13b626f 100755 --- a/model/equations.md +++ b/model/equations.md @@ -62,22 +62,19 @@ In our epidemiological modelling we represent connected events with delayed caus - **The incubation period**. This models the delay between infection time (primary) and the onset-of-symptoms time (secondary). - **The reporting delay.** This models the delay between an onset time/specimen time (primary) and the reporting time (secondary). -1. The generation interval is the random time between the infection of an index infection and the infection of a secondary infection. -2. The reporting reference time delay is the random time between infection of an eventual case and the reference time of the case ascertainment (see [Epinowcast definition](https://package.epinowcast.org/dev/articles/model.html#decomposition-into-expected-final-notifications-and-report-delay-components)). - We intend to use discrete time models, likely daily dynamics. However, delay distributions are often reported in the literature as *continuous distributions*, either because the underlying data was on a fine-grained scale or because of analytic convenience. Additionally, if we are making inference on these distributions rather than using literature estimates it might be more convenient to use a parametric form of a continuous distribution (e.g. a Log-Normal distribution). Apart from user defined probability mass functions (PMFs) as in [EpiSewer](https://github.com/adrian-lison/EpiSewer/blob/main/vignettes/model-definition.md), creating consistent usage of discrete distributions based on associated continuous distributions is discussed by Park et al[^1]. The approach in Park et al is to treat the continuous representation of the delay distribution as generating the discrete representation through *interval censoring*. Interval censoring happens when an event time (either primary, secondary or both) are only known to occur within an interval. [^1]: [Park, SW, et al *Medrxiv* 2024](https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1) -### Interval censoring in days with uniform primary event time +### Interval censoring in days with uniform primary event time and right truncation Most of our use-cases will use double censoring of events into days; that is both primary and secondary events are censored onto a day. In a slight abuse of notation, we can treat $s,t$ as determining days *and* the continuous time earliest time point in a day. Let the continuous delay distribution have a density function $f$. Then, as per Park *et al*, the probability that the secondary event time $S$ occurs in day $t$ (i.e. \$ S \\in \[t, t+1)\$), given that the primary event time $P$ occurred in day $s$ (i.e. $P\in[s, s)$) is, $$ \mathbb{P}(S = t| P = s) = \int_s^{s+1} \int_t^{t+1} g_P(x) f(y-x) \text{d}y \text{d}x. -$$ +$$ Where $g_P(x)$ is the density of the primary time conditioned to be within $[s, s+1)$ and $f(\tau) = 0$ for $\tau < 0$ is understood. @@ -92,13 +89,33 @@ $$ \mathbb{P}(S = t| P = s) = \int_0^{1} \int_{t-s}^{t -s +1} f(y-x) \text{d}y \text{d}x. $$ -Which shows that, as expected, the discrete delay probability only depends on the day difference $T = t-s = \tau$. Finally, we can swap the integrals and use the [PDF of sum random variables identity](https://en.wikipedia.org/wiki/Convolution_of_probability_distributions) to write, +Which shows that, as expected, the discrete delay probability only depends on the day difference $T = t-s = \tau$. Finally, we can swap the integrals and use the [PDF of summed random variables identity](https://en.wikipedia.org/wiki/Convolution_of_probability_distributions) to write, $$ \mathbb{P}(T = \tau) = F_{T+U}(\tau+1) - F_{T+U}(\tau). $$ -Where $F_{T+U}$ is the cumulative probabilty function of the delay $T$ with density $f$ and $U \sim \mathcal{U}[0,1]$. Calculating $F_{T+U}$ for any analytical distribution and value of $\tau = 0, 1, 2,...$ is a _single integral_ which has stable numerical quadrature properties. See [here](https://github.com/CDCgov/Rt-without-renewal/blob/401e028600cecebc76682023eb215d31ead6326d/EpiAware/src/EpiAwareUtils/censored_pmf.jl#L63C1-L75C4) for an example implementation. +Where $F_{T+U}$ is the cumulative probability function of the delay $T$ with density $f$ and $U \sim \mathcal{U}[0,1]$. The vector $[\mathbb{P}(T = \tau)]_{\tau=0,1,\dots}$ is a discretised PMF associated with the continuous delay distribution for $S - P$. + +In applied modelling we need $p_d$ to be finite length, which we do by conditioning $T\leq T_{max}$ for some value of $T_{max}$, this is commonly call _right truncation_ of the distribution. The right truncated PMF we use in modelling given a continuous distribution for $S-P$ and $T_{max}$ is: + +$$ +p_d(\tau) = {\mathbb{P}(T = \tau) \over \sum_{\tau' = 0}^{T_{max}} \mathbb{P}(T = \tau')} \qquad \forall \tau = 0, \dots, T_{max}. +$$ + +Calculating $F_{T+U}$ for any analytical distribution and value of $\tau = 0, 1, 2,...$ is a _single integral_ which has stable numerical quadrature properties. See [here](https://github.com/CDCgov/Rt-without-renewal/blob/401e028600cecebc76682023eb215d31ead6326d/EpiAware/src/EpiAwareUtils/censored_pmf.jl#L63C1-L75C4) for an example implementation. + +### Left truncation for the generation interval + +It is typical to also condition on the delay between infector and infectee being at least one day; that is if $T$ models the generation interval delay then $T>0$. + +The reason for this is that if we allow zero delay infections, then consistently we should also model subsequent new infections from those new infections that also happen to occur with zero delay, and so on. This leads to requiring tracking of infection generations within a single time step. **If we consider same-day infection-infector events to be epidemiologically reasonable for a pathogen of interest it would be preferable to model using a shorter than daily time step.** + +For the discretised generation interval the pmf vector is, + +$$ +p_d(\tau) = {\mathbb{P}(T = \tau) \over \sum_{\tau' = 1}^{T_{max}} \mathbb{P}(T = \tau')} \qquad \forall \tau = 1, \dots, T_{max}. +$$ ### Reporting delay between the time of reference and the time of report From f209ae3001d68d94f42e9daaf18caac68de7a0d9 Mon Sep 17 00:00:00 2001 From: Samuel Brand <48288458+SamuelBrand1@users.noreply.github.com> Date: Mon, 25 Mar 2024 22:50:05 +0000 Subject: [PATCH 3/7] minor eq fix --- model/equations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/model/equations.md b/model/equations.md index f13b626f..185e7265 100755 --- a/model/equations.md +++ b/model/equations.md @@ -70,7 +70,7 @@ Apart from user defined probability mass functions (PMFs) as in [EpiSewer](https ### Interval censoring in days with uniform primary event time and right truncation -Most of our use-cases will use double censoring of events into days; that is both primary and secondary events are censored onto a day. In a slight abuse of notation, we can treat $s,t$ as determining days *and* the continuous time earliest time point in a day. Let the continuous delay distribution have a density function $f$. Then, as per Park *et al*, the probability that the secondary event time $S$ occurs in day $t$ (i.e. \$ S \\in \[t, t+1)\$), given that the primary event time $P$ occurred in day $s$ (i.e. $P\in[s, s)$) is, +Most of our use-cases will use double censoring of events into days; that is both primary and secondary events are censored onto a day. In a slight abuse of notation, we can treat $s,t$ as determining days *and* the continuous time earliest time point in a day. Let the continuous delay distribution have a density function $f$. Then, as per Park *et al*, the probability that the secondary event time $S$ occurs in day $t$ (i.e. $S \in [t, t+1)$), given that the primary event time $P$ occurred in day $s$ (i.e. $P\in[s, s)$) is, $$ \mathbb{P}(S = t| P = s) = \int_s^{s+1} \int_t^{t+1} g_P(x) f(y-x) \text{d}y \text{d}x. From 270b4ee27da3d153af60cefd3f3d1f2dd57e3b14 Mon Sep 17 00:00:00 2001 From: Samuel Brand <48288458+SamuelBrand1@users.noreply.github.com> Date: Mon, 25 Mar 2024 22:54:48 +0000 Subject: [PATCH 4/7] Update equations.md --- model/equations.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/model/equations.md b/model/equations.md index 185e7265..4a5b881f 100755 --- a/model/equations.md +++ b/model/equations.md @@ -99,9 +99,9 @@ Where $F_{T+U}$ is the cumulative probability function of the delay $T$ with den In applied modelling we need $p_d$ to be finite length, which we do by conditioning $T\leq T_{max}$ for some value of $T_{max}$, this is commonly call _right truncation_ of the distribution. The right truncated PMF we use in modelling given a continuous distribution for $S-P$ and $T_{max}$ is: -$$ -p_d(\tau) = {\mathbb{P}(T = \tau) \over \sum_{\tau' = 0}^{T_{max}} \mathbb{P}(T = \tau')} \qquad \forall \tau = 0, \dots, T_{max}. -$$ +```math +p_d(\tau) = \mathbb{P}(T = \tau) \Big/ \sum_{\tau' = 0}^{T_{max}} \mathbb{P}(T = \tau') \qquad \forall \tau = 0, \dots, T_{max}. +``` Calculating $F_{T+U}$ for any analytical distribution and value of $\tau = 0, 1, 2,...$ is a _single integral_ which has stable numerical quadrature properties. See [here](https://github.com/CDCgov/Rt-without-renewal/blob/401e028600cecebc76682023eb215d31ead6326d/EpiAware/src/EpiAwareUtils/censored_pmf.jl#L63C1-L75C4) for an example implementation. @@ -114,7 +114,7 @@ The reason for this is that if we allow zero delay infections, then consistently For the discretised generation interval the pmf vector is, $$ -p_d(\tau) = {\mathbb{P}(T = \tau) \over \sum_{\tau' = 1}^{T_{max}} \mathbb{P}(T = \tau')} \qquad \forall \tau = 1, \dots, T_{max}. +p_d(\tau) = \mathbb{P}(T = \tau) \Big/ \sum_{\tau' = 1}^{T_{max}} \mathbb{P}(T = \tau') \qquad \forall \tau = 1, \dots, T_{max}. $$ ### Reporting delay between the time of reference and the time of report From c77548e92121a576162de4c83b5f00579a58c2a1 Mon Sep 17 00:00:00 2001 From: Samuel Brand <48288458+SamuelBrand1@users.noreply.github.com> Date: Mon, 25 Mar 2024 22:58:05 +0000 Subject: [PATCH 5/7] update equations.md contents --- model/equations.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/model/equations.md b/model/equations.md index 4a5b881f..4b3d1b5e 100755 --- a/model/equations.md +++ b/model/equations.md @@ -9,7 +9,9 @@ Eventually, it should incorporate information about the software implementation - [Infections](#infections) - [Latent processes for reproductive number](#latent-processes-for-reproductive-number) - [Generation interval and delay to reporting time of reference](#generation-interval-and-delay-to-reporting-time-of-reference) - - [Reporting delay between the time of reference and the time of report](#reporting-delay-between-the-time-of-reference-and-the-time-of-report) + - [Interval censoring in days with uniform primary event time and right truncation](#interval-censoring-in-days-with-uniform-primary-event-time-and-right-truncation) + - [Left truncation for the generation interval](#left-truncation-for-the-generation interval) + - [Reporting delay between the time of reference and the time of report](#reporting-delay-between-the-time-of-reference-and-the-time-of-report) - [Signals](#signals) - [Hospitalizations](#hospitalizations) - [Wastewater](#wastewater) @@ -68,7 +70,7 @@ Apart from user defined probability mass functions (PMFs) as in [EpiSewer](https [^1]: [Park, SW, et al *Medrxiv* 2024](https://www.medrxiv.org/content/10.1101/2024.01.12.24301247v1) -### Interval censoring in days with uniform primary event time and right truncation +#### Interval censoring in days with uniform primary event time and right truncation Most of our use-cases will use double censoring of events into days; that is both primary and secondary events are censored onto a day. In a slight abuse of notation, we can treat $s,t$ as determining days *and* the continuous time earliest time point in a day. Let the continuous delay distribution have a density function $f$. Then, as per Park *et al*, the probability that the secondary event time $S$ occurs in day $t$ (i.e. $S \in [t, t+1)$), given that the primary event time $P$ occurred in day $s$ (i.e. $P\in[s, s)$) is, @@ -105,7 +107,7 @@ p_d(\tau) = \mathbb{P}(T = \tau) \Big/ \sum_{\tau' = 0}^{T_{max}} \mathbb{P}(T = Calculating $F_{T+U}$ for any analytical distribution and value of $\tau = 0, 1, 2,...$ is a _single integral_ which has stable numerical quadrature properties. See [here](https://github.com/CDCgov/Rt-without-renewal/blob/401e028600cecebc76682023eb215d31ead6326d/EpiAware/src/EpiAwareUtils/censored_pmf.jl#L63C1-L75C4) for an example implementation. -### Left truncation for the generation interval +#### Left truncation for the generation interval It is typical to also condition on the delay between infector and infectee being at least one day; that is if $T$ models the generation interval delay then $T>0$. @@ -117,7 +119,7 @@ $$ p_d(\tau) = \mathbb{P}(T = \tau) \Big/ \sum_{\tau' = 1}^{T_{max}} \mathbb{P}(T = \tau') \qquad \forall \tau = 1, \dots, T_{max}. $$ -### Reporting delay between the time of reference and the time of report +#### Reporting delay between the time of reference and the time of report The reporting delay is the random time between the time of reference of a case and the time of report when the data of that case becomes available to analysts (see [Epinowcast definition](https://package.epinowcast.org/dev/articles/model.html#decomposition-into-expected-final-notifications-and-report-delay-components)). From c5c635694b05a7c98210c05310c0aad185eafac8 Mon Sep 17 00:00:00 2001 From: Samuel Brand <48288458+SamuelBrand1@users.noreply.github.com> Date: Mon, 25 Mar 2024 22:58:42 +0000 Subject: [PATCH 6/7] Update equations.md From 4b685f9cbcf9b52c6c7ea584f4a95b9a8cbee43e Mon Sep 17 00:00:00 2001 From: Samuel Brand <48288458+SamuelBrand1@users.noreply.github.com> Date: Mon, 25 Mar 2024 22:59:11 +0000 Subject: [PATCH 7/7] fix contents --- model/equations.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/model/equations.md b/model/equations.md index 4b3d1b5e..f1e81c2c 100755 --- a/model/equations.md +++ b/model/equations.md @@ -10,7 +10,7 @@ Eventually, it should incorporate information about the software implementation - [Latent processes for reproductive number](#latent-processes-for-reproductive-number) - [Generation interval and delay to reporting time of reference](#generation-interval-and-delay-to-reporting-time-of-reference) - [Interval censoring in days with uniform primary event time and right truncation](#interval-censoring-in-days-with-uniform-primary-event-time-and-right-truncation) - - [Left truncation for the generation interval](#left-truncation-for-the-generation interval) + - [Left truncation for the generation interval](#left-truncation-for-the-generation-interval) - [Reporting delay between the time of reference and the time of report](#reporting-delay-between-the-time-of-reference-and-the-time-of-report) - [Signals](#signals) - [Hospitalizations](#hospitalizations)