Covariate Likelihood

The CovariateLikelihood component

The CovariateLikelihood class calculates the log-likelihood of a set of data points (a time series) given a population model. It determines (and adds) the log-likelihood of each data point given a parameterised distribution assigned to all points.

% explain

A Covariate Likelihood element is made of the following components:

popmodel: A PopModelODE element, usually a reference to the population model used by the STreeLikelihood component used in the PhyDyn analysis.
data : A string made of rows and columns of data in table format, where the first column must correspond to time and the second column to the covariate value of interest.
covariate-expression: A mathematical expression used to calculate the covariate value of interest.
One of the following,
- covariate-distribution : The parametric distribution used to calculate the log-likelihood of each data point, OR
- distribution-expression : A mathematical expression describing the log-density function of the covariate distribution - in case the distribution is not available

The Covariate Likelihood is usually used as a prior and in conjunction with PhyDyn's tree likelihood component.

Example: Prevalence

Let's say our PhyDyn analysis defines a population model (PopModelODE) containing a (say, non-deme) variable infections that keeps track of the accumulated number of infections in time. Let's assume that we have seroprevalence information for two dates, 2020.412 and 2020.6, with confidence intervals that correspond to a standard deviation of 0.5, and that the total population size is 1500000. The covariate likelihood component that calculates the log-likelihood of the prevalence calculated from the trajectories of the population model referenced by ID 'seirmodel', given the two data points, is:

<distribution id="seir.seroprevalencelh.t" spec="phydyn.covariate.CovariateLikelihood" 
    popmodel='@seirmodel' >
    <data>
        time,sp
        2020.412, 6.3
        2020.6, 6.5
    </data>
	<covariate-expression> (infections / 150000)*100 </covariate-expression>
	<covariate-distribution spec="phydyn.covariate.distribution.Normal"  mean="sp" sigma="0.5"/>  
</distribution>

Note the following:

The data element generates 2 data points and, for each row/data-point, binds the values of the first and second column to variables time and sp, respectively.
The covariate-expression formula is used to calculate the value of prevalence at any given the time point in our trajectory. In our example, the 'covariate-expression' is calculated twice for each population trajectory.
Each point is assigned the Normal distribution with sigma 0.5 and mean equal to the value of sp entered in the table e.g. the first point has N(6.3, 0.5).
Let d

The data

Covariate Expression

The expression, whose syntax is identical to the follows the same rules (syntax) used to write the matrix equations should be written in terms of the variables used by the population model

The Covariate Distribution and point likelihood

The Covariate Distribution Expression

<distribution id="seir.seroprevalencelh.t" spec="phydyn.covariate.CovariateLikelihood" 
    popmodel='@seirmodel' >
    <data>
        time,sp
        2020.412, 6.3
        2020.6, 6.5
    </data>
	<covariate-expression> (infections / 150000)*100 </covariate-expression>
	<distribution-expression> -(log(sigma*sqrt(2*PI))) - 0.5*(spVal - sp)*(spVal-sp)/(sigma*sigma) </distribution-expression>  
 
</distribution>

Extended Example: Prevalence

Let's now consider the case where the standard deviation values of the normal distributions that describe each data point are provided as data i.e. as a column in our table names sigma, and that the total population size values per data point (constant 1500000 in our case) are provided in column popSize. The data element looks like this:

    <data>
        time,sp, sigma, popSize
        2020.412, 6.3, 0.45, 1500000
        2020.6, 6.5, 0.6, 1500000

The corresponding Covariate Likelihood component is written as follows:

<distribution id="seir.seroprevalencelh.t" spec="phydyn.covariate.CovariateLikelihood" 
    popmodel='@seirmodel' >
	<covariate-expression> (infections / popSize)*100 </covariate-expression>
	<covariate-distribution spec="phydyn.covariate.distribution.Normal"  mean="sp" sigma="sp"/>  
    <data>
        time,sp, sigma, popSize
        2020.412, 6.3, 0.45, 1500000
        2020.6, 6.5, 0.6, 1500000
    </data>
</distribution>