Merge pull request #179 from LCBC-UiO/dev

Getting close to rOpenSci submission
LCBC-UiO · Oct 20, 2023 · 1e6deab · 1e6deab
2 parents 6be5bed + 2b585ae
commit 1e6deab
Show file tree

Hide file tree

Showing 51 changed files with 1,625 additions and 319 deletions.
diff --git a/.github/workflows/R-CMD-check.yaml b/.github/workflows/R-CMD-check.yaml
@@ -26,7 +26,7 @@ jobs:
     env:
       GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
       R_KEEP_PKG_SOURCE: yes
-      MYPKG_EXTENDED_TESTS: ${{contains(github.event.head_commit.message,
+      GALAMM_EXTENDED_TESTS: ${{contains(github.event.head_commit.message,
                                'run-extended')}}
 
     steps:

diff --git a/NAMESPACE b/NAMESPACE
@@ -47,6 +47,7 @@ importFrom(stats,coef)
 importFrom(stats,deviance)
 importFrom(stats,family)
 importFrom(stats,fitted)
+importFrom(stats,formula)
 importFrom(stats,gaussian)
 importFrom(stats,logLik)
 importFrom(stats,nobs)

diff --git a/R/RcppExports.R b/R/RcppExports.R
@@ -1,6 +1,235 @@
 # Generated by using Rcpp::compileAttributes() -> do not edit by hand
 # Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393
 
+#' Evaluate the deviance at given values of random effects
+#'
+#'
+#' @param parlist An object of class \code{parameters<T>} containing the
+#'   parameters at which to evaluate the marginal log-likelihood.
+#' @param datlist An object of class \code{data<T>} containing the data with
+#'   which to evaluate the marginal log-likelihood.
+#' @param lp Vector with linear predictor values, with arguments of same
+#'   type as the template \code{T}.
+#' @param modvec Reference to a vector of pointers to objects of class
+#'   \code{Model<T>}, containing the necessary functions specific to the
+#'   exponential families used in the model.
+#' @param solver A solver for sparse linear systems of type
+#'   \code{Eigen::SimplicialLDLT<Eigen::SparseMatrix<T> >}.
+#' @param phi Vector of dispersion parameters, one for each model family.
+#' @return Model deviance, after integrating out the random effects. This
+#'   corresponds to \eqn{-2} times the marginal loglikelihood.
+#' @noRd
+NULL
+
+#' @title Evaluate marginal log-likelihood
+#'
+#' @description
+#' Implements penalized iteratively reweighted least squares for finding
+#' conditional modes of random effects, and returns the resulting marginal
+#' log-likelihood. The template \code{T} will typically be one of
+#' \code{double}, \code{autodiff:dual1st}, or \code{autodiff::dual2nd}.
+#'
+#'
+#' @param parlist An object of class \code{parameters<T>} containing the
+#'   parameters at which to evaluate the marginal log-likelihood.
+#' @param datlist An object of class \code{data<T>} containing the data with
+#'   which to evaluate the marginal log-likelihood.
+#' @param modvec Reference to a vector of pointers to objects of class
+#'   \code{Model<T>}, containing the necessary functions specific to the
+#'   exponential families used in the model.
+#' @return An object of class \code{logLikObject<T>}. See its definition for
+#'   details.
+#'
+#' @noRd
+NULL
+
+#' @title Set up parameter and model family
+#'
+#' @description
+#' Templated wrapper function which sets up the necessary parameters to
+#' evaluate the marginal likelihood. The template type \code{T} will typically
+#' be one of \code{double}, \code{autodiff::dual1st}, and
+#' \code{autodiff::dual2nd}.
+#'
+#'
+#' @param y Double precision vector of response values.
+#' @param trials Double precision vector with number of trials. When trials
+#' are not applicable, e.g., with Gaussian or Poisson responses, this should
+#' be a vector of ones.
+#' @param X Fixed effect model matrix.
+#' @param Zt Transpose of random effect model matrix.
+#' @param Lambdat Lower Cholesky factor of random effect covariance matrix.
+#' @param beta Double precision vector of fixed effects.
+#' @param theta Double precision vector with the unique elements of
+#'   \code{Lambdat}.
+#' @param theta_mapping Integer vector mapping elements of \code{theta} to the
+#'   positions in \code{Lambdat}.
+#' @param u_init Double precision vector with initial values of random
+#'   effects. These random effects should be standardized.
+#' @param lambda Double precision vector of factor loadings.
+#' @param lambda_mapping_X Integer vector mapping elements of
+#'   \code{lambda} to elements of \code{X}, in row-major order.
+#' @param lambda_mapping_Zt List of integer vectors mapping elements of
+#'   \code{lambda} to non-zero elements of \code{Zt} assuming compressed
+#'   sparse column format is used. If \code{lambda_mapping_Zt_covs} is of
+#'   length zero, then each list element in \code{lambda_mapping_Zt} should be
+#'   of length one, and it will then be multiplied by the corresponding element
+#'   of \code{Zt}.
+#' @param lambda_mapping_Zt_covs List of double precision vector. Must either
+#'   be of length zero, or the same length as \code{lambda_mapping_Zt_covs}.
+#'   Each list element contains potential covariates that the elements of
+#'   \code{lambda_mapping_Zt} should be multiplied with. If the list is of
+#'   length 0, all elements of \code{lambda_mapping_Zt} are implicitly
+#'   multiplied by 1.
+#' @param weights Double precision vector of weights, used in heteroscedastic
+#'   models.
+#' @param weights_mapping Integer vector mapping the elements of \code{weights}
+#'   to the rows of \code{X}.
+#' @param family Vector of strings defining the family or families. Each
+#'   vector element must currently be one of \code{"gaussian"},
+#'   \code{"binomial"}, or \code{"poisson"}.
+#' @param family_mapping Integer vector mapping elements of \code{family} to
+#'   the rows of \code{X}.
+#' @param k Double precision vector with pre-computed constant term in the
+#'   log-likelihood for each element in \code{family}.
+#' @param maxit_conditional_modes Integer specifying the maximum number of
+#'   iteration in penalized iteratively reweighted least squares algorithm
+#'   used to find the conditional modes of the random effects.
+#' @param lossvalue_tol Double precision scalar specifying the absolute
+#'   convergence criterion for the penalized iteratively reweighted least
+#'   squares algorithm used to find the conditional modes of the random
+#'   effects.
+#' @param reduced_hessian Boolean specifying whether the Hessian matrix of
+#'   second derivatives should be computed only with respect to \code{beta}
+#'   and \code{lambda}, in that order. This may be useful for getting a very
+#'   rough estimate of the inverse covariance matrix, when the full Hessian is
+#'   not positive definite.
+#'
+#' @return An \code{Rcpp::List} with the following elements. The element
+#' \code{logLik} will always be there, while the other will be there or not
+#' depending on the template type \code{T}.
+#'   * \code{logLik} Laplace approximate marginal log-likelihood at the
+#'     parameter values specified.
+#'   * \code{g} If \code{T} is \code{autodiff::dual1st} or
+#'     \code{autodiff::dual2nd}, the gradient is provided in this element as
+#'     a double precision vector.
+#'   * \code{H} If \code{T} is \code{autodiff::dual2nd}, the Hessian matrix
+#'     is provided in this element as a double precision matrix.
+#'   * \code{u} If \code{T} is \code{autodiff::dual2nd}, the conditional
+#'     modes of the standardized random effects are provided as a double
+#'     precision vector in this element.
+#'   * \code{V} If \code{T} is \code{autodiff::dual2nd}, the diagonal matrix
+#'     \eqn{V} with \eqn{b''(\nu_{i}) / \phi_{g(i)}} on the diagonal is
+#'     included in this element. See the paragraph below equation (13) in
+#'     \insertCite{sorensenLongitudinalModelingAgeDependent2023}{galamm} for
+#'     details.
+#'   * \code{phi} If \code{T} is \code{autodiff::dual2nd}, double precision
+#'     scalar containing the dispersion parameter of the model.
+#' @noRd
+NULL
+
+#' @title Evaluate the marginal likelihood
+#'
+#' @description
+#' This function evaluate the Laplace approximate marginal likelihood of a
+#' generalized additive latent and mixed model at a given set of parameters.
+#' The code uses elements generated by \code{lme4::glFormula}, and the
+#' documentation of \code{lme4} should be consulted for further details.
+#'
+#' @srrstats {G1.4a} Internal function documented.
+#'
+#' @param y Double precision vector of response values.
+#' @param trials Double precision vector with number of trials. When trials
+#' are not applicable, e.g., with Gaussian or Poisson responses, this should
+#' be a vector of ones.
+#' @param X Fixed effect model matrix.
+#' @param Zt Transpose of random effect model matrix.
+#' @param Lambdat Lower Cholesky factor of random effect covariance matrix.
+#' @param beta Double precision vector of fixed effects.
+#' @param theta Double precision vector with the unique elements of
+#'   \code{Lambdat}.
+#' @param theta_mapping Integer vector mapping elements of \code{theta} to the
+#'   positions in \code{Lambdat}.
+#' @param u_init Double precision vector with initial values of random
+#'   effects. These random effects should be standardized.
+#' @param lambda Double precision vector of factor loadings.
+#' @param lambda_mapping_X Integer vector mapping elements of
+#'   \code{lambda} to elements of \code{X}, in row-major order.
+#' @param lambda_mapping_Zt List of integer vectors mapping elements of
+#'   \code{lambda} to non-zero elements of \code{Zt} assuming compressed
+#'   sparse column format is used. If \code{lambda_mapping_Zt_covs} is of
+#'   length zero, then each list element in \code{lambda_mapping_Zt} should be
+#'   of length one, and it will then be multiplied by the corresponding element
+#'   of \code{Zt}.
+#' @param lambda_mapping_Zt_covs List of double precision vector. Must either
+#'   be of length zero, or the same length as \code{lambda_mapping_Zt_covs}.
+#'   Each list element contains potential covariates that the elements of
+#'   \code{lambda_mapping_Zt} should be multiplied with. If the list is of
+#'   length 0, all elements of \code{lambda_mapping_Zt} are implicitly
+#'   multiplied by 1.
+#' @param weights Double precision vector of weights, used in heteroscedastic
+#'   models.
+#' @param weights_mapping Integer vector mapping the elements of \code{weights}
+#'   to the rows of \code{X}.
+#' @param family Vector of strings defining the family or families. Each
+#'   vector element must currently be one of \code{"gaussian"},
+#'   \code{"binomial"}, or \code{"poisson"}.
+#' @param family_mapping Integer vector mapping elements of \code{family} to
+#'   the rows of \code{X}.
+#' @param k Double precision vector with pre-computed constant term in the
+#'   log-likelihood for each element in \code{family}.
+#' @param maxit_conditional_modes Integer specifying the maximum number of
+#'   iteration in penalized iteratively reweighted least squares algorithm
+#'   used to find the conditional modes of the random effects.
+#' @param lossvalue_tol Double precision scalar specifying the absolute
+#'   convergence criterion for the penalized iteratively reweighted least
+#'   squares algorithm used to find the conditional modes of the random
+#'   effects.
+#' @param gradient Boolean specifying whether to compute the gradient of the
+#'   log-likelhood with respect to all elements of \code{theta}, \code{beta},
+#'   \code{lambda}, and \code{weights}, in that order. If
+#'   \code{gradient = TRUE}, and \code{hessian = FALSE}, forward mode
+#'   automatic differentiation with first-order dual numbers are used. If also
+#'   \code{hessian = TRUE}, then second-order dual numbers are used instead.
+#' @param hessian Boolean specifying whether to compute the Hessian matrix of
+#'   second derivatives of the log-likelihood with respect to all elements of
+#'   \code{theta}, \code{beta}, \code{lambda}, and \code{weights}, in that
+#'   order. If \code{hessian = TRUE}, forward mode automatic differentiation
+#'   with second-order dual numbers are used.
+#' @param reduced_hessian Boolean specifying whether the Hessian matrix of
+#'   second derivatives should be computed only with respect to \code{beta}
+#'   and \code{lambda}, in that order. This may be useful for getting a very
+#'   rough estimate of the inverse covariance matrix, when the full Hessian is
+#'   not positive definite.
+#'
+#' @return An \code{Rcpp::List}, which will be converted to a \code{list} in
+#'   \code{R}, the following elements. The element \code{logLik} will always
+#'   be there, while the other will be there or not depending on arguments
+#'   \code{gradient} and \code{hessian}.
+#'   * \code{logLik} Laplace approximate marginal log-likelihood at the
+#'     parameter values specified.
+#'   * \code{g} If \code{gradient = TRUE} or \code{hessian = TRUE}, the
+#'     gradient is provided in this element as a double precision vector.
+#'   * \code{H} If \code{hessian = TRUE}, the Hessian matrix is provided in
+#'     this element as a double precision matrix.
+#'   * \code{u} If \code{hessian = TRUE}, the conditional modes of the
+#'     standardized random effects are provided as a double precision vector
+#'     in this element.
+#'   * \code{V} If \code{hessian = TRUE}, the diagonal matrix \eqn{V} with
+#'     \eqn{b''(v_{i}) / \phi_{g(i)}} on the diagonal is included in this
+#'     element. See the paragraph below equation (13) in
+#'     \insertCite{sorensenLongitudinalModelingAgeDependent2023}{galamm} for
+#'     details.
+#'   * \code{phi} If \code{hessian = TRUE}, double precision vector containing
+#'     the dispersion parameter of the model, for each model family.
+#'
+#' @details
+#' For many models, not all parameters exists. For example, without
+#' heteroscedastic residuals, the weights don't exist, and other models don't
+#' have factor loadings. For these cases, the corresponding argument (to
+#' \code{weights} or \code{lambda}) should be a correctly typed vector of
+#' length zero.
+#' @noRd
 marginal_likelihood <- function(y, trials, X, Zt, Lambdat, beta, theta, theta_mapping, u_init, lambda, lambda_mapping_X, lambda_mapping_Zt, lambda_mapping_Zt_covs, weights, weights_mapping, family, family_mapping, k, maxit_conditional_modes, lossvalue_tol, gradient, hessian, reduced_hessian = FALSE) {
     .Call(`_galamm_marginal_likelihood`, y, trials, X, Zt, Lambdat, beta, theta, theta_mapping, u_init, lambda, lambda_mapping_X, lambda_mapping_Zt, lambda_mapping_Zt_covs, weights, weights_mapping, family, family_mapping, k, maxit_conditional_modes, lossvalue_tol, gradient, hessian, reduced_hessian)
 }

diff --git a/R/VarCorr.R b/R/VarCorr.R
@@ -15,7 +15,7 @@ NULL
 #' @name VarCorr
 #' @aliases VarCorr VarCorr.galamm
 #'
-#' @return An object of class \code{VarCorr.galamm}.
+#' @return An object of class \code{c("VarCorr.galamm", "VarCorr.merMod")}.
 #' @export
 #'
 #' @seealso [print.VarCorr.galamm()] for the print function.
@@ -33,6 +33,10 @@ NULL
 #' # Extract information on variance and covariance
 #' VarCorr(mod)
 #'
+#' # Convert to data frame
+#' # (this invokes lme4's function as.data.frame.VarCorr.merMod)
+#' as.data.frame(VarCorr(mod))
+#'
 VarCorr.galamm <- function(x, sigma = 1, ...) {
   useSc <- Reduce(function(`&&`, y) y$family == "gaussian",
     family(x),
@@ -46,19 +50,20 @@ VarCorr.galamm <- function(x, sigma = 1, ...) {
       names(x$model$lmod$reTrms$cnms)
     ),
     useSc = useSc,
-    class = "VarCorr.galamm"
+    class = c("VarCorr.galamm", "VarCorr.merMod")
   )
 }
 
 
 #' @title Print method for variance-covariance objects
 #'
 #' @srrstats {G1.4} Function documented with roxygen2.
-#' @srrstats {G2.3b} Argument "comp" is case sensitive, as is documented here.
 #' @srrstats {G2.1a} Expected data types provided for all inputs.
+#' @srrstats {G2.3a} match.arg() used on "comp" argument.
+#' @srrstats {G2.3b} Argument "comp" is case sensitive, as is documented here.
 #'
-#' @param x An object of class \code{VarCorr.galamm}, returned from
-#'   \code{\link{VarCorr.galamm}}.
+#' @param x An object of class \code{c("VarCorr.galamm", "VarCorr.merMod")},
+#'   returned from \code{\link{VarCorr.galamm}}.
 #' @param digits Optional arguments specifying number of digits to use when
 #'   printing.
 #' @param comp Character vector of length 1 or 2 specifying which variance

diff --git a/R/confint.R b/R/confint.R
@@ -1,6 +1,7 @@
 #' @title Confidence intervals for model parameters
 #'
 #' @srrstats {G1.4} Function documented with roxygen2.
+#' @srrstats {G2.3a} match.arg() used on "method" argument.
 #' @srrstats {G2.3b} Arguments parm and method are case sensitive, as stated in
 #'   their documentation.
 #' @srrstats {G2.1a} Expected data types provided for all inputs.

diff --git a/R/data.R b/R/data.R
@@ -6,6 +6,8 @@
 #' \insertCite{skrondalGeneralizedLatentVariable2004;textual}{galamm}, where
 #' the dataset is used.
 #'
+#' @srrstats {G5.1} Dataset used to test package is exported.
+#'
 #' @format ## `epilep` A data frame with 236 rows and 7 columns:
 #' \describe{
 #'   \item{subj}{Subject ID.}
@@ -27,6 +29,8 @@
 #' Very basic mixed response dataset with one set of normally distributed
 #' responses and one set of binomially distributed responses.
 #'
+#' @srrstats {G5.1} Dataset used to test package is exported.
+#'
 #' @format ## `mresp` A data frame with 4000 rows and 5 columns:
 #' \describe{
 #'   \item{id}{Subject ID.}
@@ -46,6 +50,8 @@
 #' responses and one set of binomially distributed responses. The normally
 #' distributed response follow two different residual standard deviations.
 #'
+#' @srrstats {G5.1} Dataset used to test package is exported.
+#'
 #' @format ## `mresp` A data frame with 4000 rows and 5 columns:
 #' \describe{
 #'   \item{id}{Subject ID.}
@@ -72,6 +78,8 @@
 #' dataset is used. See also
 #' \insertCite{rabe-heskethCorrectingCovariateMeasurement2003;textual}{galamm}.
 #'
+#' @srrstats {G5.1} Dataset used to test package is exported.
+#'
 #' @format ## `diet` A data frame with 236 rows and 7 columns:
 #' \describe{
 #'   \item{id}{Subject ID.}
@@ -99,6 +107,8 @@
 #' Simulated dataset with residual standard deviation that varies between
 #' items.
 #'
+#' @srrstats {G5.1} Dataset used to test package is exported.
+#'
 #' @format ## `hsced` A data frame with 1200 rows and 5 columns:
 #' \describe{
 #'   \item{id}{Subject ID.}
@@ -119,6 +129,8 @@
 #' \insertCite{woodGeneralizedAdditiveModels2017a}{galamm}, and depend on the
 #' explanatory variable x.
 #'
+#' @srrstats {G5.1} Dataset used to test package is exported.
+#'
 #' @format ## `cognition` A data frame with 14400 rows and 7 columns:
 #' \describe{
 #'   \item{id}{Subject ID.}
@@ -142,6 +154,8 @@
 #' Simulated dataset for use in examples and testing with a latent covariate
 #' interacting with an observed covariate.
 #'
+#' @srrstats {G5.1} Dataset used to test package is exported.
+#'
 #' @format ## `latent_covariates` A data frame with 600 rows and 5 columns:
 #' \describe{
 #'   \item{id}{Subject ID.}
@@ -166,6 +180,8 @@
 #' interacting with an observed covariate. In this data, each response has been
 #' measured six times for each subject.
 #'
+#' @srrstats {G5.1} Dataset used to test package is exported.
+#'
 #' @format ## `latent_covariates_long` A data frame with 800 rows and 5
 #' columns:
 #' \describe{