From 3b7e3ec16fb4aa9d126b3bec545103f1f1443f15 Mon Sep 17 00:00:00 2001 From: Anestis Touloumis Date: Thu, 8 Jun 2023 17:22:16 +0100 Subject: [PATCH] 230608: preparing for release --- DESCRIPTION | 2 +- R/SimCorMultRes-data.R | 23 +++++++++++-------- README.Rmd | 2 +- README.md | 45 ++++++++++++++++++++----------------- inst/CITATION | 28 +++++++++-------------- inst/NEWS.Rd | 3 ++- man/simulation.Rd | 23 +++++++++++-------- vignettes/SimCorMultRes.Rmd | 14 +++++++----- 8 files changed, 76 insertions(+), 64 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index cf56519..a877c76 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -2,7 +2,7 @@ Package: SimCorMultRes Type: Package Title: Simulates Correlated Multinomial Responses Description: Simulates correlated multinomial responses conditional on a marginal model specification. -Version: 1.8.2 +Version: 1.9.0 Depends: R(>= 2.15.0) Imports: evd, diff --git a/R/SimCorMultRes-data.R b/R/SimCorMultRes-data.R index b95561a..1b9f290 100644 --- a/R/SimCorMultRes-data.R +++ b/R/SimCorMultRes-data.R @@ -1,17 +1,22 @@ -#' Bivariate NORTA Generated Correlation +#' Simulated Correlation Parameters #' -#' Simulated dataset to understand +#' Simulated dataset to examine the approximation of the correlation matrix +#' of the latent variables generated by NORTA to the correlation matrix of +#' the normal distribution used in the intermediate step of NORTA. #' #' @format #' A data frame with 100 rows and 4 columns: #' \describe{ -#' \item{rho}{numeric indicating the value of the correlation parameter.} -#' \item{normal}{numeric indicating the simulated average of the correlation parameter with -#' normal margins.} -#' \item{logistic}{numeric indicating the simulated average of the correlation parameter with -#' logistic margins.} -#' \item{gumbel}{numeric indicating the simulated average of the correlation parameter with -#' gumbel margins.} +#' \item{rho}{numeric indicating the true value of the correlation parameter.} +#' \item{normal}{numeric indicating the (simulated) estimated correlation +#' parameter when the marginal distribution of each of the latent variables is +#' normal.} +#' \item{logistic}{numeric indicating the (simulated) estimated correlation +#' parameter when the marginal distribution of each of the latent variables is +#' logistic.} +#' \item{gumbel}{numeric indicating the (simulated) estimated correlation +#' parameter when the marginal distribution of each of the latent variables is +#' Gumbel.} #' } #' @examples #' plot(rho - normal ~ rho, data = simulation, type = "l", col = "blue", diff --git a/README.Rmd b/README.Rmd index 414b0ea..b3f7164 100644 --- a/README.Rmd +++ b/README.Rmd @@ -80,7 +80,7 @@ This package provides five core functions to simulate correlated binary (`rbin`) - `rmult.clm` to simulate correlated ordinal responses under a marginal cumulative link model, - `rmult.crm` to simulate correlated ordinal responses under a marginal continuation-ratio link model. -All five functions, assume that you provide either the correlation matrix of the multivariate normal distribution in NORTA (via `cor.matrix`) or the latent responses (via the `rlatent`). +All five functions, assume that you provide either the correlation matrix of the multivariate normal distribution in NORTA (via `cor.matrix`) or the values of the latent responses (via the `rlatent`). A simulation study (described in Section 3.5 of the vignette) suggests that the correlation matrix of the multivariate normal distribution in NORTA (via `cor.matrix`) could be treated as a good approximation of the true correlation matrix of the latent variables generated by the NORTA method regardless of their marginal distributions for all the thresholds implemented in `SimCorMultRes`. There are also two utility functions: diff --git a/README.md b/README.md index c052559..4f12c10 100644 --- a/README.md +++ b/README.md @@ -4,8 +4,8 @@ # SimCorMultRes: Simulates Correlated Multinomial Responses [![Github -version](https://img.shields.io/badge/GitHub%20-1.8.2-orange.svg)](%22commits/master%22) -[![R-CMD-check](https://github.com/AnestisTouloumis/SimCorMultRes/workflows/R-CMD-check/badge.svg)](https://github.com/AnestisTouloumis/SimCorMultRes/actions) +version](https://img.shields.io/badge/GitHub%20-1.8.4-orange.svg)](%22commits/master%22) +[![R-CMD-check](https://github.com/AnestisTouloumis/SimCorMultRes/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/AnestisTouloumis/SimCorMultRes/actions/workflows/R-CMD-check.yaml) [![Project Status: Active The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active) @@ -30,7 +30,7 @@ install.packages("SimCorMultRes") The source code for the release version of `SimCorMultRes` is available on CRAN at: -- +- Or you can install the development version of `SimCorMultRes`: @@ -42,7 +42,7 @@ devtools::install_github("AnestisTouloumis/SimCorMultRes") The source code for the development version of `SimCorMultRes` is available on github at: -- +- To use `SimCorMultRes`, you should load the package as follows: @@ -58,27 +58,33 @@ and `rmult.crm`) responses, which are drawn as realizations of a latent regression model for continuous random vectors as proposed by Touloumis (2016): -- `rbin` to simulate correlated binary responses under a marginal - model with logit, probit, cloglog and cauchit link function, -- `rmult.bcl` to simulate correlated nominal multinomial responses - under a marginal baseline-category logit model, -- `rmult.acl` to simulate correlated ordinal responses under a - marginal adjacent-category logit model, -- `rmult.clm` to simulate correlated ordinal responses under a - marginal cumulative link model, -- `rmult.crm` to simulate correlated ordinal responses under a - marginal continuation-ratio link model. +- `rbin` to simulate correlated binary responses under a marginal model + with logit, probit, cloglog and cauchit link function, +- `rmult.bcl` to simulate correlated nominal multinomial responses under + a marginal baseline-category logit model, +- `rmult.acl` to simulate correlated ordinal responses under a marginal + adjacent-category logit model, +- `rmult.clm` to simulate correlated ordinal responses under a marginal + cumulative link model, +- `rmult.crm` to simulate correlated ordinal responses under a marginal + continuation-ratio link model. All five functions, assume that you provide either the correlation matrix of the multivariate normal distribution in NORTA (via -`cor.matrix`) or the latent responses (via the `rlatent`). +`cor.matrix`) or the values of the latent responses (via the `rlatent`). +A simulation study (described in Section 3.5 of the vignette) suggests +that the correlation matrix of the multivariate normal distribution in +NORTA (via `cor.matrix`) could be treated as a good approximation of the +true correlation matrix of the latent variables generated by the NORTA +method regardless of their marginal distributions for all the thresholds +implemented in `SimCorMultRes`. There are also two utility functions: -- `rnorta` for simulating continuous or discrete random vectors with - prescribed marginal distributions using the NORTA method, -- `rsmvnorm` for simulating continuous random vectors from a - multivariate normal distribution. +- `rnorta` for simulating continuous or discrete random vectors with + prescribed marginal distributions using the NORTA method, +- `rsmvnorm` for simulating continuous random vectors from a + multivariate normal distribution. ## Example @@ -125,7 +131,6 @@ browseVignettes("SimCorMultRes") ## How to cite - To cite R package SimCorMultRes in publications, please use: Touloumis, A. (2016). Simulating Correlated Binary and Multinomial diff --git a/inst/CITATION b/inst/CITATION index 00f47e8..3a9cfab 100644 --- a/inst/CITATION +++ b/inst/CITATION @@ -1,20 +1,14 @@ -citHeader("To cite R package SimCorMultRes in publications, please use:") - -citEntry(entry="Article", - title = "Simulating Correlated Binary and Multinomial Responses under +note <- sprintf("R package version %s", meta$Version) + +bibentry(bibtype = "Article", + title = "Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package", - author = person("Anestis","Touloumis"), - year = "2016", - journal="The R Journal", - volume="8", - number="2", + author = as.person("Anestis Touloumis"), + year = 2016, + journal= "The R Journal", + volume=8, + number=2, + note = note, pages={"79-91"}, - url = "https://journal.r-project.org/archive/2016/RJ-2016-034/index.html", - textVersion = paste("Touloumis, A. (2016).", - "Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package.", - "The R Journal 8:2, 79-91.") + url = "https://journal.r-project.org/archive/2016/RJ-2016-034/index.html" ) - - - - diff --git a/inst/NEWS.Rd b/inst/NEWS.Rd index 10a15a3..ec5e7f2 100644 --- a/inst/NEWS.Rd +++ b/inst/NEWS.Rd @@ -1,13 +1,14 @@ \name{NEWS} \title{NEWS file for the \pkg{SimCorMultRes} package} -\section{Changes in Version 1.8.3 (2023-02-23)}{ +\section{Changes in Version 1.9.0 (2023-06-06)}{ \subsection{MINOR CHANGES}{ \itemize{ \item{Added R journal paper as vignette.} \item{Improved README.} \item{Improved vignette.} \item{Reinstated code coverage using \pkg{covr}.} + \item{Updated CITATION style.} \item{Updated GitHub Actions.} } } diff --git a/man/simulation.Rd b/man/simulation.Rd index 6f81a95..2161268 100644 --- a/man/simulation.Rd +++ b/man/simulation.Rd @@ -3,24 +3,29 @@ \docType{data} \name{simulation} \alias{simulation} -\title{Bivariate NORTA Generated Correlation} +\title{Simulated Correlation Parameters} \format{ A data frame with 100 rows and 4 columns: \describe{ - \item{rho}{numeric indicating the value of the correlation parameter.} - \item{normal}{numeric indicating the simulated average of the correlation parameter with - normal margins.} - \item{logistic}{numeric indicating the simulated average of the correlation parameter with - logistic margins.} - \item{gumbel}{numeric indicating the simulated average of the correlation parameter with - gumbel margins.} + \item{rho}{numeric indicating the true value of the correlation parameter.} + \item{normal}{numeric indicating the (simulated) estimated correlation + parameter when the marginal distribution of each of the latent variables is + normal.} + \item{logistic}{numeric indicating the (simulated) estimated correlation + parameter when the marginal distribution of each of the latent variables is + logistic.} + \item{gumbel}{numeric indicating the (simulated) estimated correlation + parameter when the marginal distribution of each of the latent variables is + Gumbel.} } } \usage{ simulation } \description{ -Simulated dataset to understand +Simulated dataset to examine the approximation of the correlation matrix +of the latent variables generated by NORTA to the correlation matrix of +the normal distribution used in the intermediate step of NORTA. } \examples{ plot(rho - normal ~ rho, data = simulation, type = "l", col = "blue", diff --git a/vignettes/SimCorMultRes.Rmd b/vignettes/SimCorMultRes.Rmd index a12f240..fd7c336 100644 --- a/vignettes/SimCorMultRes.Rmd +++ b/vignettes/SimCorMultRes.Rmd @@ -471,21 +471,23 @@ apply(simulated_nominal_dataset$Ysim, 2, table) / sample_size ``` -## Notes on NORTA +## A note on NORTA implementation -In `SimCorMultRes`, the user specifies the correlation matrix of the multivariate normal distribution, denoted by $\mathbf R$, that is used in the intermediate step of the NORTA method. This is justified by the observation that when all the marginal distributions of the correlated latent variables are logistic, $\mathbf R$ is expected to approximate well their true but unknown correlation matrix [@Touloumis2016]. +In `SimCorMultRes`, the user specifies the correlation matrix of the multivariate normal distribution (denoted by $\mathbf R$) used in the intermediate step of the NORTA method and not the correlation matrix of the latent variables. The motivation is that when all the marginal distributions of the correlated latent variables are logistic, then the correlation matrix $\mathbf R$ and that of the latent variables will be close [@Touloumis2016]. This approximation is also used in `SimCorMultRes` regardless of the marginal distribution of the latent variables. -To evaluate the validity of approximation study, a simulation study was conducted. For a fixed sample size $N$ and a correlation parameter $\rho$, $N$ independent bivariate random vectors $\{\mathbf y_{i}: i = 1, \ldots, N \}$ from a bivariate normal distribution with mean vector the zero vector and covariance matrix the correlation matrix +To evaluate the validity of this approximation for the marginal distributions employed in `SimCorMultRes`, a simulation study was conducted. For a fixed sample size $N$ and a correlation parameter $\rho$, $N$ independent bivariate random vectors $\{\mathbf y_{i}: i = 1, \ldots, N \}$ from a bivariate normal distribution with mean vector the zero vector and covariance matrix the correlation matrix \[ \mathbf R = \begin{bmatrix} 1 & \rho\\ \rho & 1 \end{bmatrix} \] -were drawn. The sample correlation was used to estimate $\rho$. Next, the NORTA method was applied to obtain bivariate random vectors $\{\mathbf z_{i}: i = 1, \ldots, N \}$ so that their marginal distribution is a logistic distribution. Their correlation parameter, say $\rho_{z}$, was estimated using the corresponding sample correlation. Then, the NORTA method was applied to obtain bivariate random vectors $\{\mathbf w_{i}: i = 1, \ldots, N \}$ so that their marginal distribution is the Gumbel distribution. Their correlation parameter, say $\rho_{w}$, was estimated using their sample correlation. This procedure was replicated $10,000,000$ times. The three correlation parameters $\rho$, $\rho_z$ and $\rho_w$ were estimated using the corresponding Monte Carlo estimates $\widehat{\rho}$, $\widehat{\rho}_z$ and $\widehat{\rho}_w$, respectively. To reduce the sample variability, we set $N=10,000$. Finally, we considered $\rho= 0, 0.01,0.02,\ldots, 0.99$. +were drawn. The sample correlation was used to estimate $\rho$. Next, the NORTA method was applied to obtain bivariate random vectors $\{\mathbf z_{i}: i = 1, \ldots, N \}$ so that their marginal distribution is the logistic distribution. Their correlation parameter, say $\rho_{z}$, was estimated using the corresponding sample correlation. Then, the NORTA method was applied to obtain bivariate random vectors $\{\mathbf w_{i}: i = 1, \ldots, N \}$ so that their marginal distribution is the Gumbel distribution. Their correlation parameter, say $\rho_{w}$, was estimated using their sample correlation. -The dataframe `simulation` contains the true correlation parameter $\rho$ and the Monte Carlo estimates $\widehat{\rho}$, $\widehat{\rho}_z$ and $\widehat{\rho}_w$ of the simulation study. As expected, $\widehat{\rho} \approx \rho$ regardless of the strength of the correlation parameter. For the logistic case, the average difference between $\rho$ and $\widehat{\rho}_z$ is `r rho = simulation$rho; logistic = simulation$logistic; round(mean(rho - logistic), 4)`, taking the maximum value of `r round(max(rho - logistic), 4)` at $\rho = `r rho[which.max(rho - logistic)]`$. Thus, $\rho$ appears to approximate $\rho_{z}$ 2 decimal points. For the Gumbel case, the average difference between $\rho$ and $\widehat{\rho}_w$ is `r gumbel = simulation$gumbel; round(mean(rho - gumbel), 4)`, taking the maximum value of `r round(max(rho - gumbel), 4)` at $\rho = `r rho[which.max(rho - gumbel)]`$. Again, $\rho$ appears to approximate $\rho_{w}$ but there is some accuracy loss compared to $\rho_{z}$. +For a fixed value of $\rho$, this procedure was replicated $10,000,000$ times to reduce the simulation error and with $N=10,000$ to reduce the sampling error. The three correlation parameters $\rho$, $\rho_z$ and $\rho_w$ were estimated using their corresponding Monte Carlo counterparts, denoted by $\widehat{\rho}$, $\widehat{\rho}_z$ and $\widehat{\rho}_w$, respectively. We let $\rho \in \{ 0, 0.01,0.02,\ldots, 0.99\}$. + +The dataframe `simulation` contains the true correlation parameter $\rho$ (`rho`) and the Monte Carlo estimates $\widehat{\rho}$ (`normal`), $\widehat{\rho}_z$ (`logistic`) and $\widehat{\rho}_w$ (`gumbel`) from the simulation study described above. As expected, $\widehat{\rho} \approx \rho$ regardless of the strength of the correlation parameter. For the case of logistic marginal distributions, the average difference between $\rho$ and $\widehat{\rho}_z$ is `r rho = simulation$rho; logistic = simulation$logistic; round(mean(rho - logistic), 4)`, taking the maximum value of `r round(max(rho - logistic), 4)` at $\rho = `r rho[which.max(rho - logistic)]`$. Therefore $\rho$ appears to approximate $\rho_{z}$ to 2 decimal points. For the case of Gumbel marginal distributions, the average difference between $\rho$ and $\widehat{\rho}_w$ is `r gumbel = simulation$gumbel; round(mean(rho - gumbel), 4)`, taking the maximum value of `r round(max(rho - gumbel), 4)` at $\rho = `r rho[which.max(rho - gumbel)]`$. Although $\rho$ appears to approximate well $\rho_{w}$, there is some accuracy loss compared to $\rho_{z}$. The plot below shows the differences between the true correlation coefficient of the bivariate normal distribution and the simulated correlations for $\rho$, $\rho_z$ or $\rho_w$. ```{r echo = FALSE, fig.cap= "Difference between the correlation parameters of the bivariate normal distribution and of the latent variables for three different marginal distributions."} @@ -502,7 +504,7 @@ legend("topright", legend = c("Normal", "Logistic", "Gumbel"), title(main = paste("Difference between true and simulated correlation")) ``` -Overall, it appears that there is little accuracy loss by not specifying the correlation matrix of the correlated latent responses when their marginal distribution is either the logistic distribution or the Gumbel distribution. This covers all the thresholds implemented in `SimCorMultRes`. +Overall, there is little accuracy loss by specifying the correlation matrix of the multivariate normal distribution in the intermediate step of the NORTA distribution instead of the correlation matrix of correlated latent responses regardless of whether their marginal distributions are either logistic or Gumbel distributions. Users can treat the correlation matrix passed on to the core functions of `SimCorMultRes` as the correlation matrix of the latent variables. # How to Cite