Skip to content

Commit

Permalink
230608: preparing for release
Browse files Browse the repository at this point in the history
  • Loading branch information
AnestisTouloumis committed Jun 8, 2023
1 parent 78aae71 commit 3b7e3ec
Show file tree
Hide file tree
Showing 8 changed files with 76 additions and 64 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Package: SimCorMultRes
Type: Package
Title: Simulates Correlated Multinomial Responses
Description: Simulates correlated multinomial responses conditional on a marginal model specification.
Version: 1.8.2
Version: 1.9.0
Depends: R(>= 2.15.0)
Imports:
evd,
Expand Down
23 changes: 14 additions & 9 deletions R/SimCorMultRes-data.R
Original file line number Diff line number Diff line change
@@ -1,17 +1,22 @@
#' Bivariate NORTA Generated Correlation
#' Simulated Correlation Parameters
#'
#' Simulated dataset to understand
#' Simulated dataset to examine the approximation of the correlation matrix
#' of the latent variables generated by NORTA to the correlation matrix of
#' the normal distribution used in the intermediate step of NORTA.
#'
#' @format
#' A data frame with 100 rows and 4 columns:
#' \describe{
#' \item{rho}{numeric indicating the value of the correlation parameter.}
#' \item{normal}{numeric indicating the simulated average of the correlation parameter with
#' normal margins.}
#' \item{logistic}{numeric indicating the simulated average of the correlation parameter with
#' logistic margins.}
#' \item{gumbel}{numeric indicating the simulated average of the correlation parameter with
#' gumbel margins.}
#' \item{rho}{numeric indicating the true value of the correlation parameter.}
#' \item{normal}{numeric indicating the (simulated) estimated correlation
#' parameter when the marginal distribution of each of the latent variables is
#' normal.}
#' \item{logistic}{numeric indicating the (simulated) estimated correlation
#' parameter when the marginal distribution of each of the latent variables is
#' logistic.}
#' \item{gumbel}{numeric indicating the (simulated) estimated correlation
#' parameter when the marginal distribution of each of the latent variables is
#' Gumbel.}
#' }
#' @examples
#' plot(rho - normal ~ rho, data = simulation, type = "l", col = "blue",
Expand Down
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ This package provides five core functions to simulate correlated binary (`rbin`)
- `rmult.clm` to simulate correlated ordinal responses under a marginal cumulative link model,
- `rmult.crm` to simulate correlated ordinal responses under a marginal continuation-ratio link model.

All five functions, assume that you provide either the correlation matrix of the multivariate normal distribution in NORTA (via `cor.matrix`) or the latent responses (via the `rlatent`).
All five functions, assume that you provide either the correlation matrix of the multivariate normal distribution in NORTA (via `cor.matrix`) or the values of the latent responses (via the `rlatent`). A simulation study (described in Section 3.5 of the vignette) suggests that the correlation matrix of the multivariate normal distribution in NORTA (via `cor.matrix`) could be treated as a good approximation of the true correlation matrix of the latent variables generated by the NORTA method regardless of their marginal distributions for all the thresholds implemented in `SimCorMultRes`.

There are also two utility functions:

Expand Down
45 changes: 25 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
# SimCorMultRes: Simulates Correlated Multinomial Responses

[![Github
version](https://img.shields.io/badge/GitHub%20-1.8.2-orange.svg)](%22commits/master%22)
[![R-CMD-check](https://github.com/AnestisTouloumis/SimCorMultRes/workflows/R-CMD-check/badge.svg)](https://github.com/AnestisTouloumis/SimCorMultRes/actions)
version](https://img.shields.io/badge/GitHub%20-1.8.4-orange.svg)](%22commits/master%22)
[![R-CMD-check](https://github.com/AnestisTouloumis/SimCorMultRes/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/AnestisTouloumis/SimCorMultRes/actions/workflows/R-CMD-check.yaml)
[![Project Status: Active The project has reached a stable, usable state
and is being actively
developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
Expand All @@ -30,7 +30,7 @@ install.packages("SimCorMultRes")
The source code for the release version of `SimCorMultRes` is available
on CRAN at:

- <https://CRAN.R-project.org/package=SimCorMultRes>
- <https://CRAN.R-project.org/package=SimCorMultRes>

Or you can install the development version of `SimCorMultRes`:

Expand All @@ -42,7 +42,7 @@ devtools::install_github("AnestisTouloumis/SimCorMultRes")
The source code for the development version of `SimCorMultRes` is
available on github at:

- <https://github.com/AnestisTouloumis/SimCorMultRes>
- <https://github.com/AnestisTouloumis/SimCorMultRes>

To use `SimCorMultRes`, you should load the package as follows:

Expand All @@ -58,27 +58,33 @@ and `rmult.crm`) responses, which are drawn as realizations of a latent
regression model for continuous random vectors as proposed by Touloumis
(2016):

- `rbin` to simulate correlated binary responses under a marginal
model with logit, probit, cloglog and cauchit link function,
- `rmult.bcl` to simulate correlated nominal multinomial responses
under a marginal baseline-category logit model,
- `rmult.acl` to simulate correlated ordinal responses under a
marginal adjacent-category logit model,
- `rmult.clm` to simulate correlated ordinal responses under a
marginal cumulative link model,
- `rmult.crm` to simulate correlated ordinal responses under a
marginal continuation-ratio link model.
- `rbin` to simulate correlated binary responses under a marginal model
with logit, probit, cloglog and cauchit link function,
- `rmult.bcl` to simulate correlated nominal multinomial responses under
a marginal baseline-category logit model,
- `rmult.acl` to simulate correlated ordinal responses under a marginal
adjacent-category logit model,
- `rmult.clm` to simulate correlated ordinal responses under a marginal
cumulative link model,
- `rmult.crm` to simulate correlated ordinal responses under a marginal
continuation-ratio link model.

All five functions, assume that you provide either the correlation
matrix of the multivariate normal distribution in NORTA (via
`cor.matrix`) or the latent responses (via the `rlatent`).
`cor.matrix`) or the values of the latent responses (via the `rlatent`).
A simulation study (described in Section 3.5 of the vignette) suggests
that the correlation matrix of the multivariate normal distribution in
NORTA (via `cor.matrix`) could be treated as a good approximation of the
true correlation matrix of the latent variables generated by the NORTA
method regardless of their marginal distributions for all the thresholds
implemented in `SimCorMultRes`.

There are also two utility functions:

- `rnorta` for simulating continuous or discrete random vectors with
prescribed marginal distributions using the NORTA method,
- `rsmvnorm` for simulating continuous random vectors from a
multivariate normal distribution.
- `rnorta` for simulating continuous or discrete random vectors with
prescribed marginal distributions using the NORTA method,
- `rsmvnorm` for simulating continuous random vectors from a
multivariate normal distribution.

## Example

Expand Down Expand Up @@ -125,7 +131,6 @@ browseVignettes("SimCorMultRes")

## How to cite


To cite R package SimCorMultRes in publications, please use:

Touloumis, A. (2016). Simulating Correlated Binary and Multinomial
Expand Down
28 changes: 11 additions & 17 deletions inst/CITATION
Original file line number Diff line number Diff line change
@@ -1,20 +1,14 @@
citHeader("To cite R package SimCorMultRes in publications, please use:")
citEntry(entry="Article",
title = "Simulating Correlated Binary and Multinomial Responses under
note <- sprintf("R package version %s", meta$Version)

bibentry(bibtype = "Article",
title = "Simulating Correlated Binary and Multinomial Responses under
Marginal Model Specification: The SimCorMultRes Package",
author = person("Anestis","Touloumis"),
year = "2016",
journal="The R Journal",
volume="8",
number="2",
author = as.person("Anestis Touloumis"),
year = 2016,
journal= "The R Journal",
volume=8,
number=2,
note = note,
pages={"79-91"},
url = "https://journal.r-project.org/archive/2016/RJ-2016-034/index.html",
textVersion = paste("Touloumis, A. (2016).",
"Simulating Correlated Binary and Multinomial Responses under Marginal Model Specification: The SimCorMultRes Package.",
"The R Journal 8:2, 79-91.")
url = "https://journal.r-project.org/archive/2016/RJ-2016-034/index.html"
)




3 changes: 2 additions & 1 deletion inst/NEWS.Rd
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
\name{NEWS}
\title{NEWS file for the \pkg{SimCorMultRes} package}

\section{Changes in Version 1.8.3 (2023-02-23)}{
\section{Changes in Version 1.9.0 (2023-06-06)}{
\subsection{MINOR CHANGES}{
\itemize{
\item{Added R journal paper as vignette.}
\item{Improved README.}
\item{Improved vignette.}
\item{Reinstated code coverage using \pkg{covr}.}
\item{Updated CITATION style.}
\item{Updated GitHub Actions.}
}
}
Expand Down
23 changes: 14 additions & 9 deletions man/simulation.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 8 additions & 6 deletions vignettes/SimCorMultRes.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -471,21 +471,23 @@ apply(simulated_nominal_dataset$Ysim, 2, table) / sample_size
```


## Notes on NORTA
## A note on NORTA implementation

In `SimCorMultRes`, the user specifies the correlation matrix of the multivariate normal distribution, denoted by $\mathbf R$, that is used in the intermediate step of the NORTA method. This is justified by the observation that when all the marginal distributions of the correlated latent variables are logistic, $\mathbf R$ is expected to approximate well their true but unknown correlation matrix [@Touloumis2016].
In `SimCorMultRes`, the user specifies the correlation matrix of the multivariate normal distribution (denoted by $\mathbf R$) used in the intermediate step of the NORTA method and not the correlation matrix of the latent variables. The motivation is that when all the marginal distributions of the correlated latent variables are logistic, then the correlation matrix $\mathbf R$ and that of the latent variables will be close [@Touloumis2016]. This approximation is also used in `SimCorMultRes` regardless of the marginal distribution of the latent variables.


To evaluate the validity of approximation study, a simulation study was conducted. For a fixed sample size $N$ and a correlation parameter $\rho$, $N$ independent bivariate random vectors $\{\mathbf y_{i}: i = 1, \ldots, N \}$ from a bivariate normal distribution with mean vector the zero vector and covariance matrix the correlation matrix
To evaluate the validity of this approximation for the marginal distributions employed in `SimCorMultRes`, a simulation study was conducted. For a fixed sample size $N$ and a correlation parameter $\rho$, $N$ independent bivariate random vectors $\{\mathbf y_{i}: i = 1, \ldots, N \}$ from a bivariate normal distribution with mean vector the zero vector and covariance matrix the correlation matrix
\[
\mathbf R = \begin{bmatrix}
1 & \rho\\
\rho & 1
\end{bmatrix}
\]
were drawn. The sample correlation was used to estimate $\rho$. Next, the NORTA method was applied to obtain bivariate random vectors $\{\mathbf z_{i}: i = 1, \ldots, N \}$ so that their marginal distribution is a logistic distribution. Their correlation parameter, say $\rho_{z}$, was estimated using the corresponding sample correlation. Then, the NORTA method was applied to obtain bivariate random vectors $\{\mathbf w_{i}: i = 1, \ldots, N \}$ so that their marginal distribution is the Gumbel distribution. Their correlation parameter, say $\rho_{w}$, was estimated using their sample correlation. This procedure was replicated $10,000,000$ times. The three correlation parameters $\rho$, $\rho_z$ and $\rho_w$ were estimated using the corresponding Monte Carlo estimates $\widehat{\rho}$, $\widehat{\rho}_z$ and $\widehat{\rho}_w$, respectively. To reduce the sample variability, we set $N=10,000$. Finally, we considered $\rho= 0, 0.01,0.02,\ldots, 0.99$.
were drawn. The sample correlation was used to estimate $\rho$. Next, the NORTA method was applied to obtain bivariate random vectors $\{\mathbf z_{i}: i = 1, \ldots, N \}$ so that their marginal distribution is the logistic distribution. Their correlation parameter, say $\rho_{z}$, was estimated using the corresponding sample correlation. Then, the NORTA method was applied to obtain bivariate random vectors $\{\mathbf w_{i}: i = 1, \ldots, N \}$ so that their marginal distribution is the Gumbel distribution. Their correlation parameter, say $\rho_{w}$, was estimated using their sample correlation.

The dataframe `simulation` contains the true correlation parameter $\rho$ and the Monte Carlo estimates $\widehat{\rho}$, $\widehat{\rho}_z$ and $\widehat{\rho}_w$ of the simulation study. As expected, $\widehat{\rho} \approx \rho$ regardless of the strength of the correlation parameter. For the logistic case, the average difference between $\rho$ and $\widehat{\rho}_z$ is `r rho = simulation$rho; logistic = simulation$logistic; round(mean(rho - logistic), 4)`, taking the maximum value of `r round(max(rho - logistic), 4)` at $\rho = `r rho[which.max(rho - logistic)]`$. Thus, $\rho$ appears to approximate $\rho_{z}$ 2 decimal points. For the Gumbel case, the average difference between $\rho$ and $\widehat{\rho}_w$ is `r gumbel = simulation$gumbel; round(mean(rho - gumbel), 4)`, taking the maximum value of `r round(max(rho - gumbel), 4)` at $\rho = `r rho[which.max(rho - gumbel)]`$. Again, $\rho$ appears to approximate $\rho_{w}$ but there is some accuracy loss compared to $\rho_{z}$.
For a fixed value of $\rho$, this procedure was replicated $10,000,000$ times to reduce the simulation error and with $N=10,000$ to reduce the sampling error. The three correlation parameters $\rho$, $\rho_z$ and $\rho_w$ were estimated using their corresponding Monte Carlo counterparts, denoted by $\widehat{\rho}$, $\widehat{\rho}_z$ and $\widehat{\rho}_w$, respectively. We let $\rho \in \{ 0, 0.01,0.02,\ldots, 0.99\}$.

The dataframe `simulation` contains the true correlation parameter $\rho$ (`rho`) and the Monte Carlo estimates $\widehat{\rho}$ (`normal`), $\widehat{\rho}_z$ (`logistic`) and $\widehat{\rho}_w$ (`gumbel`) from the simulation study described above. As expected, $\widehat{\rho} \approx \rho$ regardless of the strength of the correlation parameter. For the case of logistic marginal distributions, the average difference between $\rho$ and $\widehat{\rho}_z$ is `r rho = simulation$rho; logistic = simulation$logistic; round(mean(rho - logistic), 4)`, taking the maximum value of `r round(max(rho - logistic), 4)` at $\rho = `r rho[which.max(rho - logistic)]`$. Therefore $\rho$ appears to approximate $\rho_{z}$ to 2 decimal points. For the case of Gumbel marginal distributions, the average difference between $\rho$ and $\widehat{\rho}_w$ is `r gumbel = simulation$gumbel; round(mean(rho - gumbel), 4)`, taking the maximum value of `r round(max(rho - gumbel), 4)` at $\rho = `r rho[which.max(rho - gumbel)]`$. Although $\rho$ appears to approximate well $\rho_{w}$, there is some accuracy loss compared to $\rho_{z}$. The plot below shows the differences between the true correlation coefficient of the bivariate normal distribution and the simulated correlations for $\rho$, $\rho_z$ or $\rho_w$.


```{r echo = FALSE, fig.cap= "Difference between the correlation parameters of the bivariate normal distribution and of the latent variables for three different marginal distributions."}
Expand All @@ -502,7 +504,7 @@ legend("topright", legend = c("Normal", "Logistic", "Gumbel"),
title(main = paste("Difference between true and simulated correlation"))
```

Overall, it appears that there is little accuracy loss by not specifying the correlation matrix of the correlated latent responses when their marginal distribution is either the logistic distribution or the Gumbel distribution. This covers all the thresholds implemented in `SimCorMultRes`.
Overall, there is little accuracy loss by specifying the correlation matrix of the multivariate normal distribution in the intermediate step of the NORTA distribution instead of the correlation matrix of correlated latent responses regardless of whether their marginal distributions are either logistic or Gumbel distributions. Users can treat the correlation matrix passed on to the core functions of `SimCorMultRes` as the correlation matrix of the latent variables.


# How to Cite
Expand Down

0 comments on commit 3b7e3ec

Please sign in to comment.