This is a README file of the R package MclustSepCov. In our paper available here, we analyze the multivariate longitudinal data by applying model-based clustering. We assume the Gaussian mixture model with separable covariance structure where one of covariance factors is assumed to have a temporal structure.
To install our package, you may simply execute the following codes:
# install.packages("devtools")
devtools::install_github("inmybrain/MclustSepCov", subdir = "MclustSepCov") # don't forget to specify subdir!
If you come across a problem like this, please refer to this answer in that issue.
Or you can install the source file using the command line after downloading it from here;
R CMD INSTALL MclustSepCov_1.0.tar.gz
We give a toy example to apply the main function Mclust_SEP_cpp
. First, generate 40 samples from the Gaussian mixture model with two components. One of the component has zero mean vector, while the other a vector parallel to $ (1, , 1)^{} $ with modulus 5. We assume the heterogeneous covariance structure between them.
# install.packages("mvtnorm")
library("MclustSepCov")
K <- 2
p <- 2
q <- 3
U <- lapply(1:K, function(noarg) getCovariance(p, 0.3, "AR"))
V <- lapply(1:K, function(noarg) getCovariance(q, 0.2, "CS"))
Sigma <- Map(kronecker, U, V) # separable covariance matrix
mu <- list(rep(0, p * q), 5 / sqrt(p*q) * rep(1, p * q)) # distinct mean vectors
Y <- vector(mode = "list", length = K)
set.seed(6) # to make results reproducible
for(i in 1:K){
Y[[i]] <- mvtnorm::rmvnorm(n = 20, mean = mu[[i]], sigma = Sigma[[i]])
}
Each list has 20 samples from each Gaussian component. Candidate models we consider here are the Gaussian mixture distribution with the number of mixture components within {1, 2, 3} and with any of available covariance models.
fit <- Mclust_SEP_cpp(Y = Reduce(rbind, Y), p = p, q = q, Ks = 1:3, type_cov = "all", save_fit = TRUE)
It contains a table showing BIC values for all fitted models.
fit$bic_table
The best model with 2 components and covariance model 'EEE-ECS' is saved in best_model
fit$best_model
If save_fit=TRUE
(default is FALSE
), then one can access to all fitted models; for example,
## VVV-VUN
fit$res_Mclust_SEP$`VVV-VUN`$`1`
fit$res_Mclust_SEP$`VVV-VUN`$`2`
fit$res_Mclust_SEP$`VVV-VUN`$`3`
## EEE-EAR
fit$res_Mclust_SEP$`EEE-EAR`$`1`
fit$res_Mclust_SEP$`EEE-EAR`$`2`
fit$res_Mclust_SEP$`EEE-EAR`$`3`
- For available covariance structures, see the help page;
?Mclust_SEP_cpp
- As for initial assignment of cluster membership, each sample is assigned randomly to clusters.
To cite this package, please use this bibtex format:
@article{Park:2019,
author = {Seongoh Park and Johan Lim and Hyejeong Choi and Minjung Kwak},
title = {Clustering of longitudinal interval-valued data via mixture distribution under covariance separability},
journal = {Journal of Applied Statistics},
volume = {0},
number = {0},
pages = {1-18},
year = {2019},
publisher = {Taylor & Francis},
doi = {10.1080/02664763.2019.1692795},
URL = {
https://doi.org/10.1080/02664763.2019.1692795
},
eprint = {
https://doi.org/10.1080/02664763.2019.1692795
}
}
We are happy to troubleshoot any issue with the package;
-
please contact to the maintainer by seongohpark6@gmail.com, or
-
please open an issue in the github repository.