Skip to content

Model based Clustering via Mixture Distribution under Covariance Separability

License

Notifications You must be signed in to change notification settings

inmybrain/MclustSepCov

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MclustSepCov

This is a README file of the R package MclustSepCov. In our paper available here, we analyze the multivariate longitudinal data by applying model-based clustering. We assume the Gaussian mixture model with separable covariance structure where one of covariance factors is assumed to have a temporal structure.

Installation of the package

To install our package, you may simply execute the following codes:

# install.packages("devtools")
devtools::install_github("inmybrain/MclustSepCov", subdir = "MclustSepCov") # don't forget to specify subdir!

If you come across a problem like this, please refer to this answer in that issue.

Or you can install the source file using the command line after downloading it from here;

R CMD INSTALL MclustSepCov_1.0.tar.gz

A basic example of using the package

We give a toy example to apply the main function Mclust_SEP_cpp. First, generate 40 samples from the Gaussian mixture model with two components. One of the component has zero mean vector, while the other a vector parallel to $ (1, , 1)^{} $ with modulus 5. We assume the heterogeneous covariance structure between them.

# install.packages("mvtnorm")
library("MclustSepCov")
K <- 2
p <- 2
q <- 3
U <- lapply(1:K, function(noarg) getCovariance(p, 0.3, "AR"))
V <- lapply(1:K, function(noarg) getCovariance(q, 0.2, "CS"))
Sigma <- Map(kronecker, U, V) # separable covariance matrix
mu <- list(rep(0, p * q), 5 / sqrt(p*q) * rep(1, p * q)) # distinct mean vectors
Y <- vector(mode = "list", length = K)
set.seed(6) # to make results reproducible
for(i in 1:K){
 Y[[i]] <- mvtnorm::rmvnorm(n = 20, mean = mu[[i]], sigma = Sigma[[i]])
}

Each list has 20 samples from each Gaussian component. Candidate models we consider here are the Gaussian mixture distribution with the number of mixture components within {1, 2, 3} and with any of available covariance models.

fit <- Mclust_SEP_cpp(Y = Reduce(rbind, Y), p = p, q = q, Ks = 1:3, type_cov = "all", save_fit = TRUE)

It contains a table showing BIC values for all fitted models.

fit$bic_table

The best model with 2 components and covariance model 'EEE-ECS' is saved in best_model

fit$best_model

If save_fit=TRUE (default is FALSE), then one can access to all fitted models; for example,

## VVV-VUN
fit$res_Mclust_SEP$`VVV-VUN`$`1`
fit$res_Mclust_SEP$`VVV-VUN`$`2`
fit$res_Mclust_SEP$`VVV-VUN`$`3`

## EEE-EAR
fit$res_Mclust_SEP$`EEE-EAR`$`1`
fit$res_Mclust_SEP$`EEE-EAR`$`2`
fit$res_Mclust_SEP$`EEE-EAR`$`3`

Notes

  • For available covariance structures, see the help page;
?Mclust_SEP_cpp
  • As for initial assignment of cluster membership, each sample is assigned randomly to clusters.

Citation

To cite this package, please use this bibtex format:

@article{Park:2019,
author = {Seongoh Park and Johan Lim and Hyejeong Choi and Minjung Kwak},
title = {Clustering of longitudinal interval-valued data via mixture distribution under covariance separability},
journal = {Journal of Applied Statistics},
volume = {0},
number = {0},
pages = {1-18},
year  = {2019},
publisher = {Taylor & Francis},
doi = {10.1080/02664763.2019.1692795},
URL = { 
        https://doi.org/10.1080/02664763.2019.1692795
},
eprint = { 
        https://doi.org/10.1080/02664763.2019.1692795
}
}

Issues

We are happy to troubleshoot any issue with the package;

  • please contact to the maintainer by seongohpark6@gmail.com, or

  • please open an issue in the github repository.

About

Model based Clustering via Mixture Distribution under Covariance Separability

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published