-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathREADME.Rmd
74 lines (51 loc) · 3.58 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# diceR <a href='https://alinetalhouk.github.io/diceR/'><img src='man/figures/logo.png' align="right" width="120" /></a>
<!-- badges: start -->
[![R-CMD-check](https://github.com/AlineTalhouk/diceR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/AlineTalhouk/diceR/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/AlineTalhouk/diceR/branch/master/graph/badge.svg)](https://app.codecov.io/gh/AlineTalhouk/diceR?branch=master)
[![CRAN status](https://www.r-pkg.org/badges/version/diceR)](https://CRAN.R-project.org/package=diceR)
[![CRAN RStudio mirror downloads](https://cranlogs.r-pkg.org/badges/grand-total/diceR?color=orange)](https://r-pkg.org/pkg/diceR)
<!-- badges: end -->
## Overview
The goal of `diceR` is to provide a systematic framework for generating diverse cluster ensembles in R. There are a lot of nuances in cluster analysis to consider. We provide a process and a suite of functions and tools to implement a systematic framework for cluster discovery, guiding the user through the generation of a diverse clustering solutions from data, ensemble formation, algorithm selection and the arrival at a final consensus solution. We have additionally developed visual and analytical validation tools to help with the assessment of the final result. We implemented a wrapper function `dice()` that allows the user to easily obtain results and assess them. Thus, the package is accessible to both end user with limited statistical knowledge. Full access to the package is available for informaticians and statisticians and the functions are easily expanded. More details can be found in our companion paper published at [BMC Bioinformatics](https://doi.org/10.1186/s12859-017-1996-y).
## Installation
You can install `diceR` from CRAN with:
```{r install_CRAN, message=FALSE, eval=FALSE}
install.packages("diceR")
```
Or get the latest development version from GitHub:
```{r install_github, message=FALSE, eval=FALSE}
# install.packages("devtools")
devtools::install_github("AlineTalhouk/diceR")
```
## Example
The following example shows how to use the main function of the package, `dice()`. A data matrix `hgsc` contains a subset of gene expression measurements of High Grade Serous Carcinoma Ovarian cancer patients from the Cancer Genome Atlas publicly available datasets. Samples as rows, features as columns. The function below runs the package through the `dice()` function. We specify (a range of) `nk` clusters over `reps` subsamples of the data containing 80% of the full samples. We also specify the clustering `algorithms` to be used and the ensemble functions used to aggregated them in `cons.funs`.
```{r example, results='hide'}
library(diceR)
data(hgsc)
obj <- dice(hgsc, nk = 4, reps = 5, algorithms = c("hc", "diana"),
cons.funs = c("kmodes", "majority"))
```
The first few cluster assignments are shown below:
```{r assignments}
knitr::kable(head(obj$clusters))
```
You can also compare the base `algorithms` with the `cons.funs` using internal evaluation indices:
```{r compare}
knitr::kable(obj$indices$ii$`4`)
```
## Pipeline
This figure is a visual schematic of the pipeline that `dice()` implements.
![Ensemble Clustering pipeline.](man/figures/pipeline.png)
Please visit the [overview](https://alinetalhouk.github.io/diceR/articles/overview.html "diceR overview") page for more detail.