HiCDOC normalizes intrachromosomal Hi-C matrices, uses unsupervised learning to predict A/B compartments from multiple replicates, and detects significant compartment changes between experiment conditions.
It provides a collection of functions assembled into a pipeline:
- Filter:
- Remove chromosomes which are too small to be useful.
- Filter sparse replicates to remove uninformative replicates with few interactions.
- Filter positions (bins) which have too few interactions.
- Normalize:
- Normalize technical biases using cyclic loess normalization, so that matrices are comparable.
- Normalize biological biases using Knight-Ruiz matrix balancing, so that all the bins are comparable.
- Normalize the distance effect, which results from higher interaction proportions between closer regions, with a MD loess.
- Predict:
- Predict compartments using constrained K-means.
- Detect significant differences between experiment conditions.
- Visualize:
- Plot the interaction matrices of each replicate.
- Plot the overall distance effect on the proportion of interactions.
- Plot the compartments in each chromosome, along with their concordance (confidence measure) in each replicate, and significant changes between experiment conditions.
- Plot the overall distribution of concordance differences.
- Plot the result of the PCA on the compartments' centroids.
- Plot the boxplots of self interaction ratios (differences between self interactions and the medians of other interactions) of each compartment, which is used for the A/B classification.
To install, execute the following commands in your console:
Rscript -e 'install.packages("devtools")'
Rscript -e 'devtools::install_github("mzytnicki/HiCDOC")'
After installation, the package can be loaded in R >= 4.0:
library("HiCDOC")
To try out HiCDOC, load the simulated toy data set:
data(exampleHiCDOCDataSet)
hic.experiment <- exampleHiCDOCDataSet
Then run the default pipeline on the created object:
hic.experiment <- HiCDOC(hic.experiment)
And plot some results:
plotCompartmentChanges(hic.experiment, chromosome = 'Y')
HiCDOC can import Hi-C data sets in various different formats:
- Tabular
.tsv
files. - Cooler
.cool
or.mcool
files. - Juicer
.hic
files. - HiC-Pro
.matrix
and.bed
files.
A tabular file is a tab-separated multi-replicate sparse matrix with a header:
chromosome position 1 position 2 C1.R1 C1.R2 C2.R1 ...
3 1500000 7500000 145 184 72 ...
...
The interaction proportions between position 1
and position 2
of
chromosome
are reported in each condition.replicate
column. There is no
limit to the number of conditions and replicates.
To load Hi-C data in this format:
hic.experiment <- HiCDOCDataSetFromTabular('path/to/data.tsv')
To load .cool
or .mcool
files generated by Cooler:
# Path to each file
paths = c(
'path/to/condition-1.replicate-1.cool',
'path/to/condition-1.replicate-2.cool',
'path/to/condition-2.replicate-1.cool',
'path/to/condition-2.replicate-2.cool',
'path/to/condition-3.replicate-1.cool'
)
# Replicate and condition of each file. Can be names instead of numbers.
replicates <- c(1, 2, 1, 2, 1)
conditions <- c(1, 1, 2, 2, 3)
# Resolution to select in .mcool files
binSize = 500000
# Instantiation of data set
hic.experiment <- HiCDOCDataSetFromCool(
paths,
replicates = replicates,
conditions = conditions,
binSize = binSize # Specified for .mcool files.
)
To load .hic
files generated by Juicer:
# Path to each file
paths = c(
'path/to/condition-1.replicate-1.hic',
'path/to/condition-1.replicate-2.hic',
'path/to/condition-2.replicate-1.hic',
'path/to/condition-2.replicate-2.hic',
'path/to/condition-3.replicate-1.hic'
)
# Replicate and condition of each file. Can be names instead of numbers.
replicates <- c(1, 2, 1, 2, 1)
conditions <- c(1, 1, 2, 2, 3)
# Resolution to select
binSize <- 500000
# Instantiation of data set
hic.experiment <- HiCDOCDataSetFromHiC(
paths,
replicates = replicates,
conditions = conditions,
binSize = binSize
)
To load .matrix
and .bed
files generated by HiC-Pro:
# Path to each matrix file
matrixPaths = c(
'path/to/condition-1.replicate-1.matrix',
'path/to/condition-1.replicate-2.matrix',
'path/to/condition-2.replicate-1.matrix',
'path/to/condition-2.replicate-2.matrix',
'path/to/condition-3.replicate-1.matrix'
)
# Path to each bed file
bedPaths = c(
'path/to/condition-1.replicate-1.bed',
'path/to/condition-1.replicate-2.bed',
'path/to/condition-2.replicate-1.bed',
'path/to/condition-2.replicate-2.bed',
'path/to/condition-3.replicate-1.bed'
)
# Replicate and condition of each file. Can be names instead of numbers.
replicates <- c(1, 2, 1, 2, 1)
conditions <- c(1, 1, 2, 2, 3)
# Instantiation of data set
hic.experiment <- HiCDOCDataSetFromHiCPro(
matrixPaths = matrixPaths,
bedPaths = bedPaths,
replicates = replicates,
conditions = conditions
)
Once your data is loaded, you can run all the filtering, normalization, and prediction steps with:
hic.experiment <- HiCDOC(hic.experiment)
This one-liner runs all the steps detailed below.
Remove small chromosomes of length smaller than 100 positions:
hic.experiment <- filterSmallChromosomes(hic.experiment, threshold = 100)
Remove sparse replicates filled with less than 30% non-zero interactions:
hic.experiment <- filterSparseReplicates(hic.experiment, threshold = 0.3)
Remove weak positions with less than 1 interaction in average:
hic.experiment <- filterWeakPositions(hic.experiment, threshold = 1)
Normalize technical biases such as sequencing depth:
hic.experiment <- normalizeTechnicalBiases(hic.experiment)
Normalize biological biases (such as GC content, number of restriction sites, etc.):
hic.experiment <- normalizeBiologicalBiases(hic.experiment)
Normalize the distance effect resulting from higher interaction proportions between closer regions:
hic.experiment <- normalizeDistanceEffect(hic.experiment, loessSampleSize = 20000)
Predict A and B compartments and detect significant differences:
hic.experiment <- detectCompartments(
hic.experiment,
kMeansDelta = 0.0001,
kMeansIterations = 50,
kMeansRestarts = 20
)
Plot the interaction matrix of each replicate:
plotInteractions(hic.experiment, chromosome = '3')
Plot the overall distance effect on the proportion of interactions:
plotDistanceEffect(hic.experiment)
List and plot compartments with their concordance (confidence measure) in each replicate, and significant changes between experiment conditions:
compartments(hic.experiment)
concordances(hic.experiment)
differences(hic.experiment)
plotCompartmentChanges(hic.experiment, chromosome = '3')
Plot the overall distribution of concordance differences:
plotConcordanceDifferences(hic.experiment)
Plot the result of the PCA on the compartments' centroids:
plotCentroids(hic.experiment, chromosome = '3')
Plot the boxplots of self interaction ratios (differences between self interactions and the median of other interactions) of each compartment:
plotSelfInteractionRatios(hic.experiment, chromosome = '3')
John C Stansfield, Kellen G Cresswell, Mikhail G Dozmorov, multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments, Bioinformatics, 2019, https://doi.org/10.1093/bioinformatics/btz048
Philip A. Knight, Daniel Ruiz, A fast algorithm for matrix balancing, IMA Journal of Numerical Analysis, Volume 33, Issue 3, July 2013, Pages 1029–1047, https://doi.org/10.1093/imanum/drs019
Kiri Wagstaff, Claire Cardie, Seth Rogers, Stefan Schrödl, Constrained K-means Clustering with Background Knowledge, Proceedings of 18th International Conference on Machine Learning, 2001, Pages 577-584, https://pdfs.semanticscholar.org/0bac/ca0993a3f51649a6bb8dbb093fc8d8481ad4.pdf