This repository provides the ASCAT R package (v3.1) that can be used to infer tumour purity, ploidy and allele-specific copy number profiles.
ASCAT is described in detail in: Allele-specific copy number analysis of tumors. Van Loo P et al. PNAS (2010).
This repository also contains the code underlying additional publication: Allele-specific multi-sample copy number segmentation. Ross EM, Haase K, Van Loo P & Markowetz F. Bioinformatics (2020).
Bioconductor package dependencies: GenomicRanges & IRanges (BiocManager::install(c('GenomicRanges','IRanges'))
with a recent R/BiocManager version).
Processing high-throughput sequencing data: alleleCounter (C version)
Installing ASCAT using R: devtools::install_github('VanLoo-lab/ascat/ASCAT')
- Default penalty for both ASPCF (
ascat.aspcf
) and ASmultiPCF (ascat.asmultipcf
) is now 70 (was 25). It is suitable for SNP arrays, as well as WES and WGS data. - LogR correction can now be used to correct for both GC content (standard requirement) and replication timing (optional). Also, the correction method has been updated (it now uses autosomes to compute correlations with covariates and applies a linear model with splines on all chromosomes). Please note that
ascat.correctLogR
should be used from now on (ascat.GCcorrect
is still there for backward compatibility but is just a wrapper toascat.correctLogR
). - Color scheme has been changed for CNA profiles so it is now colorblind-friendly:
- Rounded profiles: red is the major allele and blue is the minor allele.
- Unrounded profiles: purple is the total CN and green is the minor allele.
- Because ASCAT leverages genomic information from heterozygous SNPs, the nonPAR region in chromosome X for males is challenging as there are no such SNP, as opposed to PAR1 and PAR2 regions being present on chrX and chrY. We improved CNA calling in chrX by considering specificities between nonPAR and PAR1/PAR2. To this end,
ascat.loadData
has a new argument,genomeVersion
(either'hg19'
or'hg38'
), that enables locating the nonPAR region on chrX. If provided, such information will be considered in the different ASCAT functions. We recommend always providing this information so CNA calling on chrX for males will be more accurate. Since PAR1 and PAR2 are present in both chrX and chrY, a 1+1 status in males refers to 1 copy in X and 1 copy in Y, but 1+0 could either be 1 copy in X (and no copy in Y) or 1 copy of Y (and no copy in X). Also, please note that most platforms have a limited resolution for PAR1 and PAR2 so results should carefully be interpreted in respect to available resolution. - Ploidy value displayed in CNA profiles no longer comes from the grid search and is now the final tumour ploidy (matching with
ascat.output$ploidy
).
ascat.plotRawData
andascat.plotSegmentedData
have an extra argument,logr.y_values
, to change Y scale for the logR track. Default is:c(-2,2)
, whereas previous plots were:c(-1,1)
.- 'Aberrant cell fraction' now refers to 'purity'. For backward compatibility,
ascat.output$aberrantcellfraction
still exists but we encourage usingascat.output$purity
instead.
- New set of instructions, as part of the main
ascat.prepareHTS
function, to derive logR and BAF from high-throughput sequencing (HTS) data. Briefly, alleleCounter is used to get allele counts at specific loci on a pair of tumour/normal (either BAM or CRAM files). This information is then converted into logR and BAF values, based on a similar method than in the Battenberg package. Although this method allows running ASCAT on different HTS data:- WES: we recommend providing a BED file covering sequenced regions of the genome.
- WGS: we recommend running Battenberg for accurate clonal and subclonal allele-specific copy-number alteration calling. However, ASCAT can still be used to get a fast purity/ploidy fit (~30 minutes with 12 CPUs from BAMs to CNA profiles). To this end, we provide a set of files that can be used (see ReferenceFiles/WGS).
- Targeted sequencing: a bespoke function,
ascat.prepareTargetedSeq
has been implemented. Such a function must be run on a batch of normals (no tumours) and will identify high-quality SNPs to investigate. Then,ascat.prepareHTS
can be used on selected SNPs to process tumour/normal pairs. Because of sparse datapoints, we recommend usingpenalty=25
when runningascat.aspcf
. - For HTS data (WGS, WES and targeted sequencing), gamma must be set to 1 in
ascat.runASCAT
.
- A new function to collect metrics of interest has been added:
ascat.metrics
. - Boundaries can be defined for purity and ploidy (min & max) when running
ascat.runAscat
(arguments:min_purity
/max_purity
andmin_ploidy
/max_ploidy
). - New function,
ascat.plotAdjustedAscatProfile
, that plots an ASCAT profile with respect to chromosome length (instead of the number of heterozygous SNPs).
We provide some scripts and input data in the ExampleData folder.
- LogR correction files (
ascat.correctLogR
) for standard platforms (Affymetrix SNP 6.0, Affymetrix 250k STY, Illumina 660k and Illumina OmniExpress) can be found in the ReferenceFiles/SNParrays folder. For other platforms, please use our scripts (in LogRcorrection) to generate such correction files. - For WGS, we provide logR correction files as well as loci and allele files in ReferenceFiles/WGS.
- For WES and targeted sequencing, we recommend using the reference files (loci, allele and logR correction files) as part of the Battenberg package. Because they require a high-resolution input, our reference files for WGS are not suitable for WES and targeted sequencing. For WES, loci and allele files from the Battenberg package can be fed into
ascat.prepareHTS
. For targeted sequencing, allele files from the Battenberg package can be fed intoascat.prepareTargetedSeq
, which will generate cleaned loci and allele files that can be fed intoascat.prepareHTS
.
Custom10k, IlluminaASA, IlluminaGSAv3, Illumina109k, IlluminaCytoSNP, IlluminaCytoSNP850k, Illumina610k, Illumina660k, Illumina700k, Illumina1M, Illumina2.5M, IlluminaOmni5, Affy10k, Affy100k, Affy250k_sty, Affy250k_nsp, AffyOncoScan, AffyCytoScanHD, AffySNP6, HumanCNV370quad, HumanCore12, HumanCoreExome24, HumanOmniExpress12 and IlluminaOmniExpressExome.
Because arrays have a defined set of SNP probes, with a fairly constant rate of heterozygous probes across individuals, useful metrics in ascat.predictGermlineGenotypes
can be inferred from some cases (with no or very few CN changes). However, sequencing data is subjected to massive variations because of design, coverage and/or artefacts. Therefore, we are not able to provide pre-defined metrics for unmatched sequencing data.
For more information about ASCAT and other projects of our group, please visit our website.