Releases: exomiser/Exomiser
Discovering the ID
This point release is compatible with the 1902, 2003 and 2007 data releases. We recommend you check for the latest data update at https://data.monarchinitiative.org/exomiser/latest/ to keep Exomiser functioning optimally with the latest data.
New features:
- The JSON output now shows the id of the variantEvaluation taken from the VCF file.
New APIs:
- Added
VariantEvaluation.getId()
andVariantEvaluation.Builder.id()
methods to store VCF id field contents.
Unifying the disease types
Up to eleven and one more - new pathogenicity scores and a variant whitelist
CLI changes
This release contains significant diagnostic performance improvements due to the inclusion of a high-quality ClinVar whitelist and 'second generation' pathogenicity scores.
- Added new
PathogenicitySource
sources -M_CAP, MPC, MVP, PRIMATE_AI
. Be aware that these may not be free for commercial use. Check the licencing before use! - Added new variant whitelist feature which enables flagging of variants on a whitelist and bypassing of
FrequencyFilter
andVariantEffectFilter
. By default this will use ClinVar variants listed asPathogenic
orLikely_pathogenic
and with a review status ofcriteria provided, single submitter
or better. See https://www.ncbi.nlm.nih.gov/clinvar/docs/review_status/ for an explanation of the ClinVar review status.
n.b. This release is incompatible with data release 1811 and below.
Core API
API breaking changes:
- Removed FREQUENCY_SOURCE_MAP from FrequencySource
- Changed
Frequency
,RsId
andPathogenicityScore
staticvalueOf()
constructor toof()
- Removed deprecated
IntervalFilter.getGeneticInterval()
- Changed visibility of
PhenodigmMatchRawScore
from public to package private and made immutable - Changed visibility of
CrossSpeciesPhenotypeMatcher
from public to package private and added staticof()
constructor - Replaced redundant
Default*DaoMvStoreProto
classes with newAllelePropertiesDaoMvStore
- Added
OntologyService
as constructor argument toAnalysisFactory
,AnalysisParser
andAnalysisBuilder
- Replaced
BasePathogenicityScore.compareTo()
method with defaultPathogenicityScore.compareTo()
GeneticInterval
no longer acceptsReferenceDictionary
as a constructor argument
New APIs:
- Added CADD and REMM to data-genome
AlleleProperty
- Moved
JannovarDataSourceLoader
from autoconfigure to core module - Added
AllelePosition.isSymbolic()
method - Added
Variant.isCodingVariant()
method - Added
AnalysisBuilder.addIntervalFilter(Collection<ChromosomalRegion> chromosomalRegions)
method - Added new non-public
FilterStats
class for more accurate filtering statistics - Added new
AllelePropertiesDao
interface - Added new
AllelePropertiesDaoMvStore
implementation - Added new
AllelePropertiesDaoAdapter
to fix issue of Spring cache proxy not being able to intercept internal calls - Added new
HpoIdChecker
class to return current HPO id/terms for an input id/term - Added new
HumanPhenotypeOntologyDao.getIdToPhenotypeTerms()
method - Added new
OntologyService.getCurrentHpoIds()
method - Added new
SampleGenotype.isEmpty()
method - Added new experimental
VcfCodecs
class for de/serialising VCF lines - Added new
JannovarDataProtoSerialiser.loadProto()
method for loading intermediateJannovarProto.JannovarData
- Added new
VariantWhiteList
andInMemoryVariantWhiteList
implementation - Added new
VariantEvaluation.isWhiteListed()
method and relevant builder methods - Added new
JannovarDataFactory
for a simple programmatic API to buildJannovarData
objects - Added new
TranscriptSource
enum - Added new
PathogenicityScore.of()
static factory constructor - Added new
PathogenicityScore.getRawScore()
method - Added default
PathogenicityScore.compareTo()
method - Added new static
PathogenicityScore.compare()
method - Added new
ScaledPathogenicityScore
class - Added new
MpcScore
class - Add new
Contig
class for converting contig names to integer-based id
Other changes:
- Updated Spring Boot to version 2.1.3
- Updated Jannovar to version 0.28
- Updated HTSJDK to version 2.18.2
- Refactored
FrequencyData
to use array-based backing for 5-10% memory usage improvement and lower GC especially when nearing max memory - Refactored
AnalysisParser
to utiliseAnalysisBuilder
directly reducing code duplication - Refactored
AnalysisRunner
classes to to utilise newFilterStats
class - Refactored
QueryPhenotypeMatch
to store and return input queryPhenotypeMatches argument - Refactored
VariantDataServiceImpl
to use new AllelePropertiesDao - Refactored
VariantDataServiceImpl
for better readability and performance - Added check for obsolete HPO id input in
AnalysisBuilder.hpoIds()
- Re-enabled
PhenixPrioritiser
inAnalysisParser
- Refactored
VariantEvaluation.getSampleGenotypeString()
implementation to useSampleGenotype
instead ofVariantContext
- Refactored
VariantEffectCounter
internals withVariantEvaluation
calls in place ofVariantContext
- Enabled flagging of variants on a whitelist and bypassing of
FrequencyFilter
andVariantEffectFilter
- Changed
DefaultDiseaseDao
to only return diseases marked as having known disease-gene association or copy-number/structural causes - Added range check to
BasePathogenicityScore
constructor - Updated
CaddScore
andSiftScore
to extendScaledPathogenicityScore
- Updated
CaddDao
to use CADD phred scaled score directly - Replaced production use of
ReferenceDictionary
fromHG19RefDictBuilder
withContig
- Added new
PathogenicitySource
sources -M_CAP, MPC, MVP, PRIMATE_AI
. Be aware that these may not be free for commercial use.
This one goes up to eleven... Samples, Pedigrees and no more SPARSE
CLI changes
- Removed
analysisMode: SPARSE
option - this will default toPASS_ONLY
- Removed
phenixPrioritiser: {}
option - we recommend usinghiPhivePrioritiser: {runParams: 'human'}
for human-only model comparisons - Changed
outputPassVariantsOnly
tooutputContributingVariantsOnly
inoutputOptions
. Enabling this will only report the variants marked asCONTRIBUTING_VARIANT
, i.e. those variants which contribute to theEXOMISER_GENE_VARIANT_SCORE
andEXOMISER_GENE_COMBINED_SCORE
score. This will default tofalse
.outputOptions: outputContributingVariantsOnly: false
Core API
API breaking changes:
- Removed unused
VariantSerialiser
- Moved
ChromosomalRegionIndex
fromanalysis.util
package tomodel
- Changed
HiPhiveOptions.DEFAULT
toHiPhiveOptions.defaults()
to match style with the rest of the framework - Deleted redundant
MvStoreUtil.generateAlleleKey()
method in favour ofAlleleProtoAdaptor.toAlleleKey()
- Split
VariantEffectPathogenicityScore.SPLICING_SCORE
intoSPLICE_DONOR_ACCEPTOR_SCORE
andSPLICE_REGION_SCORE
- Removed unused
VariantEvaluation.getNumberOfIndividuals()
andVariantEvaluation.Builder.numIndividuals()
InheritanceModeAnnotator
now requires an ExomiserPedigree
as input and no longer takes a Jannovarde.charite.compbio.jannovar.pedigree.Pedigree
- Changed
SampleIdentifier
default identifier from 'Sample' to 'sample' to fit existing internal implementation details - Replaced
Analysis.AnalysisBuilder.pedPath(pedPath)
andAnalysis.getPedPath()
withAnalysis.AnalysisBuilder.pedigree(pedigree)
andAnalysis.getPedigree()
- Replaced
AnalysisBuilder.pedPath(pedPath)
withAnalysisBuilder.pedigree(pedigree)
- Removed obsolete
PedigreeFactory
- this functionality has been split amongst the new Pedigree API classes - Removed
AnalysisMode.SPARSE
this was confusing and unused. Unless you need to debug a script, we advise usingAnalysisMode.PASS_ONLY
- Replaced OutputSettings interface with the concrete implementation
- Replaced
OutputSettings.outputPassVariantsOnly()
withOutputSettings.outputContributingVariantsOnly()
. This still has the default value offalse
New APIs:
- Added new jannovar package and faster data serialisation format handled by the
JannovarDataProtoSerialiser
andJannovarProtoConverter
. - Added new native
Pedigree
class for representing pedigrees. - Added new
PedFiles
class for reading PED files into aPedigree
object. - Added new
PedigreeSampleValidator
to check the pedigree, proband and VCF samples are consistent with each other. - Added
SampleIdentifier.defaultSample()
for use with unspecified single-sample VCF files. - Added
InheritanceModeOptions.getMaxFreq()
method for retrieving the maximum frequency of all the defined inheritance modes. - Added new no-args
AnalysisBuilder.addFrequencyFilter()
which uses maximum value fromInheritanceModeOptions
- Added
Pedigree
support toAnalysisBuilder
- Added new
VariantEvaluation.getSampleGenotypes()
method to map sample names to genotype for that allele - Added new utility constructors to
SampleGenotype
e.g.SampleGenotype.het()
,SampleGenotype.homRef()
Other changes:
- Added support for REMM and CADD in
AlleleProtoAdaptor
- Added check to remove alleles not called as ALT in proband
SampleGenotypes
now calculated for all variants in teVariantFactory
- Added support for
frequencyFilter: {}
toAnalysisParser
- Updated HTML output to display current SO terms for variant types/consequence
- Various code clean-up changes
- Changed dependency management to use spring-boot-dependencies rather than deprecated Spring Platform
- Updated Spring Boot to version 2.0.4
JSON out, ClinVar data and multi-interval filters
CLI changes:
- Added support for filtering multiple intervals in the
intervalFilter
# single interval intervalFilter: {interval: 'chr10:123256200-123256300'}, # or for multiple intervals: intervalFilter: {intervals: ['chr10:123256200-123256300', 'chr10:123256290-123256350']}, # or using a BED file - NOTE this should be 0-based, Exomiser otherwise uses 1-based coordinates in line with VCF intervalFilter: {bed: /full/path/to/bed_file.bed}
- Added support for ClinVar annotations - available in the 1805 variant data release. These will appear automatically and are reported for information only.
- Added
JSON
output formatoutputFormats: [HTML, JSON, TSV_GENE, TSV_VARIANT, VCF]
Core API changes:
- Added new simple
BedFiles
class for reading inChromosomalRegion
from an external file. - Added support for filtering multiple intervals in the
IntervalFilter
- Added support for parsing multiple intervals in the
AnalysisParser
- Added new
OutputOption.JSON
- Added new JsonResultsWriter - JSON results format should be considered as being in a 'beta' state and may or may not change slightly in the future.
- Added support for ClinVar annotations
- Added ClinVar annotations to
HTML
andJSON
output options TSV_GENE
andTSV_VARIANT
output formats have been frozen as adding the new datasources will break the format. Use the JSON output for machines or HTML for humans.- Updated Spring platform to Brussels-SR9. This will be the final Exomiser release on the Brussels release train.
10.0.1
Tiny maintenance release.
- Updated HTSJDK library to fix
TribbleException
being thrown when trying to parse bgzipped VCF files
Multiple inheritance modes, smaller, faster, leaner, better
CLI changes:
- Deprecated extended cli options as these were less capable than the analysis file. Options are now
--analysis
or--analysis-batch
only. See the.yml
files in theexamples
directory for recommended scripts. - Exomiser can now analyse samples against multiple inheritance modes in one run using the new
inheritanceModes
field. This also allows variants to be considered under a model with a maximum frequency (%) cut-off. See example.yml
files for more details.inheritanceModes: { AUTOSOMAL_DOMINANT: 0.1, AUTOSOMAL_RECESSIVE_HOM_ALT: 0.1, AUTOSOMAL_RECESSIVE_COMP_HET: 2.0, X_DOMINANT: 0.1, X_RECESSIVE_HOM_ALT: 0.1, X_RECESSIVE_COMP_HET: 2.0, MITOCHONDRIAL: 0.2 }
- The old
modeOfInheritance
option will still work, although it will only run with default frequency cut-offs and may be removed in a later release, so please update your analyses. - The new
1802_phenotype
data release will not work on older exomiser versions as the PPI data is now shipped in a much more efficient storage format. This reduces the startup time to zero and reduces the memory footprint by approx 1 GB. We highly recommend you update older releases to the latest version in order to benefit from more recent phenotype data. - Default variant scores for
FRAMESHIFT
,NONSENSE
,SPLICING
,STOPLOSS
andSTARTLOSS
have been increased from 0.95 to the maximum score of 1.0 to reflect clinical interpretation of these variant consequences.
Core changes:
API breaking changes:
- Removed previously deprecated
Settings
andSettingsParser
classes - this was only used by the cli which was also removed. - Removed unused
PrioritiserSettings
andPrioritiserSettingsImpl
classes - these were only used by theSettingsParser
- Removed unused
PrioritiserFactory.makePrioritiser(PrioritiserSettings settings)
method - this was only used by theSettingsParser
- Removed unused
PrioritiserFactory.getHpoIdsForDiseaseId(String diseaseId)
method. This duplicated/calledPriorityService.getHpoIdsForDiseaseId(String diseaseId)
- Renamed
VariantTypePathogenicityScore
toVariantEffectPathogenicityScore
- Method names of
Inheritable
have changed fromInheritanceModes
toCompatibleInheritanceModes
to better describe their function. - Replaced
SampleNameChecker
with newSampleIdentifierUtil
- Changed signature of
InheritanceModeAnalyser
to require anInheritanceModeAnnotator
. This is now using Exomiser and Jannovar-native calls to analyse inheritance modes instead of the Jannovar mendel-bridge. - Changed
GeneScorer.scoreGene()
signature fromConsumer<Gene>
toFunction<Gene, List<GeneScore>>
to allow scoring of multiple inheritance modes in one run. - Changed
Analysis
andAnalysisBuilder
methodmodeOfInheritance
toinheritanceModes(InheritanceModeOptions inheritanceModeOptions)
- Removed unused methods on
AnalysisResults
- Renamed
OMIMPriority
toOmimPriority
- Renamed
OMIMPriorityResult
toOmimPriorityResult
- Changed
OmimPriorityResult
constructor to requireMap<ModeOfInheritance, Double> scoresByMode
,getScoresByMode()
andgetScoreForMode(modeOfInheritance)
methods - Changed
DataMatrix
from a concrete class to an interface - Changed
ResultsWriter
signatures to require aModeOfInheritance
to write results out for. - Changed
ResultsWriterUtils
now requires a specificModeOfInheritance
New APIs:
- Added new
AlleleCall
class to represent allele calls for alleles from the VCF file - Added new
GeneScore
class for holding results from theGeneScorer
- Added new
SampleIdentifier
class - Added new
SampleGenotype
class to represent VCF GenotypeCalls for a sample on a particular allele. GeneIdentifier
now implementsComparable
and has a staticcompare(geneIdentifier1, geneIdentifier2)
methodGene
now containsGeneScore
having been scored by aGeneScorer
VariantEvaluation
now has methods to determine its compatibility and whether or not it contributes to the overall score under a particularModeOfInheritance
- Added new
SampleIdentifierUtil
to replace deletedSampleNameChecker
- Added new
InheritanceModeAnnotator
andInheritanceModeOptions
- Added new
VariantContextSampleGenotypeConverter
to createSampleGenotype
from aVariantContext
- Added new
DataMatrixUtil
,InMemoryDataMatrix
,OffHeapDataMatrix
,StubDataMatrix
implementations - Added new methods on
DataMatrixIO
to facilitate loading newDataMatrix
objects from disk. - Added new
AnalysisResultsWriter
to handle writing out results instead of having to manually specify writers and inheritance modes
Other changes:
- Demoted most logging from
info
todebug
- Removed Spring control of Thymeleaf from
ThymeleafConfig
andHtmlResultsWriter
so this no longer interferes with web templates
Mitochondrial inheritance support
Updated the Jannovar library to 0.24 which now enables filtering for mitochondrial inheritance modes. Thanks to the whole Jannovar team for making this happen.
Multiple assemblies and many datasources
CLI changes:
- Exomiser can now analyse hg19 or hg38 samples - see
application.properties
for setup details. - Analysis file has new
genomeAssembly:
field - see example.yml
files. Will default to hg19 if not specified. - Genomic and phenotypic data are now separated to allow for more frequent and smaller updates - see README.md for details
- Variant alleles are now stored in a new highly-compressed data format enabling much smaller on-disk footprint with minimal loss of read performance.
- New variant frequency data-sets: TOPMed, UK10K, gnomAD - see example
.yml
files. - New caching mechanism - see
application.properties
for setup details.
Core changes:
- Maven groupId changed from root
org.monarchinitiative
to more specificorg.monarchinitiative.exomiser
. - New AlleleProto protobuf class used to store allele data in the new MVstore.
- Replaced DefaultPathogenicityDao and DefaultFrequencyDao implementations with MvStoreProto implementations.
- Classes in the genome package are no longer under direct Spring control as the
@Component
and@Autowired
annotations have been removed to enable user-defined genome assemblies on a per-analysis basis. - Genome package classes are now configured explicitly in the
exomiser-spring-boot-autoconfigure
module. - New GenomeAssembly enum
- New GenomeAnalysisServiceProvider class
- New GenomeAnalysisService interface - a facade for providing simplified access to the genome module.
- New VcfFiles utility class for providing access to VCF files with the HTSJDK
- New VariantAnnotator interface
- New JannovarVariantAnnotator and JannovarAnnotationService classes
- VariantFactoryImpl now takes a VariantAnnotator as a constructor argument.
- VariantDataService getRegulatoryFeatures() and getTopologicalDomains() split out into new GenomeDataService
- Deprecated Settings class - this will be removed in the next major version.
- Updated classes in analysis package to enable analyses with user-defined genome assemblies.
Bugfix for intergenic variants in TAD null pointer
See #224 for details.
To update to this release unzip the distribution and edit the exomiser-cli-8.0.1/application.properties
to point to the exomiser-cli-8.0.0 data directory. e.g. change
#root path where data is to be downloaded and worked on
#it is assumed that all the files required by exomiser listed in this properties file
#will be found in the data directory unless specifically overridden here.
exomiser.data-directory=data
to
exomiser.data-directory=/opt/exomiser-cli-8.0.0/data