Conceptually, the utility takes a VCF as input and outputs a set of HL7 V2 OBX observations. These OBX observations can be then incorporated into an overarching HL7 V2 ORU message.
Where clinical annotations are supplied, the software will add them to variant observations. The following annotations are supported:
-
gene (code^symbol^codeSystem): Gene containing the annotated variant. Default value is 'HGNC:0000^NoGene^HGNC'.
-
transcriptRefSeq (STRING): Valid transcript reference sequence identifier (e.g. 'NM_001354609.2')
-
cHGVS (STRING): Valid c.HGVS expression, minus the reference sequence prefix (e.g. 'c.1799_1800delinsAA')
-
proteinRefSeq (STRING): Valid protein reference sequence identifier (e.g. 'NP_001341538.1')
-
pHGVS (STRING): Valid p.HGVS expression, minus the reference sequence prefix (e.g. 'p.V600E')
-
clinSig (STRING): An indication of the clinical significance of the variant. Suggested values are those used by ClinVar, listed here.
-
phenotype (code^symbol^codeSystem): The (coded) condition or phenotype associated with this variant.
Annotations are supplied in a tab-delimited file. Columns 1-4 are CHROM, POS, REF, ALT, and must match a row in the VCF. Columns 5-11 are gene, transcriptRefSeq, cHGVS, proteinRefSeq, pHGVS, clinSig, phenotype. All columns must be present. Columns 1-4 must be populated. Columns 5-11 can contain nulls.
Sample annotation file:
CHROM | POS | REF | ALT | gene | transcriptRefSeq | cHGVS | proteinRefSeq | pHGVS | clinSig | phenotype |
---|---|---|---|---|---|---|---|---|---|---|
chr5 | 112841059 | T | A | HGNC:583^APC^HGNC | NM_001127510.3 | c.5465T>A | p.Val1822Asp | Benign | 72900001^Familial multiple polyposis syndrome^SCT | |
chr11 | 47348490 | T | C | HGNC:7551^MYBPC3^HGNC | NM_000256.3 | c.706A>G | p.Ser236Gly | Likely benign | 35728003^Familial cardiomyopathy^SCT | |
chr11 | 64804546 | T | C | HGNC:7010^MEN1^HGNC | NM_130800.2 | c.1516A>G | p.Thr506Ala | Benign | 30664006^Multiple endocrine neoplasia, type 1^SCT | |
chr13 | 32355250 | T | C | HGNC:1101^BRCA2^HGNC | NM_000059.4 | c.7397T>C | p.Val2466Ala | Benign | 718220008^Hereditary breast and ovarian cancer syndrome^SCT | |
chr19 | 38499670 | C | T | HGNC:10483^RYR1^HGNC | NM_001042723.2 | c.7063C>T | p.Arg2355Trp | Pathogenic | 405501007^Malignant hyperthermia^SCT |
Variants in the VCF file are intersected against an optional file listing variants to convert and a file listing clinical annotations. The following table summarizes the scope of VCF records converted based on these regions.
Conversion Region | Studied Region | Annotations | Output |
---|---|---|---|
Not Supplied | Not Supplied | Not Supplied |
|
Not Supplied | Not Supplied | Supplied |
|
Not Supplied | Supplied | Not Supplied |
|
Not Supplied | Supplied | Supplied |
|
Supplied | Not Supplied | Not Supplied |
|
Supplied | Not Supplied | Supplied |
|
Supplied | Supplied | Not Supplied |
|
Supplied | Supplied | Supplied |
|
The following VCF rows are excluded from conversion:
-
VCF REF is not a simple character string
-
VCF ALT is not a simple character string, comma-separated character string, or '.'.
-
VCF FILTER does not equal 'PASS' or '.'.
-
VCF INFO.SVTYPE is present. (Structural variants are excluded).
-
VCF FORMAT.GT is null ('./.', '.|.', '.', etc).
- For each region studied, create a group of OBXs, grouped by SubID (1st group is 1a, second group is 1b, etc). (See HL7 Lab Results Interface Implementation Guide Section 5.4 for more details on nesting representation).
- Within a group, include these OBXs
- [1..1] 48013-7^Genomic reference sequence^LN (SubID=1a)
- [1..*] 51959-5^Range(s) of DNA sequence examined^LN (SubID=1a.a, 1a.b, 1a.c, …. 1a.aa..1a.az, 1a.ba..1a.bz, etc) (assumes 1-based)
- After groups, include this OBX:
- [1..1] 51968-6^Discrete variation analysis overall interpretation^LN (SubID=1). If any variants, then OBX-5 is 'Positive'. Otherwise, OBX-5 is 'Negative'.
Example:
first group (subID=1a)
OBX|1|ST|48013-7^Genomic reference sequence^LN|1a|NC_000005.9|
OBX|2|NR|51959-5^Range(s) of DNA sequence examined^LN|1a.a|112043201^112181936|
second group (subID=1b)
OBX|3|ST|48013-7^Genomic reference sequence^LN|1b|NC_000017.10|
OBX|4|NR|51959-5^Range(s) of DNA sequence examined^LN|1b.a|41196311^41277500|
OBX|5|NR|51959-5^Range(s) of DNA sequence examined^LN|1b.b|41550000^41600000|
third group (subID=1c)
OBX|6|ST|48013-7^Genomic reference sequence^LN|1c|NC_000019.9|
OBX|7|NR|51959-5^Range(s) of DNA sequence examined^LN|1c.a|38924339^39078204|
OBX|8|TX|51968-6^Discrete variation analysis overall interpretation^LN|1|Positive|
- For each variant, create a group of OBXs, grouped by SubID (1st group is 2a, second group is 2b, etc)
- Within a group, include these OBXs:
- [1..1] 83005-9^Variant Category^LN (SubID=2a): 'Simple'
- [1..1] 47998-0^Variant Display Name^LN (SubID=2a): populate with contextual SPDI, built from refSeq (48013-7), start (81254-5), ref allele (69547-8), alt allele (69551-0) as refSeq:start-1:ref:alt
- [1..1] 48018-6^Gene Studied^LN (SubID=2a): Populate with gene from annotation file. If no gene is present, set equal to ''HGNC:0000^NoGene^HGNC".
- [1..1] 48004-6^DNA Change c.HGVS^LN (SubID=2a):
- If both cHGVS and transcriptRefSeq: transcriptRefSeq+':'+cHGVS
- elseIf cHGVS but not transcriptRefSeq: cHGVS
- else: populate with 47998-0 value
- [0..1] 48005-3^Amino Acid Change p.HGVS^LN (SubID=2a): if pHGVS is supplied, populate with transcriptRefSeq+':'+pHGVS
- If both pHGVS and proteinRefSeq: proteinRefSeq+':'+pHGVS
- elseIf pHGVS but not proteinRefSeq: pHGVS
- else: omit OBX
- [1..1] 48013-7^Genomic reference sequence^LN (SubID=2a)
- [1..1] 69547-8^Genomic ref allele^LN (SubID=2a)
- [1..1] 81254-5^Genomic allele start-end^LN (SubID=2a)
- [1..1] 69551-0^Genomic alt allele^LN (SubID=2a)
- [1..1] 48002-0^Genomic Source Class [Type]^LN (SubID=2a):
- if sourceClass = 'germline': LA6683-2^Germline^LN
- elseIf sourceClass = 'somatic': LA6684-0^Somatic^LN
- [1..1] 53037-8^Genetic Sequence Variation Clinical Significance^LN (SubID=2a):
- If clinSig field is populated in annotation file: clinSig
- else: 'not specified'
- [1..1] 69548-6^Genetic Variant Assessment^LN (SubID=2a): 'Present'
- [0..1] 81259-4^Probable Associated Phenotype^LN (SubID=2a): include if phenotype supplied
- If phenotype field is populated in annotation file: phenotype
- else: omit OBX
- [0..1] 81258-6^Allelic frequency^LN (SubID=2a): Equals FORMAT.AD/FORMAT.DP
- If sourceClass = germline:
- [0..1] 53034-5^Allelic state^LN (SubID=2a): based on FORMAT:GT
- LA6703-8^Heteroplasmic^LN
- LA6704 6^Homoplasmic^LN
- LA6705-3^Homozygous^LN
- LA6706-1^Heterozygous^LN
- LA6707-9^Hemizygous^LN
Example:
chr5 112841059 . T A . . . GT 1/1
first group (subID=2a)
OBX|9|ST|83005-9^Variant Category^LN|2a|Simple|
OBX|10|ST|47998-0^Variant Display Name^LN|2a|NC_000005.10:112841058:T:A|
OBX|11|CWE|48018-6^Gene Studied^LN|2a|HGNC:583^APC^HGNC|
OBX|12|ST|48004-6^DNA Change c.HGVS^LN|2a|NM_001127510.3:c.5465T>A|
OBX|13|ST|48005-3^Amino Acid Change p.HGVS^LN|2a|p.Val1822Asp|
OBX|14|ST|48013-7^Genomic reference sequence^LN|2a|NC_000005.10|
OBX|15|ST|69547-8^Genomic ref allele^LN|2a|T|
OBX|16|NR|81254-5^Genomic allele start-end^LN|2a|112122384|
OBX|17|ST|69551-0^Genomic alt allele^LN|2a|G|
OBX|18|ST|48002-0^Genomic Source Class [Type]^LN|2a|LA6683-2^Germline^LN|
OBX|19|ST|53037-8^Genetic Sequence Variation Clinical Significance^LN|2a|Benign|
OBX|20|ST|69548-6^Genetic Variant Assessment^LN|2a|Present|
OBX|21|CWE|81259-4^Probable Associated Phenotype^LN|2a|72900001^Familial multiple polyposis syndrome^SCT|
OBX|22|CNE|53034-5^Allelic state^LN|2a|LA6705-3^Homozygous^LN|
chr19 38499670 . C T . . . GT 0/1
second group (subID=2b)
OBX|23|ST|83005-9^Variant Category^LN|2b|Simple|
OBX|24|ST|47998-0^Variant Display Name^LN|2b|NC_000019.10:38499669:C:T|
OBX|25|ST|48018-6^Gene Studied^LN|2b|HGNC:10483^RYR1^HGNC|
OBX|26|ST|48004-6^DNA Change c.HGVS^LN|2b|NM_001042723.2:c.7063C>T|
OBX|27|ST|48005-3^Amino Acid Change p.HGVS^LN|2b|p.Arg2355Trp|
OBX|28|ST|48013-7^Genomic reference sequence^LN|2b|NC_000019.10|
OBX|29|ST|69547-8^Genomic ref allele^LN|2b|C|
OBX|30|NR|81254-5^Genomic allele start-end^LN|2b|41254965|
OBX|31|ST|69551-0^Genomic alt allele^LN|2b|CT|
OBX|32|ST|48002-0^Genomic Source Class [Type]^LN|2b|LA6683-2^Germline^LN|
OBX|33|ST|53037-8^Genetic Sequence Variation Clinical Significance^LN|2b|Pathogenic|
OBX|34|ST|69548-6^Genetic Variant Assessment^LN|2b|Present|
OBX|35|CWE|81259-4^Probable Associated Phenotype^LN|2b|405501007^Malignant hyperthermia^SCT|
OBX|36|CNE|53034-5^Allelic state^LN|2b|LA6706-1^Heterozygous^LN|