Missing data in the annotation is marked by '!'
Multiple values for a single annotated position are separated by ';'
Multiple positions on a single annotation line (occurs with indels only) are separated by '|'
Annotated output data is ordered in the same way as the original file.
Reserved characters:
- "!" ";" "|" "/"
- "/" Will be used in a future release to denote overlapping data from a single track
- For instance if 2 different dbSNP records overlap, which often occurs with indels, or when two refSeq transcripts overlap at the same position
- Currently such sites are compressed to ";", but this loses information when a 1:1 relationship does not exist between a track's fields
- For instance dbSNP.alleles are in the form Major;Minor1;Minor2 and dbSNP.name may or may not be a single value, regardless of # of minor alleles
- When multiple dbSNP rows overlap, we store each field at that position in a 1D array, which loses the relationship between dbSNP.alleles and dbSNP.name
chrom - chromosome
pos - genomic position
type - the type of variant
- VCF format types: SNP, INS, DEL, MULTIALLELIC
- SNP format types: SNP, INS, DEL, MULTIALLELIC, DENOVO_*
discordant - does the input file's reference allele differ from Bystro's genome assembly? (1 if yes, 0 otherwise)
trTv - is the site a transition (1), transversion (2), or neither (0)?
alt - the alternate/nonreference allele
- VCF multiallelics are split, one line each
heterozygotes - all samples that are heterozygotes for the alternate allele
homozygotes - all samples that are homozygotes for the alternate allele
missingGenos - all samples that have at least one '.' (VCF) or 'N' (SNP) genotype call.
- Note: No samples are dropped
Multiallelic variants are always decomposed into bi-allelic variants on separate lines, and given the type MULTIALLELIC
- Heterozygotes/Homozygotes are called based on the number of alleles for a given decomposed variants
- For instance, if the variant is pos:1 alt:A,C ref:T and Sample1 is 1/1 on line 1: pos:1 alt:A ref:T hets:Sample1 and on line 2: pos:1 alt:C ref:T hets:Sample1
ref - the reference allele
- e.g Human (hg38, hg19), Mouse (mm10, mm9), Fly (dm6), C.elegans (ce11), etc.
refSeq (FAQ)
All overlapping RefSeq transcripts are annotated (no prioritization, all possible values are reported)
refSeq.siteType - the effect the alt
allele has on this transcript.
- Possible types: intronic, exonic, UTR3, UTR5, spliceAcceptor, spliceDonor, ncRNA, intergenic
- This is the only field that will have a value when a site is intergenic
refSeq.exonicAlleleFunction - The coding effect of the variant
- Possible values: synonymous, nonSynonymous, indel-nonFrameshift, indel-frameshift, stopGain, stopLoss, startLoss
refSeq.refCodon - the codon based on in silico transcription of the reference assembly
refSeq.altCodon - the in silico transcribed codon after modification by the alt
allele
refSeq.refAminoAcid - the amino acid based on in silico translation of the transcript
refSeq.altAminoAcid - the in silico translated amino acid after modification by the alt
allele
refSeq.codonPosition - the site's position within the codon (1, 2, 3)
refSeq.codonNumber - the codon number within the transcript
refSeq.strand - the positive or negative watson/crick strand
refSeq.kgID - UCSC's Known Genes ID
refSeq.mRNA - mRNA ID, the transcript ID starting with NM_
refSeq.spID - UniProt protein accession number
refSeq.spDisplayID - UniProt display ID
refSeq.protAcc - NCBI protein accession number
refSeq.description - long form description of the RefSeq transcript
refSeq.rfamAcc - Rfam accession number
refSeq.name - RefSeq transcript ID
refSeq.name2 - RefSeq gene name
refSeq.nearest.name - the nearest transcript(s) RefSeq transcript ID
refSeq.nearest.name2 - the nearest transcript(s) RefSeq gene name
We report these separately because large alleles are less likely to be relevant to small snps and indels
Clinvar variants are reported based on position and do not necessarily correspond to the input file's alleles at the same position
refSeq.clinvar.alleleID - unique Clinvar identifier
refSeq.clinvar.phenotypeList - associated pheontypes
refSeq.clinvar.clinicalSignificance - designation of significance (i.e. benign, pathogenic, etc) from clinical reports
refSeq.clinvar.type - the variant type (i.e. single nucleotide variant)
refSeq.clinvar.origin - origin tissue for the clinical sample in which the variant was identified (not always provided)
refSeq.clinvar.numberSubmitters - total number of submissions of the Clinvar variant
refSeq.clinvar.reviewStatus - level of intepretation of the variant provided
- Such as "reviewed by expert panel"
refSeq.clinvar.chromStart - chromosome start site for the clinvar record
refSeq.clinvar.chromEnd - chromosome end site for the clinvar record
phastCons - a conservation score that includes neighboring bases
phyloP - a conservation score that does not include neighboring bases
cadd - a score for the deleteriousness of a variant
dbSNP (FAQ)
dbSNP variants up to 32 bases in length are reported
dbSNP variants are reported based on position and do not necessarily correspond to the input file's alleles at the same position
dbSNP.name - snp name, usually rs and a number
dbSNP.strand - strand orientation (+/-)
dbSNP.observed - observed SNP alleles at this position (+/- for indels)
dbSNP.class - variant type; includes single, insertion, and deletion
dbSNP.func - site type for the SNP name
dbSNP.alleles - SNP alleles in the dbSNP database
dbSNP.alleleNs - chromosome sample counts
dbSNP.alleleFreqs - major and minor allele frequencies
Clinvar (FAQ)
Clinvar variants up to 32 bases in length are reported
Clinvar variants are reported based on position and do not necessarily correspond to the input file's alleles at the same position
clinvar.alleleID - unique clinvar identifier for a particular variant
clinvar.phenotypeList - list of associated phenotypes for variants at this position, including indels up to 32bp in size
clinvar.clinicalSignificance - designation of significance for a variant (i.e. benign, pathogenic, etc) from a clinical report
clinvar.Type - type of variant (i.e. single nucleotide variant
clinvar.Origin - origin tissue for clinical sample (not always provided)
clinvar.numberSubmitters - total number of submissions in clinvar overlapping this position, including indels up to 32bp in size
clinvar.reviewStatus - level of intepretation of the variant provided
clinvar.referenceAllele - reference allele for this position in clinvar
clinvar.alternateAllele - alternate allele(s) for this position seen in clinvar