Bystro Annotation Field Description

Italicized fields are custom Bystro fields. All others are sourced as described.

General output information:

Missing data in the annotation is marked by '!'

Multiple values for a single annotated position are separated by ';'

Multiple positions on a single annotation line (occurs with indels only) are separated by '|'

Annotated output data is ordered in the same way as the original file.

Reserved characters:

"!" ";" "|" "/"
"/" Will be used in a future release to denote overlapping data from a single track
- For instance if 2 different dbSNP records overlap, which often occurs with indels, or when two refSeq transcripts overlap at the same position
- Currently such sites are compressed to ";", but this loses information when a 1:1 relationship does not exist between a track's fields
  - For instance dbSNP.alleles are in the form Major;Minor1;Minor2 and dbSNP.name may or may not be a single value, regardless of # of minor alleles
  - When multiple dbSNP rows overlap, we store each field at that position in a 1D array, which loses the relationship between dbSNP.alleles and dbSNP.name

Input fields

Sourced from the input file, or calculated based on input fields

chrom - chromosome

pos - genomic position

type - the type of variant

VCF format types: SNP, INS, DEL, MULTIALLELIC
SNP format types: SNP, INS, DEL, MULTIALLELIC, DENOVO_*

discordant - does the input file's reference allele differ from Bystro's genome assembly? (1 if yes, 0 otherwise)

trTv - is the site a transition (1), transversion (2), or neither (0)?

alt - the alternate/nonreference allele

VCF multiallelics are split, one line each

heterozygotes - all samples that are heterozygotes for the alternate allele

homozygotes - all samples that are homozygotes for the alternate allele

missingGenos - all samples that have at least one '.' (VCF) or 'N' (SNP) genotype call.

Note: No samples are dropped

Multiallelic variants are always decomposed into bi-allelic variants on separate lines, and given the type MULTIALLELIC

Heterozygotes/Homozygotes are called based on the number of alleles for a given decomposed variants
- For instance, if the variant is pos:1 alt:A,C ref:T and Sample1 is 1/1 on line 1: pos:1 alt:A ref:T hets:Sample1 and on line 2: pos:1 alt:C ref:T hets:Sample1

Reference Assembly

Sourced from UCSC

ref - the reference allele

e.g Human (hg38, hg19), Mouse (mm10, mm9), Fly (dm6), C.elegans (ce11), etc.

refSeq (FAQ)

Sourced from UCSC refGene (schema) and kgXref (schema)

All overlapping RefSeq transcripts are annotated (no prioritization, all possible values are reported)

refSeq.siteType - the effect the alt allele has on this transcript.

Possible types: intronic, exonic, UTR3, UTR5, spliceAcceptor, spliceDonor, ncRNA, intergenic
This is the only field that will have a value when a site is intergenic

refSeq.exonicAlleleFunction - The coding effect of the variant

Possible values: synonymous, nonSynonymous, indel-nonFrameshift, indel-frameshift, stopGain, stopLoss, startLoss

refSeq.refCodon - the codon based on in silico transcription of the reference assembly

refSeq.altCodon - the in silico transcribed codon after modification by the alt allele

refSeq.refAminoAcid - the amino acid based on in silico translation of the transcript

refSeq.altAminoAcid - the in silico translated amino acid after modification by the alt allele

refSeq.codonPosition - the site's position within the codon (1, 2, 3)

refSeq.codonNumber - the codon number within the transcript

refSeq.strand - the positive or negative watson/crick strand

refSeq.kgID - UCSC's Known Genes ID

refSeq.mRNA - mRNA ID, the transcript ID starting with NM_

refSeq.spID - UniProt protein accession number

refSeq.spDisplayID - UniProt display ID

refSeq.protAcc - NCBI protein accession number

refSeq.description - long form description of the RefSeq transcript

refSeq.rfamAcc - Rfam accession number

refSeq.name - RefSeq transcript ID

refSeq.name2 - RefSeq gene name

refSeq.nearest

The nearest transcript(s), upstream or downstream for every position in the genome

refSeq.nearest.name - the nearest transcript(s) RefSeq transcript ID

refSeq.nearest.name2 - the nearest transcript(s) RefSeq gene name

refSeq.clinvar

Alleles found in Clinvar that are larger than 32bp and overlap a refSeq transcript

We report these separately because large alleles are less likely to be relevant to small snps and indels

Clinvar variants are reported based on position and do not necessarily correspond to the input file's alleles at the same position

refSeq.clinvar.alleleID - unique Clinvar identifier

refSeq.clinvar.phenotypeList - associated pheontypes

refSeq.clinvar.clinicalSignificance - designation of significance (i.e. benign, pathogenic, etc) from clinical reports

refSeq.clinvar.type - the variant type (i.e. single nucleotide variant)

refSeq.clinvar.origin - origin tissue for the clinical sample in which the variant was identified (not always provided)

refSeq.clinvar.numberSubmitters - total number of submissions of the Clinvar variant

refSeq.clinvar.reviewStatus - level of intepretation of the variant provided

Such as "reviewed by expert panel"

refSeq.clinvar.chromStart - chromosome start site for the clinvar record

refSeq.clinvar.chromEnd - chromosome end site for the clinvar record

Genome-wide variant scores

Predications of conservation, evolution, and deleteriousness

phastCons - a conservation score that includes neighboring bases

phyloP - a conservation score that does not include neighboring bases

cadd - a score for the deleteriousness of a variant

dbSNP (FAQ)

The larget database of genetic variation

dbSNP variants up to 32 bases in length are reported

dbSNP variants are reported based on position and do not necessarily correspond to the input file's alleles at the same position

dbSNP.name - snp name, usually rs and a number

dbSNP.strand - strand orientation (+/-)

dbSNP.observed - observed SNP alleles at this position (+/- for indels)

dbSNP.class - variant type; includes single, insertion, and deletion

dbSNP.func - site type for the SNP name

dbSNP.alleles - SNP alleles in the dbSNP database

dbSNP.alleleNs - chromosome sample counts

dbSNP.alleleFreqs - major and minor allele frequencies

Clinvar (FAQ)

Clinically-reported human variants (hg38 and hg19 only)

Clinvar variants up to 32 bases in length are reported

Clinvar variants are reported based on position and do not necessarily correspond to the input file's alleles at the same position

clinvar.alleleID - unique clinvar identifier for a particular variant

clinvar.phenotypeList - list of associated phenotypes for variants at this position, including indels up to 32bp in size

clinvar.clinicalSignificance - designation of significance for a variant (i.e. benign, pathogenic, etc) from a clinical report

clinvar.Type - type of variant (i.e. single nucleotide variant

clinvar.Origin - origin tissue for clinical sample (not always provided)

clinvar.numberSubmitters - total number of submissions in clinvar overlapping this position, including indels up to 32bp in size

clinvar.reviewStatus - level of intepretation of the variant provided

clinvar.referenceAllele - reference allele for this position in clinvar

clinvar.alternateAllele - alternate allele(s) for this position seen in clinvar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIELDS.md

FIELDS.md

Bystro Annotation Field Description

Italicized fields are custom Bystro fields. All others are sourced as described.

General output information:

Input fields

Sourced from the input file, or calculated based on input fields

Reference Assembly

Sourced from UCSC

refSeq (FAQ)

Sourced from UCSC refGene (schema) and kgXref (schema)

refSeq.nearest

The nearest transcript(s), upstream or downstream for every position in the genome

refSeq.clinvar

Alleles found in Clinvar that are larger than 32bp and overlap a refSeq transcript

Genome-wide variant scores

Predications of conservation, evolution, and deleteriousness

dbSNP (FAQ)

The larget database of genetic variation

Clinvar (FAQ)

Clinically-reported human variants (hg38 and hg19 only)

Files

FIELDS.md

Latest commit

History

FIELDS.md

File metadata and controls

Bystro Annotation Field Description

Italicized fields are custom Bystro fields. All others are sourced as described.

General output information:

Input fields

Sourced from the input file, or calculated based on input fields

Reference Assembly

Sourced from UCSC

refSeq (FAQ)

Sourced from UCSC refGene (schema) and kgXref (schema)

refSeq.nearest

The nearest transcript(s), upstream or downstream for every position in the genome

refSeq.clinvar

Alleles found in Clinvar that are larger than 32bp and overlap a refSeq transcript

Genome-wide variant scores

Predications of conservation, evolution, and deleteriousness

dbSNP (FAQ)

The larget database of genetic variation

Clinvar (FAQ)

Clinically-reported human variants (hg38 and hg19 only)