Skip to content

1 Introducing PrimerMiner

Vasco Elbrecht edited this page Sep 28, 2016 · 10 revisions

What can PrimerMiner do for you?

PrimerMiner is an R package that batch downloads and processes DNA barcoding sequences from NCBI and BOLD for specified orders or families. It also has powerful visualisation and primer evaluation tools (good alternative to ecoPCR) to test developed primers and existing primers from the literature in silico.

The problem with common primer developed strategies:

For primer development, their are usually two strategies used;

  • Only mitochondrial genomes are considered for primer development, as they are supposedly highly reliable. But this limits the sequences the primers are developed to often less than a few hundred if mitochondrial genomes are available at all.
  • All available gene sequences are downloaded from NCBI, BOLD or a group specific repository. While this gives more data than mitochondrial genomes, the data is often biassed towards a few common overrepresented taxa in these datasets. Further, only using one database does not use the full potential of the available repositories.

Additionally, when generating primers for metabarcoding surveys, often 10 or more Orders have to be considered, and data manually downloaded and processed for each group.

How PrimerMiner solves these problems?

  • PrimerMiner can automatically batch download and process sequences from multiple sources, for as many groups as you want.
  • Target marker sequences can be automatically extracted from mitochondrial genomes.
  • All sequences of each group (typically on order level) are clustered to OTUs, meaning that:
  • each taxon is represented by only one or in case of cryptic species a few, avoiding biases generated by over represented taxa in the database.
  • PrimerMiner is taxonomy independent on Genus and species level! As sequences are clustered on similarity, it does not matter if they had wrong taxonomy assigned, as long as their Family or Order was identified correctly.
  • PrimerMiner takes full advantage of available partial gene sequences and mitochondrial genomes from NCBI, BOLD and your own datasets.

Why PrimerMiner is superior to ecoPCR for primer evaluation?

The sequence alignments created based on the OTUs generated with PrimerMiner are ideal to evaluate primers from the literature. There is software available to do this like ecoPCR, but it makes fundamental mistakes as the position of the and type of mismatches are not considered. Additionally ecoPCR requires complicated input files including taxonomy, while PrimerMiner requires only a sequence alignment in fasta format.

With evaluate_primer() primers can be evaluated in silico, while

  • Giving custom penalties for the position of the mismatch on the primer (mismatches at the 3' typically have higher negative effects, thus higher penalties).
  • Also the type of mismatch does matter, and can also be considered and customised for each base combination.

evaluate_primer() gives you a table for each primer, showing you the mismatches on each individual sequence in the alignment including a total penalty score for each sequence. This way, you could for example consider each sequence / primer combination with a penalty score of above 50 to not amplify. With the function combine_2_primers() you can combine the scores of a forward and reverse primer to determine which one might not work given on a defined threshold for both primers.

The PrimerMiner primer evaluation tools allow for full customisability and transparency, while focusing on what really matters in amplification; Where and what kind of mismatches are occurring between primer and template!

How PrimerMiner works!

  • Downloads sequences of a specific marker from NCBI and BOLD using batch_download().
  • Extracts marker sequences from partial and complete mitochondrial genomes
  • Several Orders and / or target Families can be specified
  • For each group, the downloaded data is clustered to OTUs using Vsearch and the consensus sequences saved.
  • OTUs have to be aligned or mapped to reference sequence (i.e. mitochondrial COI consensus), and all gaps in the alignment removed. We recommend using Geneious for this.
  • The aligned sequences can be visualised for several groups using plot_alignments()
  • Using a alignment of sequences, primers can be evaluate_primer() based on missmatches (Not jet implemented!)

System Requirements and limitations

  • PrimerMiner uses Vsearch for sequence clustering and thus only works with linux based systems. The package comes bundled with Vsearch and was tested on MacOSX (should also work on linux based systems, but this was not tested).
  • Does not work on Windows based operating systems!
  • Requires an (fast) internet connection and installation of R
  • PrimerMiner might not download data for groups higher than Family level, as i.e. Genus names can match several groups and thus PrimeMiner does not know which to download.