-
Notifications
You must be signed in to change notification settings - Fork 71
Quick start
There are different types of seed possible:
- A single read from the dataset that originates from the organelle genome.
- A organelle sequence derived from the same or a related species.
- A complete organelle sequence of a more distant species (recommended when there is no close related sequence available)
The format should be like a standard fasta file (first line: >Id_sequence)
Be cautious for seed sequences that are similar in both mitochondrial and chloroplast genomes.
We observed good results with RUBP sequences as seeds for chloroplast assembly.
You can download the example file (config.txt) and adjust the settings to your liking.
Every parameter of the configuration file is explained in the file.
No further installation is necessary:
perl NOVOPlasty3.0.pl -c config.txt
The input reads have to be uncompressed Illumina reads (fastq/fasta files) or gz/bz2 zipped files.
There is also an Ion Torrent option, but it does not produce the best results.
Either two separate files(forward and reverse) or a merged fastq/fasta file.
Multiple libraries as input is not yet supported.
DO NOT filter or quality trim the reads!!! Use the raw whole genome dataset (Only adapters should be removed)!
You can subsample to speed up the process and to reduce the memory requirements. This also possible by using the max memory option in the config file. But it is recommended to use as much reads as possible, especially when the organelle genome contains AT-rich stretches.
You can always try different K-mer's. In the case of low coverage problems or seed errors, it's recommended to lower the K-mer (set between 21-39)!!!.