-
Notifications
You must be signed in to change notification settings - Fork 73
Quick Start
CheckM works on a directory of genome bins in FASTA format. By default, CheckM assumes genomes consist of contigs/scaffolds in nucleotide space and that the files to process end with the extension fna
. You can specify a different extension with the –x
flag. CheckM calls genes internally using prodigal, taking care to identify genes with recoded stop codons. You can call genes externally and provide CheckM with FASTA files containing genes in amino acid space. To specify this, use the --genes
flag. Again, you may need to change the extension CheckM looks for (e.g., -x faa
).
CheckM consists of a series of commands in order to support a number of different analyses and workflows. If you are in a rush to get started, the standard workflow for CheckM is:
> checkm lineage_wf <bin folder> <output folder>
For a full list of options, run checkm lineage_wf -h
. To speed up processing, use the -t
flag to specify the desired number of threads. If you are on a machine with <40 GB of memory, the --reduced_tree
flag can be used which reduces the memory requirements to approximately 14 GB.
After performing a lineage or taxonomy (checkm taxonomy_wf
) workflow, you can re-run the qa
command to produce a number of different output tables and plots. The qa
command requires a marker file to run. This file is produced during the workflow and is stored in the CheckM output directory specified (e.g. lineage.ms
for lineage_wf
or <taxon name>.ms
for taxonomy_wf
).
Assume you have putative genomes in the directory /home/donovan/bins
with fa
as the file extension and want to store the CheckM results in /home/donovan/checkm
. To processes these genomes with 8 threads, simply run:
> checkm lineage_wf -t 8 -x fa /home/donovan/bins /home/donovan/checkm
Or, to process files of called genes in amino acid space which have the extension faa
, use:
> checkm lineage_wf --genes -t 8 -x faa <bin folder> <output folder>