This section evaluates the quality of the binning approaches and bins obtained used in the previous section.
We will use CheckM for quality check of bins. Use the lineage workflow to place the genome bins into a reference genome tree and to identify marker genes and estimate contamination. Note that Anvi'o can also do completeness and redundancy estimates (via anvi-summarize
), but these reports are likely to differ based on their different methods and different sets of marker genes.
Run the following command on the cluster, as CheckM has Linux dependencies. These example commands use bins from MetaBAT, but be sure to run these commands on the outputs of each of the binning programs you used in the previous section.
# -f: write results to a file instead of stdout (the default)
# -t: number of threads to use
# -x: file extension for the files which contain bins. in this example,
# the bins would be: bin1.fa, bin2.fa, etc.
checkm lineage_wf -f checkm_file.txt -t 4 -x fa metabat_bins checkm/
Use CheckM for plots of bin quality. Specifically, use bin_qa_plot
for a visual representation of completeness, contamination, and strain heterogeneity. Link for description of plots.
checkm bin_qa_plot -x fa checkm metabat_bins plots
Use the CheckM utility command coverage
to get coverage profiles for all sequences within the genome bins created. Coverage profiles are also required for a number of different plots produced by CheckM.
checkm coverage -t 5 -x fa metabat_bins coverage.tsv example_1.bam example_2.bam
Note that in the above example, you can supply a wildcard for all the BAM files you want to pass into CheckM. For example: path/to/bams/*.bam
.
Use the profile
utility to produce a table with bin size, mapped reads, % mapped reads, % binned populations, and % community. The output defaults to stdout
, so include the option -f
(and --tab_table
) to write to a file. The percentages indicate percentages of reads mapped to an assembly.
checkm profile coverage.tsv
Proceed to section 8.