Skip to content

Validation with paired metagenomes

Gavin Douglas edited this page Jun 17, 2019 · 2 revisions

One direct method for testing PICRUSt metagenome prediction accuracy in a new environment is to sequence 16S rRNA and shotgun metagenomes from the same samples. The functional profiles from both methods can then be directly compared to validate the approaches. This could be useful especially if you have shotgun metagenomes for a small number of samples and 16S data from additional samples from the same environment. This approach would enable you to gauge the performance on a subset of samples and corroborate the signals of specific predicted functions of interest.

There are two important considerations when taking this approach:

  • The metagenomes should be deeply sequenced: Because taxa saturate more quickly than genes, substantial metagenomics sequencing depth is needed to compare against PICRUSt predictions. In the original PICRUSt paper it was found that ~72,000 raw or ~15,000 annotated metagenomics sequences are needed before a subset of a deeply sequenced metagenome did better against the full metagenome than PICRUSt did (at least in soils). However, this may be changing as shallow metagenomics pipelines are now becoming available, but applies to typical metagenomics processing pipelines.

  • Even predicting random bacterial genomes produces high correlations simply because certain gene families are common or rare everywhere. This means that simply knowing that a metagenome has mostly bacteria in it goes a long way towards correctly predicting the full metagenome. Therefore it is necessary to evaluate the prediction performance against a random expectation. You can do this by comparing the metagenomes with predictions based on scrambled ASV table or simply to the mean gene family abundances across the reference genomes.