Skip to content

Commit

Permalink
docs(Improve documentation):
Browse files Browse the repository at this point in the history
Typo and extra information about the classify step.
  • Loading branch information
pchaumeil committed Feb 27, 2023
1 parent 9a89e20 commit 4574ac7
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,11 @@ GTDB-Tk v2.2.0+ includes the following new features:
- **This is now the default behavior for `classify` and `classify_wf`.**
- In `classify`, user genomes are first compared against a Mash database comprised of all GTDB representative genomes and genome pairs of sufficient similarity processed by FastANI. User genomes classified to a GTDB representative based on FastANI results are not run through pplacer.
- In the `classify_wf` workflow, genomes are classified using Mash and FastANI before executing the identify step. User genomes classified with FastANI are not run through the remainder of the pipeline (identify, align, classify).
- `classify_wf` and `classify` have now **an extra mutually exclusive required argument**: You can either pick `--skip_ani_screen` (to skip the ani_screening step to classify genomes using mash and FastANI) or `--mash_db` path to save/read (if exists) the Mash reference sketch database.
- To classify genomes without the additional `ani_screen` step, use the `--skip_ani_screen` flag.

## 📈 Performance
Using ANI screen "can" reduce computation by >50%, although it depends on the set of input genomes. A set of input genomes consisting primarily of new species will not benefit from ANI screen as much as a set of genomes that are largely assigned to GTDB species clusters. In the latter case, the ANI screen will reduce the number of genomes that need to be classified by pplacer which reduces computation time subsantially (between 25% and 60% in our testing).
Using ANI screen "can" reduce computation by >50%, although it depends on the set of input genomes. A set of input genomes consisting primarily of new species will not benefit from ANI screen as much as a set of genomes that are largely assigned to GTDB species clusters. In the latter case, the ANI screen will reduce the number of genomes that need to be classified by pplacer which reduces computation time substantially (between 25% and 60% in our testing).

## 📚 References

Expand Down
1 change: 0 additions & 1 deletion gtdbtk/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
import subprocess
import sys
import tempfile
import time
from datetime import datetime, timedelta
from typing import Dict, Tuple

Expand Down

0 comments on commit 4574ac7

Please sign in to comment.