This is a Rust crate (i.e. library) for working with a local copy of the
NCBI Taxonomy database.
The database can be downloaded (either taxdump.zip
or taxdump.tar.gz
) from the
NCBI Taxonomy FTP site and reformatted into a SQLite database
using the taxonomy_util
utility's to_sqlite
subcommand.
Documentation is available at crates.io.
(new in 0.1.1)
A tool to filter a NCBI RefSeq FASTA file so that only the ancestors of a given taxon are retained.
$ taxonomy_filter_refseq --help
taxonomy_filter_refseq 1.0.0
Peter van Heusden <pvh@sanbi.axc.za>
Filter NCBI RefSeq FASTA files by taxonomic lineage
USAGE:
taxonomy_filter_refseq [FLAGS] [OPTIONS] <INPUT_FASTA> <ANCESTOR_NAME> [OUTPUT_FASTA]
FLAGS:
--no_curated Don't accept curated RNAs and proteins (NM_, NR_ and NP_ accessions)
--no_predicted Don't accept computationally predicted RNAs and proteins (XM_, XR_ and XP_ accessions)
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
-d, --db <TAXDB_URL> URL for SQLite taxonomy database
ARGS:
<INPUT_FASTA> FASTA file with RefSeq sequences
<ANCESTOR_NAME> Name of ancestor to use as ancestor filter
<OUTPUT_FASTA> Output FASTA filename (or stdout if omitted)
(new in version 0.2.0)
$ taxonomy_filter_fastq --help
taxonomy_filter_fastq 1.0.0
Peter van Heusden <pvh@sanbi.axc.za>
Filter FASTQ files whose reads have been classified by Centrifuge or Kraken2, only retaining reads in taxa descending
from given ancestor
USAGE:
taxonomy_filter_fastq [FLAGS] [OPTIONS] <INPUT_FASTQ>... --ancestor_taxid <ANCESTOR_ID> --tax_report_filename <TAXONOMY_REPORT_FILENAME> <--centrifuge|--kraken2>
FLAGS:
-d, --output_dir Directory to deposited filtered output files in
-C, --centrifuge Filter using report from Centrifuge
-h, --help Prints help information
-K, --kraken2 Filter using report from Kraken2
-V, --version Prints version information
OPTIONS:
-A, --ancestor_taxid <ANCESTOR_ID> Name of ancestor to use as ancestor filter
-d, --db <TAXDB_URL> URL for SQLite taxonomy database
-F, --tax_report_filename <TAXONOMY_REPORT_FILENAME> Output from Kraken2 (default) or Centrifuge
ARGS:
<INPUT_FASTQ>... FASTA file with RefSeq sequences
(new in 1.0.0)
Utilities to convert NCBI taxonomy database files into SQLite database (the input format used in other tools).
taxonomy_util 1.0.0
Peter van Heusden <pvh@sanbi.axc.za>
Utilities for working with the NCBI taxonomy database
USAGE:
taxonomy_util [OPTIONS] [SUBCOMMAND]
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
-d, --db <TAXDB_URL> URL for SQLite taxonomy database
SUBCOMMANDS:
common_ancestor_distance find the tree distance to te common ancestor between two taxa
get_id find taxonomy ID for name
get_lineage get lineage for name
get_name find name for taxonomy ID
help Prints this message or the help of the given subcommand(s)
to_sqlite save taxonomy database loaded from files to SQLite database file