Bioinformatics Tools (bit)

Overview
- Programs
- Workflows
Conda install
Citation info
Shameless plug

Overview

There are of course several great and widely used packages of bioinformatics helper programs out there. Some of these include the likes of seqkit, seqtk, fastX-toolkit, and bbtools – all of which I use regularly and have helped me do things I was trying to get done. But there are always more tasks that crop up that may not yet have a helper program or script already written that we can find.

bit is a collection of one-liners, short scripts, programs and workflows that I have been adding to over several years. Anytime I need to write something to perform a task that has more than a one-off, ad hoc use, I consider adding it here.

bit runs in a Unix-like environment and is recommended to be installed with conda as shown below.

Programs

Some of the helper programs/scripts in bit include:

Program/script	Purpose
`bit-dl-ncbi-assemblies`	download NCBI assemblies in different formats by just providing accessions
`bit-get-accessions-from-GTDB`	search the (stellar) Genome Taxonomy Database by taxonomy and get their NCBI accessions
`bit-summarize-assembly`	quickly summarize nucleotide assemblies
`bit-ez-screen`	quickly search for nucleotide targets in nucleotide input fastas, filtered based on tunable target-coverage and percent ID thresholds, and summarized in a simple table
`bit-summarize-column`	quickly summarize a numeric column
`bit-mutate-seqs`	introduce point mutations (substitutions/indels) in nucleotide or amino acid fasta files
`bit-count-bases-per-seq`	count the number of bases per sequence in a fasta file
`bit-rename-fasta-headers`	rename sequences in a fasta
`bit-parse-fasta-by-headers`	split a fasta file based on headers
`bit-reorder-fasta`	re-order a fasta file
`bit-extract-seqs-by-coords`	pull out sequences from a fasta by their coordinates
`bit-genbank-to-cds-table`	pull out general CDS info into a tsv from a GenBank file
`bit-genbank-to-AA-seqs`, `bit-genbank-to-fasta`	pull amino-acid or nucleotide sequences out of a GenBank file
`bit-calc-variation-in-msa`	calculate variation in each column of a multiple-sequence alignment
`bit-filter-table`	filter a table based on wanted IDs
`bit-get-lineage-from-taxids`	get full lineage info from a list of taxon IDs (making use of the also stellar TaxonKit)
`bit-filter-KOFamScan-results`	filter KOFamScan results
`bit-get-go-term-info`	get information about a specific GO term
`bit-summarize-go-annotations`	summarize GO annotations
`bit-kraken2-to-taxon-summaries`, `bit-combine-kraken2-taxon-summaries`	summarize kraken2 outputs in a table with counts of full taxonomic lineages, and combining multiple samples
`bit-combine-bracken-and-add-lineage`	combine bracken outputs and adding full taxonomic lineage info
`bit-gen-iToL-map`, `bit-gen-iToL-colorstrip`, `bit-gen-iToL-text-dataset`, `bit-gen-iToL-binary-dataset`	generate color/mapping/data files for use with trees being viewed on the Interactive Tree of Life site
`bit-figshare-upload`	upload a file to figshare

And other just convenient things that are nice to have handy, like removing soft line wraps that some fasta files have (bit-remove-wraps), and printing out the column names of a TSV with numbers (bit-colnames) to quickly see which columns we want to provide to things like cut or awk 🙂

Each command has a help menu accessible by either entering the command alone or by providing -h as the only argument. Once installed, you can see all available commands by entering bit- and pressing tab twice.

Workflows

The snakemake workflows packaged with bit are retrievable with bit-get-workflow and currently include:

Workflow	Purpose
sra-download	downloads sra reads via prefetch and fasterq-dump, with helper program for combining run accessions if needed (see here for usage details)
genome-summarize	generates genome assembly stats, quality estimates, and taxonomy info (see here for usage details and overview)
metagenomics	processes short-read metagenomics data via assembly through to merged taxonomy and KO coverage tables, and recovers and characterizes MAGs (see here for usage details and overview)

For greater detail and usage information, see the pages linked above for each workflow.

Note that workflows are versioned independently of the bit package. When you pull one with bit-get-workflow, the directory name will have the version, and it is also listed at the top of the Snakefile.

Conda install

If you are new to the wonderful world of conda and want to learn more, one place you can start learning about it is here 🙂

Due to increasing program restrictions as bit has grown, it's easiest to install it in its own environment as shown below:

conda create -n bit -c astrobiomike -c conda-forge -c bioconda -c defaults bit
conda activate bit

Each command has a help menu accessible by either entering the command alone or by providing -h as the only argument. Once installed, you can see all available commands by entering bit- and pressing tab twice.

Citation info

If you happen to find bit useful in your work, please be sure to cite it 🙂

Lee M. bit: a multipurpose collection of bioinformatics tools. F1000Research 2022, 11:122. https://doi.org/10.12688/f1000research.79530.1

You can get the version you are using by running bit-version.

If you are using a program in bit that also leverages another program, please be sure to cite them too. For instance, bit-get-lineage-from-taxids uses TaxonKit, and bit-slim-down-go-terms uses goatools. For cases where a bit script relies on other programs like those, it will be indicated in the help menu of the bit program.

Shameless plug

For phylogenomics, checkout GToTree 🙂

Name		Name	Last commit message	Last commit date
Latest commit History 407 Commits
bit		bit
images		images
test-data		test-data
workflows		workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bioinformatics Tools (bit)

Overview

Programs

Workflows

Conda install

Citation info

Shameless plug

About

Releases 103

Packages

Contributors 2

Languages

License

AstrobioMike/bit

Folders and files

Latest commit

History

Repository files navigation

Bioinformatics Tools (bit)

Overview

Programs

Workflows

Conda install

Citation info

Shameless plug

About

Resources

License

Stars

Watchers

Forks

Releases 103

Packages 0

Contributors 2

Languages

Packages