Table of Contents
- Bioinformatician
- Bioinformatics
- 生物信息杂谈
- Talks
- Online courses
- Workshop
- Comprehensive packages
- General file formats
- bam/sam/tabix/bgzf
- Fasta/q
- GFF/BED/VCF
- Others formats
- Database API
- data structure
- Models
- Scripts
- Visualization
- Kmer
- Phylogenetic tree
- Taxonomy
- Assembly
- Alignment
- Multiple Alignment
- Mapping
- Bacterial comparative genomics
- Metagenomics
- 16S
- Classifier | removing human reads
- Virome
- Chip-seq
- Plastform
- PCR
- HPC
- Transcriptome
- Variant Calling
- What is a bioinformatician
- Benjamin Franklin Award for Open Access in the Life Sciences
- My Formula as a Bioinformatician
- So you want to be a computational biologist? ☆
- Bioinformatics is not something you are taught, it’s a way of life
- A guide for the lonely bioinformatician
- Bioinformatics Curriculum Guidelines: Toward a Definition of Core Competencies
- Top N Reasons To Do A Ph.D. or Post-Doc in Bioinformatics/Computational Biology
- 101 Questions: a new series of interviews with notable bioinformaticians
- 生物信息学家级别 Levels of Bioinformatics Research
- It’s time to reboot bioinformatics education
- An Explosion Of Bioinformatics Careers
- Is going back to the wet lab worth it?
- 5 things I wish I knew when I started getting into bioinformatics
Social media
- Staying Current in Bioinformatics & Genomics: 2017 Edition
- Interesting bioinformatics blogs (2017 edition)
Programming skills
- Linux Command Line for Bioinformatics
- An Introduction to Programming for Bioscientists: A Python-Based Primer
- You can code, too!
- The Phylogeny of Everything, the Origin of Eukaryotes, and the Rules of Taxonomy: Death to Archaea, Bacteria, and Eucarya! Long live Archaebacteria, Eubacteria, Eukaryota, and Prokaryota!
- Crossroads (iii) – a New Direction for Bioinformatics with Twelve Fundamental Problems
- Ten Simple Rules for Effective Computational Research(高效计算科学研究的十条简单规则)
- Ten Simple Rules for Reproducible Computational Research
- Ten Simple Rules for the Care and Feeding of Scientific Data
- An Introduction To Applied Bioinformatics
- Freedom in bioinformatics
- 二代测序数据辨(之一):Clean Data
- 病原微生物高通量测序数据分析笔记
- What to do with lots of (sequencing) data
- The myths of bioinformatics software
- Good Habit for Bioinformatics Analyst or Scientist
- What Are The Most Common Stupid Mistakes In Bioinformatics?
- Myths about Bioinformatics
- 《学生物的,不会编程,也可以报考生物信息学的研究生》by 牛登科。(学生物的,不会编程,也能学生物信息学技术)
- 《高通量测序能替代PCR吗?》 by 韩建
- 《生物信息学数据分析与皇帝的新装》
- 个性化医疗会带来更昂贵的药物?
- 高通量测序公司靠什么赚钱?
- 生物不退学指南:教你如何靠生物养家糊口 (想进入生物学领域的请看)
- https://liulab-dfci.github.io/bioinfo-combio/
- Rosalind (Rosalind is a platform for learning bioinformatics through problem solving)
- Teaching Materials of Langmead-lab
- A Primer for Computational Biology
- Human Genome Variation Lab, teaching materials from our undergrad computational course on human genetic variation
- [python] Biopython
- [golang] Biogo
- [golang] bio - A simple but high-performance bioinformatics package in Go
I recommend optical duplicate removal for all HiSeq platforms, for any kind of project in which you expect high library complexity (such as WGS). By optical duplicate, I mean removal of duplicates with very close coordinates on the flow cell
- Duplicates on Illumina
- Remove duplicates from reads: best practices?
bbmap clumpify
can remove PCR and optical duplicates- Deduplication Improves Cost-Efficiency and Yields of De Novo Assembly and Binning of Shotgun Metagenomes in Microbiome Research
- zindex - Create an index on a compressed text file
- tabix - table file index
- wormtable - Write-once-read-many table for large datasets.
- [python] hts-python - pythonic wrapper for libhts
- [python] htseq - HTSeq is a Python library to facilitate processing and analysis of data from high-throughput sequencing (HTS) experiments. http://www-huber.embl.de/users/anders/HTSeq/
- [golang] biogo/hts
- bamtools - C++ API & command-line toolkit for working with BAM data
- samblaster - a tool to mark duplicates and extract discordant and split reads from sam files.
- [python] pysamstats - A fast Python and command-line utility for extracting simple statistics against genome positions based on sequence alignments from a SAM or BAM file.
- [python] pysam - a python module for reading and manipulating Samfiles. It's a lightweight wrapper of the samtools C-API. Pysam also includes an interface for tabix. Another sam parser: simplesam
- grabix - a wee tool for random access into BGZF files
- [golang] bix - tabix file access with golang using biogo machinery
- mergesam - Automate common sam & bam conversions
- SAMstat - Displaying sequence statistics for next generation sequencing
- seqtk - Toolkit for processing sequences in FASTA/Q formats
- seqkit - A cross-platform and efficient toolkit for FASTA/Q file manipulation http://bioinf.shenwei.me/seqkit
- [python] pyfaidx - pyfaidx: efficient pythonic random access to fasta subsequences
- [golang] bio - A lightweight and high-performance bioinformatics package in Go
FASTA index
- [golang] faidx
- [golang] bio/seqio/fai
-
bedtools2 - A powerful toolset for genome arithmetic.
-
BEDOPS - the fast, highly scalable and easily-parallelizable genome analysis toolkit
-
gffcompare - classify, merge, tracking and annotation of GFF files by comparing to a reference annotation GFF
-
gffread - GFF/GTF utility providing format conversions, region filtering, FASTA sequence extraction and more
-
[python] gffutils - GFF and GTF file manipulation and interconversion
-
[python] pybedtools - Python wrapper for Aaron Quinlan's BEDTools
-
[golang] irelate - Streaming relation (overlap, distance, KNN) of (any number of) sorted genomic interval sets. #golang
-
[golang] vcfgo - a golang library to read, write and manipulate files in the variant call format.
-
vcflib - a simple C++ library for parsing and manipulating VCF files, + many command-line utilities
- blast_table2xml - Convert blast m6 format to xml for blast2go
- seqmagick - file format conversion in Biopython in a convenient way
- pyensembl - Python interface to ensembl reference genome metadata (exons, transcripts, etc...)
- kvector - kvector is a small utility for converting motifs to kmer vectors to compare motifs of different lengths
- pomegranate - Graphical models for Python, implemented in Cython for speed.
- oneliners - Useful bash one-liners for bioinformatics.
- cgat - CGAT - Computational Genomics Analysis Tools
- bcbb - Incubator for useful bioinformatics code, primarily in Python and R http://bcbio.wordpress.com
- jcvi - Python utility libraries on genome assembly, annotation and comparative genomics
- picobio - Miscellaneous Bioinformatics scripts etc mostly in Python
- pydna - Classes and code for representing double stranded DNA and functions for simulating homologous recombination and Gibson assembly.
- BioUtils - Python scripts for miscellaneous bioinformatics stuff.
- sesbio - Bioinformatics scripts for genome analysis
- ngsutils - Tools for next-generation sequencing analysis http://ngsutils.org
- ngsTools - Programs to analyse NGS data for population genetics purposes
- Circleator - Flexible circular visualization of genome-associated data with BioPerl and SVG.
- ComplexHeatmap - make complex heatmaps as well as self define annotation graphics
- dalliance - Interactive web-based genome browser. http://www.biodalliance.org/
- Question: Which program, tool, or strategy do you use to visualize genomic rearrangements?
- DNAplotlib - DNA plotting library for Python
- Circos: Perl package for circular plots, which are well suited for genomic rearrangements.
- J-Circos is a java application for doing interactive work with circos plots.
- ClicO FS: an interactive web-based service of Circos.
- rCircos: R package for circular plots. [last update: 2013]
- OmicCircos: R package for circular plots for omics data.[last update: 2015-04]
- Gviz - Plotting data and annotation information along genomic coordinates
- pyGenomeTracks - python module to plot beautiful and highly customizable genome browser tracks
- karyoploteR - karyoploteR - An R/Bioconductor package to plot arbitrary data along the genome
- gggenes - Draw gene arrow maps in ggplot2
- DnaFeaturesViewer - Python library to plot DNA sequence features (e.g. from Genbank files)
- ggbio: R package for visualizing biological data. Has a circular view similar to the previous packages.
- D3 chord diagrams (javascript) can be used to visualize genomic rearrangements. See this plot of migration flows as a similar example.
- Genomatix Transcriptome Viewer: Gene Fusion analyses
- iFUSE: integrated fusion gene explorer
- FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery
- SOAPFuse includes the option to generate figures
- Gremlin
- Seqeyes: A flash tool for visualizing structural variations.
- SVVIZ - A READ VISUALIZER TO VALIDATE STRUCTURAL VARIANTS
- samplot - Plot structural variant signals from many BAMs and CRAMs
- Understanding UMAP
- khmer - In-memory nucleotide sequence k-mer counting, filtering, graph traversal and more http://khmer.readthedocs.org/
- Jellyfish
- [R] ggtree - a phylogenetic tree viewer for different types of tree annotations
- [python] ETE tools
- evolview
- NCBI_taxonomy_tree - NCBI taxonomy tree in-memory mapping
- taxiphy - Common repository for scripts to generate trees from taxonomies. Currently includes ITIS, NCBI, and GBIF.
- gtaxon - A fast cross-platform NCBI taxonomy data querying (gi2taxid, taxid2taxon, name2taxid, LCA) tool, with cmd client and REST API server for both local and remote server.
- [R] taxize - A taxonomic toolbelt for R http://ropensci.org/tutorials/taxize.html
- TaxonKit - Cross-platform and Efficient NCBI Taxonomy Toolkit http://bioinf.shenwei.me/taxonkit/
- Bandage - a Bioinformatics Application for Navigating De novo Assembly Graphs Easily
- nucleotid.es - an assembler catalogue
- hpg-aligner - HPG Aligner is an ultrafast and highly sensitive Next-Generation Sequencing (NGS) mapper which supoprts both DNA and RNA alignment
- AliView - Software for aligning viewing and editing dna/aminoacid sequences, intuitive, fast and lightweight. Download and website: http://www.ormbunkar.se/aliview
-
shotmap - A Shotgun Metagenome Annotation Pipeline
-
metagenomeSeq - Statistical analysis for sparse high-throughput sequencing
-
mmgenome - Tools for extracting individual genomes from metagneomes
-
harvest - suite of core-genome alignment and visualization tools for quickly analyzing thousands of intraspecific microbial genomes.
-
PhyloSift - Phylogenetic and taxonomic analysis for genomes and metagenomes
-
MetaQuery: Annotation and quantitative analysis of genes in the human gut microbiome
-
Microbial Ecology - a discussion and overview of amplicon sequencing and metagenomics
network
- NetCoMi Network Comparison for Microbial Compositional Data
- taxonomer.iobio - Taxonomer is a kmer-based ultrafast metagenomics tool for assigning taxonomy to sequencing reads from both clinical and environmental samples.
- BMTagger - Best Match Tagger for removing human reads from metagenomics datasets paper,sop
- Centrifuge - Classifier for metagenomic sequences
- viral-ngs - Viral genomics analysis pipelines
- Rabix - Portable Bioinformatics Pipelines
- bioboxes - Standards for Interchangeable Bioinformatics Containers
- Anvi’o is an analysis and visualization platform for ‘omics data. introduction
- find_differential_primers - Scripts to aid the design of differential primers for diagnostic PCR.
- Primer3-py - Primer3-py is a Python-abstracted API for the popular Primer3 library. The intention is to provide a simple and reliable interface for automated oligo analysis and design.
- hpcgo - Helping submit jobs to HPC cluster.
- easy_qsub - Easily submitting PBS jobs with script template. Multiple input files supported.
ensemble id -> symbol -> biotype
zcat Homo_sapiens.GRCh38.84.gtf.gz \
| awk '$3=="gene"' \
| perl -ne 'next unless /gene_id "(.+?)".+gene_name "(.+?)".+gene_biotype "(.+?)"/; print "$1\t$2\t$3\n";' \
> Homo_sapiens.GRCh38.84.gtf.gz.ensemble2symol-biotype.tsv