seqspec
is a machine-readable YAML file format to describe the content of molecules in genomic libraries, the structure of reads generated from them, and how those are stored in files. It was inspired by and builds off of the Teichmann Lab Single Cell Genomics Library Structure by Xi Chen.
Genomic library structure depends on both the assay and sequencer (and kit) used to generate and bind the assay-specific construct to the sequencing adapters to generate a sequencing library. Therefore, a seqspec
is specific to both a genomics assay and sequencer.
A list of seqspec
examples for multiple assays and sequencers can be found on this website. Each spec.yaml
describes the 5'->3' "Final library structure" for the assay and sequencer and can be extended to include sequencer-specific read annotations. Sequence specification files can be formatted with the seqspec
command line tool.
The seqspec
format and tool are described in this publication. If you use seqspec
please cite
Ali Sina Booeshaghi, Xi Chen, Lior Pachter, A machine-readable specification for genomics assays, Bioinformatics, Volume 40, Issue 4, April 2024, btae168.
# release
pip install seqspec
# development
pip install git+https://github.com/pachterlab/seqspec.git
# verify install
seqspec --help
Documentation:
- Learn about
seqspec
:docs/DOCUMENTATION.md
- Write a
seqspec
from scratch :docs/TUTORIAL.md
- Write a
seqspec
from a template :docs/TUTORIAL_FROM_TEMPLATE.md
- Contribute a
seqspec
:docs/CONTRIBUTING.md
- The
seqspec
specification :docs/SPECIFICATION.md
- YouTube video that introduces
seqspec
- Paper that describes
seqspec