This repo contains example data in various genomics file formats. It is intended for bioinformatics tool developers to make testing software easier. It includes examples of valid file formats, edge cases, and invalid formats.
Browse the data on 42basepairs: https://42basepairs.com/browse/r2/bio-data-zoo
Download this repo as a zip file: https://github.com/omgenomics/bio-data-zoo/archive/refs/heads/main.zip
Format | Extensions |
---|---|
FASTA | .fa, .fa.gz |
FASTQ | .fastq, .fastq.gz |
BAM | .bam, .bam.bai, .bam.csi, .sam, .sam.gz, .sam.gz.csi, .sam.gz.tbi |
VCF | .vcf, .vcf.gz, .vcf.gz.csi, .vcf.gz.tbi, .bcf, .bcf.csi |
BED | .bed, .bed.gz, .bed.gz.csi, .bed.gz.tbi |
CRAM | TODO: .cram, .crai, different CRAM versions |
GFF | TODO: .gff3, .gtf, .gff, .gff.gz, .gff.gz.tbi |
Path | Source | Preview file | Download file |
---|---|---|---|
basic_R1.fastq |
s3://1000genomes |
Preview on 42basepairs | Download |
basic.bam |
s3://1000genomes |
Preview on 42basepairs | Download |
basic_multisample.vcf |
s3://human-pangenomics |
Preview on 42basepairs | Download |
basic.vcf |
s3://human-pangenomics |
Preview on 42basepairs | Download |
basic.bed |
s3://human-pangenomics |
Preview on 42basepairs | Download |
See CONTRIBUTING docs.