This repository is for maintaining the GWAS-SSF (GWAS summary statistics format) specification.
Click here for the current specification. Examples are here.
Comments, suggestions and issues are welcome via the issue tracker.
- GWAS summary statistics format (GWAS-SSF) is composed of two files:
- summary statistics data file (TSV) e.g. 0000123.tsv
- accompanying metadata file (YAML) e.g. 0000123.tsv-meta.yaml
- The summary statistics data file is a TSV flat file of tab-delimited values that can be compressed (see schamatic), reporting data from a single genome-wide analysis.
- The first line of the summary statistics data file contains the headers to the table.
- The rows after the header store the variant association data.
- Where permitted, values can be omitted by the presence of 'NA'.
- There are no limits to the number of rows or columns that the table can have, however, a set of mandatory fields (defined in spec, see Table 1) must be present in a defined order.
- A file may contain additional columns beyond the set of mandatory fields.
- The metadata file describes the summary statistics data file such as the filename and md5sum (see example for example)
- The metadata file describes the GWAS data such as sample, trait, genome assembly etc (see spec)
- The metadata file can be validated against the yaml schema with yamale e.g.
yamale -s schema/metadata-yamale-schema.yaml examples/0000123.tsv-meta.yaml
- A pydantic schema is also available for python projects here
- Alternatively, you can use datamodel-code-generator to generate the pydantic model from the json schema
Convert the .tex to pdf:
pdflatex gwas-ssf_<version>.tex
generates the pdf, gwas-ssf_.pdf.