Skip to content

Commit

Permalink
Merge pull request #29 from EBISPOT/yaml_update
Browse files Browse the repository at this point in the history
Yaml update
  • Loading branch information
ala-ebi authored Mar 13, 2024
2 parents fe6be8c + 24872c6 commit 585b5a2
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 36 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ author: Yue Ji and Laura Harris
date: Feburary 22, 2024
description: Having clear and accessible metadata is essential for enhancing data interpretation and ensuring its reusability. In the case of Genome-Wide Association Studies (GWAS), having a standardized and easy-to-understand format for documenting study metadata is crucial. In the GWAS Catalog, metadata associated with full genome-wide summary statistics files is accessible via multiple routes - searchable in the main Catalog via the website and REST API,
slug: streamlining-GWAS-Summary-Statistics-Data-Interpretation-with-Improved-YAML-Metadata
img: meta_yaml.png
---

Having clear and accessible metadata is essential for enhancing data interpretation and ensuring its reusability. In the case of Genome-Wide Association Studies (GWAS), having a standardized and easy-to-understand format for documenting study metadata is crucial. In the GWAS Catalog, metadata associated with full genome-wide summary statistics files is accessible via multiple routes - searchable in the main Catalog via the website and REST API, and additionally via a text file in YAML format, contained in the same directory as the data file.
Expand All @@ -28,42 +29,8 @@ These updates reflect our commitment to improving the user experience while ensu


Table 1. Metadata field definitions
| Field | Description | Data type and values | Mandatory | Example |
| -------------------------------- | ---------------------------------------------------------------------------------- | ------------------------------------------------------- | ----------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
| \# Study meta-data |
| gwas_id | GWAS Catalog accession ID | Text string | Yes | GCST90244057 |
| author_notes | Additional information about this study from the author | Text string | No | File contains GWAS summary statistics from a meta-analysis of NMR metabolic traits in up to 33 cohorts. |
| gwas_catalog_api | GWAS catalog REST API link | Text string | Yes | [https://www.ebi.ac.uk/gwas/rest/api/studies/GCST90244057](https://www.ebi.ac.uk/gwas/rest/api/studies/GCST90244057) |
| date_metadata_last_modified | The latest date that metadata YAML file was modified | date | Yes | 2023-11-28 |
| \# Trait Information |
| trait_description | Author reported trait description | Text string (multiple possible) | Yes | Body mass index |
| ontology_mapping | Short form ontology terms describing the trait | Text string (multiple possible) | No | EFO_0004918 |
| \# Genotyping Information |
| genome_assembly | Genome assembly for the summary statistics. | GRCh/NCBI/UCSC value | Yes | GRCh37 |
| coordinate_system | Coordinate system used for the summary statistics | Text String (1-based or 0-based) | No | 1-based |
| genotyping_technology | Method(s) used to genotype variants in the discovery stage. | Text string (multiple possible) | Yes | Genome-wide genotyping array |
| imputation_panel | Panel used for imputation | Text string | No | HRC + UK10K |
| imputation_software | Software used for imputation | Text string | No | SHAPEIT3 + IMPUTE4 |
| \# Sample Information |
| sample_ancestry_category | Broad ancestry category that best describes the sample. | Text string | Yes | European |
| sample_ancestry | The most detailed ancestry descriptor(s) for the sample. | Text string (multiple possible) | Yes | \- Finnish<br>- British |
| sample_size | Sample size | Integer | Yes | 27006 |
| ancestry_method | Method used to determine sample ancestry e.g. self-reported/genetically determined | Text string (multiple possible) | No | self-reported |
| case_control_study | Flag whether the study is a case-control study | Boolean | No (default is false) | true |
| case_count | Number of cases for case/control study | Integer | No, unless caseControlStudy is true | 27006 |
| control_count | Number of controls for case/control study | Integer | No, unless caseControlStudy is true | 27006 |
| sex | To indicate a sex-stratified analysis | M (for male), F (for female), combined or NR if unknown | No | combined |
| \# Summary Statistic information |
| data_file_name | The name of the summary statistics file | Text string | Yes | GCST90244057_buildGRCh37.tsv |
| file_type | The format of the summary statistics file | "GWAS-SSF v1.0", "pre-GWAS-SSF", "non-GWAS-SSF" | Yes | GWAS-SSF v1.0 |
| data_file_md5sum | The md5 checksum of the summary statistics file. | Text string | Yes | 0ec56396f89edcc21a3d5a25a6fa993d |
| analysis_software | Software and version used for the association analysis | Text string (multiple possible) | Yes if p-values of 0 given | REGENIE |
| adjusted_covariates | Any covariates the GWAS is adjusted for | Text string (multiple possible) | No | sex |
| minor_allele_freq_lower_limit | Lowest possible effect allele frequency | Numeric | No | 0.0003 |
| \# Harmonization status |
| is_harmonised | Description of harmonisation codes | Text string | Only given in harmonised datasets | false |
| is_sorted | Flag whether the file is sorted by genomic location | Boolean | Yes | false |
| harmonisation_reference | The genome reference file used for harmonising the summary statistics file | Text string | No | ftp://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/dna/ |
<article-image src="streamlining-GWAS-Summary-Statistics-Data-Interpretation-with-Improved-YAML-Metadata/meta_yaml.png" alt="Metadata field definitions" style='height: 100%; width: 100%'></article-image>

## Questions and feedback

Questions or comments about this change? Please contact us as gwas-info@ebi.ac.uk.

0 comments on commit 585b5a2

Please sign in to comment.