Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor changes to documentation + suggestion of TOC #228

Merged
merged 2 commits into from
Feb 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 117 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ When you're ready to contribute code to address an open issue, please follow the
Our continuous integration (CI) testing runs [a number of checks](https://github.com/comorment/containers/actions) for each pull request on [GitHub Actions](https://github.com/features/actions).
You can run most of these tests locally, which is something you should do *before* opening a PR to help speed up the review process and make it easier for us.

And finally, please update the [CHANGELOG](https://github.com/comorment/containers/blob/main/CHANGELOG.md) with notes on your contribution in the "Unreleased" section at the top.
And finally, please update the [CHANGELOG](CHANGELOG.md) with notes on your contribution in the "Unreleased" section at the top.

After all of the above checks have passed, you can now open [a new GitHub pull request](https://github.com/comorment/containers/pulls).
Make sure you have a clear description of the problem and the solution, and include a link to relevant issues.
Expand All @@ -109,3 +109,119 @@ When you're ready to contribute code to address an open issue, please follow the

</details>

## Information for developers

The list of tools included in the different Dockerfiles and installer bash scripts for each container
is provided [here](docker/README.md). Please keep this up to date when pushing new container builds.

### Sphinx

We use sphinx to generate online documentation from README.md files of this repository.
This uses [MyST](https://myst-parser.readthedocs.io) package to generate links in the documentation.
Here are few rules that we follow across ``.md`` files to make it work well:

* use full path to the file in this repository

### Folder structure

These folders are relevant to the users:
* ``docs`` folder contain user documentation
* ``usecases`` folder contain extended examples / tutorials
* ``singularity`` folder contain pre-build containers
* ``reference`` folder contain reference data used in use-cases
* ``scripts`` folder contain pipelines such as ``gwas.py`` and ``pgs-toolkit``, as well as other helper scripts.

These folders are relevant to developers:
* ``docker`` folder contains several ``Dockerfile`` files (container definitions)
and relevant shell scripts (in ``docker/scripts/``) used within those Dockerfile's. Unit-tests validating functionality of the resulting containers are available in the ``tests`` folder.
* ``sphinx-docs`` provides scripts used to build sphinx documentation.

### Note about NREC machine

We use NREC machine to develop and build containers.
NREC machine has small local disk (~20 TB) and a larger external volume attached (~400 TB)
If you use NREC machine, it's important to not store large data or install large software to your home folder which is located on a small disk,
using ``/nrec/projects space`` instead:

```
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 20G 9.6G 9.7G 50% /
/dev/mapper/nrec_extvol-comorment 393G 346G 28G 93% /nrec/projects
/dev/mapper/nrec_extvol_2-comorment_2 935G 609G 279G 69% /nrec/space
```

Both docker and singularity were configured to avoid placing cached files into local file system.
For docker this involves changing ``/etc/docker/daemon.json`` file by adding this:

```
{
"data-root": "/nrec/projects/docker_root"
}
```

(as described <https://tienbm90.medium.com/how-to-change-docker-root-data-directory-89a39be1a70b> ; you may use ``docker info`` command to check the data-root)

For singularity, the configuration is described here <https://sylabs.io/guides/3.6/user-guide/build_env.html>
and it was done for the root user by adding the following line into /etc/environment

```
export SINGULARITY_CACHEDIR="/nrec/projects/singularity_cache"
```

Common software, such as git-lfs, is installed to /nrec/projects/bin.
Therefore it's reasonable for all users of the NREC comorment instance
to add this folder to the path by changing ``~/.bashrc`` and ``~/.bash_profile``.

```
export PATH="/nrec/projects/bin:$PATH"
```

A cloned version of comorment repositories is available here:

```
/nrec/projects/github/comorment/containers
/nrec/projects/github/comorment/reference
```

Feel free to change these folders and use git pull / git push. TBD: currently the folder is cloned as 'ofrei' user - I'm not sure if it will actually work to pull & push. But let's figure this out.

### Testing container builds

Some basic checks for the functionality of the different container builds are provided in ``<containers>/tests/``, implemented in Python.
The tests can be executed using the [Pytest](https://docs.pytest.org) testing framework.

To install Pytest in the current Python environment, issue:

```
pip install pytest # --user optional
```

New virtual environment using [conda](https://docs.conda.io/en/latest/index.html):

```
conda create -n pytest python=3 pytest -y # creates env "pytest"
conda activate pytest # activates env "pytest"
```

Then, all checks can be executed by issuing:

```
cd <containers>
py.test -v tests # with verbose output
```

Checks for individual containers (e.g., ``gwas.sif``) can be executed by issuing:

```
py.test -v tests/test_<container-prefix>.py
```

Note that the proper container files (*.sif files) corresponding to the different test scripts must exist in ``<containers>/singularity/>``,
not only git LFS pointer files.

### Git clone ignoring LFS

See [stackoverflow.com/questions/42019529/how-to-clone-pull-a-git-repository-ignoring-lfs](https://stackoverflow.com/questions/42019529/how-to-clone-pull-a-git-repository-ignoring-lfs)
```
GIT_LFS_SKIP_SMUDGE=1 git clone git@github.com:comorment/containers.git
```
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# CoMorMent Containers
# COSGAP: A COntainerized Statistical Genetics Analysis Pipelines

The goal of the [CoMorMent](https://www.comorment.uio.no) containers repository at <https://github.com/comorment/containers> is to distribute tools for GWAS and post-GWAS analysis in CoMorMent project ([comorment.eu](https://comorment.eu)).
The goal of this github repository (<https://github.com/comorment/containers>) is to distribute software tools for statistical genetics analysis, alongside with their respective reference data and scripts ("analysis pipelines") to facilitate application of these tools. The scope of this project is currently limited to genome-wide association studies (GWAS) and post-GWAS statistical-genetics analyses, including polygenic scoring (PGS). This project builds on earlier work by [Tryggve consortium](https://neic.no/tryggve/),
with most recent major development done as part of the CoMorMent EU H2020 project ([comorment.eu](https://comorment.eu)). For more information see our [preprint](https://arxiv.org/abs/2212.14103) manuscript, [this presentation](https://www.youtube.com/watch?v=msegdR2vJZs) on PGC WWL meeting (Feb 9, 2024), or our online documentation [here](https://comorment-containers.readthedocs.io/en/latest/).

For an overview of available software, see [here](docs/README.md).

Most of these tools are packaged into singularity containers (<https://sylabs.io/singularity/>) and shared in the [singularity](https://github.com/comorment/containers/tree/main/singularity) folder of this repository. You can download individual containers using github's ``Download`` button, or clone the entire repository from command line as described in the [Getting started](#getting-started) section below.

Expand All @@ -14,6 +17,8 @@ More extensive use cases of containers, focusing on real data analysis, are prov

The history of changes is available in the [CHANGELOG.md](./CHANGELOG.md) file.

If you would like to contribute to developing these containers, please see the [CONTRIBUTING](CONTRIBUTING.md) file.

Additional tools are available in separate repositories:

* <https://github.com/comorment/ldsc> - LD score regression
Expand Down
29 changes: 0 additions & 29 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,3 @@
# Docker

This repository is used to develop and document [Singularity](https://sylabs.io) or [Apptainer](https://apptainer.org) containers with various software and analytical tools for GWAS and post-GWAS analysis via [Docker](https://www.docker.com).

## Getting started

For new users we recommend to go over introductory instructions in [docs/singularity/hello.md](./../docs/singularity/hello.md), which explain the basic usage of singularity containers, using a minimalistic example (singularity container with ``plink`` binary).

If you would like to contribute to developing these containers, please see the [CONTRIBUTING](./../CONTRIBUTING.md) file.

For a tutorial on GWAS with synthetic data, see [usecases/gwas_demo.md](./../usecases/gwas_demo.md).

### Prerequisites (to running tutorials)

NOTE: This is out of date. Confer [usecases/README.md](./../usecases/README.md).

* download container files shared on the [Google Drive](https://drive.google.com/drive/folders/1mfxZJ-7A-4lDlCkarUCxEf2hBIxQGO69?usp=sharing).
* download ``comorment_ref.tar.gz`` file from the above Google Drive folder, extract it with ``tar -xzvf comorment_ref.tar.gz`` command,
and create an environmental variable ``COMORMENT_REF`` pointing to the folder containing extracted ``comorment_ref.tar.gz`` data.
If you want to see the content of ``comorment_ref.tar.gz`` without downloading and extracting,
you may take a quick look [here](https://github.com/norment/comorment_data). This is a private repository, and you need to get access.
Please contact Oleksandr and Bayram by e-mail and send us your github user name. If you don't have it, create one [here](http://github.com/join).
* create an empty folder called ``data``, for storing the results and intermediate files produced by running containers.
(most instructinos mount this folder like this: ``-B data:/data``).

## Description of available containers

The detailed description of the available container [files](https://github.com/comorment/containers/tree/main/singularity) provided in this repository are found [here](./../docs/singularity/README.md).

## Software versions

Below is the list of tools included in the different Dockerfiles and installer bash scripts for each container.
Expand Down
25 changes: 22 additions & 3 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,27 @@

## Singularity

Brief descriptions of the available software container builds are available at [singularity](./singularity/README.md).
The list of all tools is provided on [this](/docs/software_list.md) page.
This software is organized into the following containers:
* [hello.sif](/docs/singularity/hello.md) - a simple container for demo purpose, allowing to experiment with singularity features
* [gwas.sif](/docs/singularity/gwas.md) - multiple tools (released as binaries/executables) for imputation and GWAS analysis
* [python3.sif](/docs/singularity/python3.md) - python3 environment with pre-installed modules and tools
* [r.sif](/docs/singularity/r.md) - R 4.0.5 environment with rareGWAMA, GenomicSEM, TwoSampleMR and GSMR packages installed (plus some standard R packages)

## Specifications
All containers have a common set of linux tools like ``gzip``, ``tar``, ``parallel``, etc.
Please [open an issue](https://github.com/comorment/containers/issues/new) if you'd like to add more of such basic tools, or if you would like to update some software to a newer version.

## Data Format Specifications

To improve interoperability between different tools we developed the following data format specification:

* [Genotypes data](/docs/specifications/geno_specification.md)
* [Phenotypes data](/docs/specifications/pheno_specification.md)
* [GWAS Summary Statistics](/docs/specifications/sumstats_specification.md)

These format specifications are applicable to various scripts, released in this repository, including

* [gwas.py](/scripts/gwas/README.md) - pipeline for GWAS analysis
* [LDpred2](/scripts/pgs/LDpred2/README.md) - command-line wrapper around LDpred2
* [pgs_toolkit](/scripts/pgs/pgs_toolkit/README.md) - pipeline for PGS analysis

Specifications of the input data format for GWAS analysis, recommended in CoMorMent projects are documented at [specifications](./specifications/README.md)
12 changes: 0 additions & 12 deletions docs/singularity/README.md

This file was deleted.

16 changes: 0 additions & 16 deletions docs/specifications/README.md

This file was deleted.

6 changes: 5 additions & 1 deletion docs/specifications/geno_specification.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Genotypes
# Genotype data spec

We expect imputed genotype data, which may be split into multiple *cohorts* at each site.
For example, MoBa imputed genotype data is currently split into three cohorts, one per genotype array: GSA, OMNI and HCE.
Expand Down Expand Up @@ -47,3 +47,7 @@ Currently we do not require ``IID`` values to be unique across cohorts.
At of now, we only support the analysis for autosomes (chr 1..22).
Support for other chromosomes will came later.
We expect the same set of individuals across all autosomes (chr 1..22).

## Change log

* ``v0.9`` - first version of this document
7 changes: 6 additions & 1 deletion docs/specifications/pheno_specification.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Phenotypes and covariates
# Phenotypes and covariates spec

For phenotypes and covariates, we expect the data to be organized in a single delimiter-separated file (hereinafter referred to as *phenotype file*),
with rows corresponding to individuals, and columns corresponding to relevant variables of interest or covariates.
Expand Down Expand Up @@ -75,3 +75,8 @@ PC2,CONTINUOUS,2nd principal component
PC3,CONTINUOUS,3rd principal component
...
```

## Change log

* ``v0.9`` - first version of this document
* ``v0.9.1`` - specify case/control coding and rename COLUMN->FIELD in the dictionary file
6 changes: 5 additions & 1 deletion docs/specifications/sumstats_specification.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Summary statistics
# Summary statistics spec

The results of GWAS are represented as summary statistics, with the following columns:

Expand Down Expand Up @@ -55,3 +55,7 @@ If you need these columns for ``regenie`` analysis consider also running ``plink
| Z | ? | Z | Z | Z | OK |
| FRQ | FRQ_A_NNN | FRQ | EAF | FRQ | keep "FRQ" which makes more sense for non-EUR populations |
| missing | ? | missing | EAF_1KG | missing | not needed |

## Change log

* ``v0.9`` - first version of this document
2 changes: 2 additions & 0 deletions reference/examples/gsmr/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
reference/examples/gsmr/gsmr_example.recode.log filter=lfs diff=lfs merge=lfs -text
gsmr_example.recode.log filter=lfs diff=lfs merge=lfs -text
2 changes: 0 additions & 2 deletions sphinx-docs/source/docs/singularity/README.md

This file was deleted.

2 changes: 0 additions & 2 deletions sphinx-docs/source/docs/specifications/README.md

This file was deleted.

45 changes: 45 additions & 0 deletions sphinx-docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,51 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

Here is an overview of how we may want the doc's TOC to be
(As a starting point we may have only root-level listed in the docs, and everything else expanding upon user clicking on that node)

Introduction
Getting started (hello.sif)
Full installation
documentation specific to each container (including containers shared in other repositories)
hello.sif
Software list
gwas.sif
Software list, including refs to documentation
python.sif
Software list
R.sif
Software list
For more docuemtation on LDpred2 see [scripts] folder.
ldsc.sif
* reference overview
* usage exampe
HDL.sif (external repo)
MAGMA.sif
MiXeR.sif
...
specification of data formats
geno
pheno
sumstats
reference data
For tool-specifc referece, see [container documentation]
* opensnp dataset
* summary statistics (HEIGHT, L/R handedness, ...)
* ? 1kG files if needed
scripts (tools / toolkits / pipelines)
gwas
usage example (with data included in the repo, i.e. opensnp dataset)
pgs_toolkit
this supports usage from python
ldpred2
usecases / tutorials (UKB, MoBa, ADNI, ..)
can be READMe files, but also jupyter notebooks
API usage
pgs_toolkit
Contributing / dev instructions (wiki-like content)
Internal usage (p33/p697/Tryggve collaborators)

Welcome to the CoMorMent-container's documentation!
===================================================

Expand Down
2 changes: 2 additions & 0 deletions usecases/bolt_out/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
example_3chr.log filter=lfs diff=lfs merge=lfs -text
myld.log filter=lfs diff=lfs merge=lfs -text
18 changes: 18 additions & 0 deletions usecases/gwas_demo/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
run2_chr3.log filter=lfs diff=lfs merge=lfs -text
run2_PHENO.regenie.log filter=lfs diff=lfs merge=lfs -text
run1_CASE2.regenie.log filter=lfs diff=lfs merge=lfs -text
run1_CASE.regenie.log filter=lfs diff=lfs merge=lfs -text
run1_chr1.log filter=lfs diff=lfs merge=lfs -text
run1.log filter=lfs diff=lfs merge=lfs -text
run1.regenie.step1.log filter=lfs diff=lfs merge=lfs -text
run1_CASE2.plink2.log filter=lfs diff=lfs merge=lfs -text
run1_chr3.log filter=lfs diff=lfs merge=lfs -text
run2_chr1.log filter=lfs diff=lfs merge=lfs -text
run2_PHENO2.plink2.log filter=lfs diff=lfs merge=lfs -text
run2_PHENO2.regenie.log filter=lfs diff=lfs merge=lfs -text
run1_CASE.plink2.log filter=lfs diff=lfs merge=lfs -text
run1_chr2.log filter=lfs diff=lfs merge=lfs -text
run2_chr2.log filter=lfs diff=lfs merge=lfs -text
run2.regenie.step1.log filter=lfs diff=lfs merge=lfs -text
run2.log filter=lfs diff=lfs merge=lfs -text
run2_PHENO.plink2.log filter=lfs diff=lfs merge=lfs -text
1 change: 1 addition & 0 deletions usecases/saige_out/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
out_vcf.log filter=lfs diff=lfs merge=lfs -text
Loading