Skip to content

Commit

Permalink
Release 0.1.2. Minor updates / bug fixes + Adding containers.
Browse files Browse the repository at this point in the history
  • Loading branch information
shz9 committed Apr 24, 2024
1 parent e868877 commit 538fec6
Show file tree
Hide file tree
Showing 11 changed files with 328 additions and 62 deletions.
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,20 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.2] - 2024-04-24

### Changed

- Fixed `manhattan` plot implementation to support various new features.
- Added a warning when accessing `csr_matrix` property of `LDMatrix` when it hasn't been loaded
previously.

### Added

- `reset_mask` method for magenpy `LDMatrix`.
- `Dockerfile`s for both `cli` and `jupyter` modes.
- A helper script to convert LD matrices from old format to new format.

## [0.1.1] - 2024-04-12

### Changed
Expand Down
43 changes: 43 additions & 0 deletions containers/cli.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Usage:
# ** Step 1 ** Build the docker image:
# docker build -f cli.Dockerfile -t magenpy-cli .
# ** Step 2** Run the docker container in interactive shell mode:
# docker run -it magenpy-cli /bin/bash
# ** Step 3** Test magenpy_ld:
# magenpy_ld -h

FROM python:3.11-slim-buster

LABEL authors="Shadi Zabad"
LABEL version="0.1"
LABEL description="Docker image containing all requirements to run the commandline scripts in the magenpy package"

# Install system dependencies
RUN apt-get update && apt-get install -y \
unzip \
wget \
pkg-config \
g++ gcc \
libopenblas-dev \
libomp-dev

# Download and setup plink2:
RUN mkdir -p /software && \
wget https://s3.amazonaws.com/plink2-assets/alpha5/plink2_linux_avx2_20240105.zip -O /software/plink2.zip && \
unzip /software/plink2.zip -d /software && \
rm /software/plink2.zip

# Download and setup plink1.9:
RUN mkdir -p /software && \
wget https://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20231211.zip -O /software/plink.zip && \
unzip /software/plink.zip -d /software && \
rm /software/plink.zip

# Add plink1.9 and plink2 to PATH:
RUN echo 'export PATH=$PATH:/software' >> ~/.bashrc

# Install magenpy package from PyPI
RUN pip install --upgrade pip magenpy

# Test the installation
RUN magenpy_ld -h
53 changes: 53 additions & 0 deletions containers/jupyter.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Usage:
# ** Step 1 ** Build the docker image:
# docker build -f ../vemPRS/containers/jupyter.Dockerfile -t magenpy-jupyter .
# ** Step 2 ** Run the docker container (pass the appropriate port):
# docker run -p 8888:8888 magenpy-jupyter
# ** Step 3 ** Open the link in your browser:
# http://localhost:8888


FROM python:3.11-slim-buster

LABEL authors="Shadi Zabad"
LABEL version="0.1"
LABEL description="Docker image containing all requirements to run the magenpy package in a Jupyter Notebook"

# Install system dependencies
RUN apt-get update && apt-get install -y \
unzip \
wget \
pkg-config \
g++ gcc \
libopenblas-dev \
libomp-dev

# Download and setup plink2:
RUN mkdir -p /software && \
wget https://s3.amazonaws.com/plink2-assets/alpha5/plink2_linux_avx2_20240105.zip -O /software/plink2.zip && \
unzip /software/plink2.zip -d /software && \
rm /software/plink2.zip

# Download and setup plink1.9:
RUN mkdir -p /software && \
wget https://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20231211.zip -O /software/plink.zip && \
unzip /software/plink.zip -d /software && \
rm /software/plink.zip

# Add plink1.9 and plink2 to PATH:
RUN echo 'export PATH=$PATH:/software' >> ~/.bashrc

# Install magenpy package from PyPI
RUN pip install --upgrade pip magenpy jupyterlab

# Expose the port Jupyter Lab will be served on
EXPOSE 8888

# Set the working directory
WORKDIR /magenpy_dir

# Copy the current directory contents into the container at /app
COPY . /magenpy_dir

# Run Jupyter Lab
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--allow-root", "--NotebookApp.token=''"]
18 changes: 18 additions & 0 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,21 @@ source magenpy_env/bin/activate
python -m pip install --upgrade pip
python -m pip install magenpy>=0.1
```

### Using `Docker` containers

If you are using `Docker` containers, you can build a container with the `viprs` package
and all its dependencies by downloading the relevant `Dockerfile` from the
[repository](https://github.com/shz9/magenpy/tree/master/containers) and building it
as follows:

```bash
# Build the docker image:
docker build -f cli.Dockerfile -t magenpy-cli .
# Run the container in interactive mode:
docker run -it magenpy-cli /bin/bash
# Test that the package installed successfully:
magenpy_ld -h
```

We plan to publish pre-built `Docker` images on `DockerHub` in the future.
69 changes: 69 additions & 0 deletions examples/convert_old_ld_matrices.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
"""
This is a utility script that converts the old-style published LD matrices (magenpy 0.0.X) to the new
format deployed since magenpy>=0.1. The old LD matrix format used ragged Zarr arrays, while the new format
uses flattened Zarr arrays that are more efficient and easier to work with. The script takes the path to the
old LD matrices and converts them to the new format with the desired precision (e.g. float32).
The user may also specify the compressor name and compression level for the new LD matrices.
The script will validate the conversion by checking the integrity of the new LD matrices.
Usage:
python convert_old_ld_matrices.py --old-matrix-path /path/to/old/ld_matrices/chr_* \
--new-path /path/to/new/ld_matrices/ \
--dtype float32
"""

import magenpy as mgp
from magenpy.utils.system_utils import makedir
import zarr
import os.path as osp
import glob
import argparse


parser = argparse.ArgumentParser(description="""
Convert old-style LD matrices (magenpy 0.0.X) to the new format (magenpy >=0.1).
""")

parser.add_argument('--old-matrix-path', dest='old_path', type=str, required=True,
help='The path to the old LD matrix. Can be a wild card of the form "path/to/chr_*"')
parser.add_argument('--new-path', dest='new_path', type=str, required=True,
help='The path where to store the new LD matrix.')
parser.add_argument('--dtype', dest='dtype', type=str, default='int16',
choices={'int8', 'int16', 'float32', 'float64'},
help='The desired data type for the entries of the new LD matrix.')
parser.add_argument('--compressor', dest='compressor', type=str, default='zstd',
help='The compressor name for the new LD matrix.')
parser.add_argument('--compression-level', dest='compression_level', type=int, default=9,
help='The compression level for the new LD matrix.')

args = parser.parse_args()

for f in glob.glob(args.old_path):

try:
z_arr = zarr.open(f, 'r')
chrom = z_arr.attrs['Chromosome']
except Exception as e:
print(f"Error: {e}")
continue

print(f"> Converting LD matrix for chromosome: {chrom}")

new_path_suffix = f'chr_{chrom}'
if new_path_suffix not in args.new_path:
new_path = osp.join(args.new_path, new_path_suffix)
else:
new_path = args.new_path

makedir(new_path)

ld_mat = mgp.LDMatrix.from_ragged_zarr_matrix(f,
new_path,
overwrite=True,
dtype=args.dtype,
compressor_name=args.compressor,
compression_level=args.compression_level)
print("Valid conversion:", ld_mat.validate_ld_matrix())
35 changes: 27 additions & 8 deletions magenpy/LDMatrix.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import os.path as osp
import numpy as np
import pandas as pd
import warnings
from scipy.sparse import csr_matrix, identity, triu, diags
from .utils.model_utils import quantize, dequantize

Expand Down Expand Up @@ -712,27 +713,32 @@ def window_size(self):
@property
def n_neighbors(self):
"""
The number of variants in the LD window for each SNP.
!!! seealso "See Also"
* [window_size][magenpy.LDMatrix.LDMatrix.window_size]
!!! note
This includes the variant itself if the matrix is in memory and is symmetric.
:return: The number of variants in the LD window for each SNP.
"""
return self.window_size()

@property
def csr_matrix(self):
"""
:return: The in-memory CSR matrix object.
..note ::
If the LD matrix is not in-memory, then it'll be loaded using default settings.
This means that the matrix will be loaded as upper-triangular matrix with
default data type. To customize the loading, call the `.load(...)` method before
accessing the CSR matrix in this way.
:return: The in-memory CSR matrix object.
"""
if self._mat is None:
warnings.warn("> Warning: Loading LD matrix with default settings. "
"To customize, call the `.load(...)` method before invoking `.csr_matrix`.",
stacklevel=2)
self.load()
return self._mat

Expand Down Expand Up @@ -833,7 +839,20 @@ def set_mask(self, mask):
if self.in_memory:
self.load(force_reload=True,
return_symmetric=self.is_symmetric,
fill_diag=self.is_symmetric)
fill_diag=self.is_symmetric,
dtype=self.dtype)

def reset_mask(self):
"""
Reset the mask to its default value (None).
"""
self._mask = None

if self.in_memory:
self.load(force_reload=True,
return_symmetric=self.is_symmetric,
fill_diag=self.is_symmetric,
dtype=self.dtype)

def to_snp_table(self, col_subset=None):
"""
Expand Down Expand Up @@ -1409,11 +1428,11 @@ def validate_ld_matrix(self):
return True

def __getstate__(self):
return self.store.path, self.in_memory, self.is_symmetric, self._mask
return self.store.path, self.in_memory, self.is_symmetric, self._mask, self.dtype

def __setstate__(self, state):

path, in_mem, is_symmetric, mask = state
path, in_mem, is_symmetric, mask, dtype = state

self._zg = zarr.open_group(path, mode='r')
self.in_memory = in_mem
Expand All @@ -1426,7 +1445,7 @@ def __setstate__(self, state):
self.set_mask(mask)

if in_mem:
self.load(return_symmetric=is_symmetric, fill_diag=is_symmetric)
self.load(return_symmetric=is_symmetric, fill_diag=is_symmetric, dtype=dtype)

def __len__(self):
return self.n_snps
Expand Down
4 changes: 2 additions & 2 deletions magenpy/SumstatsTable.py
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@ def p_value(self):
return self.pval

@property
def log10_p_value(self):
def negative_log10_p_value(self):
"""
:return: The negative log10 of the p-value (-log10(p_value)) of association
test of each variant on the phenotype.
Expand Down Expand Up @@ -623,7 +623,7 @@ def to_table(self, col_subset=None):
elif col == 'PVAL':
table['PVAL'] = self.p_value
elif col == 'LOG10_PVAL':
table['LOG10_PVAL'] = self.log10_p_value
table['NLOG10_PVAL'] = self.negative_log10_p_value
elif col == 'CHISQ':
table['CHISQ'] = self.get_chisq_statistic()
elif col == 'MAF_VAR':
Expand Down
2 changes: 1 addition & 1 deletion magenpy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

from .utils.data_utils import *

__version__ = '0.1.1'
__version__ = '0.1.2'
__release_date__ = 'April 2024'


Expand Down
Loading

0 comments on commit 538fec6

Please sign in to comment.