Skip to content

Commit

Permalink
Merge pull request #53 from KevinMenden/docker-fix
Browse files Browse the repository at this point in the history
Release 0.9.6
  • Loading branch information
KevinMenden authored Dec 16, 2020
2 parents efab7ca + f5757cc commit bef6c44
Show file tree
Hide file tree
Showing 13 changed files with 86 additions and 78 deletions.
4 changes: 2 additions & 2 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"python.pythonPath": "/home/kevin/anaconda3/envs/scaden/bin/python",
"python.linting.pylintEnabled": false,
"python.linting.pylintEnabled": true,
"python.linting.enabled": true,
"python.linting.flake8Enabled": true
"python.linting.flake8Enabled": false
}
9 changes: 5 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
FROM continuumio/miniconda3
FROM ubuntu

COPY environment.yml /
RUN conda env create -f /environment.yml && conda clean -a
ENV PATH /opt/conda/envs/scaden/bin:$PATH
RUN apt-get update && apt-get upgrade -y
RUN apt-get install python3 -y
RUN apt-get install python3-pip -y
RUN pip3 install scaden
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,13 @@
![Scaden](docs/img/scaden_logo.png)

![MIT](https://anaconda.org/bioconda/scaden/badges/license.svg)
![Install with Bioconda](https://anaconda.org/bioconda/scaden/badges/installer/conda.svg)

![Scaden version](https://img.shields.io/badge/scaden-v0.9.5-cyan)
![MIT](https://img.shields.io/badge/License-MIT-black)
![Install with pip](https://img.shields.io/badge/Install%20with-pip-blue)
![Install with Bioconda](https://img.shields.io/badge/Install%20with-conda-green)
![Docker build](https://img.shields.io/docker/cloud/build/kevinmenden/scaden)
![Downloads](https://static.pepy.tech/personalized-badge/scaden?period=total&units=international_system&left_color=blue&right_color=green&left_text=Downloads)

## Single-cell assisted deconvolutional network

Scaden is a deep-learning based algorithm for cell type deconvolution of bulk RNA-seq samples. It was developed
Expand Down
31 changes: 31 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Changelog

### Version 0.9.6
+ fixed Dockerfile (switched to pip installation)
+ added better error messages to `simulate` command
+ cleaned up dependencies

### Version 0.9.5
+ added `scaden simulate` command to perform bulk simulation and training file creation
+ added `--seed` parameter to allow reproducible Scaden runs

### Version 0.9.4
+ fixed dependencies (added python>=3.6 requirement)

### Version 0.9.3
+ upgrade to Tensorflow 2
+ cleaned up dependencies

### Version 0.9.2
+ RAM usage improvement

### Version 0.9.1
+ Added automatic removal of duplicate genes in Mixture file
+ Changed name of final prediction file
+ Added Scaden logo to main script

### Version 0.9.0
This is the initial release version of Scaden. While this version contains full functionality for pre-processing, training and prediction, it does not
contain thorough error messages, plotting functionality and a solid helper function for generation training data. These are all features
planned for the release of v.1.0.0.
The core functionality of Scaden is, however, implemented and fully operational. Please check the [Usage](usage) section to learn how to use Scaden.
32 changes: 0 additions & 32 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,35 +8,3 @@ at the [DZNE Tübingen](https://www.dzne.de/en/about-us/sites/tuebingen/) and th

A pre-print describing the method is available on Biorxiv:
[Deep-learning-based cell composition analysis from tissue expression profiles](https://www.biorxiv.org/content/10.1101/659227v1)





## Changelog

### Version 0.9.5
+ added `scaden simulate` command to perform bulk simulation and training file creation
+ added `--seed` parameter to allow reproducible Scaden runs

### Version 0.9.4
+ fixed dependencies (added python>=3.6 requirement)

### Version 0.9.3
+ upgrade to Tensorflow 2
+ cleaned up dependencies

### Version 0.9.2
+ RAM usage improvement

### Version 0.9.1
+ Added automatic removal of duplicate genes in Mixture file
+ Changed name of final prediction file
+ Added Scaden logo to main script


### Version 0.9.0
This is the initial release version of Scaden. While this version contains full functionality for pre-processing, training and prediction, it does not
contain thorough error messages, plotting functionality and a solid helper function for generation training data. These are all features
planned for the release of v.1.0.0.
The core functionality of Scaden is, however, implemented and fully operational. Please check the [Usage](usage) section to learn how to use Scaden.
14 changes: 6 additions & 8 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,16 @@ Scaden be easily installed on a Linux system, and should also work on Mac.
There are currently two options for installing Scaden, either using [Bioconda](https://bioconda.github.io/) or via [pip](https://pypi.org/).


## Bioconda
Installation via Bioconda is the preferred route of installation, and we highly recommend using conda. To install Scaden, use:

`conda install -c bioconda scaden`
## pip
To install Scaden via pip, simply run the following command:

It is always recommended to create a separate conda environment for installation.
`pip install scaden`


## pip
If you don't want to use conda, you can also install Scaden using pip:
## Bioconda
You can also install Scaden via bioconda, using::

`pip install scaden`
`conda install -c bioconda scaden`


## Docker
Expand Down
13 changes: 9 additions & 4 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,10 +96,9 @@ Once you have done this, you can use Scaden's command `scaden simulate` to gener
The first step is to process your scRNA-seq dataset(s) you want to use for training. I used Scanpy for this, and would therefore
recommend to do the same, but you can of course use other software for this purpose. I've uploaded the scripts I used to preprocess
the data used for the Scaden paper [here](https://doi.org/10.6084/m9.figshare.8234030.v1). Mainly you have to normalize your count data
and create a file containing the cell type labels. The file for the cell type labels should be of size (n x 2), where n is the number of cells
you have in your data. The two columns correspond to a label for your cells, and a 'Celltype' column. In fact, the only necessary column is the 'Celltype'
column, which Scaden uses to extract the information. The count data should be of size (n x g), where g is the number of genes and n is the number of samples.
The order must be the same as for the cell type labels.
and create a file containing the cell type labels.
The file for the cell type labels should be of size (n x 1), where n is the number of cells
you have in your data. The single column in this file should be labeled 'Celltype'. You can have extra columns if you like, as long as you have a 'Celltype' column which specifies the cell type label in the correct order. The count data should be of size (n x g), where g is the number of genes and n is the number of samples. The order must be the same as for the cell type labels.

#### Bulk simulation
Once the data is processed, you can use the command `scaden simulate` to generate your artificial bulk samples for training.
Expand All @@ -116,6 +115,12 @@ As example, you can generate 1000 artificial bulk samples from 100 cells per sam
scaden simulate --cells 100 --n_samples 1000 --data <data_directory> --pattern <your_pattern>
```

An example for a pattern would be `*_counts.txt`. This pattern would find the following dataset:
* `dataset_counts.txt`
* `dataset_celltypes.txt`

Make sure to include an `*` in your pattern!

This command will create the artificial samples in the current working directory. You can also specificy an output directory using the `--out` parameter. Scaden will also directly create a .h5ad file in this directory, which is the file you will need for training. By default, this file will be called `data.h5ad`, however you can change the prefix using the `--prefix` flag.


Expand Down
10 changes: 0 additions & 10 deletions environment.yml

This file was deleted.

1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ nav:
- Installation: installation.md
- Usage: usage.md
- Datasets: datasets.md
- Changelog: changelog.md
theme: readthedocs
2 changes: 1 addition & 1 deletion scaden/model/scaden.py
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,7 @@ def train(self, input_path, train_datasets):
pd.DataFrame(self.sig_genes).to_csv(self.model_dir + "/genes.txt", sep="\t")


def predict(self, input_path, out_name="cdn_predictions.txt"):
def predict(self, input_path, out_name="scaden_predictions.txt"):
"""
Perform prediction with a pre-trained model
:param out_dir: path to store results in
Expand Down
25 changes: 18 additions & 7 deletions scaden/preprocessing/bulk_simulation.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,22 @@ def filter_matrix_signature(mat, genes):
mat = mat[genes]
return mat

def load_celltypes(path, name):
""" Load the cell type information """
try:
y = pd.read_table(path)
# Check if has Celltype column
if not 'Celltype' in y.columns:
logger.error(f"No 'Celltype' column found in {name}_celltypes.txt! Please make sure to include this column.")
sys.exit()
except FileNotFoundError as e:
logger.error(f"No celltypes file found for {name}. It should be called {name}_celltypes.txt.")
sys.exit(e)

return y




def load_dataset(name, dir, pattern):
"""
Expand All @@ -172,12 +188,7 @@ def load_dataset(name, dir, pattern):
pattern = pattern.replace("*", "")
print("Loading " + name + " dataset ...")

try:
y = pd.read_table(dir + name + "_celltypes.txt")
except FileNotFoundError as e:
logger.error(f"No celltypes file found for {name}. It should be called {name}_celltypes.txt.")
sys.exit()

y = load_celltypes(dir + name + "_celltypes.txt", name)
x = pd.read_table(dir + name + pattern, index_col=0)

return (x, y)
Expand Down Expand Up @@ -285,7 +296,7 @@ def simulate_bulk(
datasets = [x.split("_")[0] for x in files]

if len(datasets) == 0:
logging.error("No datasetes fround! Have you specified the pattern correctly?")
logging.error("No datasets fround! Have you specified the pattern correctly?")
sys.exit()

print("Datasets: " + str(datasets))
Expand Down
2 changes: 1 addition & 1 deletion scaden/preprocessing/create_h5ad_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def parse_data(x_path, y_path):
x = pd.read_table(x_path, sep="\t")
y = pd.read_table(y_path, sep="\t")
except FileNotFoundError as e:
logging.error(f"Could not find simulated data files: {e}")
logging.error(f" Could not find simulated data files: {e}")
sys.exit()
labels = list(y.columns)

Expand Down
11 changes: 4 additions & 7 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

from setuptools import setup, find_packages

version = '0.9.5'
version = '0.9.6'


with open("README.md", "r", encoding="UTF-8") as fh:
Expand Down Expand Up @@ -30,13 +30,10 @@
'pandas',
'numpy',
'scikit-learn',
'scipy',
'tensorflow>=2.0',
'anndata',
'tqdm',
'click'
],
extras_require = {
'scanpy': ["scanpy", "matplotlib", "seaborn"]
}
'click',
'h5py~=2.10.0'
]
)

0 comments on commit bef6c44

Please sign in to comment.