Skip to content

Commit

Permalink
pytorch version
Browse files Browse the repository at this point in the history
  • Loading branch information
Koncopd committed Dec 14, 2020
1 parent 51f9ef4 commit dc8801e
Show file tree
Hide file tree
Showing 89 changed files with 7,122 additions and 6,240 deletions.
32 changes: 5 additions & 27 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,28 +1,6 @@

#developing directory
/models/
/checkpoint
/dist/
/test/data/
/examples/
/results/
/data/

#OS or Editor files and folders
.DS_Store
Thumbs.db
.ipynb_checkpoints/
.directory
/.idea/

# Python / Byte-compiled / optimized / DLL
datasets/
.idea/
__pycache__/
*.py[cod]
*.so
.cache
*.h5ad
#others
*.pdf
*.zip
#testing modules
.pytest_cache/
docs/_build
own_tests/
*.egg-info
4 changes: 2 additions & 2 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
version: 2

python:
version: 3.8
version: 3.6
install:
- requirements: docs/requirements.txt
- method: setuptools
Expand All @@ -17,4 +17,4 @@ sphinx:

formats:
- epub
- pdf
- pdf
4 changes: 3 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@ language: python
dist: xenial
cache: pip
python:
- "3.6"
- "3.7"
- "3.8"

install:
- pip install -r requirements.txt
- python setup.py install

script:
- PYTHONPATH=. pytest
- PYTHONPATH=. pytest
54 changes: 46 additions & 8 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,19 +1,57 @@
|PyPI| |travis| |Docs| |PyPIDownloads|
|PyPI| |PyPIDownloads| |Docs| |travis|

scArches - single-cell architecture surgery
scArches (PyTorch) - single-cell architecture surgery
=========================================================================
.. raw:: html

<img src="https://user-images.githubusercontent.com/33202701/89729020-15f7c200-da32-11ea-989b-1b9a3283f642.png" width="900px" align="center">

scArches is a package to integrate newly produced single-cell datasets into integrated reference atlases. Our method can facilitate large collaborative projects with decentralise training and integration of multiple datasets by different groups. scArches is compatible with `scanpy <https://scanpy.readthedocs.io/en/stable/>`_, and hosts efficient implementations of all conditional generative models for single-cell data.
This is a Pytorch version of scArches which can be found `here <https://github.com/theislab/scArches/>`_. scArches is a package to integrate newly produced single-cell datasets into integrated reference atlases. Our method can facilitate large collaborative projects with decentralise training and integration of multiple datasets by different groups. scArches is compatible with `scanpy <https://scanpy.readthedocs.io/en/stable/>`_, and hosts efficient implementations of all conditional generative models for single-cell data.



What can you do with scArches?
-------------------------------
- Integrate many single-cell datasets and share the trained model and the data (if possible).
- Download a pre-trained model for your atlas of interest, update it with new datasets and share with your collaborators.
- Construct a customized reference by downloading a reference atlas, add a few pre-trained adaptors (datasets) and project your own data in to this customized reference atlas.
- Project and integrate query datasets on the top of a reference and use latent representation for downstream tasks, e.g.: diff testing, clustering.
- Construct single or multi-modal (CITE-seq) reference atlases and share the trained model and the data (if possible).
- Download a pre-trained model for your atlas of interest, update it wih new datasets and share with your collaborators.
- Project and integrate query datasets on the top of a reference and use latent representation for downstream tasks, e.g.:diff testing, clustering, classification


What are different models?
---------------
scArches is itself and algorithm to map to project query on the top of reference datasets and is applicable
to different models. Here we provide a short explanation and hints when to use which model. Our models are divided into
three categories:


What are different models?
---------------
scArches is itself and algorithm to map to project query on the top of reference datasets and is applicable
to different models. Here we provide a short explanation and hints when to use which model. Our models are divided into
three categories:

Unsupervised
This class of algortihms need no `cell type` labels, meaning that you can creat a reference and project a query without having access to cell type labeles.
We implemented two algorithms:

- **scVI** (`Lopez et al.,2018 <https://www.nature.com/articles/s41592-018-0229-2>`_.): Requires access to raw counts values for data integration and assumes
count distribution on the data (NB, ZINB, Poission).

- **trVAE** (`Lotfollahi et al.,2019 <https://arxiv.org/abs/1910.01791>`_.): It supports both normalized log tranformed or count data as input and applies additional MMD loss to have better mearging in the latent space.

Supervised and Semi-supervised
This class of algorithmes assume the user has access to `cell type` labels when creating the reference data and usaully perfomr better integration
compared to. unsupervised methods. However, the query data still can be unlabaled. In addition to integration , you can classify your query cells using
these methods.

- **scANVI** (`Xu et al.,2019 <https://www.biorxiv.org/content/10.1101/532895v1>`_.): It neeeds cell type labels for reference data. Your query data can be either unlabeled or labeled. In case of unlabeled query data you can use this method to also classify your query cells using reference labels.

Multi-modal
These algorithms can be used to contstruct multi-modal references atlas and map query data from either modalities on the top of the reference.

- **totalVI** (`Gayoso al.,2019 <https://www.biorxiv.org/content/10.1101/532895v1>`_.): This model can be used to build multi-modal CITE-seq reference atalses.
Query datasets can be either from sc-RNAseq or CITE-seq. In addition to integrating query with reference one can use this model to impute the Proteins
in the query datasets.

Usage and installation
-------------------------------
Expand All @@ -22,7 +60,7 @@ See `here <https://scarches.readthedocs.io/>`_ for documentation and tutorials.
Support and contribute
-------------------------------
If you have a question or new architecture or a model that could be integrated into our pipeline, you can
post an `issue <https://github.com/theislab/scarches/issues/new>`__. Our package supports tf/keras now but pytorch version will be added very soon.
post an `issue <https://github.com/theislab/scarches/issues/new>`__ or reach us by `email <mailto:cottoneyejoe.server@gmail.com,mo.lotfollahi@gmail.com,mohsen.naghipourfar@gmail.com>`_.


Reference
Expand Down
1 change: 1 addition & 0 deletions __init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from . import scarches
64 changes: 44 additions & 20 deletions docs/about.rst
Original file line number Diff line number Diff line change
@@ -1,43 +1,66 @@
|PyPI| |travis| |Docs|

scArches - single-cell architecture surgery
scArches (PyTorch) - single-cell architecture surgery
=========================================================================
.. raw:: html

<img src="https://user-images.githubusercontent.com/33202701/89729020-15f7c200-da32-11ea-989b-1b9a3283f642.png" width="700px" align="center">

scArches is a package to integrate newly produced single-cell datasets into integrated references atlases. Our method can facilitate large collaborative projects with decentralised training and integration of multiple datasets by different groups. scArches is compatible with `scanpy <https://scanpy.readthedocs.io/en/stable/>`_. and hosts efficient implementations of all conditional generative models for single-cell data.



scArches is a package to integrate newly produced single-cell datasets into integrated references atlases. Our method can facilitate large collaborative projects with decentralise training and integration of multiple datasets by different groups. scArches is compatible with `scanpy <https://scanpy.readthedocs.io/en/stable/>`_. and hosts efficient implementations of all conditional generative models for single-cell data.

What can you do with scArches?
--------------------------------
- Integrate many single-cell datasets and share the trained model and the data (if possible).
-------------------------------
- Construct single or multi-modal (CITE-seq) reference atlases and share the trained model and the data (if possible).
- Download a pre-trained model for your atlas of interest, update it wih new datasets and share with your collaborators.
- Construct a customized reference by downloading a reference atlas, add a few pre-trained adaptors (datasets) and project your own data in to this customized reference atlas.
- Project and integrate query datasets on the top of a reference and use latent representation for downstream tasks, e.g.:diff testing, clustering.
- Project and integrate query datasets on the top of a reference and use latent representation for downstream tasks, e.g.:diff testing, clustering, classification

Where to start?
--------------------------------

What are different models?
---------------
scArches is itself and algorithm to map to project query on the top of reference datasets and is applicable
to different models. Here we provide a short explanation and hints when to use which model. Our models are divided into
three categories:

To get a sense of how the model works please go through `this <https://scarches.readthedocs.io/en/latest/pancreas_pipeline.html>`_ example.
For examples on how to use or construct and share pre-trained models check examples.

What is an adaptor?
--------------------------------
.. raw:: html
What are different models?
---------------
scArches is itself and algorithm to map to project query on the top of reference datasets and is applicable
to different models. Here we provide a short explanation and hints when to use which model. Our models are divided into
three categories:

Unsupervised
This class of algortihms need no `cell type` labels, meaning that you can creat a reference and project a query without having access to cell type labeles.
We implemented two algorithms:

- **scVI** (`Lopez et al.,2018 <https://www.nature.com/articles/s41592-018-0229-2>`_.): Requires access to raw counts values for data integration and assumes
count distribution on the data (NB, ZINB, Poission).

<img src="https://user-images.githubusercontent.com/33202701/89730296-bdc6bd00-da3d-11ea-9012-410e22fa200a.png" width="200px" align="right">
- **trVAE** (`Lotfollahi et al.,2019 <https://arxiv.org/abs/1910.01791>`_.): It supports both normalized log tranformed or count data as input and applies additional MMD loss to have better mearging in the latent space.

In scArche, each query datasets is added to the reference model by training a set of weights called `adaptor`.
Each `adaptor` is a sharable object. This will enable users to download a reference model, customise
that reference model with a set of `adaptors` (datasets) and finally add user data as a new
`adaptor` and also share this adaptor for others.
Supervised and Semi-supervised
This class of algorithmes assume the user has access to `cell type` labels when creating the reference data and usaully perfomr better integration
compared to. unsupervised methods. However, the query data still can be unlabaled. In addition to integration , you can classify your query cells using
these methods.

- **scANVI** (`Xu et al.,2019 <https://www.biorxiv.org/content/10.1101/532895v1>`_.): It neeeds cell type labels for reference data. Your query data can be either unlabeled or labeled. In case of unlabeled query data you can use this method to also classify your query cells using reference labels.

Multi-modal
These algorithms can be used to contstruct multi-modal references atlas and map query data from either modalities on the top of the reference.

- **totalVI** (`Gayoso al.,2019 <https://www.biorxiv.org/content/10.1101/532895v1>`_.): This model can be used to build multi-modal CITE-seq reference atalses.
Query datasets can be either from sc-RNAseq or CITE-seq. In addition to integrating query with reference one can use this model to impute the Proteins
in the query datasets.


Where to start?
---------------
To get a sense of how the model works please go through `this <https://scarches.readthedocs.io/en/latest/pancreas_pipeline.html>`__ tutorial.
To find out how to construct and share or use pre-trained models example sections. Check `this <https://scarches.readthedocs.io/en/latest/zenodo_intestine.html>`__ example to learn how to start with a raw data and pre-process data for the model.

Reference
-------------------------------
If scArches is useful in your research, please consider to cite the `preprint <https://www.biorxiv.org/content/10.1101/2020.07.16.205997v1/>`_.


.. |PyPI| image:: https://img.shields.io/pypi/v/scarches.svg
Expand All @@ -51,3 +74,4 @@ that reference model with a set of `adaptors` (datasets) and finally add user da

.. |travis| image:: https://travis-ci.com/theislab/scarches.svg?branch=master
:target: https://travis-ci.com/theislab/scarches

2 changes: 1 addition & 1 deletion docs/api/data.rst → docs/api/dataset.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Data Processing
===============

.. automodule:: scarches.data
.. automodule:: scarches.dataset
:members:
:undoc-members:
:show-inheritance:
7 changes: 0 additions & 7 deletions docs/api/datasets.rst

This file was deleted.

10 changes: 4 additions & 6 deletions docs/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,15 @@ The API reference contains detailed descriptions of the different end-user class
This API reference only contains end-user documentation.
If you are looking to hack away at scArches' internals, you will find more detailed comments in the source code.

Import scArches as::
Import scarches as::

import scarches as sca

After reading the data (``sca.data.read``), you can normalize your data with our ``sca.data.normalize_hvg`` function.
Then, you can instantiate one of the implemented models from ``sca.models`` module (currently we support ``scArches``,
``scArches``, ``scArchesNB``, and ``scArchesZINB``) and train it on your dataset. Finally, after training a model on your task, You can
share your trained model via ``sca.zenodo`` functions. Multiple examples are provided in `here`.
After reading the data (``sca.data.read``), you can you can instantiate one of the implemented models from ``sca.models`` module (currently we support ``trVAE``,
``scVI``, ``scANVI``, and ``TotalVI``) and train it on your dataset.

.. toctree::
:glob:
:maxdepth: 2

*
*
33 changes: 30 additions & 3 deletions docs/api/models.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,39 @@
Models
======

* `scArches`_
* `trVAE`_
* `scVI`_
* `scANVI`_
* `TotalVI`_

scArches
trVAE
-----

.. autoclass:: scarches.models.TRVAE
:members:
:undoc-members:
:show-inheritance:

scVI
----

.. autoclass:: scarches.models.SCVI
:members:
:undoc-members:
:show-inheritance:

scANVI
--------

.. autoclass:: scarches.models.SCANVI
:members:
:undoc-members:
:show-inheritance:

TotalVI
--------

.. autoclass:: scarches.models.scArches
.. autoclass:: scarches.models.TOTALVI
:members:
:undoc-members:
:show-inheritance:
7 changes: 0 additions & 7 deletions docs/api/utils.rst

This file was deleted.

2 changes: 1 addition & 1 deletion docs/api/zenodo.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ File Helpers
.. automodule:: scarches.zenodo.file
:members:
:undoc-members:
:show-inheritance:
:show-inheritance:
16 changes: 7 additions & 9 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,11 @@

notebooks_url = 'https://github.com/theislab/scarches/raw/master/notebooks/'
notebooks = [
'zenodo_pancreas_from_pretrained.ipynb',
'zenodo_pancreas_from_scratch.ipynb',
'pancreas_pipeline.ipynb',
'zenodo_intestine.ipynb'

'scanvi_surgery_pipeline.ipynb',
'scvi_surgery_pipeline.ipynb',
'totalvi_surgery_pipeline.ipynb',
'trvae_surgery_pipeline.ipynb',
'trVAE_zenodo_pipeline.ipynb'
]

for nb in notebooks:
Expand All @@ -50,11 +50,9 @@
# -- Project information -----------------------------------------------------

project = 'scArches'
copyright = f'{datetime.now():%Y}, Mohsen Naghipourfar, Mohammad Lotfollahi'
author = 'Mohsen Naghipourfar, Mohammad Lotfollahi'
author = 'Marco Wagenstetter, Mohammad Lotfollahi, Mohsen Naghipourfar, Sergei Rybakov'
copyright = f'{datetime.now():%Y}, ' + author

# version = scarches.__version__
# release = version
pygments_style = 'sphinx'
todo_include_todos = True
html_theme_options = dict(navigation_depth=3, titles_only=False)
Expand Down
Loading

0 comments on commit dc8801e

Please sign in to comment.