pytorch version

theislab · Dec 14, 2020 · dc8801e · dc8801e
1 parent 51f9ef4
commit dc8801e
Show file tree

Hide file tree

Showing 89 changed files with 7,122 additions and 6,240 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,28 +1,6 @@
-
-#developing directory
-/models/
-/checkpoint
-/dist/
-/test/data/
-/examples/
-/results/
-/data/
-
-#OS or Editor files and folders
-.DS_Store
-Thumbs.db
-.ipynb_checkpoints/
-.directory
-/.idea/
-
-# Python / Byte-compiled / optimized / DLL
+datasets/
+.idea/
 __pycache__/
-*.py[cod]
-*.so
-.cache
-*.h5ad
-#others
-*.pdf
-*.zip
-#testing modules
-.pytest_cache/
+docs/_build
+own_tests/
+*.egg-info
diff --git a/.readthedocs.yml b/.readthedocs.yml
@@ -3,7 +3,7 @@
 version: 2
 
 python:
-   version: 3.8
+   version: 3.6
    install:
       - requirements: docs/requirements.txt
       - method: setuptools
@@ -17,4 +17,4 @@ sphinx:
 
 formats:
     - epub
-    - pdf
+    - pdf
diff --git a/.travis.yml b/.travis.yml
@@ -2,11 +2,13 @@ language: python
 dist: xenial
 cache: pip
 python:
+  - "3.6"
+  - "3.7"
   - "3.8"
 
 install:
   - pip install -r requirements.txt
   - python setup.py install
 
 script:
-  - PYTHONPATH=. pytest
+  - PYTHONPATH=. pytest
diff --git a/README.rst b/README.rst
@@ -1,19 +1,57 @@
-|PyPI| |travis| |Docs| |PyPIDownloads|
+|PyPI| |PyPIDownloads| |Docs| |travis|
 
-scArches - single-cell architecture surgery
+scArches (PyTorch) - single-cell architecture surgery
 =========================================================================
 .. raw:: html
 
  <img src="https://user-images.githubusercontent.com/33202701/89729020-15f7c200-da32-11ea-989b-1b9a3283f642.png" width="900px" align="center">
 
-scArches is a package to integrate newly produced single-cell datasets into integrated reference atlases. Our method can facilitate large collaborative projects with decentralise training and integration of multiple datasets by different groups. scArches is compatible with `scanpy <https://scanpy.readthedocs.io/en/stable/>`_, and hosts efficient implementations of all conditional generative models for single-cell data.
+This is a Pytorch version of scArches which can be found `here <https://github.com/theislab/scArches/>`_. scArches is a package to integrate newly produced single-cell datasets into integrated reference atlases. Our method can facilitate large collaborative projects with decentralise training and integration of multiple datasets by different groups. scArches is compatible with `scanpy <https://scanpy.readthedocs.io/en/stable/>`_, and hosts efficient implementations of all conditional generative models for single-cell data.
+
+
 
 What can you do with scArches?
 -------------------------------
-- Integrate many single-cell datasets and share the trained model and the data (if possible).
-- Download a pre-trained model for your atlas of interest, update it with new datasets and share with your collaborators.
-- Construct a customized reference by downloading a reference atlas, add a few  pre-trained adaptors (datasets) and project your own data in to this customized reference atlas.
-- Project and integrate query datasets on the top of a reference and use latent representation for downstream tasks, e.g.: diff testing, clustering.
+- Construct single or multi-modal (CITE-seq) reference atlases and share the trained model and the data (if possible).
+- Download a pre-trained model for your atlas of interest, update it wih new datasets and share with your collaborators.
+- Project and integrate query datasets on the top of a reference and use latent representation for downstream tasks, e.g.:diff testing, clustering, classification
+
+
+What are different models?
+---------------
+scArches is itself and algorithm to map to project query on the top of reference datasets and is applicable
+to different models. Here we provide a short explanation and hints when to use which model. Our models are divided into
+three categories:
+
+
+What are different models?
+---------------
+scArches is itself and algorithm to map to project query on the top of reference datasets and is applicable
+to different models. Here we provide a short explanation and hints when to use which model. Our models are divided into
+three categories:
+
+Unsupervised
+ This class of algortihms need no `cell type` labels, meaning that you can creat a reference and project a query without having access to cell type labeles.
+ We implemented two algorithms:
+
+ - **scVI**  (`Lopez et al.,2018 <https://www.nature.com/articles/s41592-018-0229-2>`_.): Requires access to raw counts values for data integration and assumes
+ count distribution on the data (NB, ZINB, Poission).
+
+ - **trVAE** (`Lotfollahi et al.,2019 <https://arxiv.org/abs/1910.01791>`_.): It supports both normalized log tranformed or count data as input and applies additional MMD loss to have better mearging in the latent space.
+
+Supervised and Semi-supervised
+ This class of algorithmes assume the user has access to `cell type` labels when creating the reference data and usaully perfomr better integration
+ compared to. unsupervised methods. However, the query data still can be unlabaled. In addition to integration , you can classify your query cells using
+ these methods.
+
+ - **scANVI** (`Xu et al.,2019 <https://www.biorxiv.org/content/10.1101/532895v1>`_.): It neeeds cell type labels for reference data. Your query data can be either   unlabeled or labeled. In case of unlabeled query data you can use this method to also classify your query cells using reference labels.
+
+Multi-modal
+ These algorithms can be used to contstruct multi-modal references atlas and map query data from either modalities on the top of the reference.
+
+ - **totalVI** (`Gayoso al.,2019 <https://www.biorxiv.org/content/10.1101/532895v1>`_.): This model can be used to build multi-modal  CITE-seq reference atalses.
+   Query datasets can be either from sc-RNAseq or CITE-seq. In addition to integrating query with reference one can use this model to impute the Proteins
+   in the query datasets.
 
 Usage and installation
 -------------------------------
@@ -22,7 +60,7 @@ See `here <https://scarches.readthedocs.io/>`_ for documentation and tutorials.
 Support and contribute
 -------------------------------
 If you have a question or new architecture or a model that could be integrated into our pipeline, you can
-post an `issue <https://github.com/theislab/scarches/issues/new>`__. Our package supports tf/keras now but pytorch version will be added very soon.
+post an `issue <https://github.com/theislab/scarches/issues/new>`__ or reach us by `email <mailto:cottoneyejoe.server@gmail.com,mo.lotfollahi@gmail.com,mohsen.naghipourfar@gmail.com>`_.
 
 
 Reference

diff --git a/__init__.py b/__init__.py
@@ -0,0 +1 @@
+from . import scarches
diff --git a/docs/about.rst b/docs/about.rst
@@ -1,43 +1,66 @@
 |PyPI| |travis| |Docs|
 
-scArches - single-cell architecture surgery
+scArches (PyTorch) - single-cell architecture surgery
 =========================================================================
 .. raw:: html
 
  <img src="https://user-images.githubusercontent.com/33202701/89729020-15f7c200-da32-11ea-989b-1b9a3283f642.png" width="700px" align="center">
 
+scArches is a package to integrate newly produced single-cell datasets into integrated references atlases. Our method can facilitate large collaborative projects with decentralised training and integration of multiple datasets by different groups. scArches is compatible with `scanpy <https://scanpy.readthedocs.io/en/stable/>`_. and hosts efficient implementations of all conditional generative models for single-cell data.
 
 
-
-scArches is a package to integrate newly produced single-cell datasets into integrated references atlases. Our method can facilitate large collaborative projects with decentralise training and integration of multiple datasets by different groups. scArches is compatible with `scanpy <https://scanpy.readthedocs.io/en/stable/>`_. and hosts efficient implementations of all conditional generative models for single-cell data. 
-
 What can you do with scArches?
---------------------------------
-- Integrate many single-cell datasets and share the trained model and the data (if possible).
+-------------------------------
+- Construct single or multi-modal (CITE-seq) reference atlases and share the trained model and the data (if possible).
 - Download a pre-trained model for your atlas of interest, update it wih new datasets and share with your collaborators.
-- Construct a customized reference by downloading a reference atlas, add a few  pre-trained adaptors (datasets) and project your own data in to this customized reference atlas.
-- Project and integrate query datasets on the top of a reference and use latent representation for downstream tasks, e.g.:diff testing, clustering.
+- Project and integrate query datasets on the top of a reference and use latent representation for downstream tasks, e.g.:diff testing, clustering, classification
 
-Where to start?
---------------------------------
 
+What are different models?
+---------------
+scArches is itself and algorithm to map to project query on the top of reference datasets and is applicable
+to different models. Here we provide a short explanation and hints when to use which model. Our models are divided into
+three categories:
 
-To get a sense of how the model works please go through `this <https://scarches.readthedocs.io/en/latest/pancreas_pipeline.html>`_ example.
-For examples on how to use or construct and share pre-trained models check examples.
 
-What is an adaptor?
---------------------------------
-.. raw:: html
+What are different models?
+---------------
+scArches is itself and algorithm to map to project query on the top of reference datasets and is applicable
+to different models. Here we provide a short explanation and hints when to use which model. Our models are divided into
+three categories:
+
+Unsupervised
+ This class of algortihms need no `cell type` labels, meaning that you can creat a reference and project a query without having access to cell type labeles.
+ We implemented two algorithms:
+
+ - **scVI**  (`Lopez et al.,2018 <https://www.nature.com/articles/s41592-018-0229-2>`_.): Requires access to raw counts values for data integration and assumes
+ count distribution on the data (NB, ZINB, Poission).
 
-    <img src="https://user-images.githubusercontent.com/33202701/89730296-bdc6bd00-da3d-11ea-9012-410e22fa200a.png" width="200px" align="right">
+ - **trVAE** (`Lotfollahi et al.,2019 <https://arxiv.org/abs/1910.01791>`_.): It supports both normalized log tranformed or count data as input and applies additional MMD loss to have better mearging in the latent space.
 
-In scArche, each query datasets is added to the reference model by training a set of weights called `adaptor`.
-Each `adaptor` is a sharable object. This will enable users to download a reference model, customise
-that reference model with a set of `adaptors` (datasets) and finally add user data as a new
-`adaptor` and also share this adaptor for others.
+Supervised and Semi-supervised
+ This class of algorithmes assume the user has access to `cell type` labels when creating the reference data and usaully perfomr better integration
+ compared to. unsupervised methods. However, the query data still can be unlabaled. In addition to integration , you can classify your query cells using
+ these methods.
 
+ - **scANVI** (`Xu et al.,2019 <https://www.biorxiv.org/content/10.1101/532895v1>`_.): It neeeds cell type labels for reference data. Your query data can be either   unlabeled or labeled. In case of unlabeled query data you can use this method to also classify your query cells using reference labels.
 
+Multi-modal
+ These algorithms can be used to contstruct multi-modal references atlas and map query data from either modalities on the top of the reference.
 
+ - **totalVI** (`Gayoso al.,2019 <https://www.biorxiv.org/content/10.1101/532895v1>`_.): This model can be used to build multi-modal  CITE-seq reference atalses.
+   Query datasets can be either from sc-RNAseq or CITE-seq. In addition to integrating query with reference one can use this model to impute the Proteins
+   in the query datasets.
+
+
+Where to start?
+---------------
+To get a sense of how the model works please go through `this <https://scarches.readthedocs.io/en/latest/pancreas_pipeline.html>`__ tutorial.
+To find out how to construct and share or use pre-trained models example sections. Check `this <https://scarches.readthedocs.io/en/latest/zenodo_intestine.html>`__ example to learn how to start with a raw data  and pre-process data for the model.
+
+Reference
+-------------------------------
+If scArches is useful in your research, please consider to cite the `preprint <https://www.biorxiv.org/content/10.1101/2020.07.16.205997v1/>`_.
 
 
 .. |PyPI| image:: https://img.shields.io/pypi/v/scarches.svg
@@ -51,3 +74,4 @@ that reference model with a set of `adaptors` (datasets) and finally add user da
 
 .. |travis| image:: https://travis-ci.com/theislab/scarches.svg?branch=master
     :target: https://travis-ci.com/theislab/scarches
+
diff --git a/docs/api/data.rst → docs/api/dataset.rst b/docs/api/data.rst → docs/api/dataset.rst
@@ -1,7 +1,7 @@
 Data Processing
 ===============
 
-.. automodule:: scarches.data
+.. automodule:: scarches.dataset
     :members:
     :undoc-members:
     :show-inheritance:
diff --git a/docs/api/datasets.rst b/docs/api/datasets.rst
diff --git a/docs/api/index.rst b/docs/api/index.rst
@@ -9,17 +9,15 @@ The API reference contains detailed descriptions of the different end-user class
     This API reference only contains end-user documentation.
     If you are looking to hack away at scArches' internals, you will find more detailed comments in the source code.
 
-Import scArches as::
+Import scarches as::
 
     import scarches as sca
 
-After reading the data (``sca.data.read``), you can normalize your data with our ``sca.data.normalize_hvg`` function.
-Then, you can instantiate one of the implemented models from ``sca.models`` module (currently we support ``scArches``,
-``scArches``, ``scArchesNB``, and ``scArchesZINB``) and train it on your dataset. Finally, after training a model on your task, You can
-share your trained model via ``sca.zenodo`` functions. Multiple examples are provided in `here`.
+After reading the data (``sca.data.read``), you can you can instantiate one of the implemented models from ``sca.models`` module (currently we support ``trVAE``,
+``scVI``, ``scANVI``, and ``TotalVI``) and train it on your dataset.
 
 .. toctree::
     :glob:
     :maxdepth: 2
 
-    *
+    *
diff --git a/docs/api/models.rst b/docs/api/models.rst
@@ -1,12 +1,39 @@
 Models
 ======
 
-* `scArches`_
+* `trVAE`_
+* `scVI`_
+* `scANVI`_
+* `TotalVI`_
 
-scArches
+trVAE
+-----
+
+.. autoclass:: scarches.models.TRVAE
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+scVI
+----
+
+.. autoclass:: scarches.models.SCVI
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+scANVI
+--------
+
+.. autoclass:: scarches.models.SCANVI
+    :members:
+    :undoc-members:
+    :show-inheritance:
+
+TotalVI
 --------
 
-.. autoclass:: scarches.models.scArches
+.. autoclass:: scarches.models.TOTALVI
     :members:
     :undoc-members:
     :show-inheritance:
diff --git a/docs/api/utils.rst b/docs/api/utils.rst
diff --git a/docs/api/zenodo.rst b/docs/api/zenodo.rst
@@ -23,4 +23,4 @@ File Helpers
 .. automodule:: scarches.zenodo.file
     :members:
     :undoc-members:
-    :show-inheritance:
+    :show-inheritance:
diff --git a/docs/conf.py b/docs/conf.py
@@ -34,11 +34,11 @@
 
 notebooks_url = 'https://github.com/theislab/scarches/raw/master/notebooks/'
 notebooks = [
-    'zenodo_pancreas_from_pretrained.ipynb',
-    'zenodo_pancreas_from_scratch.ipynb',
-    'pancreas_pipeline.ipynb',
-    'zenodo_intestine.ipynb'
-
+    'scanvi_surgery_pipeline.ipynb',
+    'scvi_surgery_pipeline.ipynb',
+    'totalvi_surgery_pipeline.ipynb',
+    'trvae_surgery_pipeline.ipynb',
+    'trVAE_zenodo_pipeline.ipynb'
 ]
 
 for nb in notebooks:
@@ -50,11 +50,9 @@
 # -- Project information -----------------------------------------------------
 
 project = 'scArches'
-copyright = f'{datetime.now():%Y}, Mohsen Naghipourfar, Mohammad Lotfollahi'
-author = 'Mohsen Naghipourfar, Mohammad Lotfollahi'
+author = 'Marco Wagenstetter, Mohammad Lotfollahi, Mohsen Naghipourfar, Sergei Rybakov'
+copyright = f'{datetime.now():%Y}, ' + author
 
-# version = scarches.__version__
-# release = version
 pygments_style = 'sphinx'
 todo_include_todos = True
 html_theme_options = dict(navigation_depth=3, titles_only=False)