Merge branch 'master' into issue-1293

gregcaporaso · Jun 7, 2016 · bfcbde2 · bfcbde2
2 parents 30d73c7 + f627315
commit bfcbde2
Show file tree

Hide file tree

Showing 202 changed files with 5,067 additions and 4,848 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -5,7 +5,6 @@ language: python
 env:
   - PYTHON_VERSION=3.5 MAKE_DOC=TRUE
   - PYTHON_VERSION=3.4 USE_CYTHON=TRUE
-  - PYTHON_VERSION=2.7
 before_install:
   - "export DISPLAY=:99.0"
   - "sh -e /etc/init.d/xvfb start"

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,18 +2,51 @@
 
 ## Version 0.4.2-dev (changes since 0.4.2 go here)
 
+**IMPORTANT**: scikit-bio is no longer compatible with Python 2. scikit-bio is compatible with Python 3.4 and later.
+
 ### Features
+* Added support for reference tags to `skbio.io.format.stockholm` reader and writer. ([#1348](https://github.com/biocore/scikit-bio/issues/1348))
+* Expanded error message in `skbio.io.format.stockholm` reader when `constructor` is not passed, in order to provide better explanation to user. ([#1327](https://github.com/biocore/scikit-bio/issues/1327))
 * Added `skbio.sequence.distance.kmer_distance` for computing the kmer distance between two sequences. ([#913](https://github.com/biocore/scikit-bio/issues/913))
+* Added `skbio.sequence.Sequence.replace` for assigning a character to positions in a `Sequence`. ([#1222](https://github.com/biocore/scikit-bio/issues/1222))
+* Added support for `pandas.RangeIndex`, lowering the memory footprint of default integer index objects. `Sequence.positional_metadata` and `TabularMSA.positional_metadata` now use `pd.RangeIndex` as the positional metadata index. `TabularMSA` now uses `pd.RangeIndex` as the default index. Usage of `pd.RangeIndex` over the previous `pd.Int64Index` [should be transparent](http://pandas.pydata.org/pandas-docs/version/0.18.0/whatsnew.html#range-index), so these changes should be non-breaking to users. scikit-bio now depends on pandas >= 0.18.0 ([#1308](https://github.com/biocore/scikit-bio/issues/1308))
+* Added `reset_index=False` parameter to `TabularMSA.append` and `TabularMSA.extend` for resetting the MSA's index to the default index after appending/extending.
+* Added support for partial pairwise calculations via `skbio.diversity.partial_beta_diversity`. ([#1221](https://github.com/biocore/scikit-bio/issues/1221), [#1337](https://github.com/biocore/scikit-bio/pull/1337)). This function is immediately deprecated as its return type will change in the future and should be used with caution in its present form (see the function's documentation for details).
+* `TemporaryFile` and `NamedTemporaryFile` are now supported IO sources for `skbio.io` and related functionality.  ([#1291](https://github.com/biocore/scikit-bio/issues/1291))
+* Added `tree_node_class=TreeNode` parameter to `skbio.tree.majority_rule` to support returning consensus trees of type `TreeNode` (the default) or a type that has the same interface as `TreeNode` (e.g. `TreeNode` subclasses) ([#1193](https://github.com/biocore/scikit-bio/pull/1193))
+* `TreeNode.from_linkage_matrix` and `TreeNode.from_taxonomy` now support constructing `TreeNode` subclasses. `TreeNode.bifurcate` now supports `TreeNode` subclasses ([#1193](https://github.com/biocore/scikit-bio/pull/1193))
 
 ### Backward-incompatible changes [stable]
 
 ### Backward-incompatible changes [experimental]
+* `TabularMSA.append` and `TabularMSA.extend` now require one of `minter`, `index`, or `reset_index` to be provided when incorporating new sequences into an MSA. Previous behavior was to auto-increment the index labels if `minter` and `index` weren't provided and the MSA had a default integer index, otherwise error. Use `reset_index=True` to obtain the previous behavior in a more explicit way.
 
 ### Bug fixes
+* Fixed row and column names to `biplot_scores` in the `OrdinationResults` object from `skbio.stats.ordination`. This fix affect the `cca` and `rda` methods. ([#1322](https://github.com/biocore/scikit-bio/issues/1322))
+* Fixed bug when using `skbio.io.format.stockholm` reader on file with multi-line tree with no id. Previously this raised an `AttributeError`, now it correctly handles this type of tree. ([#1334](https://github.com/biocore/scikit-bio/issues/1334))
+* Fixed bug when reading Stockholm files with GF or GS features split over multiple lines. Previously, the feature text was simply concatenated because it was assumed to have trailing whitespace. There are examples of Stockholm files with and without trailing whitespace for multi-line features, so the `skbio.io.format.stockholm` reader now adds a single space when concatenating feature text without trailing whitespace to avoid joining words together. Multi-line trees stored as GF metadata are concatenated as they appear in the file; a space is not added when concatenating. ([#1328](https://github.com/biocore/scikit-bio/issues/1328))
 * Fixed bug when using `Sequence.iter_kmers` on empty `Sequence` object. Previously this raised a `ValueError`, now it returns
 an empty generator.
+* Fixed minor bug where adding sequences to an empty `TabularMSA` with MSA-wide `positional_metadata` would result in a `TabularMSA` object in an inconsistent state. This could happen using `TabularMSA.append` or `TabularMSA.extend`. This bug only affects a `TabularMSA` object *without* sequences that has MSA-wide `positional_metadata` (for example, `TabularMSA([], positional_metadata={'column': []})`).
+* `TreeNode.distance` now handles the situation in which `self` or `other` are ancestors. Previosly, a node further up the tree was used resulting in inflated distances. ([#807](https://github.com/biocore/scikit-bio/issues/807))
+* `TreeNode.prune` can now handle a root with a single descendent. Previously, the root was ignored from possibly having a single descendent. ([#1247](https://github.com/biocore/scikit-bio/issues/1247))
+* Providing the `format` keyword to `skbio.io.read` when creating a generator with an empty file will now return an empty generator instead of raising `StopIteration`. ([#1313](https://github.com/biocore/scikit-bio/issues/1313))
+
+### Deprecated functionality [stable]
+* Deprecated use of the term "non-degenerate", in favor of "definite". `GrammaredSequence.nondegenerate_chars`, `GrammaredSequence.nondegenerates`, and `GrammaredSequence.has_nondegenerates` have been renamed to `GrammaredSequence.definite_chars`, `GrammaredSequence.definites`, and `GrammaredSequence.has_definites`, respectively. The old names will be removed in scikit-bio 0.5.2. Relevant affected public classes include `GrammaredSequence`, `DNA`, `RNA`, and `Protein`.
+* Deprecated `Sequence.has_metadata` and `TabularMSA.has_metadata` methods, which will be removed in scikit-bio 0.5.2. Use `bool(obj.metadata)` to determine if the metadata dict is empty.
+* Deprecated `Sequence.has_positional_metadata` and `TabularMSA.has_positional_metadata` methods, which will be removed in scikit-bio 0.5.2. Use `len(obj.positional_metadata.columns)` to determine if positional metadata columns are present, or `obj.positional_metadata.empty` to determine if the positional metadata DataFrame is empty (empty index OR empty columns).
+
+### Deprecated functionality [experimental]
+* Deprecated function `skbio.util.create_dir`. This function will be removed in scikit-bio 0.5.1. Please use the Python standard library
+functionality described [here](https://docs.python.org/2/library/os.html#os.makedirs). ([#833](https://github.com/biocore/scikit-bio/issues/833))
+* Deprecated function `skbio.util.remove_files`. This function will be removed in scikit-bio 0.5.1. Please use the Python standard library
+functionality described [here](https://docs.python.org/2/library/os.html#os.remove). ([#833](https://github.com/biocore/scikit-bio/issues/833))
+* Deprecated function `skbio.util.is_casava_v180_or_later`. This function will be removed in 0.5.1. Functionality moved to FASTQ sniffer.
+([#833](https://github.com/biocore/scikit-bio/issues/833))
 
 ### Miscellaneous
+* When installing scikit-bio via `pip`, numpy must now be installed first ([#1296](https://github.com/biocore/scikit-bio/issues/1296))
 
 ## Version 0.4.2 (2016-02-17)
 

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -55,8 +55,6 @@ Submitting code to scikit-bio
 
 scikit-bio is hosted on [GitHub](http://www.github.com), and we use GitHub's [Pull Request](https://help.github.com/articles/using-pull-requests) mechanism for reviewing and accepting submissions. You should work through the following steps to submit code to scikit-bio.
 
-**Note:** We recommend developing scikit-bio in a Python 3 environment because doctests must be written (and pass) in Python 3. See [Setting up a development environment](#setting-up-a-development-environment).
-
 1. Begin by [creating an issue](https://github.com/biocore/scikit-bio/issues) describing your proposed change (see [Types of contributions](#types-of-contributions) for details).
 
 2. [Fork](https://help.github.com/articles/fork-a-repo) the scikit-bio repository on the GitHub website.
@@ -109,11 +107,13 @@ scikit-bio is hosted on [GitHub](http://www.github.com), and we use GitHub's [Pu
 Setting up a development environment
 ------------------------------------
 
+**Note:** scikit-bio must be developed in a Python 3.4 or later environment.
+
 The recommended way to set up a development environment for contributing to scikit-bio is using [Anaconda](https://store.continuum.io/cshop/anaconda/) by Continuum Analytics, with its associated command line utility `conda`. The primary benefit of `conda` over `pip` is that on some operating systems (ie Linux), `pip` installs packages from source. This can take a very long time to install Numpy, scipy, matplotlib, etc. `conda` installs these packages using pre-built binaries, so the installation is much faster. Another benefit of `conda` is that it provides both package and environment management, which removes the necessity of using `virtualenv` separately. Not all packages are available using `conda`, therefore our strategy is to install as many packages as possible using `conda`, then install any remaining packages using `pip`.
 
 1. Install Anaconda
 
- See [Continuum's site](https://store.continuum.io/cshop/anaconda/) for instructions. [Miniconda](http://conda.pydata.org/docs/install/quick.html) provides a great fast way to get conda up and running.
+ See [Continuum's site](https://store.continuum.io/cshop/anaconda/) for instructions. [Miniconda](http://conda.pydata.org/docs/install/quick.html) provides a fast way to get conda up and running.
 
 2. Create a new conda environment
  ```
@@ -169,14 +169,13 @@ Testing guidelines
 
 All code that is added to scikit-bio must be unit tested, and the unit test code must be submitted in the same pull request as the library code that you are submitting. We will only merge code that is unit tested and that passes the [continuous integration build](https://github.com/biocore/scikit-bio/blob/master/.travis.yml). This build includes, but is not limited to, the following checks:
 
-- Full unit test suite executes without errors in Python 2 and 3.
-- Doctests execute correctly in Python 3.
+- Full unit test suite and doctests execute without errors in supported versions of Python 3.
 - C code can be correctly compiled.
 - Cython code is correctly generated.
 - All tests import functionality from the appropriate minimally deep API.
 - Documentation can be built.
 - Current code coverage is maintained or improved.
-- Code passes ``pep8``/``flake8`` checks.
+- Code passes ``flake8`` checks.
 
 Running ``make test`` locally during development will include a subset of the full checks performed by Travis-CI.
 

diff --git a/README.rst b/README.rst
@@ -3,32 +3,41 @@
    :target: http://scikit-bio.org
    :alt: scikit-bio logo
 
-|Build Status| |Coverage Status| |Gitter Badge|
+|Build Status| |Coverage Status| |ASV Benchmarks| |Gitter Badge| |Depsy Badge| |Anaconda Cloud Build| |Anaconda Cloud| |License| |Downloads| |Install|
 
-scikit-bio is an open-source, BSD-licensed Python package providing data structures, algorithms and educational resources for bioinformatics.
+scikit-bio is an open-source, BSD-licensed Python 3 package providing data structures, algorithms and educational resources for bioinformatics.
 
 To view scikit-bio's documentation, visit `scikit-bio.org
 <http://scikit-bio.org>`__.
 
+**Note:** scikit-bio is no longer compatible with Python 2. scikit-bio is compatible with Python 3.4 and later.
+
 scikit-bio is currently in beta. We are very actively developing it, and **backward-incompatible interface changes can and will arise**. To avoid these types of changes being a surprise to our users, our public APIs are decorated to make it clear to users when an API can be relied upon (stable) and when it may be subject to change (experimental). See the `API stability docs <https://github.com/biocore/scikit-bio/blob/master/doc/source/user/api_stability.rst>`_ for more details, including what we mean by *stable* and *experimental* in this context.
 
 Installing
 ----------
 
-To install the latest release of scikit-bio::
+The recommended way to install scikit-bio is via the ``conda`` package manager available in `Anaconda <http://continuum.io/downloads>`_ or `miniconda <http://conda.pydata.org/miniconda.html>`_.
 
-    pip install scikit-bio
-
-Equivalently, you can use the ``conda`` package manager available in `Anaconda <http://continuum.io/downloads>`_ or `miniconda <http://conda.pydata.org/miniconda.html>`_ to install scikit-bio and its dependencies without having to compile them::
+To install the latest release of scikit-bio::
 
     conda install -c https://conda.anaconda.org/biocore scikit-bio
 
-Finally, most of scikit-bio's dependencies (in particular, the ones that are trickier to build) are also available, albeit only for Python 2, in `Canopy Express <https://www.enthought.com/canopy-express/>`_.
+Alternatively, you can install scikit-bio using ``pip``::
+
+    pip install numpy
+    pip install scikit-bio
 
 You can verify your installation by running the scikit-bio unit tests::
 
     python -m skbio.test
 
+For users of Debian, ``skbio`` is in the Debian software distribution and may
+be installed using::
+
+    sudo apt-get install python3-skbio python-skbio-doc
+
+
 Getting help
 ------------
 
@@ -102,6 +111,20 @@ scikit-bio's logo was created by `Alina Prassas <http://cargocollective.com/alin
    :target: https://travis-ci.org/biocore/scikit-bio
 .. |Coverage Status| image:: https://coveralls.io/repos/biocore/scikit-bio/badge.png
    :target: https://coveralls.io/r/biocore/scikit-bio
+.. |ASV Benchmarks| image:: http://img.shields.io/badge/benchmarked%20by-asv-green.svg?style=flat
+   :target: https://s3-us-west-2.amazonaws.com/scikit-bio.org/benchmarks/master/index.html
 .. |Gitter Badge| image:: https://badges.gitter.im/Join%20Chat.svg
    :alt: Join the chat at https://gitter.im/biocore/scikit-bio
    :target: https://gitter.im/biocore/scikit-bio?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge
+.. |Depsy Badge| image:: http://depsy.org/api/package/pypi/scikit-bio/badge.svg
+   :target: http://depsy.org/package/python/scikit-bio
+.. |Anaconda Cloud Build| image:: https://anaconda.org/biocore/scikit-bio/badges/build.svg
+   :target: https://anaconda.org/biocore/scikit-bio/builds
+.. |Anaconda Cloud| image:: https://anaconda.org/biocore/scikit-bio/badges/version.svg
+   :target: https://anaconda.org/biocore/scikit-bio
+.. |License| image:: https://anaconda.org/biocore/scikit-bio/badges/license.svg
+   :target: https://anaconda.org/biocore/scikit-bio
+.. |Downloads| image:: https://anaconda.org/biocore/scikit-bio/badges/downloads.svg
+   :target: https://anaconda.org/biocore/scikit-bio
+.. |Install| image:: https://anaconda.org/biocore/scikit-bio/badges/installer/conda.svg
+   :target: https://conda.anaconda.org/biocore