merge upstream/master

simonjayhawkins · Feb 10, 2019 · 0e581ad · 0e581ad
1 parent b12f658
commit 0e581ad
Show file tree

Hide file tree

Showing 313 changed files with 8,137 additions and 7,393 deletions.
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -8,16 +8,16 @@ Our main contributing guide can be found [in this repo](https://github.com/panda
 
 If you are looking to contribute to the *pandas* codebase, the best place to start is the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues). This is also a great place for filing bug reports and making suggestions for ways in which we can improve the code and documentation.
 
-If you have additional questions, feel free to ask them on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). Further information can also be found in the "[Where to start?](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#where-to-start)" section.
+If you have additional questions, feel free to ask them on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas). Further information can also be found in the "[Where to start?](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#where-to-start)" section.
 
 ## Filing Issues
 
-If you notice a bug in the code or documentation, or have suggestions for how we can improve either, feel free to create an issue on the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) using [GitHub's "issue" form](https://github.com/pandas-dev/pandas/issues/new). The form contains some questions that will help us best address your issue. For more information regarding how to file issues against *pandas*, please refer to the "[Bug reports and enhancement requests](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#bug-reports-and-enhancement-requests)" section.
+If you notice a bug in the code or documentation, or have suggestions for how we can improve either, feel free to create an issue on the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) using [GitHub's "issue" form](https://github.com/pandas-dev/pandas/issues/new). The form contains some questions that will help us best address your issue. For more information regarding how to file issues against *pandas*, please refer to the "[Bug reports and enhancement requests](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#bug-reports-and-enhancement-requests)" section.
 
 ## Contributing to the Codebase
 
-The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to the "[Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#working-with-the-code)" section.
+The code is hosted on [GitHub](https://www.github.com/pandas-dev/pandas), so you will need to use [Git](http://git-scm.com/) to clone the project and make changes to the codebase. Once you have obtained a copy of the code, you should create a development environment that is separate from your existing Python environment so that you can make and test changes without compromising your own work environment. For more information, please refer to the "[Working with the code](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#working-with-the-code)" section.
 
-Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites in the "[Test-driven development/code writing](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#test-driven-development-code-writing)" section. We also have guidelines regarding coding style that will be enforced during testing, which can be found in the "[Code standards](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#code-standards)" section.
+Before submitting your changes for review, make sure to check that your changes do not break any tests. You can find more information about our test suites in the "[Test-driven development/code writing](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#test-driven-development-code-writing)" section. We also have guidelines regarding coding style that will be enforced during testing, which can be found in the "[Code standards](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#code-standards)" section.
 
-Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request. Details about how to do that can be found in the "[Contributing your changes to pandas](https://github.com/pandas-dev/pandas/blob/master/doc/source/contributing.rst#contributing-your-changes-to-pandas)" section. We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's ready, we will merge it, and you will have successfully contributed to the codebase!
+Once your changes are ready to be submitted, make sure to push your changes to GitHub before creating a pull request. Details about how to do that can be found in the "[Contributing your changes to pandas](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst#contributing-your-changes-to-pandas)" section. We will review your changes, and you will most likely be asked to make additional changes before it is finally ready to merge. However, once it's ready, we will merge it, and you will have successfully contributed to the codebase!
diff --git a/.gitignore b/.gitignore
@@ -101,14 +101,14 @@ asv_bench/pandas/
 # Documentation generated files #
 #################################
 doc/source/generated
-doc/source/api/generated
+doc/source/user_guide/styled.xlsx
+doc/source/reference/api
 doc/source/_static
 doc/source/vbench
 doc/source/vbench.rst
 doc/source/index.rst
 doc/build/html/index.html
 # Windows specific leftover:
 doc/tmp.sv
-doc/source/styled.xlsx
 env/
 doc/source/savefig/
diff --git a/Makefile b/Makefile
@@ -23,4 +23,3 @@ doc:
 	cd doc; \
 	python make.py clean; \
 	python make.py html
-	python make.py spellcheck
diff --git a/asv_bench/benchmarks/__init__.py b/asv_bench/benchmarks/__init__.py
@@ -0,0 +1 @@
+"""Pandas benchmarks."""
diff --git a/asv_bench/benchmarks/algorithms.py b/asv_bench/benchmarks/algorithms.py
@@ -5,7 +5,6 @@
 import pandas as pd
 from pandas.util import testing as tm
 
-
 for imp in ['pandas.util', 'pandas.tools.hashing']:
     try:
         hashing = import_module(imp)
@@ -142,4 +141,4 @@ def time_quantile(self, quantile, interpolation, dtype):
         self.idx.quantile(quantile, interpolation=interpolation)
 
 
-from .pandas_vb_common import setup  # noqa: F401
+from .pandas_vb_common import setup  # noqa: F401 isort:skip
diff --git a/asv_bench/benchmarks/categoricals.py b/asv_bench/benchmarks/categoricals.py
@@ -223,12 +223,19 @@ class CategoricalSlicing(object):
 
     def setup(self, index):
         N = 10**6
-        values = list('a' * N + 'b' * N + 'c' * N)
-        indices = {
-            'monotonic_incr': pd.Categorical(values),
-            'monotonic_decr': pd.Categorical(reversed(values)),
-            'non_monotonic': pd.Categorical(list('abc' * N))}
-        self.data = indices[index]
+        categories = ['a', 'b', 'c']
+        values = [0] * N + [1] * N + [2] * N
+        if index == 'monotonic_incr':
+            self.data = pd.Categorical.from_codes(values,
+                                                  categories=categories)
+        elif index == 'monotonic_decr':
+            self.data = pd.Categorical.from_codes(list(reversed(values)),
+                                                  categories=categories)
+        elif index == 'non_monotonic':
+            self.data = pd.Categorical.from_codes([0, 1, 2] * N,
+                                                  categories=categories)
+        else:
+            raise ValueError('Invalid index param: {}'.format(index))
 
         self.scalar = 10000
         self.list = list(range(10000))

diff --git a/asv_bench/benchmarks/ctors.py b/asv_bench/benchmarks/ctors.py
@@ -72,7 +72,7 @@ class SeriesDtypesConstructors(object):
 
     def setup(self):
         N = 10**4
-        self.arr = np.random.randn(N, N)
+        self.arr = np.random.randn(N)
         self.arr_str = np.array(['foo', 'bar', 'baz'], dtype=object)
         self.s = Series([Timestamp('20110101'), Timestamp('20120101'),
                          Timestamp('20130101')] * N * 10)

diff --git a/asv_bench/benchmarks/index_object.py b/asv_bench/benchmarks/index_object.py
@@ -138,7 +138,8 @@ def setup(self, dtype):
         self.sorted = self.idx.sort_values()
         half = N // 2
         self.non_unique = self.idx[:half].append(self.idx[:half])
-        self.non_unique_sorted = self.sorted[:half].append(self.sorted[:half])
+        self.non_unique_sorted = (self.sorted[:half].append(self.sorted[:half])
+                                  .sort_values())
         self.key = self.sorted[N // 4]
 
     def time_boolean_array(self, dtype):

diff --git a/asv_bench/benchmarks/strings.py b/asv_bench/benchmarks/strings.py
@@ -102,10 +102,10 @@ def setup(self, repeats):
         N = 10**5
         self.s = Series(tm.makeStringIndex(N))
         repeat = {'int': 1, 'array': np.random.randint(1, 3, N)}
-        self.repeat = repeat[repeats]
+        self.values = repeat[repeats]
 
     def time_repeat(self, repeats):
-        self.s.str.repeat(self.repeat)
+        self.s.str.repeat(self.values)
 
 
 class Cat(object):

diff --git a/azure-pipelines.yml b/azure-pipelines.yml
@@ -104,7 +104,7 @@ jobs:
       if git diff upstream/master --name-only | grep -q "^asv_bench/"; then
           cd asv_bench
           asv machine --yes
-          ASV_OUTPUT="$(asv dev)"
+          ASV_OUTPUT="$(asv run --quick --show-stderr --python=same --launch-method=spawn)"
           if [[ $(echo "$ASV_OUTPUT" | grep "failed") ]]; then
               echo "##vso[task.logissue type=error]Benchmarks run with errors"
               echo "$ASV_OUTPUT"

diff --git a/ci/code_checks.sh b/ci/code_checks.sh
@@ -93,7 +93,7 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then
     # this particular codebase (e.g. src/headers, src/klib, src/msgpack). However,
     # we can lint all header files since they aren't "generated" like C files are.
     MSG='Linting .c and .h' ; echo $MSG
-    cpplint --quiet --extensions=c,h --headers=h --recursive --filter=-readability/casting,-runtime/int,-build/include_subdir pandas/_libs/src/*.h pandas/_libs/src/parser pandas/_libs/ujson pandas/_libs/tslibs/src/datetime
+    cpplint --quiet --extensions=c,h --headers=h --recursive --filter=-readability/casting,-runtime/int,-build/include_subdir pandas/_libs/src/*.h pandas/_libs/src/parser pandas/_libs/ujson pandas/_libs/tslibs/src/datetime pandas/io/msgpack pandas/_libs/*.cpp pandas/util
     RET=$(($RET + $?)) ; echo $MSG "DONE"
 
     echo "isort --version-number"
@@ -174,9 +174,10 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
     MSG='Check that no file in the repo contains tailing whitespaces' ; echo $MSG
     set -o pipefail
     if [[ "$AZURE" == "true" ]]; then
-        ! grep -n --exclude="*.svg" -RI "\s$" * | awk -F ":" '{print "##vso[task.logissue type=error;sourcepath=" $1 ";linenumber=" $2 ";] Tailing whitespaces found: " $3}'
+        # we exclude all c/cpp files as the c/cpp files of pandas code base are tested when Linting .c and .h files
+        ! grep -n '--exclude=*.'{svg,c,cpp,html} -RI "\s$" * | awk -F ":" '{print "##vso[task.logissue type=error;sourcepath=" $1 ";linenumber=" $2 ";] Tailing whitespaces found: " $3}'
     else
-        ! grep -n --exclude="*.svg" -RI "\s$" * | awk -F ":" '{print $1 ":" $2 ":Tailing whitespaces found: " $3}'
+        ! grep -n '--exclude=*.'{svg,c,cpp,html}  -RI "\s$" * | awk -F ":" '{print $1 ":" $2 ":Tailing whitespaces found: " $3}'
     fi
     RET=$(($RET + $?)) ; echo $MSG "DONE"
 fi
@@ -206,7 +207,7 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
 
     MSG='Doctests frame.py' ; echo $MSG
     pytest -q --doctest-modules pandas/core/frame.py \
-        -k"-axes -combine -itertuples -join -pivot_table -query -reindex -reindex_axis -round"
+        -k" -itertuples -join -reindex -reindex_axis -round"
     RET=$(($RET + $?)) ; echo $MSG "DONE"
 
     MSG='Doctests series.py' ; echo $MSG
@@ -240,8 +241,8 @@ fi
 ### DOCSTRINGS ###
 if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
 
-    MSG='Validate docstrings (GL06, GL07, GL09, SS04, PR03, PR05, EX04)' ; echo $MSG
-    $BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL06,GL07,GL09,SS04,PR03,PR05,EX04
+    MSG='Validate docstrings (GL06, GL07, GL09, SS04, PR03, PR05, PR10, EX04, RT04, SS05, SA05)' ; echo $MSG
+    $BASE_DIR/scripts/validate_docstrings.py --format=azure --errors=GL06,GL07,GL09,SS04,PR03,PR05,EX04,RT04,SS05,SA05
     RET=$(($RET + $?)) ; echo $MSG "DONE"
 
 fi

diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet.pdf b/doc/cheatsheet/Pandas_Cheat_Sheet.pdf
diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet.pptx b/doc/cheatsheet/Pandas_Cheat_Sheet.pptx
diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pdf
diff --git a/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx b/doc/cheatsheet/Pandas_Cheat_Sheet_JA.pptx
diff --git a/doc/make.py b/doc/make.py
@@ -15,15 +15,18 @@
 import sys
 import os
 import shutil
+import csv
 import subprocess
 import argparse
 import webbrowser
+import docutils
+import docutils.parsers.rst
 
 
 DOC_PATH = os.path.dirname(os.path.abspath(__file__))
 SOURCE_PATH = os.path.join(DOC_PATH, 'source')
 BUILD_PATH = os.path.join(DOC_PATH, 'build')
-BUILD_DIRS = ['doctrees', 'html', 'latex', 'plots', '_static', '_templates']
+REDIRECTS_FILE = os.path.join(DOC_PATH, 'redirects.csv')
 
 
 class DocBuilder:
@@ -50,7 +53,7 @@ def __init__(self, num_jobs=0, include_api=True, single_doc=None,
         if single_doc and single_doc.endswith('.rst'):
             self.single_doc_html = os.path.splitext(single_doc)[0] + '.html'
         elif single_doc:
-            self.single_doc_html = 'api/generated/pandas.{}.html'.format(
+            self.single_doc_html = 'reference/api/pandas.{}.html'.format(
                 single_doc)
 
     def _process_single_doc(self, single_doc):
@@ -60,7 +63,7 @@ def _process_single_doc(self, single_doc):
 
         For example, categorial.rst or pandas.DataFrame.head. For the latter,
         return the corresponding file path
-        (e.g. generated/pandas.DataFrame.head.rst).
+        (e.g. reference/api/pandas.DataFrame.head.rst).
         """
         base_name, extension = os.path.splitext(single_doc)
         if extension in ('.rst', '.ipynb'):
@@ -118,8 +121,6 @@ def _sphinx_build(self, kind):
             raise ValueError('kind must be html or latex, '
                              'not {}'.format(kind))
 
-        self.clean()
-
         cmd = ['sphinx-build', '-b', kind]
         if self.num_jobs:
             cmd += ['-j', str(self.num_jobs)]
@@ -139,6 +140,77 @@ def _open_browser(self, single_doc_html):
                            single_doc_html)
         webbrowser.open(url, new=2)
 
+    def _get_page_title(self, page):
+        """
+        Open the rst file `page` and extract its title.
+        """
+        fname = os.path.join(SOURCE_PATH, '{}.rst'.format(page))
+        option_parser = docutils.frontend.OptionParser(
+            components=(docutils.parsers.rst.Parser,))
+        doc = docutils.utils.new_document(
+            '<doc>',
+            option_parser.get_default_values())
+        with open(fname) as f:
+            data = f.read()
+
+        parser = docutils.parsers.rst.Parser()
+        # do not generate any warning when parsing the rst
+        with open(os.devnull, 'a') as f:
+            doc.reporter.stream = f
+            parser.parse(data, doc)
+
+        section = next(node for node in doc.children
+                       if isinstance(node, docutils.nodes.section))
+        title = next(node for node in section.children
+                     if isinstance(node, docutils.nodes.title))
+
+        return title.astext()
+
+    def _add_redirects(self):
+        """
+        Create in the build directory an html file with a redirect,
+        for every row in REDIRECTS_FILE.
+        """
+        html = '''
+        <html>
+            <head>
+                <meta http-equiv="refresh" content="0;URL={url}"/>
+            </head>
+            <body>
+                <p>
+                    The page has been moved to <a href="{url}">{title}</a>
+                </p>
+            </body>
+        <html>
+        '''
+        with open(REDIRECTS_FILE) as mapping_fd:
+            reader = csv.reader(mapping_fd)
+            for row in reader:
+                if not row or row[0].strip().startswith('#'):
+                    continue
+
+                path = os.path.join(BUILD_PATH,
+                                    'html',
+                                    *row[0].split('/')) + '.html'
+
+                try:
+                    title = self._get_page_title(row[1])
+                except Exception:
+                    # the file can be an ipynb and not an rst, or docutils
+                    # may not be able to read the rst because it has some
+                    # sphinx specific stuff
+                    title = 'this page'
+
+                if os.path.exists(path):
+                    raise RuntimeError((
+                        'Redirection would overwrite an existing file: '
+                        '{}').format(path))
+
+                with open(path, 'w') as moved_page_fd:
+                    moved_page_fd.write(
+                        html.format(url='{}.html'.format(row[1]),
+                                    title=title))
+
     def html(self):
         """
         Build HTML documentation.
@@ -150,6 +222,8 @@ def html(self):
 
         if self.single_doc_html is not None:
             self._open_browser(self.single_doc_html)
+        else:
+            self._add_redirects()
         return ret_code
 
     def latex(self, force=False):
@@ -184,7 +258,7 @@ def clean():
         Clean documentation generated files.
         """
         shutil.rmtree(BUILD_PATH, ignore_errors=True)
-        shutil.rmtree(os.path.join(SOURCE_PATH, 'api', 'generated'),
+        shutil.rmtree(os.path.join(SOURCE_PATH, 'reference', 'api'),
                       ignore_errors=True)
 
     def zip_html(self):