Export ensemble (#22)

* first commit export-ensemble * added function to export single base learner as Python code * move export_as_file method up to baselearnerorigin * write specific meta_feature_generator and cv_source as well * add stacker.py containing class for XcessivStackedEnsemble. used solely for exporting an ensemble as a Python package. * add fit and _process_using_meta_feature_generator methods to stacker * fix print statements * add temporary builder for ensemble.py * add mote components for buiding ensemble.py * add more components to builder source * finish building builder.py * add explanatory comments * add explanatory comments * add exporting your stacked ensemble to docs * add exporting your stacked ensemble to docs * add "Exporting your stacked ensemble" to docs * update pypi to 0.3.5 and fix python3 string stuff
reiinakano · Jun 4, 2017 · d9e6c5f · d9e6c5f
1 parent ac3fec6
commit d9e6c5f
Show file tree

Hide file tree

Showing 10 changed files with 430 additions and 6 deletions.
diff --git a/docs/thirdparty.rst b/docs/thirdparty.rst
@@ -157,7 +157,7 @@ Once they're in Xcessiv, TPOT pipelines are just regular base learners you can t
 
 Create and finalize a preset Logistic Regression base learner. We'll use this to stack the base learners together.
 
-Let's begin by stacking together the two highest performers. the ExtraTreesClassifier and the KNeighborsClassifier without the original features. Right off the bat, cross-validating on the secondary meta-features yields an accuracy of 0.9975.
+Let's begin by stacking together the two highest performers, the ExtraTreesClassifier and the KNeighborsClassifier without the original features. Right off the bat, cross-validating on the secondary meta-features yields an accuracy of 0.9975.
 
 Going further, let's see if adding the less effective (on its own) Linear SVM will prove useful to our small ensemble. Running it, we get an even better 0.9992 accuracy.
 

diff --git a/docs/walkthrough.rst b/docs/walkthrough.rst
@@ -427,3 +427,46 @@ Here's a complete list of what happens when Xcessiv creates a new ensemble. Note
 And that's it! Try experimenting with more base learners, appending the original features to the meta-features, and even changing the type of your secondary learner. Push that accuracy up as high as you possibly can!
 
 Normally, it would take a lot of extraneous code just to set things up and keep track of everything you try, but Xcessiv takes care of all the dirty work so you can focus solely on the important thing, constructing your ultimate ensemble.
+
+Exporting your stacked ensemble
+-------------------------------
+
+Let's say that after trying out different stacked ensemble combinations, you think you've found the one. It wouldn't be very useful if you didn't have a way to use it on other data to generate predictions. Xcessiv offers a way to convert any stacked ensemble into an importable Python package. Click on the export icon of your chosen ensemble, and enter a unique package name to save your package as.
+
+Give your package name a unique name that conforms to Python package naming conventions. For example, we obviously wouldn't want to name our package "numpy" or "my.package". In this walkthrough, we might save our package as "DigitsDataEnsemble1".
+
+On successful export, Xcessiv will automatically save your package inside your project folder.
+
+Your ensemble can then be imported from :class:`DigitsDataEnsemble1` like this.::
+
+   # Make sure DigitsDataEnsemble1 is importable
+   from DigitsDataEnsemble1 import xcessiv_ensemble
+
+``xcessiv_ensemble`` will then contain a stacked ensemble instance with the methods ``get_params``, ``set_params``, ``fit``, and the ensemble's secondary learner's meta-feature generator method. For example, if your secondary learner's meta-feature generator method is ``predict``, you'll be able to call :func:`xcessiv_ensemble.predict` after fitting.
+
+Here's an example of how you'd normally use an imported ensemble.::
+
+   from DigitsDataEnsemble1 import xcessiv_ensemble
+
+   # Fit all base learners and secondary learner on training data
+   xcessiv_ensemble.fit(X_train, y_train)
+
+   # Generate some predictions on test/unseen data
+   predictions = xcessiv_ensemble.predict(X_test)
+
+Most common use cases for ``xcessiv_ensemble`` will involve using a method other than the configured meta-feature generator. Take the case of using :class:`sklearn.linear_model.LogisticRegression` as our secondary learner. :class:`sklearn.linear_model.LogisticRegression` has both methods :func:`predict` and :func:`predict_proba`, but if our meta-feature generator is set to :func:`predict`, Xcessiv doesn't know :func:`predict_proba` actually exists and only :func:`xcessiv_ensemble.predict` will be a valid method. For these cases, ``xcessiv_ensemble`` exposes a method :func:`_process_using_meta_feature_generator` you can use in the following way.::
+
+   from DigitsDataEnsemble1 import xcessiv_ensemble
+
+   # Fit all base learners and secondary learner on training data
+   xcessiv_ensemble.fit(X_train, y_train)
+
+   # Generate some prediction probabilities on test/unseen data
+   probas = xcessiv_ensemble._process_using_meta_feature_generator(X_test, 'predict_proba')
+
+You'll notice that ``xcessiv_ensemble`` follows the **scikit-learn** interface for estimators. That means you'll be able to use it as its own standalone base learner. If you're crazy enough, you can even try *stacking together already stacked ensembles*. For now, the recommended way of quickly adding your stacked ensemble as a separate base learner is to write something like this in your base learner setup.::
+
+   # Make sure DigitsDataEnsemble1 is importable
+   from DigitsDataEnsemble1 import xcessiv_ensemble
+
+   base_learner = xcessiv_ensemble
diff --git a/setup.py b/setup.py
@@ -33,7 +33,7 @@ def run_tests(self):
 
 setup(
     name='xcessiv',
-    version='0.3.4',
+    version='0.3.5',
     url='https://github.com/reiinakano/xcessiv',
     license='Apache License 2.0',
     author='Reiichiro Nakano',

diff --git a/xcessiv/__init__.py b/xcessiv/__init__.py
@@ -2,7 +2,7 @@
 from flask import Flask
 
 
-__version__ = '0.3.4'
+__version__ = '0.3.5'
 
 
 app = Flask(__name__, static_url_path='/static', static_folder='ui/build/static')

diff --git a/xcessiv/models.py b/xcessiv/models.py
@@ -194,6 +194,28 @@ def cleanup(self, path):
         for learner in self.base_learners:
             learner.cleanup(path)
 
+    def export_as_file(self, filepath, hyperparameters):
+        """Generates a Python file with the importable base learner set to ``hyperparameters``
+
+         This function generates a Python file in the specified file path that contains
+         the base learner as an importable variable stored in ``base_learner``. The base
+         learner will be set to the appropriate  hyperparameters through ``set_params``.
+
+        Args:
+            filepath (str, unicode): File path to save file in
+
+            hyperparameters (dict): Dictionary to use for ``set_params``
+        """
+        if not filepath.endswith('.py'):
+            filepath += '.py'
+
+        file_contents = ''
+        file_contents += self.source
+        file_contents += '\n\nbase_learner.set_params(**{})\n'.format(hyperparameters)
+        file_contents += '\nmeta_feature_generator = "{}"\n'.format(self.meta_feature_generator)
+        with open(filepath, 'wb') as f:
+            f.write(file_contents.encode('utf8'))
+
 
 class AutomatedRun(Base):
     """This table contains initialized/completed automated hyperparameter searches"""
@@ -315,6 +337,18 @@ def cleanup(self, path):
         """
         self.delete_meta_features(path)
 
+    def export_as_file(self, filepath):
+        """Generates a Python file with the importable base learner
+
+         This function generates a Python file in the specified file path that contains
+         the base learner as an importable variable stored in ``base_learner``. The base
+         learner will be set to the appropriate  hyperparameters through ``set_params``.
+
+        Args:
+            filepath (str, unicode): File path to save file in
+        """
+        self.base_learner_origin.export_as_file(filepath, self.hyperparameters)
+
 
 class StackedEnsemble(Base):
     """This table contains StackedEnsembles created in the xcessiv notebook"""
@@ -356,6 +390,94 @@ def return_secondary_learner(self):
         estimator = estimator.set_params(**self.secondary_learner_hyperparameters)
         return estimator
 
+    def export_as_package(self, package_path, cv_source):
+        """Exports the ensemble as a Python package and saves it to `package_path`.
+
+        Args:
+            package_path (str, unicode): Absolute/local path of place to save package in
+
+            cv_source (str, unicode): String containing actual code for base learner
+                cross-validation used to generate secondary meta-features.
+
+        Raises:
+            exceptions.UserError: If os.path.join(path, name) already exists.
+        """
+        if os.path.exists(package_path):
+            raise exceptions.UserError('{} already exists'.format(package_path))
+
+        package_name = os.path.basename(os.path.normpath(package_path))
+
+        os.makedirs(package_path)
+
+        # Write __init__.py
+        with open(os.path.join(package_path, '__init__.py'), 'wb') as f:
+            f.write('from {}.builder import xcessiv_ensemble'.format(package_name).encode('utf8'))
+
+        # Create package baselearners with each base learner having its own module
+        os.makedirs(os.path.join(package_path, 'baselearners'))
+        open(os.path.join(package_path, 'baselearners', '__init__.py'), 'a').close()
+        for idx, base_learner in enumerate(self.base_learners):
+            base_learner.export_as_file(os.path.join(package_path,
+                                                     'baselearners',
+                                                     'baselearner' + str(idx)))
+
+        # Create metalearner.py containing secondary learner
+        self.base_learner_origin.export_as_file(
+            os.path.join(package_path, 'metalearner'),
+            self.secondary_learner_hyperparameters
+        )
+
+        # Create cv.py containing CV method for getting meta-features
+        with open(os.path.join(package_path, 'cv.py'), 'wb') as f:
+            f.write(cv_source.encode('utf8'))
+
+        # Create stacker.py containing class for Xcessiv ensemble
+        ensemble_source = ''
+        stacker_file_loc = os.path.join(os.path.abspath(os.path.dirname(__file__)), 'stacker.py')
+        with open(stacker_file_loc) as f:
+            ensemble_source += f.read()
+
+        ensemble_source += '\n\n' \
+                           '    def {}(self, X):\n' \
+                           '        return self._process_using_' \
+                           'meta_feature_generator(X, "{}")\n\n'\
+            .format(self.base_learner_origin.meta_feature_generator,
+                    self.base_learner_origin.meta_feature_generator)
+
+        with open(os.path.join(package_path, 'stacker.py'), 'wb') as f:
+            f.write(ensemble_source.encode('utf8'))
+
+        # Create builder.py containing file where `xcessiv_ensemble` is instantiated for import
+        builder_source = ''
+
+        for idx, base_learner in enumerate(self.base_learners):
+            builder_source += 'from {}.baselearners import baselearner{}\n'.format(package_name, idx)
+
+        builder_source += 'from {}.cv import return_splits_iterable\n'.format(package_name)
+
+        builder_source += 'from {} import metalearner\n'.format(package_name)
+
+        builder_source += 'from {}.stacker import XcessivStackedEnsemble\n'.format(package_name)
+
+        builder_source += '\nbase_learners = [\n'
+        for idx, base_learner in enumerate(self.base_learners):
+            builder_source += '    baselearner{}.base_learner,\n'.format(idx)
+        builder_source += ']\n'
+
+        builder_source += '\nmeta_feature_generators = [\n'
+        for idx, base_learner in enumerate(self.base_learners):
+            builder_source += '    baselearner{}.meta_feature_generator,\n'.format(idx)
+        builder_source += ']\n'
+
+        builder_source += '\nxcessiv_ensemble = XcessivStackedEnsemble(base_learners=base_learners,' \
+                          ' meta_feature_generators=meta_feature_generators,' \
+                          ' secondary_learner=metalearner.base_learner,' \
+                          ' cv_function=return_splits_iterable,' \
+                          ' append_original={})\n'.format(self.append_original)
+
+        with open(os.path.join(package_path, 'builder.py'), 'wb') as f:
+            f.write(builder_source.encode('utf8'))
+
     @property
     def serialize(self):
         return dict(

diff --git a/xcessiv/stacker.py b/xcessiv/stacker.py
@@ -0,0 +1,112 @@
+from __future__ import absolute_import, print_function, division, unicode_literals
+from sklearn.pipeline import _BasePipeline
+import numpy as np
+
+
+class XcessivStackedEnsemble(_BasePipeline):
+    """Contains the class for the Xcessiv stacked ensemble"""
+    def __init__(self, base_learners, meta_feature_generators,
+                 secondary_learner, cv_function, append_original):
+        super(XcessivStackedEnsemble, self).__init__()
+
+        self.base_learners = base_learners
+        self.meta_feature_generators = meta_feature_generators
+        self.secondary_learner = secondary_learner
+        self.cv_function = cv_function
+        self.append_original = append_original
+        self._named_learners = [('bl{}'.format(idx), base_learner) for idx, base_learner
+                               in enumerate(base_learners)]
+        self._named_learners.append(('secondary-learner', secondary_learner))
+
+    def get_params(self, deep=True):
+        """Get parameters for this estimator.
+
+        Args:
+
+        deep (boolean, optional): If True, will return the parameters for this estimator and
+            contained subobjects that are estimators.
+
+        Returns
+        params: mapping of string to any Parameter names mapped to their values.
+        """
+        return self._get_params('_named_learners', deep=deep)
+
+    def set_params(self, **params):
+        """Set the parameters of this estimator."""
+        self._set_params('_named_learners', **params)
+        return self
+
+    def fit(self, X, y):
+        print('Fitting {} base learners'.format(len(self.base_learners)))
+
+        all_learner_meta_features = []
+        for idx, base_learner in enumerate(self.base_learners):
+
+            single_learner_meta_features = []
+            test_indices = []
+            for num, (train_idx, test_idx) in enumerate(self.cv_function(X, y)):
+                print('Fold {} of base learner {}'.format(num+1, idx+1))
+
+                base_learner.fit(X[train_idx], y[train_idx])
+
+                preds = getattr(base_learner, self.meta_feature_generators[idx])(X[test_idx])
+
+                if len(preds.shape) == 1:
+                    preds = preds.reshape(-1, 1)
+
+                single_learner_meta_features.append(
+                    preds
+                )
+
+                test_indices.append(test_idx)
+
+            single_learner_meta_features = np.concatenate(single_learner_meta_features)
+            all_learner_meta_features.append(single_learner_meta_features)
+
+        all_learner_meta_features = np.concatenate(all_learner_meta_features, axis=1)
+        test_indices = np.concatenate(test_indices)  # reorganized order due to CV
+
+        print('Fitting meta-learner')
+
+        if self.append_original:
+            all_learner_meta_features = np.concatenate(
+                (all_learner_meta_features, X[test_indices]),
+                axis=1
+            )
+
+        self.secondary_learner.fit(all_learner_meta_features, y[test_indices])
+
+        return self
+
+    def _process_using_meta_feature_generator(self, X, meta_feature_generator):
+        """Process using secondary learner meta-feature generator
+
+        Since secondary learner meta-feature generator can be anything e.g. predict, predict_proba,
+        this internal method gives the ability to use any string. Just make sure secondary learner
+        has the method.
+
+        Args:
+            X (array-like): Features array
+
+            meta_feature_generator (str, unicode): Method for use by secondary learner
+        """
+
+        all_learner_meta_features = []
+        for idx, base_learner in enumerate(self.base_learners):
+            single_learner_meta_features = getattr(base_learner,
+                                                   self.meta_feature_generators[idx])(X)
+
+            if len(single_learner_meta_features.shape) == 1:
+                single_learner_meta_features = single_learner_meta_features.reshape(-1, 1)
+            all_learner_meta_features.append(single_learner_meta_features)
+
+        all_learner_meta_features = np.concatenate(all_learner_meta_features, axis=1)
+        if self.append_original:
+            all_learner_meta_features = np.concatenate(
+                (all_learner_meta_features, X),
+                axis=1
+            )
+
+        out = getattr(self.secondary_learner, meta_feature_generator)(all_learner_meta_features)
+
+        return out
diff --git a/xcessiv/ui/src/Ensemble/EnsembleMoreDetailsModal.js b/xcessiv/ui/src/Ensemble/EnsembleMoreDetailsModal.js
@@ -1,7 +1,8 @@
 import React, {Component} from 'react';
 import './Ensemble.css';
 import 'react-select/dist/react-select.css';
-import { Modal, Panel, Button, Alert } from 'react-bootstrap';
+import { Modal, Panel, Button, Alert, Form, 
+  FormGroup, ControlLabel, FormControl } from 'react-bootstrap';
 
 
 function DisplayError(props) {
@@ -104,4 +105,54 @@ export class DeleteModal extends Component {
   }
 }
 
+export class ExportModal extends Component {
+  constructor(props) {
+    super(props);
+    this.state = {
+      name: ''
+    };
+  }
+
+  handleYesAndClose() {
+    this.props.handleYes(this.state.name);
+    this.props.onRequestClose();
+  }
+
+  render() {
+
+    return (
+      <Modal 
+        show={this.props.isOpen} 
+        onHide={this.props.onRequestClose}
+      >
+        <Modal.Header closeButton>
+          <Modal.Title>Export ensemble as Python package</Modal.Title>
+        </Modal.Header>
+        <Modal.Body>
+          <Form onSubmit={(e) => {
+            e.preventDefault();
+            this.handleYesAndClose();
+          }}>
+            <FormGroup
+              controlId='name'
+            >
+              <ControlLabel>Name to use as package name</ControlLabel>
+              <FormControl
+                value={this.state.name} 
+                onChange={(evt) => this.setState({name: evt.target.value})}            
+              />
+            </FormGroup>
+          </Form>
+        </Modal.Body>
+        <Modal.Footer>
+          <Button bsStyle='primary' onClick={() => this.handleYesAndClose()}>
+            Save
+          </Button>
+          <Button onClick={this.props.onRequestClose}>Cancel</Button>
+        </Modal.Footer>
+      </Modal>
+    )
+  }
+}
+
 export default DetailsModal;