Skip to content

Commit

Permalink
Export ensemble (#22)
Browse files Browse the repository at this point in the history
* first commit export-ensemble

* added function to export single base learner as Python code

* move export_as_file method up to baselearnerorigin

* write specific meta_feature_generator and cv_source as well

* add stacker.py containing class for XcessivStackedEnsemble. used solely for exporting an ensemble as a Python package.

* add fit and _process_using_meta_feature_generator methods to stacker

* fix print statements

* add temporary builder for ensemble.py

* add mote components for buiding ensemble.py

* add more components to builder source

* finish building builder.py

* add explanatory comments

* add explanatory comments

* add exporting your stacked ensemble to docs

* add exporting your stacked ensemble to docs

* add "Exporting your stacked ensemble" to docs

* update pypi to 0.3.5 and fix python3 string stuff
  • Loading branch information
reiinakano authored Jun 4, 2017
1 parent ac3fec6 commit d9e6c5f
Show file tree
Hide file tree
Showing 10 changed files with 430 additions and 6 deletions.
2 changes: 1 addition & 1 deletion docs/thirdparty.rst
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ Once they're in Xcessiv, TPOT pipelines are just regular base learners you can t

Create and finalize a preset Logistic Regression base learner. We'll use this to stack the base learners together.

Let's begin by stacking together the two highest performers. the ExtraTreesClassifier and the KNeighborsClassifier without the original features. Right off the bat, cross-validating on the secondary meta-features yields an accuracy of 0.9975.
Let's begin by stacking together the two highest performers, the ExtraTreesClassifier and the KNeighborsClassifier without the original features. Right off the bat, cross-validating on the secondary meta-features yields an accuracy of 0.9975.

Going further, let's see if adding the less effective (on its own) Linear SVM will prove useful to our small ensemble. Running it, we get an even better 0.9992 accuracy.

Expand Down
43 changes: 43 additions & 0 deletions docs/walkthrough.rst
Original file line number Diff line number Diff line change
Expand Up @@ -427,3 +427,46 @@ Here's a complete list of what happens when Xcessiv creates a new ensemble. Note
And that's it! Try experimenting with more base learners, appending the original features to the meta-features, and even changing the type of your secondary learner. Push that accuracy up as high as you possibly can!

Normally, it would take a lot of extraneous code just to set things up and keep track of everything you try, but Xcessiv takes care of all the dirty work so you can focus solely on the important thing, constructing your ultimate ensemble.

Exporting your stacked ensemble
-------------------------------

Let's say that after trying out different stacked ensemble combinations, you think you've found the one. It wouldn't be very useful if you didn't have a way to use it on other data to generate predictions. Xcessiv offers a way to convert any stacked ensemble into an importable Python package. Click on the export icon of your chosen ensemble, and enter a unique package name to save your package as.

Give your package name a unique name that conforms to Python package naming conventions. For example, we obviously wouldn't want to name our package "numpy" or "my.package". In this walkthrough, we might save our package as "DigitsDataEnsemble1".

On successful export, Xcessiv will automatically save your package inside your project folder.

Your ensemble can then be imported from :class:`DigitsDataEnsemble1` like this.::

# Make sure DigitsDataEnsemble1 is importable
from DigitsDataEnsemble1 import xcessiv_ensemble

``xcessiv_ensemble`` will then contain a stacked ensemble instance with the methods ``get_params``, ``set_params``, ``fit``, and the ensemble's secondary learner's meta-feature generator method. For example, if your secondary learner's meta-feature generator method is ``predict``, you'll be able to call :func:`xcessiv_ensemble.predict` after fitting.

Here's an example of how you'd normally use an imported ensemble.::

from DigitsDataEnsemble1 import xcessiv_ensemble

# Fit all base learners and secondary learner on training data
xcessiv_ensemble.fit(X_train, y_train)

# Generate some predictions on test/unseen data
predictions = xcessiv_ensemble.predict(X_test)

Most common use cases for ``xcessiv_ensemble`` will involve using a method other than the configured meta-feature generator. Take the case of using :class:`sklearn.linear_model.LogisticRegression` as our secondary learner. :class:`sklearn.linear_model.LogisticRegression` has both methods :func:`predict` and :func:`predict_proba`, but if our meta-feature generator is set to :func:`predict`, Xcessiv doesn't know :func:`predict_proba` actually exists and only :func:`xcessiv_ensemble.predict` will be a valid method. For these cases, ``xcessiv_ensemble`` exposes a method :func:`_process_using_meta_feature_generator` you can use in the following way.::

from DigitsDataEnsemble1 import xcessiv_ensemble

# Fit all base learners and secondary learner on training data
xcessiv_ensemble.fit(X_train, y_train)

# Generate some prediction probabilities on test/unseen data
probas = xcessiv_ensemble._process_using_meta_feature_generator(X_test, 'predict_proba')

You'll notice that ``xcessiv_ensemble`` follows the **scikit-learn** interface for estimators. That means you'll be able to use it as its own standalone base learner. If you're crazy enough, you can even try *stacking together already stacked ensembles*. For now, the recommended way of quickly adding your stacked ensemble as a separate base learner is to write something like this in your base learner setup.::

# Make sure DigitsDataEnsemble1 is importable
from DigitsDataEnsemble1 import xcessiv_ensemble

base_learner = xcessiv_ensemble
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def run_tests(self):

setup(
name='xcessiv',
version='0.3.4',
version='0.3.5',
url='https://github.com/reiinakano/xcessiv',
license='Apache License 2.0',
author='Reiichiro Nakano',
Expand Down
2 changes: 1 addition & 1 deletion xcessiv/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
from flask import Flask


__version__ = '0.3.4'
__version__ = '0.3.5'


app = Flask(__name__, static_url_path='/static', static_folder='ui/build/static')
Expand Down
122 changes: 122 additions & 0 deletions xcessiv/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,28 @@ def cleanup(self, path):
for learner in self.base_learners:
learner.cleanup(path)

def export_as_file(self, filepath, hyperparameters):
"""Generates a Python file with the importable base learner set to ``hyperparameters``
This function generates a Python file in the specified file path that contains
the base learner as an importable variable stored in ``base_learner``. The base
learner will be set to the appropriate hyperparameters through ``set_params``.
Args:
filepath (str, unicode): File path to save file in
hyperparameters (dict): Dictionary to use for ``set_params``
"""
if not filepath.endswith('.py'):
filepath += '.py'

file_contents = ''
file_contents += self.source
file_contents += '\n\nbase_learner.set_params(**{})\n'.format(hyperparameters)
file_contents += '\nmeta_feature_generator = "{}"\n'.format(self.meta_feature_generator)
with open(filepath, 'wb') as f:
f.write(file_contents.encode('utf8'))


class AutomatedRun(Base):
"""This table contains initialized/completed automated hyperparameter searches"""
Expand Down Expand Up @@ -315,6 +337,18 @@ def cleanup(self, path):
"""
self.delete_meta_features(path)

def export_as_file(self, filepath):
"""Generates a Python file with the importable base learner
This function generates a Python file in the specified file path that contains
the base learner as an importable variable stored in ``base_learner``. The base
learner will be set to the appropriate hyperparameters through ``set_params``.
Args:
filepath (str, unicode): File path to save file in
"""
self.base_learner_origin.export_as_file(filepath, self.hyperparameters)


class StackedEnsemble(Base):
"""This table contains StackedEnsembles created in the xcessiv notebook"""
Expand Down Expand Up @@ -356,6 +390,94 @@ def return_secondary_learner(self):
estimator = estimator.set_params(**self.secondary_learner_hyperparameters)
return estimator

def export_as_package(self, package_path, cv_source):
"""Exports the ensemble as a Python package and saves it to `package_path`.
Args:
package_path (str, unicode): Absolute/local path of place to save package in
cv_source (str, unicode): String containing actual code for base learner
cross-validation used to generate secondary meta-features.
Raises:
exceptions.UserError: If os.path.join(path, name) already exists.
"""
if os.path.exists(package_path):
raise exceptions.UserError('{} already exists'.format(package_path))

package_name = os.path.basename(os.path.normpath(package_path))

os.makedirs(package_path)

# Write __init__.py
with open(os.path.join(package_path, '__init__.py'), 'wb') as f:
f.write('from {}.builder import xcessiv_ensemble'.format(package_name).encode('utf8'))

# Create package baselearners with each base learner having its own module
os.makedirs(os.path.join(package_path, 'baselearners'))
open(os.path.join(package_path, 'baselearners', '__init__.py'), 'a').close()
for idx, base_learner in enumerate(self.base_learners):
base_learner.export_as_file(os.path.join(package_path,
'baselearners',
'baselearner' + str(idx)))

# Create metalearner.py containing secondary learner
self.base_learner_origin.export_as_file(
os.path.join(package_path, 'metalearner'),
self.secondary_learner_hyperparameters
)

# Create cv.py containing CV method for getting meta-features
with open(os.path.join(package_path, 'cv.py'), 'wb') as f:
f.write(cv_source.encode('utf8'))

# Create stacker.py containing class for Xcessiv ensemble
ensemble_source = ''
stacker_file_loc = os.path.join(os.path.abspath(os.path.dirname(__file__)), 'stacker.py')
with open(stacker_file_loc) as f:
ensemble_source += f.read()

ensemble_source += '\n\n' \
' def {}(self, X):\n' \
' return self._process_using_' \
'meta_feature_generator(X, "{}")\n\n'\
.format(self.base_learner_origin.meta_feature_generator,
self.base_learner_origin.meta_feature_generator)

with open(os.path.join(package_path, 'stacker.py'), 'wb') as f:
f.write(ensemble_source.encode('utf8'))

# Create builder.py containing file where `xcessiv_ensemble` is instantiated for import
builder_source = ''

for idx, base_learner in enumerate(self.base_learners):
builder_source += 'from {}.baselearners import baselearner{}\n'.format(package_name, idx)

builder_source += 'from {}.cv import return_splits_iterable\n'.format(package_name)

builder_source += 'from {} import metalearner\n'.format(package_name)

builder_source += 'from {}.stacker import XcessivStackedEnsemble\n'.format(package_name)

builder_source += '\nbase_learners = [\n'
for idx, base_learner in enumerate(self.base_learners):
builder_source += ' baselearner{}.base_learner,\n'.format(idx)
builder_source += ']\n'

builder_source += '\nmeta_feature_generators = [\n'
for idx, base_learner in enumerate(self.base_learners):
builder_source += ' baselearner{}.meta_feature_generator,\n'.format(idx)
builder_source += ']\n'

builder_source += '\nxcessiv_ensemble = XcessivStackedEnsemble(base_learners=base_learners,' \
' meta_feature_generators=meta_feature_generators,' \
' secondary_learner=metalearner.base_learner,' \
' cv_function=return_splits_iterable,' \
' append_original={})\n'.format(self.append_original)

with open(os.path.join(package_path, 'builder.py'), 'wb') as f:
f.write(builder_source.encode('utf8'))

@property
def serialize(self):
return dict(
Expand Down
112 changes: 112 additions & 0 deletions xcessiv/stacker.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
from __future__ import absolute_import, print_function, division, unicode_literals
from sklearn.pipeline import _BasePipeline
import numpy as np


class XcessivStackedEnsemble(_BasePipeline):
"""Contains the class for the Xcessiv stacked ensemble"""
def __init__(self, base_learners, meta_feature_generators,
secondary_learner, cv_function, append_original):
super(XcessivStackedEnsemble, self).__init__()

self.base_learners = base_learners
self.meta_feature_generators = meta_feature_generators
self.secondary_learner = secondary_learner
self.cv_function = cv_function
self.append_original = append_original
self._named_learners = [('bl{}'.format(idx), base_learner) for idx, base_learner
in enumerate(base_learners)]
self._named_learners.append(('secondary-learner', secondary_learner))

def get_params(self, deep=True):
"""Get parameters for this estimator.
Args:
deep (boolean, optional): If True, will return the parameters for this estimator and
contained subobjects that are estimators.
Returns
params: mapping of string to any Parameter names mapped to their values.
"""
return self._get_params('_named_learners', deep=deep)

def set_params(self, **params):
"""Set the parameters of this estimator."""
self._set_params('_named_learners', **params)
return self

def fit(self, X, y):
print('Fitting {} base learners'.format(len(self.base_learners)))

all_learner_meta_features = []
for idx, base_learner in enumerate(self.base_learners):

single_learner_meta_features = []
test_indices = []
for num, (train_idx, test_idx) in enumerate(self.cv_function(X, y)):
print('Fold {} of base learner {}'.format(num+1, idx+1))

base_learner.fit(X[train_idx], y[train_idx])

preds = getattr(base_learner, self.meta_feature_generators[idx])(X[test_idx])

if len(preds.shape) == 1:
preds = preds.reshape(-1, 1)

single_learner_meta_features.append(
preds
)

test_indices.append(test_idx)

single_learner_meta_features = np.concatenate(single_learner_meta_features)
all_learner_meta_features.append(single_learner_meta_features)

all_learner_meta_features = np.concatenate(all_learner_meta_features, axis=1)
test_indices = np.concatenate(test_indices) # reorganized order due to CV

print('Fitting meta-learner')

if self.append_original:
all_learner_meta_features = np.concatenate(
(all_learner_meta_features, X[test_indices]),
axis=1
)

self.secondary_learner.fit(all_learner_meta_features, y[test_indices])

return self

def _process_using_meta_feature_generator(self, X, meta_feature_generator):
"""Process using secondary learner meta-feature generator
Since secondary learner meta-feature generator can be anything e.g. predict, predict_proba,
this internal method gives the ability to use any string. Just make sure secondary learner
has the method.
Args:
X (array-like): Features array
meta_feature_generator (str, unicode): Method for use by secondary learner
"""

all_learner_meta_features = []
for idx, base_learner in enumerate(self.base_learners):
single_learner_meta_features = getattr(base_learner,
self.meta_feature_generators[idx])(X)

if len(single_learner_meta_features.shape) == 1:
single_learner_meta_features = single_learner_meta_features.reshape(-1, 1)
all_learner_meta_features.append(single_learner_meta_features)

all_learner_meta_features = np.concatenate(all_learner_meta_features, axis=1)
if self.append_original:
all_learner_meta_features = np.concatenate(
(all_learner_meta_features, X),
axis=1
)

out = getattr(self.secondary_learner, meta_feature_generator)(all_learner_meta_features)

return out
53 changes: 52 additions & 1 deletion xcessiv/ui/src/Ensemble/EnsembleMoreDetailsModal.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import React, {Component} from 'react';
import './Ensemble.css';
import 'react-select/dist/react-select.css';
import { Modal, Panel, Button, Alert } from 'react-bootstrap';
import { Modal, Panel, Button, Alert, Form,
FormGroup, ControlLabel, FormControl } from 'react-bootstrap';


function DisplayError(props) {
Expand Down Expand Up @@ -104,4 +105,54 @@ export class DeleteModal extends Component {
}
}

export class ExportModal extends Component {
constructor(props) {
super(props);
this.state = {
name: ''
};
}

handleYesAndClose() {
this.props.handleYes(this.state.name);
this.props.onRequestClose();
}

render() {

return (
<Modal
show={this.props.isOpen}
onHide={this.props.onRequestClose}
>
<Modal.Header closeButton>
<Modal.Title>Export ensemble as Python package</Modal.Title>
</Modal.Header>
<Modal.Body>
<Form onSubmit={(e) => {
e.preventDefault();
this.handleYesAndClose();
}}>
<FormGroup
controlId='name'
>
<ControlLabel>Name to use as package name</ControlLabel>
<FormControl
value={this.state.name}
onChange={(evt) => this.setState({name: evt.target.value})}
/>
</FormGroup>
</Form>
</Modal.Body>
<Modal.Footer>
<Button bsStyle='primary' onClick={() => this.handleYesAndClose()}>
Save
</Button>
<Button onClick={this.props.onRequestClose}>Cancel</Button>
</Modal.Footer>
</Modal>
)
}
}

export default DetailsModal;
Loading

0 comments on commit d9e6c5f

Please sign in to comment.