Skip to content

Commit

Permalink
Merge pull request #93 from firefly-cpp/69-support-for-regression-tas…
Browse files Browse the repository at this point in the history
…ks-and-feature-selection

69 support for regression tasks and feature selection
  • Loading branch information
firefly-cpp authored Apr 22, 2024
2 parents df84e61 + 9d62b5f commit 23dda00
Show file tree
Hide file tree
Showing 27 changed files with 1,347 additions and 26 deletions.
44 changes: 43 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@

NiaAML is a framework for Automated Machine Learning based on nature-inspired algorithms for optimization. The framework is written fully in Python. The name NiaAML comes from the Automated Machine Learning method of the same name [[1]](#1). Its goal is to compose the best possible classification pipeline for the given task efficiently using components on the input. The components are divided into three groups: feature selection algorithms, feature transformation algorithms and classifiers. The framework uses nature-inspired algorithms for optimization to choose the best set of components for the classification pipeline, and optimize their hyperparameters. We use the <a href="https://github.com/NiaOrg/NiaPy">NiaPy framework</a> for the optimization process, which is a popular Python collection of nature-inspired algorithms. The NiaAML framework is easy to use and customize or expand to suit your needs.

> 🆕📈 NiaAML now also support regression tasks. The package still refers to regressors as "classifiers" to avoid introducing a breaking change to the API.
The NiaAML framework allows you not only to run full pipeline optimization, but also to separate implemented components such as classifiers, feature selection algorithms, etc. **It supports numerical and categorical features as well as missing values in datasets.**

* **Free software:** MIT license,
Expand Down Expand Up @@ -133,7 +135,7 @@ self._params = dict(
)
```

An individual in the first type of optimization is represented as a real-valued vector that has a size equal to the sum of the number of keys in all three dictionaries (classifier's _params, Feature Transformation algorithm's _params and feature selection algorithm's _params) and the value of each dimension is in the range [0.0, 1.0]. The second type of optimization maps real values from the individual's vector to those parameter definitions in the dictionaries. Each parameter's value can be defined as a range or array of values. In the first case, a value from a vector is mapped from one iterval to another, and in the second case, a value from the vector falls into one of the bins that represent an index of the array that holds possible parameters` values.
An individual in the first type of optimization is represented as a real-valued vector that has a size equal to the sum of the number of keys in all three dictionaries (classifier's `_params`, Feature Transformation algorithm's `_params` and feature selection algorithm's `_params`) and the value of each dimension is in the range [0.0, 1.0]. The second type of optimization maps real values from the individual's vector to those parameter definitions in the dictionaries. Each parameter's value can be defined as a range or array of values. In the first case, a value from a vector is mapped from one iterval to another, and in the second case, a value from the vector falls into one of the bins that represent an index of the array that holds possible parameters` values.

Let's say we have a classifier with 3 parameters, a feature selection algorithm with 2 parameters and feature transformation algorithm with 4 parameters. The size of an individual in the second type of optimization is 9. The size of an individual in the first type of optimization is always 3 (1 classifier, 1 feature selection algorithm and 1 feature transformation algorithm).

Expand Down Expand Up @@ -197,6 +199,46 @@ pipeline1.export_text('pipeline.txt')

This is a very simple example with dummy data. It is only intended to give you a basic idea of how to use the framework.

### 📈 Example of a Regression Task

The API for solving regression tasks is not different to the classification use-case. One only has to choose the right components that support regression:

Currently, the following components support regression tasks:

➡️ **Feature Transform Algorithms**:

+ "Normalizer"
+ "StandardScaler"
+ "MaxAbsScaler"
+ "QuantileTransformer"
+ "RobustScaler"

🔎 **Feature Selection Algorithms**:

+ "SelectKBest"
+ "SelectPercentile"
+ "SelectUnivariateRegression"

🔮 **Models (Classifiers)**:

+ "LinearRegression"
+ "RidgeRegression"
+ "LassoRegression"
+ "DecisionTreeRegression"
+ "GaussianProcessRegression"

```python
pipeline_optimizer = PipelineOptimizer(
data=data_reader,
feature_selection_algorithms=["SelectKBest", "SelectPercentile", "SelectUnivariateRegression"],
feature_transform_algorithms=["Normalizer", "StandardScaler"],
classifiers=["LinearRegression", "RidgeRegression", "LassoRegression", "DecisionTreeRegression", "GaussianProcessRegression"],
)

# run the modified version of optimization
pipeline1 = pipeline_optimizer.run("MSE", 10, 10, 20, 20, "ParticleSwarmAlgorithm")
```

### Example of a Pipeline Component's Implementation

The NiaAML framework is easily expandable, as you can implement components by overriding the base classes' methods. To implement a classifier you should inherit from the [Classifier](niaaml/classifiers/classifier.py) class, and you can do the same with [FeatureSelectionAlgorithm](niaaml/preprocessing/feature_selection/feature_selection_algorithm.py) and [FeatureTransformAlgorithm](niaaml/preprocessing/feature_transform/feature_transform_algorithm.py) classes. All of the mentioned classes inherit from the [PipelineComponent](niaaml/pipeline_component.py) class.
Expand Down
41 changes: 41 additions & 0 deletions docs/getting_started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,47 @@ This is a very simple example with dummy data. It is only intended to give you a

Find more examples `here <https://github.com/lukapecnik/NiaAML/tree/master/examples>`_

Regression example
------------------

The API for solving regression tasks is not different to the classification use-case. One only has to choose the right components that support regression:

Currently, the following components support regression tasks:

➡️ **Feature Transform Algorithms**:

* "Normalizer"
* "StandardScaler"
* "MaxAbsScaler"
* "QuantileTransformer"
* "RobustScaler"

🔎 **Feature Selection Algorithms**:

* "SelectKBest"
* "SelectPercentile"
* "SelectUnivariateRegression"

🔮 **Models (Classifiers)**:

* "LinearRegression"
* "RidgeRegression"
* "LassoRegression"
* "DecisionTreeRegression"
* "GaussianProcessRegression"

.. code:: python
pipeline_optimizer = PipelineOptimizer(
data=data_reader,
feature_selection_algorithms=["SelectKBest", "SelectPercentile", "SelectUnivariateRegression"],
feature_transform_algorithms=["Normalizer", "StandardScaler"],
classifiers=["LinearRegression", "RidgeRegression", "LassoRegression", "DecisionTreeRegression", "GaussianProcessRegression"],
)
# run the modified version of optimization
pipeline1 = pipeline_optimizer.run("MSE", 10, 10, 20, 20, "ParticleSwarmAlgorithm")
Components
----------

Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ NiaAML is an automated machine learning Python framework based on nature-inspire

* **Free software:** MIT license
* **Github repository:** https://github.com/lukapecnik/NiaAML
* **Python versions:** 3.6.x, 3.7.x, 3.8.x
* **Python versions:** 3.11.x, 3.12.x

The main documentation is organized into a couple of sections:

Expand Down
10 changes: 10 additions & 0 deletions niaaml/classifiers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,17 @@
from niaaml.classifiers.extremely_randomized_trees import ExtremelyRandomizedTrees
from niaaml.classifiers.bagging import Bagging
from niaaml.classifiers.decision_tree import DecisionTree
from niaaml.classifiers.regression_decision_tree import DecisionTreeRegression
from niaaml.classifiers.k_neighbors import KNeighbors
from niaaml.classifiers.gaussian_process import GaussianProcess
from niaaml.classifiers.regression_gaussian_process import GaussianProcessRegression
from niaaml.classifiers.gaussian_naive_bayes import GaussianNB
from niaaml.classifiers.quadratic_driscriminant_analysis import (
QuadraticDiscriminantAnalysis,
)
from niaaml.classifiers.regression_linear_model import LinearRegression
from niaaml.classifiers.regression_ridge import RidgeRegression
from niaaml.classifiers.regression_lasso import LassoRegression
from niaaml.classifiers.utility import ClassifierFactory

__all__ = [
Expand All @@ -23,9 +28,14 @@
"Bagging",
"ExtremelyRandomizedTrees",
"DecisionTree",
"DecisionTreeRegression",
"KNeighbors",
"GaussianProcess",
"GaussianProcessRegression",
"GaussianNB",
"QuadraticDiscriminantAnalysis",
"ClassifierFactory",
"LinearRegression",
"RidgeRegression",
"LassoRegression",
]
91 changes: 91 additions & 0 deletions niaaml/classifiers/regression_decision_tree.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
from niaaml.classifiers.classifier import Classifier
from niaaml.utilities import ParameterDefinition
from sklearn.tree import DecisionTreeRegressor as DTR

import warnings
from sklearn.exceptions import (
ConvergenceWarning,
DataConversionWarning,
DataDimensionalityWarning,
EfficiencyWarning,
FitFailedWarning,
UndefinedMetricWarning,
)

__all__ = ["DecisionTreeRegression"]


class DecisionTreeRegression(Classifier):
r"""Implementation of decision tree regression.
Date:
2024
Author:
Laurenz Farthofer
License:
MIT
Documentation:
https://scikit-learn.org/stable/modules/tree.html#regression
See Also:
* :class:`niaaml.classifiers.Classifier`
"""
Name = "Decision Tree Regression"

def __init__(self, **kwargs):
r"""Initialize DecisionTree instance."""
warnings.filterwarnings(action="ignore", category=ConvergenceWarning)
warnings.filterwarnings(action="ignore", category=DataConversionWarning)
warnings.filterwarnings(action="ignore", category=DataDimensionalityWarning)
warnings.filterwarnings(action="ignore", category=EfficiencyWarning)
warnings.filterwarnings(action="ignore", category=FitFailedWarning)
warnings.filterwarnings(action="ignore", category=UndefinedMetricWarning)

self._params = dict(
criterion=ParameterDefinition(["squared_error", "friedman_mse", "absolute_error", "poisson"]),
splitter=ParameterDefinition(["best", "random"]),
)
self.__decision_tree_regression = DTR()

def set_parameters(self, **kwargs):
r"""Set the parameters/arguments of the algorithm."""
self.__decision_tree_regression.set_params(**kwargs)

def fit(self, x, y, **kwargs):
r"""Fit DecisionTree.
Arguments:
x (pandas.core.frame.DataFrame): n samples to classify.
y (pandas.core.series.Series): n classes of the samples in the x array.
Returns:
None
"""
self.__decision_tree_regression.fit(x, y)

def predict(self, x, **kwargs):
r"""Predict class for each sample (row) in x.
Arguments:
x (pandas.core.frame.DataFrame): n samples to classify.
Returns:
pandas.core.series.Series: n predicted classes.
"""
return self.__decision_tree_regression.predict(x)

def to_string(self):
r"""User friendly representation of the object.
Returns:
str: User friendly representation of the object.
"""
return Classifier.to_string(self).format(
name=self.Name,
args=self._parameters_to_string(
self.__decision_tree_regression.get_params()
),
)
88 changes: 88 additions & 0 deletions niaaml/classifiers/regression_gaussian_process.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
from niaaml.classifiers.classifier import Classifier
from niaaml.utilities import MinMax
from niaaml.utilities import ParameterDefinition
from sklearn.gaussian_process import GaussianProcessRegressor as GPR
import numpy as np

import warnings
from sklearn.exceptions import (
ConvergenceWarning,
DataConversionWarning,
DataDimensionalityWarning,
EfficiencyWarning,
FitFailedWarning,
UndefinedMetricWarning,
)

__all__ = ["GaussianProcessRegression"]


class GaussianProcessRegression(Classifier):
r"""Implementation of gaussian process regression.
Date:
2024
Author:
Laurenz Farthofer
License:
MIT
Documentation:
https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html#sklearn.gaussian_process.GaussianProcessRegressor
See Also:
* :class:`niaaml.classifiers.Classifier`
"""
Name = "Gaussian Process Regression"

def __init__(self, **kwargs):
r"""Initialize GaussianProcess instance."""
warnings.filterwarnings(action="ignore", category=ConvergenceWarning)
warnings.filterwarnings(action="ignore", category=DataConversionWarning)
warnings.filterwarnings(action="ignore", category=DataDimensionalityWarning)
warnings.filterwarnings(action="ignore", category=EfficiencyWarning)
warnings.filterwarnings(action="ignore", category=FitFailedWarning)
warnings.filterwarnings(action="ignore", category=UndefinedMetricWarning)

self._params = dict()
self.__gaussian_process = GPR()

def set_parameters(self, **kwargs):
r"""Set the parameters/arguments of the algorithm."""
self.__gaussian_process.set_params(**kwargs)

def fit(self, x, y, **kwargs):
r"""Fit GaussianProcess.
Arguments:
x (pandas.core.frame.DataFrame): n samples to classify.
y (pandas.core.series.Series): n classes of the samples in the x array.
Returns:
None
"""
self.__gaussian_process.fit(x, y)

def predict(self, x, **kwargs):
r"""Predict class for each sample (row) in x.
Arguments:
x (pandas.core.frame.DataFrame): n samples to classify.
Returns:
pandas.core.series.Series: n predicted classes.
"""
return self.__gaussian_process.predict(x)

def to_string(self):
r"""User friendly representation of the object.
Returns:
str: User friendly representation of the object.
"""
return Classifier.to_string(self).format(
name=self.Name,
args=self._parameters_to_string(self.__gaussian_process.get_params()),
)
Loading

0 comments on commit 23dda00

Please sign in to comment.