Merge pull request #93 from firefly-cpp/69-support-for-regression-tas…

…ks-and-feature-selection 69 support for regression tasks and feature selection
firefly-cpp · Apr 22, 2024 · 23dda00 · 23dda00
2 parents df84e61 + 9d62b5f
commit 23dda00
Show file tree

Hide file tree

Showing 27 changed files with 1,347 additions and 26 deletions.
diff --git a/README.md b/README.md
@@ -42,6 +42,8 @@
 
 NiaAML is a framework for Automated Machine Learning based on nature-inspired algorithms for optimization. The framework is written fully in Python. The name NiaAML comes from the Automated Machine Learning method of the same name [[1]](#1). Its goal is to compose the best possible classification pipeline for the given task efficiently using components on the input. The components are divided into three groups: feature selection algorithms, feature transformation algorithms and classifiers. The framework uses nature-inspired algorithms for optimization to choose the best set of components for the classification pipeline, and optimize their hyperparameters. We use the <a href="https://github.com/NiaOrg/NiaPy">NiaPy framework</a> for the optimization process, which is a popular Python collection of nature-inspired algorithms. The NiaAML framework is easy to use and customize or expand to suit your needs.
 
+> 🆕📈 NiaAML now also support regression tasks. The package still refers to regressors as "classifiers" to avoid introducing a breaking change to the API.
+
 The NiaAML framework allows you not only to run full pipeline optimization, but also to separate implemented components such as classifiers, feature selection algorithms, etc. **It supports numerical and categorical features as well as missing values in datasets.**
 
 * **Free software:** MIT license,
@@ -133,7 +135,7 @@ self._params = dict(
 )
 ```
 
-An individual in the first type of optimization is represented as a real-valued vector that has a size equal to the sum of the number of keys in all three dictionaries (classifier's _params, Feature Transformation algorithm's _params and feature selection algorithm's _params) and the value of each dimension is in the range [0.0, 1.0]. The second type of optimization maps real values from the individual's vector to those parameter definitions in the dictionaries. Each parameter's value can be defined as a range or array of values. In the first case, a value from a vector is mapped from one iterval to another, and in the second case, a value from the vector falls into one of the bins that represent an index of the array that holds possible parameters` values.
+An individual in the first type of optimization is represented as a real-valued vector that has a size equal to the sum of the number of keys in all three dictionaries (classifier's `_params`, Feature Transformation algorithm's `_params` and feature selection algorithm's `_params`) and the value of each dimension is in the range [0.0, 1.0]. The second type of optimization maps real values from the individual's vector to those parameter definitions in the dictionaries. Each parameter's value can be defined as a range or array of values. In the first case, a value from a vector is mapped from one iterval to another, and in the second case, a value from the vector falls into one of the bins that represent an index of the array that holds possible parameters` values.
 
 Let's say we have a classifier with 3 parameters, a feature selection algorithm with 2 parameters and feature transformation algorithm with 4 parameters. The size of an individual in the second type of optimization is 9. The size of an individual in the first type of optimization is always 3 (1 classifier, 1 feature selection algorithm and 1 feature transformation algorithm).
 
@@ -197,6 +199,46 @@ pipeline1.export_text('pipeline.txt')
 
 This is a very simple example with dummy data. It is only intended to give you a basic idea of how to use the framework.
 
+### 📈 Example of a Regression Task
+
+The API for solving regression tasks is not different to the classification use-case. One only has to choose the right components that support regression:
+
+Currently, the following components support regression tasks:
+
+➡️ **Feature Transform Algorithms**:
+
++ "Normalizer"
++ "StandardScaler"
++ "MaxAbsScaler"
++ "QuantileTransformer"
++ "RobustScaler"
+
+🔎 **Feature Selection Algorithms**:
+
++ "SelectKBest"
++ "SelectPercentile"
++ "SelectUnivariateRegression"
+
+🔮 **Models (Classifiers)**:
+
++ "LinearRegression"
++ "RidgeRegression"
++ "LassoRegression"
++ "DecisionTreeRegression"
++ "GaussianProcessRegression"
+
+```python
+pipeline_optimizer = PipelineOptimizer(
+    data=data_reader,
+    feature_selection_algorithms=["SelectKBest", "SelectPercentile", "SelectUnivariateRegression"],
+    feature_transform_algorithms=["Normalizer", "StandardScaler"],
+    classifiers=["LinearRegression", "RidgeRegression", "LassoRegression", "DecisionTreeRegression", "GaussianProcessRegression"],
+)
+
+# run the modified version of optimization
+pipeline1 = pipeline_optimizer.run("MSE", 10, 10, 20, 20, "ParticleSwarmAlgorithm")
+```
+
 ### Example of a Pipeline Component's Implementation
 
 The NiaAML framework is easily expandable, as you can implement components by overriding the base classes' methods. To implement a classifier you should inherit from the [Classifier](niaaml/classifiers/classifier.py) class, and you can do the same with [FeatureSelectionAlgorithm](niaaml/preprocessing/feature_selection/feature_selection_algorithm.py) and [FeatureTransformAlgorithm](niaaml/preprocessing/feature_transform/feature_transform_algorithm.py) classes. All of the mentioned classes inherit from the [PipelineComponent](niaaml/pipeline_component.py) class.

diff --git a/docs/getting_started.rst b/docs/getting_started.rst
@@ -64,6 +64,47 @@ This is a very simple example with dummy data. It is only intended to give you a
 
 Find more examples `here <https://github.com/lukapecnik/NiaAML/tree/master/examples>`_
 
+Regression example
+------------------
+
+The API for solving regression tasks is not different to the classification use-case. One only has to choose the right components that support regression:
+
+Currently, the following components support regression tasks:
+
+➡️ **Feature Transform Algorithms**:
+
+* "Normalizer"
+* "StandardScaler"
+* "MaxAbsScaler"
+* "QuantileTransformer"
+* "RobustScaler"
+
+🔎 **Feature Selection Algorithms**:
+
+* "SelectKBest"
+* "SelectPercentile"
+* "SelectUnivariateRegression"
+
+🔮 **Models (Classifiers)**:
+
+* "LinearRegression"
+* "RidgeRegression"
+* "LassoRegression"
+* "DecisionTreeRegression"
+* "GaussianProcessRegression"
+
+.. code:: python
+
+    pipeline_optimizer = PipelineOptimizer(
+        data=data_reader,
+        feature_selection_algorithms=["SelectKBest", "SelectPercentile", "SelectUnivariateRegression"],
+        feature_transform_algorithms=["Normalizer", "StandardScaler"],
+        classifiers=["LinearRegression", "RidgeRegression", "LassoRegression", "DecisionTreeRegression", "GaussianProcessRegression"],
+    )
+
+    # run the modified version of optimization
+    pipeline1 = pipeline_optimizer.run("MSE", 10, 10, 20, 20, "ParticleSwarmAlgorithm")
+
 Components
 ----------
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -7,7 +7,7 @@ NiaAML is an automated machine learning Python framework based on nature-inspire
 
 * **Free software:** MIT license
 * **Github repository:** https://github.com/lukapecnik/NiaAML
-* **Python versions:** 3.6.x, 3.7.x, 3.8.x
+* **Python versions:** 3.11.x, 3.12.x
 
 The main documentation is organized into a couple of sections:
 

diff --git a/niaaml/classifiers/__init__.py b/niaaml/classifiers/__init__.py
@@ -6,12 +6,17 @@
 from niaaml.classifiers.extremely_randomized_trees import ExtremelyRandomizedTrees
 from niaaml.classifiers.bagging import Bagging
 from niaaml.classifiers.decision_tree import DecisionTree
+from niaaml.classifiers.regression_decision_tree import DecisionTreeRegression
 from niaaml.classifiers.k_neighbors import KNeighbors
 from niaaml.classifiers.gaussian_process import GaussianProcess
+from niaaml.classifiers.regression_gaussian_process import GaussianProcessRegression
 from niaaml.classifiers.gaussian_naive_bayes import GaussianNB
 from niaaml.classifiers.quadratic_driscriminant_analysis import (
     QuadraticDiscriminantAnalysis,
 )
+from niaaml.classifiers.regression_linear_model import LinearRegression
+from niaaml.classifiers.regression_ridge import RidgeRegression
+from niaaml.classifiers.regression_lasso import LassoRegression
 from niaaml.classifiers.utility import ClassifierFactory
 
 __all__ = [
@@ -23,9 +28,14 @@
     "Bagging",
     "ExtremelyRandomizedTrees",
     "DecisionTree",
+    "DecisionTreeRegression",
     "KNeighbors",
     "GaussianProcess",
+    "GaussianProcessRegression",
     "GaussianNB",
     "QuadraticDiscriminantAnalysis",
     "ClassifierFactory",
+    "LinearRegression",
+    "RidgeRegression",
+    "LassoRegression",
 ]
diff --git a/niaaml/classifiers/regression_decision_tree.py b/niaaml/classifiers/regression_decision_tree.py
@@ -0,0 +1,91 @@
+from niaaml.classifiers.classifier import Classifier
+from niaaml.utilities import ParameterDefinition
+from sklearn.tree import DecisionTreeRegressor as DTR
+
+import warnings
+from sklearn.exceptions import (
+    ConvergenceWarning,
+    DataConversionWarning,
+    DataDimensionalityWarning,
+    EfficiencyWarning,
+    FitFailedWarning,
+    UndefinedMetricWarning,
+)
+
+__all__ = ["DecisionTreeRegression"]
+
+
+class DecisionTreeRegression(Classifier):
+    r"""Implementation of decision tree regression.
+
+    Date:
+        2024
+
+    Author:
+        Laurenz Farthofer
+
+    License:
+        MIT
+
+    Documentation:
+        https://scikit-learn.org/stable/modules/tree.html#regression
+
+    See Also:
+        * :class:`niaaml.classifiers.Classifier`
+    """
+    Name = "Decision Tree Regression"
+
+    def __init__(self, **kwargs):
+        r"""Initialize DecisionTree instance."""
+        warnings.filterwarnings(action="ignore", category=ConvergenceWarning)
+        warnings.filterwarnings(action="ignore", category=DataConversionWarning)
+        warnings.filterwarnings(action="ignore", category=DataDimensionalityWarning)
+        warnings.filterwarnings(action="ignore", category=EfficiencyWarning)
+        warnings.filterwarnings(action="ignore", category=FitFailedWarning)
+        warnings.filterwarnings(action="ignore", category=UndefinedMetricWarning)
+
+        self._params = dict(
+            criterion=ParameterDefinition(["squared_error", "friedman_mse", "absolute_error", "poisson"]),
+            splitter=ParameterDefinition(["best", "random"]),
+        )
+        self.__decision_tree_regression = DTR()
+
+    def set_parameters(self, **kwargs):
+        r"""Set the parameters/arguments of the algorithm."""
+        self.__decision_tree_regression.set_params(**kwargs)
+
+    def fit(self, x, y, **kwargs):
+        r"""Fit DecisionTree.
+
+        Arguments:
+            x (pandas.core.frame.DataFrame): n samples to classify.
+            y (pandas.core.series.Series): n classes of the samples in the x array.
+
+        Returns:
+            None
+        """
+        self.__decision_tree_regression.fit(x, y)
+
+    def predict(self, x, **kwargs):
+        r"""Predict class for each sample (row) in x.
+
+        Arguments:
+            x (pandas.core.frame.DataFrame): n samples to classify.
+
+        Returns:
+            pandas.core.series.Series: n predicted classes.
+        """
+        return self.__decision_tree_regression.predict(x)
+
+    def to_string(self):
+        r"""User friendly representation of the object.
+
+        Returns:
+            str: User friendly representation of the object.
+        """
+        return Classifier.to_string(self).format(
+            name=self.Name,
+            args=self._parameters_to_string(
+                self.__decision_tree_regression.get_params()
+            ),
+        )
diff --git a/niaaml/classifiers/regression_gaussian_process.py b/niaaml/classifiers/regression_gaussian_process.py
@@ -0,0 +1,88 @@
+from niaaml.classifiers.classifier import Classifier
+from niaaml.utilities import MinMax
+from niaaml.utilities import ParameterDefinition
+from sklearn.gaussian_process import GaussianProcessRegressor as GPR
+import numpy as np
+
+import warnings
+from sklearn.exceptions import (
+    ConvergenceWarning,
+    DataConversionWarning,
+    DataDimensionalityWarning,
+    EfficiencyWarning,
+    FitFailedWarning,
+    UndefinedMetricWarning,
+)
+
+__all__ = ["GaussianProcessRegression"]
+
+
+class GaussianProcessRegression(Classifier):
+    r"""Implementation of gaussian process regression.
+
+    Date:
+        2024
+
+    Author:
+        Laurenz Farthofer
+
+    License:
+        MIT
+
+    Documentation:
+        https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html#sklearn.gaussian_process.GaussianProcessRegressor
+
+    See Also:
+        * :class:`niaaml.classifiers.Classifier`
+    """
+    Name = "Gaussian Process Regression"
+
+    def __init__(self, **kwargs):
+        r"""Initialize GaussianProcess instance."""
+        warnings.filterwarnings(action="ignore", category=ConvergenceWarning)
+        warnings.filterwarnings(action="ignore", category=DataConversionWarning)
+        warnings.filterwarnings(action="ignore", category=DataDimensionalityWarning)
+        warnings.filterwarnings(action="ignore", category=EfficiencyWarning)
+        warnings.filterwarnings(action="ignore", category=FitFailedWarning)
+        warnings.filterwarnings(action="ignore", category=UndefinedMetricWarning)
+
+        self._params = dict()
+        self.__gaussian_process = GPR()
+
+    def set_parameters(self, **kwargs):
+        r"""Set the parameters/arguments of the algorithm."""
+        self.__gaussian_process.set_params(**kwargs)
+
+    def fit(self, x, y, **kwargs):
+        r"""Fit GaussianProcess.
+
+        Arguments:
+            x (pandas.core.frame.DataFrame): n samples to classify.
+            y (pandas.core.series.Series): n classes of the samples in the x array.
+
+        Returns:
+            None
+        """
+        self.__gaussian_process.fit(x, y)
+
+    def predict(self, x, **kwargs):
+        r"""Predict class for each sample (row) in x.
+
+        Arguments:
+            x (pandas.core.frame.DataFrame): n samples to classify.
+
+        Returns:
+            pandas.core.series.Series: n predicted classes.
+        """
+        return self.__gaussian_process.predict(x)
+
+    def to_string(self):
+        r"""User friendly representation of the object.
+
+        Returns:
+            str: User friendly representation of the object.
+        """
+        return Classifier.to_string(self).format(
+            name=self.Name,
+            args=self._parameters_to_string(self.__gaussian_process.get_params()),
+        )