GitHub

Package information:

Overview

Problem-oriented AutoML in Clustering (PoAC) is a flexible and powerful framework designed to enhance the automation of clustering tasks within the AutoML landscape. PoAC leverages meta-learning and surrogate modeling to optimize clustering pipelines, offering a flexible approach that allows customization of meta-features, Clustering Validation Indices (CVIs).

Features

Problem Space Generation: Synthesize labeled clustering datasets through combinatorial analysis of dataset archetype parameters.
Clustering Simulations: Create partitionings with multiple noise levels, calculate CVIs, and similarity metrics to simulate clustering performance.
Feature Space Construction: Extract meta-features from the problem space datasets and combine them with the CVIs and similarity metrics to build a comprehensive meta-database.
Surrogate Modeling: Train a regression model as a surrogate to predict the quality of clustering pipelines, enabling task-agnostic optimization across various clustering scenarios.
Clustering pipeline synthesis: Seamlessly integrate the trained surrogate model with popular AutoML frameworks like TPOT to enhance clustering evaluations.

Installation

To get started with PoAC, follow these steps:

Clone the repository:

git clone git@github.com:Mcamilo/poac.git \
cd PoAC

It’s recommended to use a virtual environment to manage dependencies.

Create a virtual environment:

python3 -m venv poac-env
source poac-env/bin/activate  # On Windows, use `poac-env\Scripts\activate`

Install the required packages:
```
pip install -r requirements.txt
```

Usage

We have divided the PoAC framework into two main stages: Training of the Surrogate Model and the Pipeline Synthesis. While the framework is designed to guide users through these stages sequentially, it is flexible enough to allow users to execute individual modules based on their specific needs. Additionally, PoAC comes with a pre-trained default surrogate model, enabling users to quickly start synthesizing and optimizing clustering pipelines without the need for training a new model.

1. Surrogate Model

import poac
import joblib

surrogate = poac.Surrogate()

# Start by defining the problem space, where you synthesize clustering datasets:
surrogate.populate_problem_space(sample_size=5, keep=False)
# Simulate clustering partitionings with varying levels of noise:
surrogate.simulate_solutions()
# Extract meta-features and combine with CVIs and similarity metrics
surrogate.extract_metafeatures()
# Train the surrogate model
surrogate_model = surrogate.build_model()

# Optionally, save the surrogate model
joblib.dump(surrogate_model, 'optimization/tpot/models/random_forest_model.joblib')

2. Pipeline Synthesis

import poac
from sklearn.datasets import load_breast_cancer

# Example of using PoAC with TPOT
data = load_breast_cancer().data
optimizer = poac.Optimizer(data)

sv6light_meta_features = ['attr_ent.sd','sparsity.sd', 'cov.mean','var.mean','eigenvalues.mean','sparsity.mean', 'wg_dist.sd', 'iq_range.mean','sil','dbs']
code, pipeline, labels = optimizer.synthesize(generations=3,population_size=5,meta_features=sv6light_meta_features)

Results

In our experiments, integrating the PoAC surrogate model into TPOT achieved a mean Adjusted Rand Index (ARI) of 70% across 100 synthetic datasets. The model's flexibility and robustness make it suitable for a wide range of clustering tasks and AutoML applications.

Contributing

We welcome contributions to PoAC! Please fork the repository, create a new branch, and submit a pull request. For major changes, please open an issue to discuss your proposed changes.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
experiments		experiments
images		images
poac		poac
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Features

Installation

Usage

1. Surrogate Model

2. Pipeline Synthesis

Results

Contributing

License

About

Releases

Packages

Languages

Mcamilo/poac

Folders and files

Latest commit

History

Repository files navigation

Overview

Features

Installation

Usage

1. Surrogate Model

2. Pipeline Synthesis

Results

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages