Clustering Pipeline Project

This project provides a flexible command-line tool for running a variety of clustering algorithms on tabular data. It supports preprocessing, optional cleaning, and evaluation with clear, user-friendly feedback.

Features

Multiple clustering algorithms (KMeans, DBSCAN, HDBSCAN, Agglomerative, Spectral, GMM, Autoencoder, etc.)
Optional data cleaning (by default, cleaning is skipped; enable with --clean)
Easy parameterization via the command line
Outputs cluster assignments to a CSV file
Helpful error messages and documentation references

Quick Start

Install dependencies
```
pip install -r requirements.txt
```
Run clustering
```
python main.py <csv_file> [algorithm] [param1=value1 ...] [--clean]
```
- <csv_file>: Path to your data CSV (required)
- [algorithm]: Clustering algorithm key (optional, defaults to agglomerative_average)
- [param1=value1 ...]: Optional algorithm parameters
- [--clean]: Enable cleaning of unrealistic values (default is skipped)
Examples
- KMeans: python main.py data.csv kmeans n_clusters=4
- DBSCAN: python main.py data.csv dbscan eps=0.5 min_samples=5
- Default (hierarchical average, no cleaning): python main.py data.csv
- With cleaning: python main.py data.csv --clean
See GUIDE.md for full documentation, algorithm list, and troubleshooting.

Project Structure

main.py — Main entry point for clustering and evaluation
preprocessing.py — Data cleaning and preprocessing utilities
clustering_algorithms/ — Contains implementations for each clustering method
requirements.txt — Python dependencies
.gitignore — Ignores CSV files, cache, and system files
GUIDE.md — Complete usage guide and examples

Tips

All output cluster assignments are saved as <algorithm>_clusters.csv.
If you encounter errors, check the error message for a reference to GUIDE.md.

Happy clustering!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Clustering Pipeline Project

Features

Quick Start

Project Structure

Tips

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
clustering_algorithms		clustering_algorithms
data		data
resources		resources
.gitignore		.gitignore
GUIDE.md		GUIDE.md
README.md		README.md
main.py		main.py
preprocessing.py		preprocessing.py
requirements.txt		requirements.txt
utils.py		utils.py

pamudu123/clustering-lab

Folders and files

Latest commit

History

Repository files navigation

Clustering Pipeline Project

Features

Quick Start

Project Structure

Tips

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages