This project implements an automated machine learning pipeline for classifying the Iris dataset using a Decision Tree classifier. The pipeline includes dimensionality reduction using Principal Component Analysis (PCA), standard scaling of features, and training the classifier. The project serves as a demonstration of how to create an end-to-end machine learning workflow using scikit-learn pipelines.
- Python 3.x
- scikit-learn
- numpy
You can install the required packages using pip:
pip install scikit-learn numpy
- Clone the repository:
git clone https://github.com/abhipatel35/Automated-Machine-Learning-Pipeline-for-Iris-Dataset-Classification.git
- Navigate to the project directory:
cd automated-ml-pipeline-iris
- Run the script:
python main.py
- Data Loading: The Iris dataset is loaded using scikit-learn's datasets module.
- Data Splitting: The dataset is split into training and testing sets.
- Pipeline Creation: A scikit-learn pipeline is created, which includes:
- Dimensionality reduction using PCA.
- Standard scaling of features.
- Training a Decision Tree classifier.
- Model Training: The pipeline is fitted to the training data.
- Model Evaluation: The accuracy score of the model on the test set is computed.