Iris Dataset Analysis
This project demonstrates data analysis and visualization techniques using the famous Iris dataset. The Iris dataset contains measurements of four features (sepal length, sepal width, petal length, and petal width) for three species of Iris flowers: setosa, versicolor, and virginica.
Table of Contents Overview Installation Dataset Usage Analysis Visualization Machine Learning Contributing License Overview The Iris dataset is a classic dataset in machine learning and statistics, often used for testing classification algorithms. In this project, we'll perform the following steps:
Load and explore the dataset. Visualize the data using various plots. Apply machine learning algorithms to classify Iris species. Installation To run the analysis, you need to have Python installed along with the following libraries:
NumPy Pandas Scikit-Learn Seaborn Matplotlib You can install these libraries using pip:
bash Copy code pip install numpy pandas scikit-learn seaborn matplotlib Dataset The Iris dataset consists of 150 samples from three species of Iris (setosa, versicolor, and virginica). There are four features for each sample:
Sepal length in cm Sepal width in cm Petal length in cm Petal width in cm The dataset can be loaded using Scikit-Learn or downloaded directly from the UCI Machine Learning Repository.
Usage Clone the repository:
bash Copy code git clone https://github.com/NirmalSagarTigga/Iris-Data-Set-Analysis-Visualization-and-Prediction.git cd iris-dataset-analysis Run the Jupyter Notebook:
Launch Jupyter Notebook by running the command below and open iris_analysis.ipynb.
bash Copy code jupyter notebook Explore and visualize the data:
Follow the steps in the Jupyter Notebook to explore and visualize the Iris dataset.
Analysis
- Data Loading We use Pandas to load the dataset:
python Copy code import pandas as pd from sklearn.datasets import load_iris
iris = load_iris() df = pd.DataFrame(data=iris.data, columns=iris.feature_names) df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names) 2. Data Exploration Explore the dataset using Pandas:
python Copy code print(df.head()) print(df.describe()) print(df['species'].value_counts()) Visualization Visualize the Iris dataset using Seaborn and Matplotlib:
- Pairplot python Copy code import seaborn as sns
sns.pairplot(df, hue='species', markers=["o", "s", "D"]) 2. Boxplot python Copy code sns.boxplot(x='species', y='sepal length (cm)', data=df) 3. Heatmap python Copy code import matplotlib.pyplot as plt import numpy as np
corr_matrix = df.iloc[:, :-1].corr() sns.heatmap(corr_matrix, annot=True, cmap='coolwarm') plt.show() Machine Learning Apply machine learning models to classify the Iris species:
- Train-Test Split python Copy code from sklearn.model_selection import train_test_split
X = df.iloc[:, :-1] y = df['species'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) 2. Model Training and Evaluation Use a Decision Tree Classifier:
python Copy code from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score, classification_report
model = DecisionTreeClassifier() model.fit(X_train, y_train) y_pred = model.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred):.2f}') print(classification_report(y_test, y_pred)) Contributing Feel free to submit issues, fork the repository, and send pull requests. For major changes, please open an issue first to discuss what you would like to change.
License This project is licensed under the MIT License. See the LICENSE file for details.
This README provides a comprehensive guide to getting started with the Iris dataset using popular Python libraries for data analysis, visualization, and machine learning.