Dimensionality-Reduction-in-Oncology

Overview

This project demonstrates the application of dimensionality reduction techniques on a cancer patients dataset to simplify data analysis and uncover hidden patterns. By reducing the number of features, we aim to facilitate better visualization and improve the performance of machine learning models.

Dataset

The dataset used in this project contains information on cancer patients, including demographic details, exposure to risk factors, and clinical symptoms. The dataset consists of the following features:

Patient Id
Age
Gender
Air Pollution
Alcohol use
Dust Allergy
Occupational Hazards
Genetic Risk
Chronic Lung Disease
Balanced Diet
Obesity
Smoking
Passive Smoker
Chest Pain
Coughing of Blood
Fatigue
Weight Loss
Shortness of Breath
Wheezing
Swallowing Difficulty
Clubbing of Finger Nails
Frequent Cold
Dry Cough
Snoring
Level

Project Workflow

Data Preprocessing

Loading the Dataset: The dataset is loaded from an Excel file stored in Google Drive.
Handling Missing Values: Rows with missing values (encoded as 0 or 'Nan') are removed to ensure data quality.
Label Encoding: Categorical variables are encoded using label encoding to convert them into numerical values.
Data Scaling: The data is standardized using the StandardScaler to normalize the distribution of features.

Dimensionality Reduction

Principal Component Analysis (PCA): PCA is applied to reduce the dimensionality of the dataset to 2 principal components, retaining the essential information while simplifying the data.

Results

Explained Variance: The principal components explain a significant portion of the variance in the data.
Visualization: The reduced data is visualized using a scatter plot, with different colors representing different levels of cancer severity.
Component Analysis: The contribution of each original feature to the principal components is analyzed and presented.

Correlation Analysis

A heatmap of the correlation matrix is generated to visualize the relationships between different features in the dataset.

Conclusion

This project demonstrates the effectiveness of PCA in reducing the dimensionality of a complex dataset, making it easier to visualize and analyze. The correlation heatmap provides additional insights into the relationships between different features, highlighting potential areas for further research and analysis.

Contact

If you have any questions or suggestions, please feel free to reach out to me at nvarjunmani07@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
PCA.ipynb		PCA.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dimensionality-Reduction-in-Oncology

Overview

Dataset

Project Workflow

Data Preprocessing

Dimensionality Reduction

Results

Correlation Analysis

Conclusion

Contact

About

Releases

Packages

Languages

Arjun-08/Dimensionality-Reduction-in-Oncology-

Folders and files

Latest commit

History

Repository files navigation

Dimensionality-Reduction-in-Oncology

Overview

Dataset

Project Workflow

Data Preprocessing

Dimensionality Reduction

Results

Correlation Analysis

Conclusion

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages