CNN for NMR Spectra Classification Project

Project Description

This project implements a Convolutional Neural Network (CNN) for the classification of ¹H-NMR spectra specifically focused on metabolic spectra. The CNN is designed to analyze and categorize spectral images, providing an efficient and accurate method for spectra classification. The primary motivation behind this project is to create an automatic classification of spectra in order to reduce human error and subjective interpretation in the analysis of metabolite images.

Main Features

Data Loading: Efficient loading of spectral image data.
Preprocessing: Image preprocessing techniques to enhance features and normalize data.
CNN Model: Implementation of a custom CNN architecture for spectra classification.
Training Pipeline: A robust training process with validation and performance metrics.
Prediction: Ability to classify new spectral data using the trained model.
Software (in developement)

Technologies and Frameworks

Python: Primary programming language -> version==3.12.4
TensorFlow/Keras: For building and training the CNN model
NumPy: For numerical computations and array operations
Matplotlib: For data visualization
YAML: Used for configuration management
Tkinter: software developement

Why YAML? Using YAML files saves time and simplifies code modifications by allowing you to change configuration settings without altering or rerunning your entire codebase

Workflow Overview (related to create_your_model)

Step 1: Set Up Data Folder

Create a root folder to store all your data. Example: your\path\here\data
For each metabolite, create a folder named after the metabolite with two subfolders inside:

valid: To store valid data files.
invalid: To store invalid data files.

Example Structure:

your\path\here\data\Glutamine\valid
your\path\here\data\Glutamine\invalid

Step 2: Set Up Models Folder

Create a root folder to store all your models.
Example: your\path\here\models
For each metabolite, create a folder named after the metabolite to store model files.

Example Structure: your\path\here\models\Glutamine

Important Note About Notebook & YAML file Placement

The create_your_model.ipynb & config_[metabolite_name].yml must be placed in the specific metabolite folder within the models directory (e.g., your\path\here\models\Glutamine\create_your_model.ipynb). This placement is crucial as the get_metabolite_name() function relies on the folder structure to automatically detect which metabolite you're working with. Each metabolite folder requires a YAML configuration file that is dynamically named to match the metabolite directory.

Example Correct Placement: Test_the_project/models/ └── Glutamine/ └── create_your_model.ipynb └── config_[metabolite_name].yml

This ensures that the notebook can automatically identify the metabolite name from its location in the folder structure, making the workflow more automated and less prone to errors.

Version Selection:

In the notebook, you can select which version of the configuration to use by setting the version number:

The load_config() function will:

Automatically look for the YAML file in the same directory
Load the specified version's configuration
Raise an error if the specified version doesn't exist in the config file

Make sure your YAML file exists and contains the version number you specify in the notebook, otherwise you'll receive a Version not found error.

Dataset

Due to confidentiality constraints, I create a custom-generated dataset of ECG spectral images that mimic the structural characteristics of the original NMR spectra.

Dataset Characteristics

Image Type: ECG spectral images Purpose: Simulate NMR spectra classification workflow

Dataset Selection

The original Heartbeat Dataset contains approximately 100,000 ECG recordings. For this project, I randomly selected:

1,000 Normal ECG images
1,000 Abnormal ECG images

This balanced subset provides a representative sample for developing and testing the classification model while maintaining computational efficiency.

The dataset images were created using the following source:

Original Dataset: Heartbeat Dataset
Data Conversion Script: dataset_csv_to_png.py

View Dataset Folder

Testing the Project

Setup and Installation

Clone this repository to your local machine:

git clone https://github.com/Martinfacot/CNN_Spectra_Classification.git

Navigate to the project directory:

cd CNN_Spectra_Classification

Create and activate a virtual environment (optional but recommended):

python3.12 -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required Python packages:

pip3 install -r requirements.txt

Unzip the image folders and ensure the following directory structure:

your/path/here/Test_the_project/data/ECG/normal/
your/path/here/Test_the_project/data/ECG/abnormal/

Open and run the Jupyter Notebook located at:

your/path/here/Test_the_project/models/ECG/create_your_model_ECG.ipynb

Troubleshooting

Ensure all dependencies are correctly installed
Verify the exact Python version (3.12)
Check that image folders are correctly unzipped and placed in the specified directories

Results

If you want to explore the model performance without retraining, you can examine the detailed results in the following Excel files:

Alanine Model Results

Example of a simple metabolite classification
View Alanine Results

ECG Model Results

Results from the test project using ECG data
View ECG Results

These files contain comprehensive metrics and performance evaluations for their respective models.

How the Software Works (in developement)

The software implements a comprehensive workflow for metabolite spectra classification:

Data Preparation

Users select a patient folder containing metabolite spectra images
The application automatically scans specific subdirectories for metabolite images
Supports multiple metabolites including Alanine, 3-HB, Acetone, Glutamine, and others

Image Processing and Classification

Image Preprocessing
- Crops and resizes images to standardized dimensions
- Normalizes image data for consistent analysis
- Prepares images for neural network input
Machine Learning Classification
- Utilizes pre-trained Convolutional Neural Network (CNN) models for each metabolite
- Generates probability scores for image classification
- Provides confidence levels: High, Medium, and Low confidence classifications

User Interface Features

Interactive image display with probability visualization
Manual classification validation
Results export to Excel for further analysis
Overview page for comprehensive results review

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
Old_YAML_&_Model		Old_YAML_&_Model
Test_the_project		Test_the_project
images		images
README.md		README.md
config_[metabolite_name].yml		config_[metabolite_name].yml
create_your_model.ipynb		create_your_model.ipynb
dataset_csv_to_png.py		dataset_csv_to_png.py
models_results_Alanine.xlsx		models_results_Alanine.xlsx
requirements.txt		requirements.txt
spectra_analysis_app.py		spectra_analysis_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CNN for NMR Spectra Classification Project

Table of Contents

Project Description

Main Features

Technologies and Frameworks

Workflow Overview (related to create_your_model)

Step 1: Set Up Data Folder

Step 2: Set Up Models Folder

Important Note About Notebook & YAML file Placement

Version Selection:

Dataset

Dataset Characteristics

Dataset Selection

Testing the Project

Setup and Installation

Troubleshooting

Results

How the Software Works (in developement)

Data Preparation

Image Processing and Classification

User Interface Features

About

Releases

Packages

Languages

Martinfacot/CNN_Spectra_Classification

Folders and files

Latest commit

History

Repository files navigation

CNN for NMR Spectra Classification Project

Table of Contents

Project Description

Main Features

Technologies and Frameworks

Workflow Overview (related to create_your_model)

Step 1: Set Up Data Folder

Step 2: Set Up Models Folder

Important Note About Notebook & YAML file Placement

Version Selection:

Dataset

Dataset Characteristics

Dataset Selection

Testing the Project

Setup and Installation

Troubleshooting

Results

How the Software Works (in developement)

Data Preparation

Image Processing and Classification

User Interface Features

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages