🎧 Representation Chizzler™

A powerful two-stage audio processing tool that combines Voice Activity Detection (VAD) and Speech Enhancement to clean and denoise audio files.

🌟 Features

Two-Stage Processing Pipeline:
- Stage 1: Uses Silero VAD to detect and extract speech segments
- Stage 2: Applies MP-SENet deep learning model to remove noise
Memory-Efficient Processing:
- Processes audio in chunks to prevent memory issues
- Automatically converts audio to the required format (16kHz mono WAV)
User-Friendly Interface:
- Beautiful Gradio web interface
- Real-time progress reporting
- Compare original, VAD-processed, and denoised versions

🚀 Installation

Clone this repository:

git clone https://github.com/Reza2kn/RepresentationChizzler.git
cd RepresentationChizzler

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows, use: .venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Set up environment variables:
- Create a .env file in the project root
- Add your Hugging Face token:
```
HF_TOKEN=your_huggingface_token_here
```
Download MP-SENet:
- Clone the MP-SENet repository:
```
git clone https://github.com/yxlu-0102/MP-SENet.git
```
- Download the model checkpoint and config files:
  - Place g_best_dns in MP-SENet/best_ckpt/
  - Place config.json in MP-SENet/best_ckpt/

🎮 Usage

Run the app:
```
python run.py
```
Open your web browser and navigate to the provided URL
Upload an audio file and adjust the parameters:
- VAD Threshold: Controls voice detection sensitivity (0.1-0.9)
- Max Silence Gap: Controls merging of close speech segments (1-10s)
Compare the results:
- Original Audio
- VAD Processed (Speech Only)
- Final Denoised

🛠️ Parameters

VAD Threshold (0.1-0.9):
- Higher values = stricter voice detection
- Lower values = more lenient detection
- Default: 0.5
Max Silence Gap (1-10s):
- Maximum silence duration to consider segments as continuous
- Higher values = fewer segments but may include more silence
- Default: 4.0s

🙏 Credits

This project combines two powerful models:

Silero VAD for Voice Activity Detection
MP-SENet for Speech Enhancement

📝 License

This project is licensed under the terms specified in the MP-SENet repository.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎧 Representation Chizzler™

🌟 Features

🚀 Installation

🎮 Usage

🛠️ Parameters

🙏 Credits

📝 License

About

Releases

Packages

Languages

Reza2kn/RepresentationChizzler

Folders and files

Latest commit

History

Repository files navigation

🎧 Representation Chizzler™

🌟 Features

🚀 Installation

🎮 Usage

🛠️ Parameters

🙏 Credits

📝 License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages