A simple web application that converts YouTube videos and audio files into synthesized speech using AI models.
- Process YouTube videos by extracting and converting captions to speech
- Convert uploaded audio files through transcription and voice synthesis
- User-friendly web interface built with Gradio
- Multiple voice options for synthesis
- Automatic caption extraction from YouTube videos
project/
├── src/
│ ├── __init__.py # Makes src a package
│ ├── utils/
│ │ ├── __init__.py # Makes utils a package
│ │ ├── audio.py # Audio processing functions
│ │ └── youtube.py # YouTube caption extraction
│ ├── app.py # Gradio interface
│ └── config.py # Configuration settings
├── requirements.txt # Dependencies
├── .env.example # Example environment variables
└── README.md # Documentation
- Clone the repository:
git clone <repository-url>
cd <repository-name>
- Install system dependencies (Linux):
apt-get install espeak-ng
- Set up environment variables:
cp .env.example .env
# Edit .env and add your GROQ_API_KEY
- Install Python dependencies:
pip install -r requirements.txt
- Start the application:
python -m src.app
-
Open your web browser and navigate to the provided URL (usually http://127.0.0.1:7860)
-
Use the app by either:
- Entering a YouTube URL
- Uploading an audio file
-
Click "Process" and wait for the generated audio
- Gradio: Web interface framework
- Kokoro: Text-to-speech synthesis
- Groq: Audio transcription using Whisper model
- PyTubeFix: YouTube video processing
- soundfile: Audio file handling
- pydub: Audio processing
- Python 3.8+
- Groq API key
- espeak-ng (for Linux systems)
- Internet connection for YouTube processing
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.