VideoToText is a powerful tool that allows you to download a video from YouTube or use a local video file, extract the audio, and transcribe the text using OpenAI's Whisper model. The project leverages Docker for easy setup and deployment.
- Download video from YouTube or use a local video file
- Extract audio from the video
- Transcribe audio to text using OpenAI's Whisper model
- Save transcriptions in SRT and JSON formats
- yt-dlp: A command-line program to download videos from YouTube and other video sites.
- ffmpeg: A complete, cross-platform solution to record, convert and stream audio and video.
- whisper: OpenAI's Whisper model for state-of-the-art speech recognition.
.
├── Dockerfile
├── docker-compose.yml
├── main.py
├── utils.py
├── README.md
├── /downloads # Folder where the processed results will be saved
└── /files # Folder where the videos to be processed should be placed
git clone https://github.com/vshloda/VideoToText.git
cd VideoToText
docker compose build
For processing YouTube videos:
docker compose run --rm app python main.py --url "https://www.youtube.com/watch?v=example"
For processing local video files:
docker compose run --rm app python main.py --file "files/video.mp4"
To change the Whisper model used for transcription, modify the Dockerfile
file. Update the line where the model is loaded:
# Define build-model arguments
ARG WHISPER_MODEL=small # Change 'small' to 'medium', 'large', etc.
Available models include:
tiny
base
small
medium
large
This project is licensed under the MIT License.