A Python script that automates video subtitle creation & translation, supporting both local files and online video URLs.
- Download videos from various platforms easily by specifying the URL (YouTube, Twitter, Facebook, Instagram, etc.)
- Transcribe audio using ChatGPT's Whisper model (default) or local Whisper model
- Translate the generated transcriptions to a target language using OpenAI's GPT models
- Add translated subtitles to videos
- Support for multiple languages and resolutions (including YouTube Shorts)
- Support for long videos, these are split into chunks automatically before transcribing and translating
Here are some examples of the script in action:
- English to Korean Translation
- English to Spanish Translation
- German to English Translation
- Spanish to English Translation
Note: Results may vary with accents or background music, especially when using the local model. Video and caption synchronization might be affected when the audio isn't clear. The above clips were processed using OpenAI for both transcription & translation (default script behavior).
Run the script using the following command:
python translate.py video_input target_language [options]
video_input
: URL or path to the input video filetarget_language
: Target language for translation (e.g., Spanish, English, French)
--output_dir
: Directory to save output files (default: "output")--models_path
: Path to store Whisper models (default: "Models")--openai_api_key
: OpenAI API key (if not set as an environment variable)--font
: Font to use for subtitles (default: "NanumGothic")--use_local_whisper
: Use local Whisper model for transcription instead of ChatGPT's Whisper
-
Translate YouTube video subtitles to Spanish (using default ChatGPT Whisper):
python translate.py https://www.youtube.com/watch?v=VIDEO_ID Spanish
-
Translate local video file subtitles to French (using default ChatGPT Whisper):
python translate.py /path/to/your/video.mp4 French
-
Use a specific output directory and font:
python translate.py input_video.mp4 German --output_dir my_output --font Arial
-
Use a local model for transcription:
python translate.py input_video.mp4 Korean --use_local_whisper
- Download FFmpeg from ffmpeg.org
- Extract the ZIP file
- Add the
bin
folder path to system PATH - Verify installation:
ffmpeg -version
brew install ffmpeg
sudo apt update
sudo apt install ffmpeg
Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windows
Install required packages:
pip install openai==1.12.0
pip install faster-whisper==0.10.0
pip install yt-dlp==2024.3.10
pip install ffmpeg-python==0.2.0
pip install pydub==0.25.1
- Create account at platform.openai.com
- Generate API key in account settings
- Set environment variable:
# Linux/macOS export OPENAI_API_KEY='your-key-here' # Windows (PowerShell) $env:OPENAI_API_KEY='your-key-here'
- Clone repository:
git clone https://github.com/tikene/video-caption-and-translate.git
cd video-caption-and-translate
- Verify installation:
python translate.py --help
- Ensure FFmpeg is in system PATH
- Restart terminal/IDE after PATH changes
- Check with
ffmpeg -version
- Verify API key is set correctly
- Check account has sufficient credits
- Ensure stable internet connection
The script generates the following files in the output directory:
- Downloaded video (if URL was provided)
- Translated SRT subtitle file
- Video with embedded translated subtitles
- The script uses the GPT-4o-mini model for translation by default, which costs around $0.03 cents for a two-minute video. To increase translation quality, you may use gpt-4, but beware that costs will go up substantially
- Longer video -> Higher costs (duh)
This project is licensed under the MIT License - see the LICENSE file for details.