A Python script that downloads and transcribes videos from YouTube channels using OpenAI's Whisper. The script first attempts to use YouTube's built-in transcription API, and falls back to Whisper if no transcript is available.
- Fetches all playlists from a YouTube channel matching specified keywords
- Downloads and processes videos in batches
- Uses YouTube's transcript API when available
- Falls back to OpenAI's Whisper for videos without transcripts
- Stores transcripts in SQLite database
- Shows progress with tqdm progress bars
- Python 3.7+
- ffmpeg (required for Whisper)
- Chrome/Chromium browser (for Selenium)
- YouTube Data API key
- Clone this repository
- Install required packages:
pip install -r requirements.txt
-
Install ffmpeg (if not already installed):
- Ubuntu:
sudo apt install ffmpeg
- macOS:
brew install ffmpeg
- Windows: Download from ffmpeg website
- Ubuntu:
-
Copy
config.example.json
toconfig.json
and update with your settings:- Get a YouTube API key from Google Cloud Console
- Set your target channel URL
- Define keywords to match playlists
- Configure your settings in
config.json
- Run the script:
python main.py
Edit config.json
with your settings:
youtube_api_key
: Your YouTube Data API keychannel_url
: URL of the YouTube channel to processplaylist_keywords
: List of keywords to match playlistswhisper_model
: Whisper model to use (tiny, base, small, medium, large)
MIT License