Skip to content

Python script to take a youtube channel and ingest transcripts. Utilizes OpenAI's Whisper to generate transcripts if they are not readily available. Enjoy!

Notifications You must be signed in to change notification settings

AnthonyMadia/yt_transcripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YouTube Channel Transcriber

A Python script that downloads and transcribes videos from YouTube channels using OpenAI's Whisper. The script first attempts to use YouTube's built-in transcription API, and falls back to Whisper if no transcript is available.

Features

  • Fetches all playlists from a YouTube channel matching specified keywords
  • Downloads and processes videos in batches
  • Uses YouTube's transcript API when available
  • Falls back to OpenAI's Whisper for videos without transcripts
  • Stores transcripts in SQLite database
  • Shows progress with tqdm progress bars

Prerequisites

  • Python 3.7+
  • ffmpeg (required for Whisper)
  • Chrome/Chromium browser (for Selenium)
  • YouTube Data API key

Installation

  1. Clone this repository
  2. Install required packages:
pip install -r requirements.txt
  1. Install ffmpeg (if not already installed):

    • Ubuntu: sudo apt install ffmpeg
    • macOS: brew install ffmpeg
    • Windows: Download from ffmpeg website
  2. Copy config.example.json to config.json and update with your settings:

    • Get a YouTube API key from Google Cloud Console
    • Set your target channel URL
    • Define keywords to match playlists

Usage

  1. Configure your settings in config.json
  2. Run the script:
python main.py

Configuration

Edit config.json with your settings:

  • youtube_api_key: Your YouTube Data API key
  • channel_url: URL of the YouTube channel to process
  • playlist_keywords: List of keywords to match playlists
  • whisper_model: Whisper model to use (tiny, base, small, medium, large)

License

MIT License

About

Python script to take a youtube channel and ingest transcripts. Utilizes OpenAI's Whisper to generate transcripts if they are not readily available. Enjoy!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages