Skip to content

Scraper and Visualiser for YouTube Livestream Chat

License

Notifications You must be signed in to change notification settings

dannylty/holoscrape

Repository files navigation

Automated Extensible YouTube Livestream Chat Scraper and Visualiser.

Tests Project Status: Active – The project has reached a stable, usable state and is being actively developed. Python 3.9

Main Features

  • Automatically detect existing livestreams. Polling will be done periodically to the specified indexers. The indexers are in charge of generating live YouTube video-ids to be scraped.
  • Dispatch to tmux panes in real time. Display current streams in an interactive pane which will grow or shrink in size as live streams come and go.
  • Customisable processors. Write to a database, write to files, or create your own processor.
  • Customisable indexers. Don't follow Hololive? Write your own indexers instead to produce video-ids of your favourite streamers.

How To

Installing Requirements

sudo apt install tmux
pip3 install -r requirements.txt

Configuring

The in-built Holodex indexers require API keys to be supplied, and they are given by the env var HOLODEX_API_KEY.

Config reading defaults to config.json.

If write_to_db or write_to_local is false, their respective subconfigs can be omitted.

{
    "write_to_db": true, <-- Mandatory
    "db_host": <host>,
    "db_port": <port>,
    "db_user": "username",
    "db_password": "password",
    "db_database": "password",
    "db_table": "example_tab",
    "db_stream_table": "stream_tab",
    "db_nshards": 30,

    "write_to_local": true, <-- Mandatory
    "local_path": "/path/to/data/",

    "log_path": "/path/to/logs/", <-- Mandatory
}

An example database schema is given in init.sql that works with the example configs above. If you don't know your way around it, just turn write_to_db off.

Running (in tmux)

python3 main.py
'ctrl-b s' to change to the scraping window