Skip to content

sd816224/youtube-crypto-index

Repository files navigation

  • It stands for the how many crypto ralated video published on youtube within the unit period.
  • It can be used for gauging sentiment in the retail crypto market. index wont reflect the deleted videos before the time point of database initial loading.
  • dashboard currently display the latest videos and real time index with bitcoin price. dashboard link.
  • the data is sourced by youtube data api v3. the new video notificaion is bridged from google PubSubHubbub.
  • current database is on aws RDS, includes 500+ watching channels. 337k+ videos details. data potentially can be used for NLP/ML for a better sentiment result in future.
  • the project can work for different topics on youtube as the watching channels is based on searching keywords.

structure:

Alt text

database design:

Alt text

consists of following parts:

  • 1: db_manager: fetch the already-existing video data from API and store to database accordingly.
  • 2: sub_manager : managment of channels subscription.
    • subscription request
    • subscription health check
  • 3: dashboard: integrated server including functions:
    • webhook server receiving and responding feed of the new video notification
    • dash server for data virsulization.
  • 4: deployment: build CICD pipe for above 1-3 parts to AWS cloud.

guidance for colabrators:

1. db_manager

  • clone the repo

  • create enviroment params file: src/.env. config it as src/.env.example:

    • create your own google api key for google_api_key (enable youtube bigdata v3 and create key)
    • config when you have your postgres database ready, otherwise comment out: RDS_DB_NAME. RDS_USERNAME. RDS_PASSWORD. RDS_HOSTNAME. RDS_PORT
  • setup docker for dev&testing:

    • make sure running docker application. for dev stage with local-dev-d - spin up the container by CLI docker-compose -f ./src/docker-compose-dev.yaml up -d
    • make sure dev db use port 5432, testing db use port 5433.
    • if docker command not found. try to refresh in desktop application.
    • for checking the background container: docker ps
    • for stopping the container at the end:$ docker-compose -f ./src/docker-compose-dev.yaml down
  • config in db_manager.py:

    • reset_db_only: only turn it on when needing to reset database and run src/db_manager.py
    • work_on_remote_db: #only turen is on after config the real postgres database and work on it. otherwise its defaulty set to work with local-dev-db container.
    • db_init: only turn it on for the 1st run. it will rebuild tables and fetching all data from youtube api.
    • channel_pages_to_search: amount of channels to fetch when searching. No=page*maxResult(its set as 5 defaulty now)
    • q: keyword to search for channels
    • maxResults_channels: max result for each search page of channels (1-50)
    • maxResults_videos: max result for each search page of videos (1-50)
  • run following, it setup database and populate channals and videos data into relavent tables. Caution: it will take your api quota and time depends on your configuration.

    • python src/db_manager.py

2. sub_manager

  • dev config:
    • callback_url: get it from ngrok terminal+/feed
    • work_on_remote_db: #only turen is on after config the real postgres database and work on it. otherwise its defaulty set to work with local-dev-db container.
  • it designed to run hourly to check the expiring channels.
    • for dev&testing to run locally hourly: uncomment line 243-248
    • it run once for prod. got to use event trigger on AWS when deploy
  • run by :python src/sub_manager.py

3. dashboard

  • config webhook for local dev
    • setup ngrok with credential,run it for port 8050.
    • config in webhook_server.py:
      • work_on_remote_db: #only turen is on after config the real postgres database and work on it. otherwise its defaulty set to work with local-dev-db container.
    • run server by :python src/dashboard.py

4. deployment

after check cd.yml run fine. check docker ps running fine at background.

reason of the design

  • there are a few ways to approach the video info fetching:
    • youtube api:

    • push notification from google pubsubhubbub (https://developers.google.com/youtube/v3/guides/push_notifications)

      • it has no quota limit.
      • it pushes the notification for 3 actions:
        • publish new videos
        • admend of title
        • admend of description
      • the problem is the we can not know which action is about from the notification.
      • as long as we have database from db_manager. we can eaily know if its the action we are monitoring.
      • it's really a pain here as very lacking documentation.
      • subscription defaulty expiry in 5 days. it can be renewed anytime to extend the expiry date.
    • web scrape

      • can not be borthered. lack of knowledge. but its open mind for other solution.

backlog:

  • blFlask==2.1.3
  • Werkzeug==2.2.2
  • can not pass the security check . comment out for now.
  • if upgrade their verision docker container wont run for : TypeError: LocalProxy.init() got an unexpected keyword argument 'unbound_message'

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published