SceneSifter is a prototype for interactive video information extraction and retrieval, designed to help content creators organize and search video clips. Leveraging speech-to-text, image-to-text, text-to-vector, and vector similarity search technologies, it processes user-supplied videos and returns clickable relevant timestamps from those videos given a fuzzy search request.
├── backend
│ ├── model # Vosk model
│ ├── src
│ │ ├── application.properties.ini # Properties
│ │ ├── chatgpt.mjs # ChatGPT service
│ │ ├── index.mjs
│ │ ├── itt.mjs # Image-to-text conversion
│ │ ├── postgres_service.mjs # Postgres service
│ │ ├── process_video.mjs # Audio extraction, frame extraction, etc.
│ │ ├── stt.mjs # Speech-to-text conversion
│ ├── uploaded_videos # User-uploaded videos
│ ├── package.json
├── example_videos # Videos used for testing
├── frontend # React App
├── images # images for README.md
├── sql
│ ├── video_listing.sql # Database creation
│ ├── video_embedding.csv # Some dummy data
└── README.md
FFmpeg is required for audio & frame extraction. Download and install it from https://ffmpeg.org/download.html
We used vosk-model-en-us-0.22-lgraph
. To use a different Vosk model, download one from https://alphacephei.com/vosk/models and unpack it in backend/model/ to overwrite the current one.
-
Launch a Postgres instance using the docker image with pgvector:
mkdir ~/postgres-volume/ docker run --name postgres \ -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password \ -p 5432:5432 \ -v ~/postgres-volume/:/var/lib/postgresql/data -d ankane/pgvector:latest
-
Copy the Video listing schema to the container:
docker cp sql/video_listing.sql postgres:/home
-
Create the video listing table from the schema
docker exec -it postgres psql -U postgres -c '\i /home/video_listing.sql'
Out of safety concerns, we didn't put our OpenAI key in the config file. To enable the ChatGPT service, you need to manually add your own OpenAI key.
In backend/src/application.properties.ini
, set
CHATGPT_API_KEY=<your own key>
Same for Hugging Face. To enable image-to-text conversion via the inference API, in backend/src/application.properties.ini
, set
HUGGINGFACE_TOKEN=<your own key>
To run the backend code:
cd backend
npm i
npm start
To run the frontend code:
cd frontend
npm i
npm start
File | Length | Description |
---|---|---|
2010_elevator_Pitch_Winner.mp4 | 1m15s | An elevator pitch |
children_new_billionaires.mp4 | 2m27s | A short documentary featuring some self-introductions |
Garr_Reynolds_Introduction.mp4 | 1m03s | A self introduction |
Matthew_introduction.mp4 | 1m07s | A self introduction |
ted_talk.mp4 | 1m20s | A Ted Talk clip |
ted_youth_unemployment.mp4 | 1m12s | A Ted Talk clip |
tokyo1.mp4 | 2m01s | The best of Tokyo2020 compilation |
tokyo2.mp4 | 2m01s | The best of Tokyo2020 compilation |
tokyo3.mp4 | 2m01s | The best of Tokyo2020 compilation |
- Header background image: Photo by HD Wallpapers on StockSnap
- TV icon