Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
-
Updated
Oct 27, 2023 - Jupyter Notebook
Summary about Video-to-Text datasets. This repository is part of the review paper *Bridging Vision and Language from the Video-to-Text Perspective: A Comprehensive Review*
A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.
Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*
Audio Visual Scene-Aware Dialog (AVSD) Challenge at the 10th Dialog System Technology Challenge (DSTC)
Source code of the paper titled *Attentive Visual Semantic Specialized Network for Video Captioning*
Video content description technique for generating descriptions for unconstrained videos.
A Video.js 7 middleware that uses browser speech synthesis to speak descriptions contained in a description text track
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian (Bahasa Indonesia).
A simple attention deep learning model to answer questions about a given video with the most relevant video intervals as answers.
FrVD: French Video Description dataset
This project processes videos by extracting frames, generating detailed visual descriptions for each frame using the BLIP model, and then summarizing these descriptions with the BART model.
Tool employed to visualize synchronized FrVD metadata and videos simultaneously.
Add a description, image, and links to the video-description topic page so that developers can more easily learn about it.
To associate your repository with the video-description topic, visit your repo's landing page and select "manage topics."