⬇️ Get the dataset from:
Please also check the SHA256 sum of the files to ensure the data intergrity:
# shasum -a 256 ReVOS/*
1844db76c3075deb95034935a8208ba459272784242ec39de9287c66a4c22f07 ReVOS/JPEGImages.zip
e9b4310661495ddd47dedcac1e289621cf8f5dff89c0bb8b65f7699f748d5cff ReVOS/mask_dict.json
05d9be5d7bbed03b216a457274022ad1c9e683b4ffdc32b19c60c5e797fd8f01 ReVOS/mask_dict_foreground.json
3917dfba8725d080c8658dc43ad4a84cf41ce51438acfaee70bef5aa0310417a ReVOS/meta_expressions_train_.json
e171028757d8ec75bea33a661c38c288ecdafdb6c7bc64bb9e14edb97e2e5ade ReVOS/meta_expressions_train_sub_.json
127b975e78effef6033846e68989dcf79c692d644dd07c835b723cea700d4c2d ReVOS/meta_expressions_valid_.json
ReVOS
├── JPEGImages # JPEGImages contains all images of the train set and valid set
│ ├── <video1 >
│ ├── <video2 >
│ └── <video...>
├── mask_dict.json # mask dict (train set & valid set)
├── mask_dict_foreground.json # foreground mask dict (train set & valid set)
├── meta_expressions_train_.json # meta expressions (train set)
└── meta_expressions_valid_.json # meta expressions (valid set)
ReVOS largely borrows videos from MOSE, MeViS, LV-VIS, OVIS, TAO and UVO. The copyright for these videos is held by the organizers of MOSE, MeViS, LV-VIS, OVIS, TAO and UVO.
Please consider to cite VISA if it helps your research.
@article{yan2024visa,
title={VISA: Reasoning Video Object Segmentation via Large Language Models},
author={Yan, Cilin and Wang, Haochen and Yan, Shilin and Jiang, Xiaolong and Hu, Yao and Kang, Guoliang and Xie, Weidi and Gavves, Efstratios},
journal={arXiv preprint arXiv:2407.11325},
year={2024}
}
ReVOS is licensed under a CC BY-NC-SA 4.0 License. The data of ReVOS is released for non-commercial research purpose only.