GitHub - lwpyh/CoS_codes: CoS: Chain-of-Shot Prompting for Long Video Understanding

CoS: Chain-of-Shot Prompting for Long Video Understanding

Jian Hu, Zixu Cheng, Chenyang Si, Wei Li, Shaogang Gong

✨ Highlights:

(i) We are the first to approach long video understanding by optimising input video information to fully utilise the model’s ability to comprehend long videos.

(ii) We propose a training-free mosaicing binary coding together with pseudo temporal grounding for long video understanding.

(iii) We apply our CoS into three various baseline to demonstrate its effectiveness and adaptability.

Installation

conda create -n CoS python=3.10 -y && conda activate cos
pip install torch==2.1.2 torchvision --index-url https://download.pytorch.org/whl/cu118
pip install packaging &&  pip install ninja && pip install flash-attn --no-build-isolation --no-cache-dir
pip install -r requirements.txt
cd LongVA/
python -m pip install -e "longva/.[train]"
pip install transformers==4.46.3
pip install -q bitsandbytes==0.42.0 accelerate==0.26.0
cd lmms-eval
pip install -e .

Long Video Benchmark Evaluation

For Video-MME, LongVideoBench, MLVU evaluation, please use lmms-eval After installing lmms-eval and CoS, you can use the following script to evaluate. note now our baseline is LongVA, you can extend our CoS to any baselines by modifying codes in lmms-eval folders.

accelerate launch --num_processes 8 --main_process_port 12345 -m lmms_eval \
    --model longva_cos \
    --model_args pretrained=lmms-lab/LongVA-7B,conv_template=qwen_1_5,model_name=llava_qwen,max_frames_num=128,video_decode_backend=decord\
    --tasks videomme \
    --batch_size 1 \
    --log_samples \
    --log_samples_suffix videoxl \
    --output_path ./logs/

Citation

If you find this repository useful, please consider giving a star ⭐ and citation

@article{hu2025cos,
  title={CoS: Chain-of-Shot Prompting for Long Video Understanding},
  author={Hu, Jian and Cheng, Zixu and Si, Chenyang and Li, Wei and Gong, Shaogang},
  journal={arXiv preprint arXiv:2502.06428},
  year={2025}
}

Acknowledgement

LongVA: the codebase we built upon.
LMMs-Eval: the codebase we built for CoS and evaluation.
Special thanks to Shu Yan for his generous and selfless help.

License

This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses. The content of this project itself is licensed under the Apache license 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
LongVA		LongVA
__pycache__		__pycache__
eval		eval
lmms-eval		lmms-eval
logs/lmms-lab__LongVA-7B		logs/lmms-lab__LongVA-7B
.DS_Store		.DS_Store
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
frame_CoS.png		frame_CoS.png
hallu_sample_longva.py		hallu_sample_longva.py
requirements.txt		requirements.txt
script.sh		script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CoS: Chain-of-Shot Prompting for Long Video Understanding

Installation

Long Video Benchmark Evaluation

Citation

Acknowledgement

License

About

Releases

Packages

Languages

License

lwpyh/CoS_codes

Folders and files

Latest commit

History

Repository files navigation

CoS: Chain-of-Shot Prompting for Long Video Understanding

Installation

Long Video Benchmark Evaluation

Citation

Acknowledgement

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages