Skip to content

Latest commit

 

History

History
62 lines (50 loc) · 3.11 KB

File metadata and controls

62 lines (50 loc) · 3.11 KB

MiniGPT-4 for video

Currently, this is a simple extension of MiniGPT-4 without extra training. We try to undermine its ability for video understanding with simple prompt design.

🔥 Updates

  • 2023/04/19: Simple extension release We simple encode 4 frames and use the time-sensitive prompt as follows:
        "First, <Img><ImageHere></Img>. Then, <Img><ImageHere></Img>. "
        "After that, <Img><ImageHere></Img>. Finally, <Img><ImageHere></Img>. "
    However, without video-text instruction finetuning, it's difficult to answer those questions about the time.

💬 Example

images

🏃 Usage

Please follow the instrction in MiniGPT-4 to prepare the environment.

  • Prepare the envirment.
        conda env create -f environment.yml
        conda activate minigpt4
  • Download BLIP2 model:
    • ViT: wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/eva_vit_g.pth
    • QFormer: wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/blip2_pretrained_flant5xxl.pth
    • Change the vit_model_path and q_former_model_path in minigpt4.yaml.
  • Download Vicuna model:
    • LLAMA: Download it from the original repo or hugging face.
    • If you download LLAMA from the original repo, please process it via the following command:
        # convert_llama_weights_to_hf is copied from transformers
        python src/transformers/models/llama/convert_llama_weights_to_hf.py \
            --input_dir /path/to/downloaded/llama/weights \
            --model_size 7B --output_dir /output/path
        # fastchat v0.1.10
        python3 -m fastchat.model.apply_delta \
        --base /path/to/llama-13b \
        --target /output/path/to/vicuna-13b \
        --delta lmsys/vicuna-13b-delta-v0
  • Download MiniGPT-4 model:
  • Running demo:
        python demo_video.py --cfg-path eval_configs/minigpt4_eval.yaml

Acknowledgement

This project is mainly based on MiniGPT-4, which is support by Lavis, Vicuna and BLIP2. Thanks for these amazing projects!