Our code includes a three-stage training process. You can switch between different distillation stages by changing the command-line parameters.
--distill_stage1_config
: Configuration file for Stage 1.--distill_stage2_config
: Configuration file for Stage 2.--distill_stage3_config
: Configuration file for Stage 3.
--train_stage1
: Whether to conduct training for Stage 1 (default:false
).--train_stage2
: Whether to conduct training for Stage 2 (default:false
).--train_stage3
: Whether to conduct training for Stage 3 (default:false
).
--load_distill_checkpoint
: Path to load the distillation checkpoint.
--checkpoint_dir
: Path to store the distillation checkpoint.
Our code currently supports training in phases and does not support completing the entire training process in one go. You can specify which phase of training to execute by providing command-line arguments. Checkpoints will be periodically saved during each phase of training. Below are the script commands for executing the three phases of training:
bash scripts/distill_stage1.sh
bash scripts/distill_stage2.sh
bash scripts/distill_stage2.sh
- The --load_distill_checkpoint in the script is optional and is used to import an existing checkpoint. Please replace it with the corresponding ckpt path.
- In Stage 1 and Stage 2, all layers of the model are replaced with Mamba2 and aligned through distillation.
- In Stage 3, you can determine which layers to retain as the original multi-head attention layers by customizing the
distill_stage3/softmax_attention
list inconfigs/experiment/distill_stage3_mmMamba.yaml
to train the mmMamba-Hybrid model.
We use VLMEvalKit and InternVL Evalkit to evaluate our model. Among the metrics, GQA and POPE are evaluated using InternVL Evalkit, while the other metrics are assessed with VLMEvalKit. Below is the detailed evaluation process:
We provides a step-by-step guide to evaluate the mmMamba model using VLMEvalKit.
-
Download the mmMamba Model Weights
- Download the mmMamba model weights from Hugging Face.
-
Clone the VLMEvalKit Repository
- Clone the VLMEvalKit repository from its GitHub repository.
-
Integrate mmMamba Model into VLMEvalKit
- Copy
mmMamba.py
from./eval/mmMamba.py
toVLMEvalKit/vlmeval/vlm/
. - Add the
mmMambaChat
class toVLMEvalKit/vlmeval/vlm/__init__.py
with:from .mmMamba import mmMambaChat
- Copy
-
Add mmMamba Model to Configuration
- Add the mmMamba model to
VLMEvalKit/vlmeval/config.py
with:'mmMamba': partial(mmMambaChat, model_path='/path/to/mmMamba', version='V2.0')
- Add the mmMamba model to
-
Run the Evaluation
- Run the evaluation with the following command:
torchrun \ --nproc-per-node=${GPUS} \ --master_port=${MASTER_PORT} \ run.py --data <DatasetName> --model mmMamba
- Run the evaluation with the following command:
- Replace
<DatasetName>
with the actual name of the dataset you want to use for evaluation as what is provided in VLMEvalKit. - Ensure that you have the necessary dependencies installed and that your environment is correctly set up for running the evaluation.
- Adjust the paths and parameters as needed to fit your specific setup and requirements.
By following these steps, you should be able to successfully evaluate the mmMamba model using the provided tools and configurations.
This README provides a step-by-step guide to evaluate the mmMamba model using InternVL2 for the GQA and POPE metrics.
-
Download the mmMamba Model Weights
- Download the mmMamba model weights from Hugging Face
-
Clone the InternVL Repository
- Clone the InternVL repository and navigate to the
internvl_chat
directory.
- Clone the InternVL repository and navigate to the
-
Modify the InternVL Model Initialization
- Change lines 46-48 in
internvl_chat/internvl/model/__init__.py
to:model = AutoModel.from_pretrained( args.checkpoint, torch_dtype=torch.bfloat16, low_cpu_mem_usage=False, trust_remote_code=True).eval()
- Change lines 46-48 in
-
Follow the evaluation process
- Follow the evaluation process outlined in the InternVL2 Series Evaluation documentation.
- Ensure that you have the necessary dependencies installed and that your environment is correctly set up for running the evaluation.
- Adjust the paths and parameters as needed to fit your specific setup and requirements.
By following these steps, you should be able to successfully evaluate the mmMamba model using InternVL2 for the specified metrics.