August. 05th, 2024
: We release log and ckpt for VSSD with MESA.July. 29th, 2024
: When introduce MESA in training as MLLA, VSSD-B achieve 85.4% top-1 acc on ImageNet-1K !July. 25th, 2024
: We release the code, log and ckpt for VSSD.
Recently, State Space Duality (SSD), an improved variant of SSMs, was introduced in Mamba2 to enhance model performance and efficiency. However, the inherent causal nature of SSD/SSMs restricts their applications in non-causal vision tasks. To address this limitation, we introduce Visual State Space Duality (VSSD) model, which has a non-causal format of SSD. This repository contains the code for training and evaluating VSSD varints on the ImageNet-1K dataset for image classification, COCO dataset for object detection, and ADE20K dataset for semantic segmentation. For more information, please refer to our paper.
name | pretrain | resolution | acc@1 | #params | FLOPs | logs | ckpts |
---|---|---|---|---|---|---|---|
VSSD-Micro | ImageNet-1K | 224x224 | 82.5 | 14M | 2.3G | log | ckpt |
VSSD-Tiny | ImageNet-1K | 224x224 | 83.6 | 24M | 4.5G | log | ckpt |
VSSD-Small | ImageNet-1K | 224x224 | 84.1 | 40M | 7.4G | log | ckpt |
VSSD-Base | ImageNet-1K | 224x224 | 84.7 | 89M | 16.1G | log | ckpt |
Enhanced model with MESA:
name | pretrain | resolution | acc@1 | #params | FLOPs | logs | ckpts |
---|---|---|---|---|---|---|---|
VSSD-Tiny | ImageNet-1K | 224x224 | 84.1 | 24M | 4.5G | log | ckpt |
VSSD-Small | ImageNet-1K | 224x224 | 84.5 | 40M | 7.4G | log | ckpt |
VSSD-Base | ImageNet-1K | 224x224 | 85.4 | 89M | 16.1G | log | ckpt |
Backbone | #params | FLOPs | Detector | box mAP | mask mAP | logs | ckpts |
---|---|---|---|---|---|---|---|
VSSD-Micro | 33M | 220G | MaskRCNN@1x | 45.4 | 41.3 | log | ckpt |
VSSD-Tiny | 44M | 265G | MaskRCNN@1x | 46.9 | 42.6 | log | ckpt |
VSSD-Small | 59M | 325G | MaskRCNN@1x | 48.4 | 43.5 | log | ckpt |
VSSD-Micro | 33M | 220G | MaskRCNN@3x | 47.7 | 42.8 | log | ckpt |
VSSD-Tiny | 44M | 265G | MaskRCNN@3x | 48.8 | 43.6 | log | ckpt |
VSSD-Small | 59M | 325G | MaskRCNN@3x | 50.0 | 44.6 | - | ckpt |
Backbone | Input | #params | FLOPs | Segmentor | mIoU(SS) | mIoU(MS) | logs | ckpts |
---|---|---|---|---|---|---|---|---|
VSSD-Micro | 512x512 | 42M | 893G | UperNet@160k | 45.6 | 46.0 | log | ckpt |
VSSD-Tiny | 512x512 | 53M | 941G | UperNet@160k | 47.9 | 48.7 | log | ckpt |
Step 1: Clone the VSSD repository:
git clone https://github.com/YuHengsss/VSSD.git
cd VSSD
Step 2: Environment Setup:
Create and activate a new conda environment
conda create -n VSSD
conda activate VSSD
Install Dependencies
pip install -r requirements.txt
Dependencies for Detection
and Segmentation
(optional)
pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0
Classification
To train VSSD models for classification on ImageNet, use the following commands for different configurations:
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp
If you only want to test the performance (together with params and flops):
python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --resume </path/of/checkpoint> --eval
Detection and Segmentation
To evaluate with mmdetection
or mmsegmentation
:
bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1
use --tta
to get the mIoU(ms)
in segmentation
To train with mmdetection
or mmsegmentation
:
bash ./tools/dist_train.sh </path/to/config> 8
If VSSD is helpful for your research, please cite the following paper:
@article{shi2024vssd,
title={VSSD: Vision Mamba with Non-Causal State Space Duality},
author={Yuheng Shi and Minjing Dong and Mingjia Li and Chang Xu},
journal={arXiv preprint arXiv:2407.18559},
year={2024}
}
This project is based on VMamba(paper, code), Mambav2 (paper, code), Swin-Transformer (paper, code), OpenMMLab, thanks for their excellent works.