Skip to content

YuHengsss/VSSD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VSSD

VSSD: Vision Mamba with Non-Causal State Space Duality

Paper: (arXiv:2407.18559)

Updates

  • August. 05th, 2024: We release log and ckpt for VSSD with MESA.
  • July. 29th, 2024: When introduce MESA in training as MLLA, VSSD-B achieve 85.4% top-1 acc on ImageNet-1K !
  • July. 25th, 2024: We release the code, log and ckpt for VSSD.

Introduction

Recently, State Space Duality (SSD), an improved variant of SSMs, was introduced in Mamba2 to enhance model performance and efficiency. However, the inherent causal nature of SSD/SSMs restricts their applications in non-causal vision tasks. To address this limitation, we introduce Visual State Space Duality (VSSD) model, which has a non-causal format of SSD. This repository contains the code for training and evaluating VSSD varints on the ImageNet-1K dataset for image classification, COCO dataset for object detection, and ADE20K dataset for semantic segmentation. For more information, please refer to our paper.

Main Results

Classification on ImageNet-1K

name pretrain resolution acc@1 #params FLOPs logs ckpts
VSSD-Micro ImageNet-1K 224x224 82.5 14M 2.3G log ckpt
VSSD-Tiny ImageNet-1K 224x224 83.6 24M 4.5G log ckpt
VSSD-Small ImageNet-1K 224x224 84.1 40M 7.4G log ckpt
VSSD-Base ImageNet-1K 224x224 84.7 89M 16.1G log ckpt

Enhanced model with MESA:

name pretrain resolution acc@1 #params FLOPs logs ckpts
VSSD-Tiny ImageNet-1K 224x224 84.1 24M 4.5G log ckpt
VSSD-Small ImageNet-1K 224x224 84.5 40M 7.4G log ckpt
VSSD-Base ImageNet-1K 224x224 85.4 89M 16.1G log ckpt

Object Detection on COCO

Backbone #params FLOPs Detector box mAP mask mAP logs ckpts
VSSD-Micro 33M 220G MaskRCNN@1x 45.4 41.3 log ckpt
VSSD-Tiny 44M 265G MaskRCNN@1x 46.9 42.6 log ckpt
VSSD-Small 59M 325G MaskRCNN@1x 48.4 43.5 log ckpt
VSSD-Micro 33M 220G MaskRCNN@3x 47.7 42.8 log ckpt
VSSD-Tiny 44M 265G MaskRCNN@3x 48.8 43.6 log ckpt
VSSD-Small 59M 325G MaskRCNN@3x 50.0 44.6 - ckpt

Semantic Segmentation on ADE20K

Backbone Input #params FLOPs Segmentor mIoU(SS) mIoU(MS) logs ckpts
VSSD-Micro 512x512 42M 893G UperNet@160k 45.6 46.0 log ckpt
VSSD-Tiny 512x512 53M 941G UperNet@160k 47.9 48.7 log ckpt

Getting Started

Installation

Step 1: Clone the VSSD repository:

git clone https://github.com/YuHengsss/VSSD.git
cd VSSD

Step 2: Environment Setup:

Create and activate a new conda environment

conda create -n VSSD
conda activate VSSD

Install Dependencies

pip install -r requirements.txt

Dependencies for Detection and Segmentation (optional)

pip install mmengine==0.10.1 mmcv==2.1.0 opencv-python-headless ftfy regex
pip install mmdet==3.3.0 mmsegmentation==1.2.2 mmpretrain==1.2.0

Quick Start

Classification

To train VSSD models for classification on ImageNet, use the following commands for different configurations:

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=8 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp

If you only want to test the performance (together with params and flops):

python -m torch.distributed.launch --nnodes=1 --node_rank=0 --nproc_per_node=1 --master_addr="127.0.0.1" --master_port=29501 main.py --cfg </path/to/config> --batch-size 128 --data-path </path/of/dataset> --output /tmp --resume </path/of/checkpoint> --eval

Detection and Segmentation

To evaluate with mmdetection or mmsegmentation:

bash ./tools/dist_test.sh </path/to/config> </path/to/checkpoint> 1

use --tta to get the mIoU(ms) in segmentation

To train with mmdetection or mmsegmentation:

bash ./tools/dist_train.sh </path/to/config> 8

Citation

If VSSD is helpful for your research, please cite the following paper:

@article{shi2024vssd,
         title={VSSD: Vision Mamba with Non-Causal State Space Duality}, 
         author={Yuheng Shi and Minjing Dong and Mingjia Li and Chang Xu},
         journal={arXiv preprint arXiv:2407.18559},
         year={2024}
}

Acknowledgment

This project is based on VMamba(paper, code), Mambav2 (paper, code), Swin-Transformer (paper, code), OpenMMLab, thanks for their excellent works.

About

Introduce Mamba2 to Vision.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published