The official implementation of "Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures".
2024/04/14
: We support rwkv6 in classification task, higher performance!2024/03/04
: We release the code and models of Vision-RWKV.
- High-Resolution Efficiency: Processed high-resolution images smoothly with a global receptive field.
- Scalability: Pre-trained with large-scale datasets and posses scale up stablity.
- Superior Performance: Achieved a better performance in classfication tasks than ViTs. Surpassed window-based ViTs and comparabled to global attention ViTs with lower flops and higher speed in dense prediction tasks.
- Efficient Alternative: Capability to be an alternative backbone to ViT in comprehensive vision tasks.
- Support RWKV6 as VRWKV6
- Release VRWKV-L
- Release VRWKV-T/S/B
Model | Size | Pretrain | Download |
---|---|---|---|
VRWKV-L | 192 | ImageNet-22K | ckpt |
Model | Size | #Param | #FLOPs | Top-1 Acc | Download |
---|---|---|---|---|---|
VRWKV-T | 224 | 6.2M | 1.2G | 75.1 | ckpt | cfg |
VRWKV-S | 224 | 23.8M | 4.6G | 80.1 | ckpt | cfg |
VRWKV-B | 224 | 93.7M | 18.2G | 82.0 | ckpt | cfg |
VRWKV-L | 384 | 334.9M | 189.5G | 86.0 | ckpt | cfg |
VRWKV6-T | 224 | 7.6M | 1.6G | 76.6 | ckpt | cfg |
VRWKV6-S | 224 | 27.7M | 5.6G | 81.1 | ckpt | cfg |
VRWKV6-B | 224 | 104.9M | 20.9G | 82.6 | ckpt | cfg |
- VRWKV-L is pretrained on ImageNet-22K and then finetuned on ImageNet-1K.
- We train VRWKV-L with the internimage codebase for a higher speed.
Model | #Param | #FLOPs | box AP | mask AP | Download |
---|---|---|---|---|---|
VRWKV-T | 8.4M | 67.9G | 41.7 | 38.0 | ckpt | cfg |
VRWKV-S | 29.3M | 189.9G | 44.8 | 40.2 | ckpt | cfg |
VRWKV-B | 106.6M | 599.0G | 46.8 | 41.7 | ckpt | cfg |
VRWKV-L | 351.9M | 1730.6G | 50.6 | 44.9 | ckpt | cfg |
- We report the #Param and #FLOPs of the backbone in this table.
Model | #Param | #FLOPs | mIoU | Download |
---|---|---|---|---|
VRWKV-T | 8.4M | 16.6G | 43.3 | ckpt | cfg |
VRWKV-S | 29.3M | 46.3G | 47.2 | ckpt | cfg |
VRWKV-B | 106.6M | 146.0G | 49.2 | ckpt | cfg |
VRWKV-L | 351.9M | 421.9G | 53.5 | ckpt | cfg |
- We report the #Param and #FLOPs of the backbone in this table.
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{duan2024vrwkv,
title={Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures},
author={Duan, Yuchen and Wang, Weiyun and Chen, Zhe and Zhu, Xizhou and Lu, Lewei and Lu, Tong and Qiao, Yu and Li, Hongsheng and Dai, Jifeng and Wang, Wenhai},
journal={arXiv preprint arXiv:2403.02308},
year={2024}
}
This repository is released under the Apache 2.0 license as found in the LICENSE file.
Vision-RWKV is built with reference to the code of the following projects: RWKV, MMPretrain, MMDetection, MMSegmentation, ViT-Adapter, InternImage. Thanks for their awesome work!