Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Fix UniFormer README and metafile #2450

Merged
merged 4 commits into from
May 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 2 additions & 9 deletions configs/recognition/mvit/README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,14 @@
# MViT V2

> [MViTv2: Improved Multiscale Vision Transformers for Classification and Detection](http://openaccess.thecvf.com//content/CVPR2022/papers/Li_MViTv2_Improved_Multiscale_Vision_Transformers_for_Classification_and_Detection_CVPR_2022_paper.pdf)
[MViTv2: Improved Multiscale Vision Transformers for Classification and Detection](http://openaccess.thecvf.com//content/CVPR2022/papers/Li_MViTv2_Improved_Multiscale_Vision_Transformers_for_Classification_and_Detection_CVPR_2022_paper.pdf)

<!-- [ALGORITHM] -->

## Abstract

<!-- [ABSTRACT] -->

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video
classification, as well as object detection. We present an improved version of MViT that incorporates
decomposed relative positional embeddings and residual pooling connections. We instantiate this architecture
in five sizes and evaluate it for ImageNet classification, COCO detection and Kinetics video recognition where
it outperforms prior work. We further compare MViTv2s' pooling attention to window attention mechanisms where
it outperforms the latter in accuracy/compute. Without bells-and-whistles, MViTv2 has state-of-the-art
performance in 3 domains: 88.8% accuracy on ImageNet classification, 58.7 boxAP on COCO object detection as
well as 86.1% on Kinetics-400 video classification.
In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection. We present an improved version of MViT that incorporates decomposed relative positional embeddings and residual pooling connections. We instantiate this architecture in five sizes and evaluate it for ImageNet classification, COCO detection and Kinetics video recognition where it outperforms prior work. We further compare MViTv2s' pooling attention to window attention mechanisms where it outperforms the latter in accuracy/compute. Without bells-and-whistles, MViTv2 has state-of-the-art performance in 3 domains: 88.8% accuracy on ImageNet classification, 58.7 boxAP on COCO object detection as well as 86.1% on Kinetics-400 video classification.

<!-- [IMAGE] -->

Expand Down
2 changes: 1 addition & 1 deletion configs/recognition/mvit/metafile.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Collections:
- Name: MViT
README: configs/recognition/MViT/README.md
README: configs/recognition/mvit/README.md
Paper:
URL: http://openaccess.thecvf.com//content/CVPR2022/papers/Li_MViTv2_Improved_Multiscale_Vision_Transformers_for_Classification_and_Detection_CVPR_2022_paper.pdf
Title: "MViTv2: Improved Multiscale Vision Transformers for Classification and Detection"
Expand Down
10 changes: 5 additions & 5 deletions configs/recognition/uniformerv2/README.md

Large diffs are not rendered by default.

52 changes: 26 additions & 26 deletions configs/recognition/uniformerv2/metafile.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Collections:
Models:
- Name: uniformerv2-base-p16-res224_clip_8xb32-u8_kinetics400-rgb
Config: configs/recognition/uniformerv2/uniformerv2-base-p16-res224_clip_8xb32-u8_kinetics400-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-B/16
Batch Size: 32
Expand All @@ -30,7 +30,7 @@ Models:

- Name: uniformerv2-base-p16-res224_clip-kinetics710-pre_8xb32-u8_kinetics400-rgb
Config: configs/recognition/uniformerv2/uniformerv2-base-p16-res224_clip-kinetics710-pre_8xb32-u8_kinetics400-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-B/16
Batch Size: 32
Expand All @@ -52,7 +52,7 @@ Models:

- Name: uniformerv2-large-p14-res224_clip-kinetics710-pre_u8_kinetics400-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res224_clip-kinetics710-pre_u8_kinetics400-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14
Pretrained: Kinetics-710
Expand All @@ -73,7 +73,7 @@ Models:

- Name: uniformerv2-large-p14-res224_clip-kinetics710-pre_u16_kinetics400-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res224_clip-kinetics710-pre_u16_kinetics400-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14
Pretrained: Kinetics-710
Expand All @@ -94,7 +94,7 @@ Models:

- Name: uniformerv2-large-p14-res224_clip-kinetics710-pre_u32_kinetics400-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res224_clip-kinetics710-pre_u32_kinetics400-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14
Pretrained: Kinetics-710
Expand All @@ -115,7 +115,7 @@ Models:

- Name: uniformerv2-large-p14-res336_clip-kinetics710-pre_u32_kinetics400-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res336_clip-kinetics710-pre_u32_kinetics400-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14@336
Pretrained: Kinetics-710
Expand All @@ -136,7 +136,7 @@ Models:

- Name: uniformerv2-base-p16-res224_clip-kinetics710-pre_8xb32-u8_kinetics600-rgb
Config: configs/recognition/uniformerv2/uniformerv2-base-p16-res224_clip-kinetics710-pre_8xb32-u8_kinetics600-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-B/16
Pretrained: Kinetics-710
Expand All @@ -158,7 +158,7 @@ Models:

- Name: uniformerv2-large-p14-res224_clip-kinetics710-pre_u8_kinetics600-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res224_clip-kinetics710-pre_u8_kinetics600-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14
Pretrained: Kinetics-710
Expand All @@ -178,7 +178,7 @@ Models:

- Name: uniformerv2-large-p14-res224_clip-kinetics710-pre_u16_kinetics600-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res224_clip-kinetics710-pre_u16_kinetics600-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14
Pretrained: Kinetics-710
Expand All @@ -198,7 +198,7 @@ Models:

- Name: uniformerv2-large-p14-res224_clip-kinetics710-pre_u32_kinetics600-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res224_clip-kinetics710-pre_u32_kinetics600-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14
Pretrained: Kinetics-710
Expand All @@ -218,7 +218,7 @@ Models:

- Name: uniformerv2-large-p14-res336_clip-kinetics710-pre_u32_kinetics600-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res336_clip-kinetics710-pre_u32_kinetics600-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14@336
Pretrained: Kinetics-710
Expand All @@ -236,8 +236,8 @@ Models:
Top 5 Accuracy: 98.5
Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/uniformerv2/kinetics600/uniformerv2-large-p14-res336_clip-kinetics710-pre_u32_kinetics600-rgb_20221219-f984f5d2.pth

- Name: uniformerv2-base-p16-res224_clip-kinetics710-pre_8xb32-u8_kinetics700-rgb
Config: configs/recognition/uniformerv2/uniformerv2-base-p16-res224_clip-kinetics710-pre_8xb32-u8_kinetics700-rgb.py
- Name: uniformerv2-base-p16-res224_clip-pre_8xb32-u8_kinetics700-rgb
Config: configs/recognition/uniformerv2/uniformerv2-base-p16-res224_clip-pre_8xb32-u8_kinetics700-rgb.py
In Collection: UniFormer
Metadata:
Architecture: UniFormerV2-B/16
Expand All @@ -253,7 +253,7 @@ Models:
- Dataset: Kinetics-700
Task: Action Recognition
Metrics:
Top 1 Accuracy: 76.3
Top 1 Accuracy: 75.9
Top 5 Accuracy: 92.9
Training Log: https://download.openmmlab.com/mmaction/v1.0/recognition/uniformerv2/uniformerv2-base-p16-res224_clip_8xb32-u8_kinetics700-rgb/uniformerv2-base-p16-res224_clip_8xb32-u8_kinetics700-rgb.log
Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/uniformerv2/uniformerv2-base-p16-res224_clip_8xb32-u8_kinetics700-rgb/uniformerv2-base-p16-res224_clip_8xb32-u8_kinetics700-rgb_20230313-f02e48ad.pth
Expand All @@ -275,14 +275,14 @@ Models:
- Dataset: Kinetics-700
Task: Action Recognition
Metrics:
Top 1 Accuracy: 75.9
Top 1 Accuracy: 76.3
Top 5 Accuracy: 92.9
Training Log: https://download.openmmlab.com/mmaction/v1.0/recognition/uniformerv2/uniformerv2-base-p16-res224_clip-kinetics710-pre_8xb32-u8_kinetics700-rgb/uniformerv2-base-p16-res224_clip-kinetics710-pre_8xb32-u8_kinetics700-rgb.log
Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/uniformerv2/uniformerv2-base-p16-res224_clip-kinetics710-pre_8xb32-u8_kinetics700-rgb/uniformerv2-base-p16-res224_clip-kinetics710-pre_8xb32-u8_kinetics700-rgb_20230313-69070837.pth

- Name: uniformerv2-large-p14-res224_clip-kinetics710-pre_u8_kinetics700-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res224_clip-kinetics710-pre_u8_kinetics700-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14
Pretrained: Kinetics-710
Expand All @@ -302,7 +302,7 @@ Models:

- Name: uniformerv2-large-p14-res224_clip-kinetics710-pre_u16_kinetics700-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res224_clip-kinetics710-pre_u16_kinetics700-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14
Pretrained: Kinetics-710
Expand All @@ -322,7 +322,7 @@ Models:

- Name: uniformerv2-large-p14-res224_clip-kinetics710-pre_u32_kinetics700-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res224_clip-kinetics710-pre_u32_kinetics700-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14
Pretrained: Kinetics-710
Expand All @@ -342,7 +342,7 @@ Models:

- Name: uniformerv2-large-p14-res336_clip-kinetics710-pre_u32_kinetics700-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res336_clip-kinetics710-pre_u32_kinetics700-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14@336
Pretrained: Kinetics-710
Expand All @@ -362,7 +362,7 @@ Models:

- Name: uniformerv2-base-p16-res224_clip-pre_u8_kinetics710-rgb
Config: configs/recognition/uniformerv2/uniformerv2-base-p16-res224_clip-pre_u8_kinetics710-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-B/16
Pretrained: CLIP-400M
Expand All @@ -374,8 +374,8 @@ Models:
Code: https://github.com/OpenGVLab/UniFormerV2
Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/uniformerv2/kinetics710/uniformerv2-base-p16-res224_clip-pre_u8_kinetics710-rgb_20221219-77d34f81.pth

- Name: uniformerv2-large-p14-res224_clip-kinetics710-pre_u8_kinetics700-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res224_clip-kinetics710-pre_u8_kinetics700-rgb.py
- Name: uniformerv2-large-p14-res224_clip-pre_u8_kinetics710-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res224_clip-pre_u8_kinetics710-rgb.py
In Collection: UniFormer
Metadata:
Architecture: UniFormerV2-L/14
Expand All @@ -390,7 +390,7 @@ Models:

- Name: uniformerv2-large-p14-res336_clip-pre_u8_kinetics710-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p14-res336_clip-pre_u8_kinetics710-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14@336
Pretrained: Kinetics-710
Expand All @@ -404,7 +404,7 @@ Models:

- Name: uniformerv2-base-p16-res224_clip-kinetics710-kinetics-k400-pre_16xb32-u8_mitv1-rgb
Config: configs/recognition/uniformerv2/uniformerv2-base-p16-res224_clip-kinetics710-kinetics-k400-pre_16xb32-u8_mitv1-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-B/16
Pretrained: Kinetics-710 + Kinetics-400
Expand All @@ -426,7 +426,7 @@ Models:

- Name: uniformerv2-large-p16-res224_clip-kinetics710-kinetics-k400-pre_u8_mitv1-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p16-res224_clip-kinetics710-kinetics-k400-pre_u8_mitv1-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14
Pretrained: Kinetics-710 + Kinetics-400
Expand All @@ -446,7 +446,7 @@ Models:

- Name: uniformerv2-large-p16-res336_clip-kinetics710-kinetics-k400-pre_u8_mitv1-rgb
Config: configs/recognition/uniformerv2/uniformerv2-large-p16-res336_clip-kinetics710-kinetics-k400-pre_u8_mitv1-rgb.py
In Collection: UniFormer
In Collection: UniFormerV2
Metadata:
Architecture: UniFormerV2-L/14@336
Pretrained: Kinetics-710 + Kinetics-400
Expand Down
5 changes: 5 additions & 0 deletions model-index.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ Import:
- configs/recognition/trn/metafile.yml
- configs/recognition/swin/metafile.yml
- configs/recognition/c2d/metafile.yml
- configs/recognition/omnisource/metafile.yml
- configs/recognition/mvit/metafile.yml
- configs/recognition/uniformer/metafile.yml
- configs/recognition/uniformerv2/metafile.yml
- configs/recognition/videomae/metafile.yml
- configs/detection/slowfast/metafile.yml
- configs/detection/slowonly/metafile.yml
- configs/detection/acrn/metafile.yml
Expand Down