Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support STDC Network (new) #995

Merged
merged 39 commits into from
Dec 10, 2021
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
d7c2168
refactor stdc code
Sep 7, 2021
021eb65
update key
Sep 7, 2021
499a07f
fix backbone inference
Sep 9, 2021
b189253
remove comments
Sep 9, 2021
0bb88c1
Adding STDC for training
MengzhangLI Oct 25, 2021
b762c9b
Merge branch 'stdc' of github.com:xiexinch/mmsegmentation into stdc
MengzhangLI Oct 25, 2021
a863d46
fixing errors
MengzhangLI Nov 2, 2021
cbccd55
fixing version conflict
MengzhangLI Nov 2, 2021
40bc509
Merge branch 'master' into stdc_new
MengzhangLI Nov 2, 2021
1d4f4dd
fux typo
MengzhangLI Nov 3, 2021
7e3ec3b
use STDCHead
MengzhangLI Nov 24, 2021
0b446ad
Merge branch 'master' of https://github.com/open-mmlab/mmsegmentation…
MengzhangLI Nov 26, 2021
5328292
Merge branch 'master' of https://github.com/open-mmlab/mmsegmentation…
MengzhangLI Nov 26, 2021
3674381
upload models&logs
MengzhangLI Nov 26, 2021
bc88737
adding model converters script and fix unittest
MengzhangLI Nov 27, 2021
70dacc5
fix error
MengzhangLI Nov 27, 2021
3273969
fix error
MengzhangLI Nov 27, 2021
f0c8d99
fix error
MengzhangLI Nov 27, 2021
e646dd0
delete redundant keys in config
MengzhangLI Nov 27, 2021
923e0c8
fix errors in configs and unittest
MengzhangLI Nov 28, 2021
97f3f7a
fix errors in configs and unittest
MengzhangLI Nov 28, 2021
5f90850
fix errors in configs and unittest
MengzhangLI Nov 28, 2021
f4769f3
change Memory name
MengzhangLI Nov 29, 2021
2c9c7c3
refactor stdc2mmseg
MengzhangLI Nov 29, 2021
65c7670
Merge branch 'master' of https://github.com/open-mmlab/mmsegmentation…
MengzhangLI Nov 30, 2021
ebd268e
Merge branch 'master' of https://github.com/open-mmlab/mmsegmentation…
MengzhangLI Nov 30, 2021
562ccd1
change name to STDC
MengzhangLI Nov 30, 2021
658b617
refactor stdc
MengzhangLI Dec 1, 2021
8856d41
refactor stdc
MengzhangLI Dec 2, 2021
2c3d4e9
stdc refactor
MengzhangLI Dec 7, 2021
befd400
stdc refactor
MengzhangLI Dec 7, 2021
4dbbdfa
stdc refactor
MengzhangLI Dec 7, 2021
86a80db
stdc refactor
MengzhangLI Dec 7, 2021
b896054
stdc refactor
MengzhangLI Dec 8, 2021
3e703ea
stdc refactor
MengzhangLI Dec 9, 2021
f63ad0c
stdc refactor
MengzhangLI Dec 9, 2021
80bb1c4
stdc refactor
MengzhangLI Dec 10, 2021
211a735
refactor stdc
MengzhangLI Dec 10, 2021
bc2e5e0
stdc refactor
MengzhangLI Dec 10, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ Supported methods:
- [x] [PointRend (CVPR'2020)](configs/point_rend)
- [x] [CGNet (TIP'2020)](configs/cgnet)
- [x] [BiSeNetV2 (IJCV'2021)](configs/bisenetv2)
- [x] [STDC (CVPR'2021)](configs/stdc)
- [x] [SETR (CVPR'2021)](configs/setr)
- [x] [DPT (ArXiv'2021)](configs/dpt)
- [x] [SegFormer (NeurIPS'2021)](configs/segformer)
Expand Down
1 change: 1 addition & 0 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ MMSegmentation 是一个基于 PyTorch 的语义分割开源工具箱。它是 O
- [x] [PointRend (CVPR'2020)](configs/point_rend)
- [x] [CGNet (TIP'2020)](configs/cgnet)
- [x] [BiSeNetV2 (IJCV'2021)](configs/bisenetv2)
- [x] [STDC (CVPR'2021)](configs/stdc)
- [x] [SETR (CVPR'2021)](configs/setr)
- [x] [DPT (ArXiv'2021)](configs/dpt)
- [x] [SegFormer (NeurIPS'2021)](configs/segformer)
Expand Down
83 changes: 83 additions & 0 deletions configs/_base_/models/stdc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
norm_cfg = dict(type='BN', requires_grad=True)
model = dict(
type='EncoderDecoder',
pretrained=None,
backbone=dict(
type='STDCContextPathNet',
backbone_cfg=dict(
type='STDCNet',
stdc_type='STDCNet1',
in_channels=3,
channels=(32, 64, 256, 512, 1024),
bottleneck_type='cat',
num_convs=4,
norm_cfg=norm_cfg,
act_cfg=dict(type='ReLU'),
with_final_conv=False),
last_in_channels=(1024, 512),
out_channels=128,
ffm_cfg=dict(in_channels=384, out_channels=256, scale_factor=4)),
decode_head=dict(
type='FCNHead',
in_channels=256,
channels=256,
num_convs=1,
num_classes=19,
in_index=3,
concat_input=False,
dropout_ratio=0.1,
norm_cfg=norm_cfg,
align_corners=True,
sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
auxiliary_head=[
dict(
type='FCNHead',
in_channels=128,
channels=64,
num_convs=1,
num_classes=19,
in_index=2,
norm_cfg=norm_cfg,
concat_input=False,
align_corners=False,
sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
dict(
type='FCNHead',
in_channels=128,
channels=64,
num_convs=1,
num_classes=19,
in_index=1,
norm_cfg=norm_cfg,
concat_input=False,
align_corners=False,
sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=10000),
loss_decode=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)),
dict(
type='STDCHead',
in_channels=256,
channels=64,
num_convs=1,
num_classes=2,
boundary_threshold=0.1,
in_index=0,
norm_cfg=norm_cfg,
concat_input=False,
align_corners=False,
loss_decode=[
dict(
type='CrossEntropyLoss',
loss_name='loss_ce',
use_sigmoid=True,
loss_weight=1.0),
dict(type='DiceLoss', loss_name='loss_dice', loss_weight=1.0)
]),
],
# model training and testing settings
train_cfg=dict(),
test_cfg=dict(mode='whole'))
71 changes: 71 additions & 0 deletions configs/stdc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Rethinking BiSeNet For Real-time Semantic Segmentation

## Introduction

<!-- [ALGORITHM] -->

<a href="https://github.com/MichaelFan01/STDC-Seg">Official Repo</a>

<a href="https://github.com/open-mmlab/mmsegmentation/blob/v0.20.0/mmseg/models/backbones/stdc.py#L394">Code Snippet</a>

## Abstract

BiSeNet has been proved to be a popular two-stream network for real-time segmentation. However, its principle of adding an extra path to encode spatial information is time-consuming, and the backbones borrowed from pretrained tasks, e.g., image classification, may be inefficient for image segmentation due to the deficiency of task-specific design. To handle these problems, we propose a novel and efficient structure named Short-Term Dense Concatenate network (STDC network) by removing structure redundancy. Specifically, we gradually reduce the dimension of feature maps and use the aggregation of them for image representation, which forms the basic module of STDC network. In the decoder, we propose a Detail Aggregation module by integrating the learning of spatial information into low-level layers in single-stream manner. Finally, the low-level features and deep features are fused to predict the final segmentation results. Extensive experiments on Cityscapes and CamVid dataset demonstrate the effectiveness of our method by achieving promising trade-off between segmentation accuracy and inference speed. On Cityscapes, we achieve 71.9% mIoU on the test set with a speed of 250.4 FPS on NVIDIA GTX 1080Ti, which is 45.2% faster than the latest methods, and achieve 76.8% mIoU with 97.0 FPS while inferring on higher resolution images.

<!-- [IMAGE] -->
<div align=center>
<img src="https://user-images.githubusercontent.com/24582831/143640374-d0709587-edb2-4821-bb60-340035f6ad8f.png" width="60%"/>
</div>

<details>
<summary align="right"><a href="https://arxiv.org/abs/2104.13188">STDC (CVPR'2021)</a></summary>

```latex
@inproceedings{fan2021rethinking,
title={Rethinking BiSeNet For Real-time Semantic Segmentation},
author={Fan, Mingyuan and Lai, Shenqi and Huang, Junshi and Wei, Xiaoming and Chai, Zhenhua and Luo, Junfeng and Wei, Xiaolin},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={9716--9725},
year={2021}
}
```

</details>

## Usage

To use original repositories' [ImageNet Pretrained STDCNet Weights](https://drive.google.com/drive/folders/1wROFwRt8qWHD4jSo8Zu1gp1d6oYJ3ns1) , it is necessary to convert keys.

We provide a script [`stdc2mmseg.py`](../../tools/model_converters/stdc2mmseg.py) in the tools directory to convert the key of models from [the official repo](https://github.com/MichaelFan01/STDC-Seg) to MMSegmentation style.

```shell
python tools/model_converters/stdc2mmseg.py ${PRETRAIN_PATH} ${STORE_PATH} ${STDC_TYPE}
```

E.g.

```shell
python tools/model_converters/stdc2mmseg.py ./STDCNet813M_73.91.tar ./pretrained/stdc1.pth STDC1

python tools/model_converters/stdc2mmseg.py ./STDCNet1446_76.47.tar ./pretrained/stdc2.pth STDC2
```

This script convert model from `PRETRAIN_PATH` and store the converted model in `STORE_PATH`.

## Results and models

### Cityscapes

| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | config | download |
| --------- | --------- | --------- | ------: | -------- | -------------- | ----: | ------------- | --------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| STDC1 (No Pretrain) | STDC1 | 512x1024 | 80000 | 7.15 | 23.06 | 71.52 | 73.35 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/stdc/stdc1_512x1024_80k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/v0.5/stdc/stdc1_512x1024_80k_cityscapes/stdc1_512x1024_80k_cityscapes_20211125_211245-2c8ba4c5.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/stdc/stdc1_512x1024_80k_cityscapes/stdc1_512x1024_80k_cityscapes_20211125_211245.log.json) |
| STDC1| STDC1 | 512x1024 | 80000 | - | - | 75.10 | 77.72 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/stdc/stdc1_in1k-pre_512x1024_80k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/stdc/stdc1_in1k-pre_512x1024_80k_cityscapes/stdc1_in1k-pre_512x1024_80k_cityscapes_20211125_213942-880bb7d0.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/stdc/stdc1_in1k-pre_512x1024_80k_cityscapes/stdc1_in1k-pre_512x1024_80k_cityscapes_20211125_213942.log.json) |
| STDC2 (No Pretrain) | STDC2 | 512x1024 | 80000 | 8.27 | 23.71 | 73.20 | 75.55 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/stdc/stdc2_512x1024_80k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/stdc/stdc2_512x1024_80k_cityscapes/stdc2_512x1024_80k_cityscapes_20211125_222450-82333ae0.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/stdc/stdc2_512x1024_80k_cityscapes/stdc2_512x1024_80k_cityscapes_20211125_222450.log.json) |
| STDC2 | STDC2 | 512x1024 | 80000 | - | - | 77.17 | 79.01 | [config](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/stdc/stdc2_in1k-pre_512x1024_80k_cityscapes.py) | [model](https://download.openmmlab.com/mmsegmentation/v0.5/stdc/stdc2_in1k-pre_512x1024_80k_cityscapes/stdc2_in1k-pre_512x1024_80k_cityscapes_20211125_220437-d2c469f8.pth) &#124; [log](https://download.openmmlab.com/mmsegmentation/v0.5/stdc/stdc2_in1k-pre_512x1024_80k_cityscapes/stdc2_in1k-pre_512x1024_80k_cityscapes_20211125_220437.log.json) |

Note:

- For STDC on Cityscapes dataset, default setting is 4 GPUs with 12 samples per GPU in training.
- `No Pretrain` means the model is trained from scratch.
- The FPS is for reference only. The environment is also different from paper setting, whose input size is `512x1024` and `768x1536`, i.e., 50% and 75% of our input size, respectively and using TensorRT.
- The parameter `fusion_kernel` in `STDCHead` is not learnable. In official repo, `find_unused_parameters=True` is set [here](https://github.com/MichaelFan01/STDC-Seg/blob/59ff37fbd693b99972c76fcefe97caa14aeb619f/train.py#L220). You may check it by printing model parameters of original repo on your own.
87 changes: 87 additions & 0 deletions configs/stdc/stdc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
Collections:
- Name: stdc
Metadata:
Training Data:
- Cityscapes
Paper:
URL: https://arxiv.org/abs/2104.13188
Title: Rethinking BiSeNet For Real-time Semantic Segmentation
README: configs/stdc/README.md
Code:
URL: https://github.com/open-mmlab/mmsegmentation/blob/v0.20.0/mmseg/models/backbones/stdc.py#L394
Version: v0.20.0
Converted From:
Code: https://github.com/MichaelFan01/STDC-Seg
Models:
- Name: stdc1_512x1024_80k_cityscapes
In Collection: stdc
Metadata:
backbone: STDC1
crop size: (512,1024)
lr schd: 80000
inference time (ms/im):
- value: 43.37
hardware: V100
backend: PyTorch
batch size: 1
mode: FP32
resolution: (512,1024)
Training Memory (GB): 7.15
Results:
- Task: Semantic Segmentation
Dataset: Cityscapes
Metrics:
mIoU: 71.52
mIoU(ms+flip): 73.35
Config: configs/stdc/stdc1_512x1024_80k_cityscapes.py
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/v0.5/stdc/stdc1_512x1024_80k_cityscapes/stdc1_512x1024_80k_cityscapes_20211125_211245-2c8ba4c5.pth
- Name: stdc1_in1k-pre_512x1024_80k_cityscapes
In Collection: stdc
Metadata:
backbone: STDC1
crop size: (512,1024)
lr schd: 80000
Results:
- Task: Semantic Segmentation
Dataset: Cityscapes
Metrics:
mIoU: 75.1
mIoU(ms+flip): 77.72
Config: configs/stdc/stdc1_in1k-pre_512x1024_80k_cityscapes.py
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/stdc/stdc1_in1k-pre_512x1024_80k_cityscapes/stdc1_in1k-pre_512x1024_80k_cityscapes_20211125_213942-880bb7d0.pth
- Name: stdc2_512x1024_80k_cityscapes
In Collection: stdc
Metadata:
backbone: STDC2
crop size: (512,1024)
lr schd: 80000
inference time (ms/im):
- value: 42.18
hardware: V100
backend: PyTorch
batch size: 1
mode: FP32
resolution: (512,1024)
Training Memory (GB): 8.27
Results:
- Task: Semantic Segmentation
Dataset: Cityscapes
Metrics:
mIoU: 73.2
mIoU(ms+flip): 75.55
Config: configs/stdc/stdc2_512x1024_80k_cityscapes.py
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/stdc/stdc2_512x1024_80k_cityscapes/stdc2_512x1024_80k_cityscapes_20211125_222450-82333ae0.pth
- Name: stdc2_in1k-pre_512x1024_80k_cityscapes
In Collection: stdc
Metadata:
backbone: STDC2
crop size: (512,1024)
lr schd: 80000
Results:
- Task: Semantic Segmentation
Dataset: Cityscapes
Metrics:
mIoU: 77.17
mIoU(ms+flip): 79.01
Config: configs/stdc/stdc2_in1k-pre_512x1024_80k_cityscapes.py
Weights: https://download.openmmlab.com/mmsegmentation/v0.5/stdc/stdc2_in1k-pre_512x1024_80k_cityscapes/stdc2_in1k-pre_512x1024_80k_cityscapes_20211125_220437-d2c469f8.pth
9 changes: 9 additions & 0 deletions configs/stdc/stdc1_512x1024_80k_cityscapes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
_base_ = [
'../_base_/models/stdc.py', '../_base_/datasets/cityscapes.py',
'../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py'
]
lr_config = dict(warmup='linear', warmup_iters=1000)
data = dict(
samples_per_gpu=12,
workers_per_gpu=4,
)
6 changes: 6 additions & 0 deletions configs/stdc/stdc1_in1k-pre_512x1024_80k_cityscapes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
_base_ = './stdc1_512x1024_80k_cityscapes.py'
model = dict(
backbone=dict(
backbone_cfg=dict(
init_cfg=dict(
type='Pretrained', checkpoint='./pretrained/stdc1.pth'))))
2 changes: 2 additions & 0 deletions configs/stdc/stdc2_512x1024_80k_cityscapes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
_base_ = './stdc1_512x1024_80k_cityscapes.py'
model = dict(backbone=dict(backbone_cfg=dict(stdc_type='STDCNet2')))
6 changes: 6 additions & 0 deletions configs/stdc/stdc2_in1k-pre_512x1024_80k_cityscapes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
_base_ = './stdc2_512x1024_80k_cityscapes.py'
model = dict(
backbone=dict(
backbone_cfg=dict(
init_cfg=dict(
type='Pretrained', checkpoint='./pretrained/stdc2.pth'))))
4 changes: 3 additions & 1 deletion mmseg/models/backbones/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from .resnest import ResNeSt
from .resnet import ResNet, ResNetV1c, ResNetV1d
from .resnext import ResNeXt
from .stdc import STDCContextPathNet, STDCNet
from .swin import SwinTransformer
from .timm_backbone import TIMMBackbone
from .twins import PCPVT, SVT
Expand All @@ -22,5 +23,6 @@
'ResNet', 'ResNetV1c', 'ResNetV1d', 'ResNeXt', 'HRNet', 'FastSCNN',
'ResNeSt', 'MobileNetV2', 'UNet', 'CGNet', 'MobileNetV3',
'VisionTransformer', 'SwinTransformer', 'MixVisionTransformer',
'BiSeNetV1', 'BiSeNetV2', 'ICNet', 'TIMMBackbone', 'ERFNet', 'PCPVT', 'SVT'
'BiSeNetV1', 'BiSeNetV2', 'ICNet', 'TIMMBackbone', 'ERFNet', 'PCPVT',
'SVT', 'STDCNet', 'STDCContextPathNet'
]
Loading