DeepLab v1, v2

Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs: https://arxiv.org/pdf/1606.00915.pdf

Problems

overly reduced feature resolution
Repeated combination of max-pooling and downsampling such as convolution with stride over 2 reduces spatial resolution. This impede small objects prediction.
This cause the probl
multi-scale objects in a image
Capture both local and global context is difficult.
reduced localization of object boundaries accuracy due to model invariance

Points

Atrous Convolution
Atrous Convolution enlarge the field of view of filters(receptive field) without the loss of feature resolution and increasing the number of parameters or the amount of computation. This can resolve the problem reduced feature resolution.
Atrous Spatial Pyramid Pooling(ASPP) ASPP can extract multi-scale features by using technique inspired by R-CNN spatial pyramid pooling method. ASPP uses multiple parallel atrous convolutional layers with difference sampling rates to the feature map. The feature extracted for each sampling rate are furthrer processed by two 1 by 1 convolutional layers in each branches, and fused to generate the final result. This can resolve the problem multi-scale objects in a image.
Upsampling by bilinear interpolation
This paper employs bilinear interpolation to upsample by a factor of 8 the score map to reach the original image resolution. Unlike the deconvolution, there is no need to require learning any extra parameters, leading to faster model training. This paper said bilinear interpolation is sufficient in this setting because the class score maps are quite smooth.
DenseCRF
Traditionally conditional random fields(CRFs) have been employed to smooth noisy segmentation maps. This paper calls this short-range CRFS. But deep based models are quite smooth and the goal of the model is to recover thin-structure. DenseCRF can resolve reduced localization of object boundaries. But I think DenseCRF has heavy computational cost.
Poly learning policy
Please read papers.

DeepLab v3

Rethinking Atrous Convoluion for Semantic Image Segmentation: https://arxiv.org/pdf/1706.05587.pdf

This paper propose four categories to handle multi-scale objects

Image Pyramid
Using some scaled images.
Encoder-Decoder structure
Extra modules are cascaded on top of the original network for capturing long range information
DenseCRF is employed to encode pixel-level pairwise similarities. While several extra convolutional layers in cascade to gradually capture long range context.
Spatial Pyramid Pooling(SPP)
SPP probes an incoming feature map with filters or pooling operations at multiple rates, and capturing objects at multiple scales.

Problems

overly reduced feature resolution
Repeated combination of max-pooling and downsampling such as convolution with stride over 2 reduces spatial resolution.
This cause the probl
multi-scale objects in a image
Capture both local and global context is difficult.

Points

Improved ASPP
Include global context calculated by global average pooling to the last feature map.
Cascaded Atrous Block
Reference: https://arxiv.org/pdf/1702.08502.pdf

ICNet

Symmetric Convolution
SqueezeNet based bottleneck module
Early downsampling
Poly learning rate
Atrous Convolution
Upsampling with max pooling's indexes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Description.md

Description.md

DeepLab v1, v2

Problems

Points

DeepLab v3

This paper propose four categories to handle multi-scale objects

Problems

Points

ICNet

SegNet

PSPNet

Points

UNet

Files

Description.md

Latest commit

History

Description.md

File metadata and controls

DeepLab v1, v2

Problems

Points

DeepLab v3

This paper propose four categories to handle multi-scale objects

Problems

Points

ICNet

SegNet

PSPNet

Points

UNet