Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs: https://arxiv.org/pdf/1606.00915.pdf
- overly reduced feature resolution
Repeated combination of max-pooling and downsampling such as convolution with stride over 2 reduces spatial resolution. This impede small objects prediction.
This cause the probl - multi-scale objects in a image
Capture both local and global context is difficult. - reduced localization of object boundaries accuracy due to model invariance
- Atrous Convolution
Atrous Convolution enlarge the field of view of filters(receptive field) without the loss of feature resolution and increasing the number of parameters or the amount of computation. This can resolve the problemreduced feature resolution
. - Atrous Spatial Pyramid Pooling(ASPP)
ASPP can extract multi-scale features by using technique inspired by R-CNN spatial pyramid pooling method. ASPP uses multiple parallel atrous convolutional layers with difference sampling rates to the feature map. The feature extracted for each sampling rate are furthrer processed by two 1 by 1 convolutional layers in each branches, and fused to generate the final result. This can resolve the problem
multi-scale objects in a image
. - Upsampling by bilinear interpolation
This paper employs bilinear interpolation to upsample by a factor of 8 the score map to reach the original image resolution. Unlike the deconvolution, there is no need to require learning any extra parameters, leading to faster model training. This paper said bilinear interpolation is sufficient in this setting because the class score maps are quite smooth. - DenseCRF
Traditionally conditional random fields(CRFs) have been employed to smooth noisy segmentation maps. This paper calls this short-range CRFS. But deep based models are quite smooth and the goal of the model is to recover thin-structure. DenseCRF can resolvereduced localization of object boundaries
. But I think DenseCRF has heavy computational cost. - Poly learning policy
Please read papers.
Rethinking Atrous Convoluion for Semantic Image Segmentation: https://arxiv.org/pdf/1706.05587.pdf
- Image Pyramid
Using some scaled images. - Encoder-Decoder structure
- Extra modules are cascaded on top of the original network for capturing long range information
DenseCRF is employed to encode pixel-level pairwise similarities. While several extra convolutional layers in cascade to gradually capture long range context. - Spatial Pyramid Pooling(SPP)
SPP probes an incoming feature map with filters or pooling operations at multiple rates, and capturing objects at multiple scales.
- overly reduced feature resolution
Repeated combination of max-pooling and downsampling such as convolution with stride over 2 reduces spatial resolution.
This cause the probl - multi-scale objects in a image
Capture both local and global context is difficult.
- Improved ASPP
Include global context calculated by global average pooling to the last feature map. - Cascaded Atrous Block
Reference: https://arxiv.org/pdf/1702.08502.pdf
- Symmetric Convolution
- SqueezeNet based bottleneck module
- Early downsampling
- Poly learning rate
- Atrous Convolution
- Upsampling with max pooling's indexes
- Symmetry Encoder-Decoder Model
- Pyramid Pooling Modules
- Auxiliary Loss