Skip to content

Latest commit

 

History

History
66 lines (56 loc) · 3.63 KB

Description.md

File metadata and controls

66 lines (56 loc) · 3.63 KB

DeepLab v1, v2

Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs: https://arxiv.org/pdf/1606.00915.pdf

Problems

  • overly reduced feature resolution
    Repeated combination of max-pooling and downsampling such as convolution with stride over 2 reduces spatial resolution. This impede small objects prediction.
    This cause the probl
  • multi-scale objects in a image
    Capture both local and global context is difficult.
  • reduced localization of object boundaries accuracy due to model invariance

Points

  • Atrous Convolution
    Atrous Convolution enlarge the field of view of filters(receptive field) without the loss of feature resolution and increasing the number of parameters or the amount of computation. This can resolve the problem reduced feature resolution.
  • Atrous Spatial Pyramid Pooling(ASPP) ASPP can extract multi-scale features by using technique inspired by R-CNN spatial pyramid pooling method. ASPP uses multiple parallel atrous convolutional layers with difference sampling rates to the feature map. The feature extracted for each sampling rate are furthrer processed by two 1 by 1 convolutional layers in each branches, and fused to generate the final result. This can resolve the problem multi-scale objects in a image.
  • Upsampling by bilinear interpolation
    This paper employs bilinear interpolation to upsample by a factor of 8 the score map to reach the original image resolution. Unlike the deconvolution, there is no need to require learning any extra parameters, leading to faster model training. This paper said bilinear interpolation is sufficient in this setting because the class score maps are quite smooth.
  • DenseCRF
    Traditionally conditional random fields(CRFs) have been employed to smooth noisy segmentation maps. This paper calls this short-range CRFS. But deep based models are quite smooth and the goal of the model is to recover thin-structure. DenseCRF can resolve reduced localization of object boundaries. But I think DenseCRF has heavy computational cost.
  • Poly learning policy
    Please read papers.

DeepLab v3

Rethinking Atrous Convoluion for Semantic Image Segmentation: https://arxiv.org/pdf/1706.05587.pdf

This paper propose four categories to handle multi-scale objects

  • Image Pyramid
    Using some scaled images.
  • Encoder-Decoder structure
  • Extra modules are cascaded on top of the original network for capturing long range information
    DenseCRF is employed to encode pixel-level pairwise similarities. While several extra convolutional layers in cascade to gradually capture long range context.
  • Spatial Pyramid Pooling(SPP)
    SPP probes an incoming feature map with filters or pooling operations at multiple rates, and capturing objects at multiple scales.

Problems

  • overly reduced feature resolution
    Repeated combination of max-pooling and downsampling such as convolution with stride over 2 reduces spatial resolution.
    This cause the probl
  • multi-scale objects in a image
    Capture both local and global context is difficult.

Points

ICNet

  • Symmetric Convolution
  • SqueezeNet based bottleneck module
  • Early downsampling
  • Poly learning rate
  • Atrous Convolution
  • Upsampling with max pooling's indexes

SegNet

  • Symmetry Encoder-Decoder Model

PSPNet

Points

  • Pyramid Pooling Modules
  • Auxiliary Loss

UNet