Skip to content

Latest commit

 

History

History
29 lines (29 loc) · 2.04 KB

1704.05548 - Annotating Object Instances with a Polygon-RNN.md

File metadata and controls

29 lines (29 loc) · 2.04 KB

♥ DL , CV , RNN, Annotation

  • One of CVPR2017 best paper, an approach for semi-automatic annotation of object instances
  • Cast annotation as polygon prediction task $\rightarrow$ Polygon RNN
  • Why semi-automic
    • Human interfere can make it more accurate
    • We need to assign ground-truth bounding box first before predicting the position of polygon using RNN
  • Previous attempts
    • Learning segmentation models from weak annotation such as image tags or bounding boxes $\rightarrow$ in-competitive performance
    • Producing(noisy) labels inside bounding boxes with a GrabCut type of approach $\rightarrow$ cannot be used as official ground-truth for a benchmark due to its inherent imprecisions.
  • Polygon-RNN: predicts a vertex at every time step
    • Inputs:
      • $x_t$ : a tensor at time step t, concatenate multiple features processed by CNN
      • $y_{t-1}, y_{t-2}$ : previous 2 predicted vertices (one-hot encoding)
      • $y_1$ : the one-hot encoding of the first predicted vertex
      • $x_t$ and $y_{t-1}, y_{t-2}$ can be used to predict the direction of next edge
      • $y_1$ can be used to predict the stop position of this edge
    • CNN : a modified vgg16
      • remove fully-connected layers as well as last max-pooling layer(pool5)
      • add additional convolutional layers with skip-connections(see ResNet), this allows CNN to extract following features
        • low-level information about the edges and corners $rightarrow$ follow the object’s boundaries
        • semantic information about the object $\rightarrow$ see the object
        • concated output features (28x28x512) $\rightarrow$ another 3x3 conv + ReLU $\rightarrow$ 28x28x128
    • RNN :
      • Convoluntional LSTM
        • preserve the spatial information received from the CNN
        • reduce the parameters
      • model the polygon with a two-layer ConvLSTM with kernel size of 3x3 and 16 channels, which outputs a vertex at each time step
      • formulate the vertex prediction as a classification task.