1704.05548 - Annotating Object Instances with a Polygon-RNN
♥ DL , CV , RNN, Annotation
- One of CVPR2017 best paper, an approach for semi-automatic annotation of object instances
- Cast annotation as polygon prediction task
$\rightarrow$ Polygon RNN - Why semi-automic
- Human interfere can make it more accurate
- We need to assign ground-truth bounding box first before predicting the position of polygon using RNN
- Previous attempts
- Learning segmentation models from weak annotation such as image tags or bounding boxes
$\rightarrow$ in-competitive performance - Producing(noisy) labels inside bounding boxes with a GrabCut type of approach
$\rightarrow$ cannot be used as official ground-truth for a benchmark due to its inherent imprecisions.
- Learning segmentation models from weak annotation such as image tags or bounding boxes
- Polygon-RNN: predicts a vertex at every time step
- Inputs:
-
$x_t$ : a tensor at time step t, concatenate multiple features processed by CNN -
$y_{t-1}, y_{t-2}$ : previous 2 predicted vertices (one-hot encoding) -
$y_1$ : the one-hot encoding of the first predicted vertex -
$x_t$ and$y_{t-1}, y_{t-2}$ can be used to predict the direction of next edge -
$y_1$ can be used to predict the stop position of this edge
-
- CNN : a modified vgg16
- remove fully-connected layers as well as last max-pooling layer(pool5)
- add additional convolutional layers with skip-connections(see ResNet), this allows CNN to extract following features
- low-level information about the edges and corners
$rightarrow$ follow the object’s boundaries - semantic information about the object
$\rightarrow$ see the object - concated output features (28x28x512)
$\rightarrow$ another 3x3 conv + ReLU$\rightarrow$ 28x28x128
- low-level information about the edges and corners
- RNN :
- Convoluntional LSTM
- preserve the spatial information received from the CNN
- reduce the parameters
- model the polygon with a two-layer ConvLSTM with kernel size of 3x3 and 16 channels, which outputs a vertex at each time step
- formulate the vertex prediction as a classification task.
- Convoluntional LSTM
- Inputs: