1708.01241 - DSOD: Learning Deeply Supervised Object Detectors from Scratch

Other ref:
Introduction:
- DSOD: learn object detectors from scratch
- Combination of SSD and DenseNet
Previous work focus on pretraining / fine tuning on ImageNet, cons:
- Limited structure design space
- Learning bias
- Domain mismatch
Our method:
- Use backbone sub-network to extract features: modified DenseNet with a stem block, four dense blocks, two transition layers and two transition w/o pooling layers
- Use front-end sub-network to predict over multi-scale response maps: fuses multi-scale prediction responses with an elaborated dense structure
Design principle and result:
- Proposal free: SSD
  - Faster RCNN and R-FCN can not converge, SSD can converge but worse than pretrained model
- Deep supervision: DenseNet, alleviate gradient vanishing
  - Make pretrain-free more accurate than pretrained and fine-tuned SSD
- Stem block: inspired by Inception-v3/v4
  - Better preformance
- Dense prediction structure: SSD
  - A little slower than plain structure
  - More accuracy: increase by 0.4%
  - Less paramaters: decrease by 3.4M
An strange observation: by using pretraining and fine tuning, the performance of DSOD is even lower(-0.4%) than the one without pretraining

Provide feedback