Skip to content

Latest commit

 

History

History
180 lines (146 loc) · 5.68 KB

README.md

File metadata and controls

180 lines (146 loc) · 5.68 KB

MUM : Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection (CVPR2022)

This is the Pytorch implementation of our paper :
MUM : Mix Image Tiles and UnMix Feature Tiles for Semi-Supervised Object Detection
IEEE/CVF International Conference on Computer Vision (CVPR), 2022
[arXiv]

Installtion & Setup

We follow the installation precess of Unbiased Teacher official repo (https://github.com/facebookresearch/unbiased-teacher)

Download the code

  • For your convenience, we provide the code and model weights in zip

Prerequisites

  • Linux or macOS with Python ≥ 3.6
  • PyTorch ≥ 1.5 and torchvision that matches the PyTorch installation.

Build Detectron2 from Source

  • We find the latest(v0.6) package of Detectron2 occur the error with our code.
  • Therefore, please install the matched(v0.5) version of Detectron2 as follows:
# get the Detectron2 v0.5 package
wget https://github.com/facebookresearch/detectron2/archive/refs/tags/v0.5.zip

# unzip
unzip v0.5.zip

# install
python -m pip install -e detectron2-0.5

Install other requirements

pip install -r requirements.txt

Dataset download

  1. Download COCO & VOC dataset

  2. Organize the dataset as following:

mix-unmix/
└── datasets/
    ├── coco/
    │   ├── train2017/
    │   ├── val2017/
    │   └── annotations/
    │   	├── instances_train2017.json
    │   	└── instances_val2017.json
    ├── VOC2007
    │   ├── Annotations
    │   ├── ImageSets
    │   └── JPEGImages
    └── VOC2012
        ├── Annotations
        ├── ImageSets
        └── JPEGImages

Evaluation

  • Performance table and Model Weights (weight files are already included in zip file)
Backbone Protocols AP50 AP50:95 Model Weights
R50-FPN COCO-Standard 1% 40.06 21.89 link
R50-FPN COCO-Additional 63.30 42.11 link
R50-FPN VOC07 (VOC12) 78.94 50.22 link
R50-FPN VOC07 (VOC12 / COCO20cls) 80.45 52.31 link
Swin COCO-Standard 0.5% 34.25 16.52 link
  • Run Evaluation w/ R50 in COCO
python train_net.py \
      --eval-only \
      --num-gpus 1 \
      --config configs/mum_configs/coco.yaml \
      MODEL.WEIGHTS weights/<your weight>.pth
  • Run Evaluation w/ R50 in VOC
python train_net.py \
      --eval-only \
      --num-gpus 1 \
      --config configs/mum_configs/voc.yaml \
      MODEL.WEIGHTS weights/<your weight>.pth

Train

We use 4 GPUs (A6000 or V100 32GB) to achieve the paper results.

  • Train the MUM under 1% COCO-supervision (ResNet-50)
python train_net.py \
      --num-gpus 4 \
      --config configs/mum_configs/coco.yaml \
  • Train the MUM under VOC07 as labeled set and VOC12 as unlabeled set
python train_net.py \
      --num-gpus 4 \
      --config configs/mum_configs/voc.yaml \

Swin

  • Download ImageNet pretrained weight of swin-t in link
  • mv pretrained weight to weights folder
mv swin_tiny_patch4_window7_224.pth weights/
  • Run Evaluation w/ Swin in COCO
python train_net.py \
      --eval-only \
      --num-gpus 1 \
      --config configs/mum_configs/coco_swin.yaml \
      MODEL.WEIGHTS weights/<your weight>.pth
      
  • Train under 0.5% COCO-supervision
python train_net.py \
      --num-gpus 4 \
      --config configs/mum_configs/coco_swin.yaml \

Mix/UnMix code block

Mixing code block

  • Generate mix mask
mask = torch.argsort(torch.rand(bs // ng, ng, nt, nt), dim=1).cuda()
img_mask = mask.view(bs // ng, ng, 1, nt, nt)
img_mask = img_mask.repeat_interleave(3, dim=2)
img_mask = img_mask.repeat_interleave(h // nt, dim=3)
img_mask = img_mask.repeat_interleave(w // nt, dim=4)
  • Mixing image tiles
img_tiled = images.tensor.view(bs // ng, ng, c, h, w)
img_tiled = torch.gather(img_tiled, dim=1, index=img_mask)
img_tiled = img_tiled.view(bs, c, h, w)

Unmixing code block

  • Generate inverse mask to unmix
inv_mask = torch.argsort(mask, dim=1).cuda()
feat_mask = inv_mask.view(bs//ng,ng,1,nt,nt)
feat_mask = feat_mask.repeat_interleave(c,dim=2)
feat_mask = feat_mask.repeat_interleave(h//nt, dim=3)
feat_mask = feat_mask.repeat_interleave(w//nt, dim=4)
  • Unmixing feature tiles
feat_tiled = feat.view(bs//ng,ng,c,h,w)
feat_tiled = torch.gather(feat_tiled, dim=1, index=feat_mask)
feat_tiled = feat_tiled.view(bs,c,h,w)

Acknowledgements

We use Unbiased-teacher official code as our baseline. And also we use Timm repository to implement Swin Transformer easily.