This repo is for our paper Open Vocabulary Semantic Segmentation with Decoupled One-Pass Network. It is based on the official repo of SimBaseline. The project page is Project Page.
Results on COCO-Stuff and Pascal VOC in the open-vocabulary setting. Results on Pascal VOC, Pascal Context and ADE20K in the cross-dataset setting
Environment
conda create --name deop python=3.7
conda activate deop
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 -c pytorch
pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.8/index.html
pip install mmcv==1.3.14
#FurtherMore, install the modified clip package.
cd third_party/CLIP
python -m pip install -Ue .
Data Preparation
In our experiments, four datasets are used. For Cityscapes and ADE20k, follow the tutorial in MaskFormer.
- For COCO Stuff 164k:
- Download data from the offical dataset website and extract it like below.
Datasets/ coco/ #http://images.cocodataset.org/zips/train2017.zip train2017/ #http://images.cocodataset.org/zips/val2017.zip val2017/ #http://images.cocodataset.org/annotations/annotations_trainval2017.zip annotations/ #http://calvin.inf.ed.ac.uk/wp-content/uploads/data/cocostuffdataset/stuffthingmaps_trainval2017.zip stuffthingmaps/
- Format the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.
python datasets/prepare_coco_stuff_164k_sem_seg.py datasets/coco python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/train2017_base datasets/coco/stuffthingmaps_detectron2/train2017_base_label_count.pkl python tools/mask_cls_collect.py datasets/coco/stuffthingmaps_detectron2/val2017 datasets/coco/stuffthingmaps_detectron2/val2017_label_count.pkl
- Download data from the offical dataset website and extract it like below.
- For Pascal VOC 11k:
- Download data from the offical dataset website and extract it like below.
datasets/ VOC2012/ #http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar JPEGImages/ val.txt #http://home.bharathh.info/pubs/codes/SBD/download.html SegmentationClassAug/ #https://gist.githubusercontent.com/sun11/2dbda6b31acc7c6292d14a872d0c90b7/raw/5f5a5270089239ef2f6b65b1cc55208355b5acca/trainaug.txt train.txt
- Format the data to detecttron2 style and split it into Seen (Base) subset and Unseen (Novel) subset.
python datasets/prepare_voc_sem_seg.py datasets/VOC2012 python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/train datasets/VOC2012/annotations_detectron2/train_base_label_count.json python tools/mask_cls_collect.py datasets/VOC2012/annotations_detectron2/val datasets/VOC2012/annotations_detectron2/val_label_count.json
The demo is almost training-free, we only train the learnable text prompt. You can download the weights from text prompt which is trained on COCO-Stuff-156 dataset.
python3 demo.py --input dataset/test/000000000285.jpg --output ./output
or you can add --class-names to set classes.
python3 demo.py --input dataset/test/000000000285.jpg --output ./output --class-names bear other
We release the weights of DeOP in model weight. The results are shown on Results.
sh deop_verifycoco.sh
# 1.Train a learnable text prompt model.
python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_proposal_classification_learn_prompt_bs32_10k.yaml --num-gpus 8
# 2. Train a MaskFormer model.
python train_net.py --config-file configs/coco-stuff-164k-156/zero_shot_maskformer_R101c_bs32_60k.yaml --num-gpus 8 MODEL.CLIP_ADAPTER.PROMPT_CHECKPOINT ${TRAINED_PROMPTS}
# 3. Train DeOP.
sh deop_train.sh
If you find this project useful for your research, please consider citing the following BibTeX entry.
@proceedings{Han2023ZeroShotSS,
title={Zero-Shot Semantic Segmentation with Decoupled One-Pass Network},
author={Cong Han and Yujie Zhong and Dengjie Li and Kai Han and Lin Ma},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year={2023},
}