Code for SemiCD-VL (formerly: DiffMatch) paper: SemiCD-VL: Visual-Language Model Guidance Makes Better Semi-supervised Change Detector.
Change Detection (CD) aims to identify pixels with semantic changes between images. However, annotating massive numbers of pixel-level images is labor-intensive and costly, especially for multi-temporal images, which require pixel-wise comparisons by human experts. Considering the excellent performance of visual language models (VLMs) for zero-shot, open-vocabulary, etc. with prompt-based reasoning, it is promising to utilize VLMs to make better CD under limited labeled data. In this paper, we propose a VLM guidance-based semi-supervised CD method, namely SemiCD-VL. The insight of SemiCD-VL is to synthesize free change labels using VLMs to provide additional supervision signals for unlabeled data. However, almost all current VLMs are designed for single-temporal images and cannot be directly applied to bi- or multi-temporal images. Motivated by this, we first propose a VLM-based mixed change event generation (CEG) strategy to yield pseudo labels for unlabeled CD data. Since the additional supervised signals provided by these VLM-driven pseudo labels may conflict with the original pseudo labels from the consistency regularization paradigm (e.g. FixMatch), we propose the dual projection head for de-entangling different signal sources. Further, we explicitly decouple the bi-temporal images semantic representation through two auxiliary segmentation decoders, which are also guided by VLM. Finally, to make the model more adequately capture change representations, we introduce contrastive consistency regularization by constructing feature-level contrastive loss in auxiliary branches. Extensive experiments show the advantage of SemiCD-VL. For instance, SemiCD-VL improves the FixMatch baseline by +5.3
We evaluate SemiCD-VL on 2 change detection datasets (LEVIR-CD and WHU-CD), where it achieves major gains over previous semi-supervised methods as shown below.
If you find SemiCD-VL useful in your research, please consider citing:
@article{li2024semicd_vl,
title={SemiCD-VL: Visual-Language Model Guidance Makes Better Semi-supervised Change Detector},
author={Li, Kaiyu and Cao, Xiangyong and Deng, Yupeng and Liu, Junmin and Meng, Deyu and Wang, Zhi},
journal={arXiv preprint arXiv:2405.04788},
year={2024}
}
Create a conda environment:
conda create -n semicd_vl python=3.7.13
conda activate semicd_vl
Install the required pip packages:
pip install -r requirements.txt
ResNet-50 | ResNet-101 | Xception-65
├── ./pretrained
├── resnet50.pth
├── resnet101.pth
└── xception.pth
- WHU-CD: imageA, imageB, and label
- LEVIR-CD: imageA, imageB, and label
Please modify your dataset path in configuration files.
├── [Your WHU-CD/LEVIR-CD Path]
├── A
├── B
└── label
We provide the generated pseudo labels in the gen_cd_label
and gen_seg_label
directories (Download), and you can skip this step. If you want to reproduce our results step by step, you can refer to the following:
APE is a vision-language model which can conduct open-vocabulary detection and segmentation. We directly use the released checkpoint APE-D to infer the roughly defined categories house, building, road, grass, tree, water
, using the following commands:
# As an example, generate pre-event pseudo labels for the WHU-CD dataset.
python demo/demo_lazy.py --config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py --input data/WHU-CD-256/A/*.png --output APE_output/whu-cd_pseudo-label_ape_prob/A/ --confidence-threshold 0.2 --text-prompt 'house,building,road,grass,tree,water' --with-sseg --opts train.init_checkpoint=model_final.pth model.model_vision.select_box_nums_for_evaluation=500 model.model_vision.text_feature_bank_reset=True
Before executing the above commands, please make sure that you have successfully built the APE environment. Please refer here to build APE's reasoning environment, we highly recommend using docker to build it.
After reasoning with APE, use the following commands for execute Change Event Generation (CEG) strategy:
# Execute instance-level CEG strategy
python scripts/gen_cd_map_json.py
# Execute Mixed CEG strategy
python scripts/gen_cd_map.py
To launch a training job, please run:
python experiments.py --exp EXP_ID --run RUN_ID
# e.g. EXP_ID=47; RUN_ID=0 for SemiCD-VL on LEVIR-CD with 5% labels
It will automatically generate the relevant config files in configs/generated/
and start the corresponding training job.
For more information on the available experiments and runs, please refer to def generate_experiment_cfgs(exp_id)
in experiments.py.
The training log, tensorboard, checkpoints, and debug images are stored in exp/
.
The following list provides the most relevant files of SemiCD-VL(DiffMatch)'s implementation:
- experiments.py: Definitions of the experiment configs used in the paper.
- diffmatch_fixmatch.py: Main training logic for DiffMatch.
- model/vlm.py: Vision-language model class.
- model/builder.py: Logic for building a model from a config including a forward wrapper for feature perturbations.
- third_party/unimatch/dataset/semicd.py: Data loader for semi-supervised training.
- configs/_base_/models: Model config files.
SemiCD-VL is based on SemiVL, UniMatch, APE, and MMSegmentation. We thank their authors for making the source code publicly available.