This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World" (Accepted by ICCV 2023)
Here we provide sample code for CaCao boosting SGG dataset in standard setting and open-world setting.
Download the enhanced dataset for VG training, you can use this Google drive link.
python adaptive_cluster.py # obtain initialized clusters for CaCao
python fine_grained_mapping.py # establish the mapping from open-world boosted data to target predicates for enhancement
python cross_modal_tuning.py # obtain cross-modal prompt tuning models for better predicate boosting
python fine_grained_predicate_boosting.py # enhance the existing SGG dataset with our CaCao model in <pre_trained_visually_prompted_model>
The SGG part code is implemented based on Scene-Graph-Benchmark.pytorch, FGPL, and SSRCNN(One-Stage). Thanks for their great works!
If you find this work useful for your research, please cite our paper and star our git repo:
@inproceedings{yu2023visually,
title={Visually-prompted language model for fine-grained scene graph generation in an open world},
author={Yu, Qifan and Li, Juncheng and Wu, Yu and Tang, Siliang and Ji, Wei and Zhuang, Yueting},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={21560--21571},
year={2023}
}