README.md

# Awesome-Open-Vocabulary-Object-Detection

# Contact
```
scottn@foxmail.com
```

# <span id='Papers'>Papers</span>
## 2023
+ Lorenzo Bianchi, Fabio Carrara, Nicola Messina, Claudio Gennaro, Fabrizio Falchi. **The Devil is in the Fine-Grained Details: Evaluating Open-Vocabulary Object Detectors for Fine-Grained Understanding.** arxiv 2023. [[paper]](https://arxiv.org/abs/2311.17518)
+ **MIC**: Zhao Wang, Aoxue Li, Fengwei Zhou, Zhenguo Li, Qi Dou. **Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization.** BMVC 2023. [[paper]](https://proceedings.bmvc2023.org/93/)
+ **CoDet**: Chuofan Ma, Yi Jiang, Xin Wen, Zehuan Yuan, Xiaojuan Qi. **CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection.** NeurIPS 2023. [[paper]](https://arxiv.org/abs/2310.16667) [[code]](https://github.com/CVMI-Lab/CoDet)
+ **DE-ViT**: Xinyu Zhang, Yuting Wang, Abdeslam Boularias. **Detect Every Thing with Few Examples.** GCPR 2023. [[paper]](https://arxiv.org/abs/2309.12969) [[code]](https://github.com/mlzxy/devit)
+ **DITO**: Dahun Kim, Anelia Angelova, Weicheng Kuo. **Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection.** arxiv 2023. [[paper]](https://paperswithcode.com/paper/detection-oriented-image-text-pretraining-for) [[code]](https://github.com/google-research/google-research/tree/master/fvlm/dito)
+ **CFM-ViT**: Dahun Kim, Anelia Angelova, Weicheng Kuo. **Contrastive Feature Masking Open-Vocabulary Vision Transformer.** ICCV 2023. [[paper]](https://paperswithcode.com/paper/contrastive-feature-masking-open-vocabulary)
+ **EdaDet**: Cheng Shi, Sibei Yang. **EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment.** ICCV 2023. [[paper]](https://arxiv.org/abs/2309.01151) 
+ Jianzong Wu, Xiangtai Li, Henghui Ding, Xia Li, Guangliang Cheng, Yunhai Tong, Chen Change Loy. **Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation.** ICCV 2023. [[paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Wu_Betrayed_by_Captions_Joint_Caption_Grounding_and_Generation_for_Open_ICCV_2023_paper.pdf) [[code]](https://github.com/jianzongwu/betrayed-by-captions)
+ Jincheng Li, Chunyu Xie, Xiaoyu Wu, Bin Wang, Dawei Leng. **What Makes Good Open-Vocabulary Detector: A Disassembling Perspective.** KDD workshop 2023. [[paper]](https://arxiv.org/abs/2309.00227)
+ **MMC-Det**: Yifan Xu, Mengdan Zhang, Xiaoshan Yang, Changsheng Xu. **Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection.** arxiv 2023. [[paper]](https://arxiv.org/abs/2308.15846)
+ **OVDEval**: Yiyang Yao, Peng Liu, Tiancheng Zhao, Qianqian Zhang, Jiajia Liao, Chunxin Fang, Kyusong Lee, Qing Wang. **How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection.** arxiv 2023. [[paper]](https://arxiv.org/abs/2308.13177) [[code]](https://github.com/om-ai-lab/OVDEval)
+ **SAS-Det**: Shiyu Zhao, Samuel Schulter, Long Zhao, Zhixing Zhang, Vijay Kumar B. G, Yumin Suh, Manmohan Chandraker, Dimitris N. Metaxas. **Improving Pseudo Labels for Open-Vocabulary Object Detection.** arxiv 2023. [[paper]](https://arxiv.org/abs/2308.06412)
+ Chaoyang Zhu, Long Chen. **A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future.** arxiv 2023. [[paper]](https://arxiv.org/abs/2307.09220)
+ **UOVN**: Hengcan Shi, Munawar Hayat, Jianfei Cai. **Unified Open-Vocabulary Dense Visual Prediction.** arxiv 2023. [[paper]](https://arxiv.org/abs/2307.08238)
+ **SGDN**: Hengcan Shi, Munawar Hayat, Jianfei Cai. **Open-Vocabulary Object Detection via Scene Graph Discovery.** arxiv 2023. [[paper]](https://arxiv.org/abs/2307.03339)
+ **OWL-ST**: Matthias Minderer, Alexey Gritsenko, Neil Houlsby. **Scaling Open-Vocabulary Object Detection.** arxiv 2023. [[paper]](https://arxiv.org/abs/2306.09683)
+ Prannay Kaul, Weidi Xie, Andrew Zisserman. **Multi-Modal Classifiers for Open-Vocabulary Object Detection.** ICML 2023. [[paper]](https://openreview.net/pdf?id=Nuymym2DZF)[[code]](https://github.com/prannaykaul/mm-ovod)
+ **OpenSeeD**: Hao Zhang, Feng Li, Xueyan Zou, Shilong Liu, Chunyuan Li, Jianfeng Gao, Jianwei Yang, Lei Zhang. **A Simple Framework for Open-Vocabulary Segmentation and Detection.** arXiv 2023. [[paper]](https://arxiv.org/abs/2303.08131) [[code]](https://github.com/IDEA-Research/OpenSeeD)
+ Relja Arandjelović, Alex Andonian, Arthur Mensch, Olivier J. Hénaff, Jean-Baptiste Alayrac, Andrew Zisserman. **Three Ways to Improve Feature Alignment for Open Vocabulary Eetection.** arXiv 2023. [[paper]](https://arxiv.org/abs/2303.13518)
+ **Prompt-OVD**: Hwanjun Song, Jihwan Bang. **Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection.** arXiv 2023. [[paper]](https://arxiv.org/abs/2303.14386)
+ **PCL**: Han-Cheol Cho, Won Young Jhoo, Wooyoung Kang, Byungseok Roh. **Open-Vocabulary Object Detection using Pseudo Caption Labels.** arXiv 2023. [[paper]](https://arxiv.org/abs/2303.13040)
+ **CORA**: Xiaoshi Wu, Feng Zhu, Rui Zhao, Hongsheng Li. **CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching.** CVPR 2023. [[paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Wu_CORA_Adapting_CLIP_for_Open-Vocabulary_Detection_With_Region_Prompting_and_CVPR_2023_paper.pdf) [[code]](https://github.com/tgxs002/CORA)
+ Luting Wang, Yi Liu, Penghui Du, Zihan Ding, Yue Liao, Qiaosong Qi, Biaolong Chen, Si Liu. **Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection.** CVPR 2023. [[paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_Object-Aware_Distillation_Pyramid_for_Open-Vocabulary_Object_Detection_CVPR_2023_paper.pdf) [[code]](https://github.com/LutingWang/OADP)
+ **BARON**: Size Wu, Wenwei Zhang, Sheng Jin, Wentao Liu, Chen Change Loy. **Aligning Bag of Regions for Open-Vocabulary Object Detection.** CVPR 2023. [[paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Wu_Aligning_Bag_of_Regions_for_Open-Vocabulary_Object_Detection_CVPR_2023_paper.pdf) [[code]](https://github.com/wusize/ovdet)
+ **RO-ViT**: Dahun Kim, Anelia Angelova, Weicheng Kuo. **Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers.** CVPR 2023. [[paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Kim_Region-Aware_Pretraining_for_Open-Vocabulary_Object_Detection_With_Vision_Transformers_CVPR_2023_paper.pdf) [[code]](https://github.com/google-research/google-research/tree/master/fvlm/rovit)
+ **DetCLIPv2**: Lewei Yao, Jianhua Han, Xiaodan Liang, Dan Xu, Wei Zhang, Zhenguo Li, Hang Xu. **DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment.** CVPR 2023. [[paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Yao_DetCLIPv2_Scalable_Open-Vocabulary_Object_Detection_Pre-Training_via_Word-Region_Alignment_CVPR_2023_paper.pdf)
+ **CondHead**: Tao Wang. **Learning to Detect and Segment for Open Vocabulary Object Detection.** CVPR 2023. [[paper]](https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_Learning_To_Detect_and_Segment_for_Open_Vocabulary_Object_Detection_CVPR_2023_paper.pdf)
+ **F-VLM**: Weicheng Kuo, Yin Cui, Xiuye Gu, AJ Piergiovanni, Anelia Angelova. **F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models.** ICLR 2023. [[paper]](https://openreview.net/forum?id=MIMwy4kh9lf) [[code]](https://sites.google.com/view/f-vlm/home)
+ **VLDet**: Chuang Lin, Peize Sun, Yi Jiang, Ping Luo, Lizhen Qu, Gholamreza Haffari, Zehuan Yuan, Jianfei Cai. **Learning Object-Language Alignments for Open-Vocabulary Object Detection.** ICLR 2023. [[paper]](https://openreview.net/pdf?id=mjHlitXvReu) [[code]](https://github.com/clin1223/VLDet)
## 2022
+ **VTP-OVD**: Yanxin Long, Jianhua Han, Runhui Huang, Xu Hang, Yi Zhu, Chunjing Xu, Xiaodan Liang. **P<sup>3</sup>OVD: Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection.** arXiv 2022. [[paper]](https://arxiv.org/abs/2211.00849)
+ **MEDet**: Peixian Chen, Kekai Sheng, Mengdan Zhang, Yunhang Shen, Ke Li, Chunhua Shen. **Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization.** arXiv 2022. [[paper]](https://arxiv.org/abs/2206.11134) [[code]](https://github.com/PeixianChen/MEDet)
+ **LocOV**: Maria A. Bravo, Sudhanshu Mittal, Thomas Brox. **Localized Vision-Language Matching for Open-vocabulary Object Detection.** DAGM German Conference on Pattern Recognition (GCPR) 2022. [[paper]](https://arxiv.org/abs/2205.06160) [[code]](https://github.com/lmb-freiburg/locov)
+ **Object-Centric-OVD**: Hanoona Rasheed, Muhammad Maaz, Muhammad Uzair Khattak, Salman Khan, Fahad Shahbaz Khan. **Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection.** NeurIPS 2022. [[paper]](https://openreview.net/forum?id=aKXBrj0DHm) [[code]](https://github.com/hanoonaR/object-centric-ovd)
+ **VL-PLM**: Shiyu Zhao, Zhixing Zhang, Samuel Schulter, Long Zhao, Vijay Kumar B.G, Anastasis Stathopoulos, Manmohan Chandraker, Dimitris Metaxas. **Exploiting Unlabeled Data with Vision and Language Models for Object Detection.** ECCV 2022. [[paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136690156.pdf) [[code]](https://github.com/xiaofeng94/VL-PLM)
+ **PromptDet**: Chengjian Feng, Yujie Zhong, Zequn Jie, Xiangxiang Chu, Haibing Ren, Xiaolin Wei, Weidi Xie, Lin Ma. **PromptDet: Towards Open-vocabulary Detection using Uncurated Images.** ECCV 2022. [[paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136690691.pdf) [[code]](https://github.com/fcjian/PromptDet)
+ **OpenSeg**: Golnaz Ghiasi, Xiuye Gu, Yin Cui, Tsung-Yi Lin. **Scaling Open-Vocabulary Image Segmentation with Image-Level Labels.** ECCV 2022. [[paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136960532.pdf) [[code]](https://github.com/tensorflow/tpu/tree/641c1ac6e26ed788327b973582cbfa297d7d31e7/models/official/detection/projects/openseg)
+ **OV-DETR**: Yuhang Zang, Wei Li, Kaiyang Zhou, Chen Huang, Chen Change Loy. **Open-Vocabulary DETR with Conditional Matching.** ECCV 2022. [[paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136690107.pdf) [[code]](https://github.com/yuhangzang/OV-DETR)
+ **PB-OVD**: Mingfei Gao, Chen Xing, Juan Carlos Niebles, Junnan Li, Ran Xu, Wenhao Liu, Caiming Xiong. **Open Vocabulary Object Detection with Pseudo Bounding-Box Labels.** ECCV 2022. [[paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136700263.pdf) [[code]](https://github.com/salesforce/PB-OVD)
+ **OWL-ViT**: Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, Neil Houlsby. **Simple Open-Vocabulary Object Detection with Vision Transformers.** ECCV 2022. [[paper]](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136700714.pdf) [[code]](https://github.com/google-research/scenic/tree/main/scenic/projects/owl_vit)
+ **RegionCLIP**: Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, Jianfeng Gao. **RegionCLIP: Region-Based Language-Image Pretraining.** CVPR 2022. [[paper]](https://openaccess.thecvf.com/content/CVPR2022/html/Zhong_RegionCLIP_Region-Based_Language-Image_Pretraining_CVPR_2022_paper.html) [[code]](https://github.com/microsoft/RegionCLIP)
+ **XPM**: Dat Huynh, Jason Kuen, Zhe Lin, Jiuxiang Gu, Ehsan Elhamifar. **Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling.** CVPR 2022. [[paper]](https://openaccess.thecvf.com/content/CVPR2022/html/Huynh_Open-Vocabulary_Instance_Segmentation_via_Robust_Cross-Modal_Pseudo-Labeling_CVPR_2022_paper.html) [[code]](https://github.com/hbdat/cvpr22_cross_modal_pseudo_labeling)
+ **HierKD**: Zongyang Ma, Guan Luo, Jin Gao, Liang Li, Yuxin Chen, Shaoru Wang, Congxuan Zhang, Weiming Hu. **Open-Vocabulary One-Stage Detection With Hierarchical Visual-Language Knowledge Distillation.** CVPR 2022. [[paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Ma_Open-Vocabulary_One-Stage_Detection_With_Hierarchical_Visual-Language_Knowledge_Distillation_CVPR_2022_paper.pdf) [[code]](https://github.com/mengqiDyangge/HierKD)
+ **DetPro**: Yu Du, Fangyun Wei, Zihe Zhang, Miaojing Shi, Yue Gao, Guoqi Li. **Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model.** CVPR 2022. [[paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Du_Learning_To_Prompt_for_Open-Vocabulary_Object_Detection_With_Vision-Language_Model_CVPR_2022_paper.pdf) [[code]](https://github.com/dyabel/detpro)
+ **ViLD**: Xiuye Gu, Tsung-Yi Lin, Weicheng Kuo, Yin Cui. **Open-vocabulary Object Detection via Vision and Language Knowledge Distillation.** ICLR 2022. [[paper]](https://openreview.net/forum?id=lL3lnMbR4WU) [[code]](https://github.com/tensorflow/tpu/tree/master/models/official/detection/projects/vild)

## 2021
+ **OVR-CNN**: Alireza Zareian, Kevin Dela Rosa, Derek Hao Hu, Shih-Fu Chang. **Open-Vocabulary Object Detection Using Captions.** CVPR 2021. [[paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Zareian_Open-Vocabulary_Object_Detection_Using_Captions_CVPR_2021_paper.pdf) [[code]](https://github.com/alirezazareian/ovr-cnn)