Awesome-Open-Vocabulary-Semantic-Segmentation

If you find this project helpful, please consider giving it a star ⭐.

Open-Vocabulary Semantic Segmentation (mainly updated by @tbh3223)
Zero-shot Semantic Segmentation (mainly updated by @tbh3223)
Referring-Image-Segmentation (mainly updated by @ghost-000)
- Fully-Supervised Methods
- Weakly-Supervised Methods
Open-Vocabulary Object Detection (mainly updated by @tbh3223)
Universal Semantic Segmentation (mainly updated by @tbh3223)
Other Related Work
Related Survey

Open-Vocabulary Semantic Segmentation

Fully-Supervised Open-Vocabulary Semantic Segmentation

The model is trained on fully-supervised semantic segmentation datasets with pixel-level annotations (e.g., COCO Stuff dataset).

[LSeg] | ICLR'22 | Language-driven Semantic Segmentation | [pdf] | [code]
[OpenSeg] | ECCV'22 | Scaling Open-vocabulary Image Segmentation with Image-level Labels | [pdf] | [code]
[Xu et al.] | ECCV'22 | A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model | [pdf] | [code]
[SegCLIP] | ICML'23 | SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
[MaskCLIP] | ICML'23 | Open-Vocabulary Universal Image Segmentation with MaskCLIP | [pdf] | [code]
[OVSeg] | CVPR'23 | Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP | [pdf] | [code]
[X-Decoder] | CVPR'23 | Generalized Decoding for Pixel, Image, and Language | [pdf] | [code]
[SAN] | CVPR'23(Highlight) | Side Adapter Network for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
[SAN] | TAPMI'23 | SAN: Side Adapter Network for Open-vocabulary Semantic Segmentation | [pdf] | [code]
[ODISE] | CVPR'23 | Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models | [pdf] | [code]
[FreeSeg] | CVPR'23 | FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation | [pdf] | [code]
[OpenSeeD] | ICCV'23 | A Simple Framework for Open-Vocabulary Segmentation and Detection | [pdf] | [code]
[GKC] | ICCV'23 | Global Knowledge Calibration for Fast Open-Vocabulary Segmentation | [pdf]
[OPSNet] | ICCV'23 | Open-vocabulary Panoptic Segmentation with Embedding Modulation | [pdf] | [code]
[MasQCLIP] | ICCV'23 | MasQCLIP for Open-Vocabulary Universal Image Segmentation | [pdf]
[DeOP] | ICCV'23 | Open Vocabulary Semantic Segmentation with Decoupled One-Pass Network | [pdf] | [code]
[Li et al.] | ICCV'23 | Open-vocabulary Object Segmentation with Diffusion Models | [pdf] | [code]
[HIPIE] | NeurIPS'23 | Hierarchical Open-vocabulary Universal Image Segmentation | [pdf] | [code]
[FC-CLIP] | NeurIPS'23 | Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP | [pdf] | [code]
[MAFT] | NeurIPS'23 | Learning Mask-aware CLIP Representations for Zero-Shot Segmentation | [pdf] | [code]
[ADA] | NeurIPS'23 | Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation | [pdf]
[Dao et al] | TMM | Class Enhancement Losses with Pseudo Labels for Open-Vocabulary Semantic Segmentation | [pdf]
[SELF-SEG] | Arixv'23.12 | Self-Guided Open-Vocabulary Semantic Segmentation | [pdf]
[OpenSD] | Arixv'23.12 | OpenSD: Unified Open-Vocabulary Segmentation and Detection | [pdf] | [code]
[RENOVATE] | Arixv'24.03 | Renovating Names in Open-Vocabulary Segmentation Benchmarks | [pdf]
[DreamCLIP] | ECCV'24 | DreamLIP: Language-Image Pre-training with Long Captions | [pdf] | [code]
[CAT-Seg] | CVPR'24 | CAT-Seg : Cost Aggregation for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
[SED] | CVPR'24 | SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
[SCAN] | CVPR'24 | Open-Vocabulary Segmentation with Semantic-Assisted Calibration | [pdf] | [code]
[OpenTrans] | CVPR'24 | Transferable and Principled Efficiency for Open-Vocabulary Segmentation | [pdf] | [code])
[H-CLIP] | Arixv'24.05 | Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation | [pdf]
[OpenDAS] | Arixv'24.05 | OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation | [pdf]
[USE] | CVPR'24 | USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation | [pdf]
[EBSeg] | CVPR'24 | Open-Vocabulary Semantic Segmentation with Image Embedding Balancing | [pdf] | [code])
[MAFT+] | ECCV'24 | Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation | [pdf] | [code])
[R-Adapter] | ECCV'24 | Efficient and Versatile Robust Fine-Tuning of Zero-shot Models | [pdf] | [code])
[MROVSeg] | Arixv'24.08 | MROVSeg: Breaking the Resolution Curse of Vision-Language Models in Open-Vocabulary Semantic Segmentation | [pdf]
[FrozenSeg] | Arixv'24.09 | FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation | [pdf] | [code]

Weakly-Supervised Open-Vocabulary Semantic Segmentation

[text-supervised/language-supervised] The model is trained on weakly supervised datasets with only image-level annotations/captions (e.g., CC12M dataset).

[GroupViT] | CVPR'22 | GroupViT: Semantic Segmentation Emerges from Text Supervision | [pdf] | [code]
[ViL-Seg] | ECCV'22 | Open-world Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding | [pdf]
[MaskCLIP+] | ECCV'22(Oral) | Extract Free Dense Labels from CLIP | [pdf] | [code]
[ViewCo] | ICLR'23 | Viewco: Discovering Text-supervised Segmentation Masks via Multi-view Semantic Consistency | [pdf]
[SegCLIP] | ICML'23 | SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
[CLIP-S4] | CVPR'23 | CLIP-S4: Language-Guided Self-Supervised Semantic Segmentation | [pdf]
[PACL] | CVPR'23 | Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning | [pdf]
[OVSegmentor] | CVPR'23 | Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision | [pdf] | [code]
[SimSeg] | CVPR'23 | A Simple Framework for Text-Supervised Semantic Segmentation | [pdf] | [code]
[TCL] | CVPR'23 | Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs | [pdf] | [code]
[SimCon] | Arxiv'23.02 | SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation | [pdf]
[Zhang et al.] | Arxiv'23.04 | Associating Spatially-Consistent Grouping with Text-supervised Semantic Segmentation | [pdf]
[ZeroSeg] | ICCV'23 | Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only | [pdf]
[CLIPpy] | ICCV'23 | Perceptual Grouping in Contrastive Vision-Language Models | [pdf]
[MixReorg] | ICCV'23 | MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation | [pdf]
[CoCu] | NeurIPS'23 | Bridging Semantic Gaps for Language-Supervised Semantic Segmentation | [pdf] | [code]
[PGSeg] | NeurIPS'23 | Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation | [pdf] | [code]
[SAM-CLIP] | Arixv'23.10 | SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding | [pdf]
[CLIP-DINOiser] | Arixv'23.12 | CLIP-DINOiser: Teaching CLIP a few DINO tricks | [pdf] | [code]
[TagAlign] | Arixv'23.12 | TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification | [pdf] | [code]
[S-Seg] | Arixv'24.01 | Exploring Simple Open-Vocabulary Semantic Segmentation | [pdf] | [code]
[CLIPSelf] | ICLR'24(Spotlight) | CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction | [pdf] | [code]
[Uni-OVSeg] | Arixv'24.02 | Open-Vocabulary Segmentation with Unpaired Mask-Text Supervision | [pdf] | [code]
[MGCA] | Arixv'24.03 | Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision | [pdf]
[TTD] | Arixv'24.04 | TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias | [pdf] | [code]
[CoDe] | CVPR'24 | Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation | [pdf]
[LLM-Supervision] | Arixv'24.03 | Training-Free Semantic Segmentation via LLM-Supervision | [pdf]
[ProxyCLIP] | ECCV'24 | ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation | [pdf] | [code]

Training-Free Open-Vocabulary Semantic Segmentation

The model is modified from the off-the-shelf large models (e.g., CLIP, Diffusion models) without an additional training phase. Note that, the large models have already been trained with some datasets (e.g., image-caption datasets).

[MaskCLIP] | ECCV'22(Oral) | Extract Free Dense Labels from CLIP | [pdf] | [code]
[ReCo] | NeurIPS'22 | ReCo: Retrieve and Co-segment for Zero-shot Transfer | [pdf] | [code]
[CLIP Surgery] | Arxiv'23.04 | CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks | [pdf] | [code]
[OVDiff] | Arxiv'23.06 | Diffusion Models for Zero-Shot Open-Vocabulary Segmentation | [pdf]
[DiffSegmenter] | Arxiv'23.09 | Diffusion Model is Secretly a Training-free Open Vocabulary Semantic Segmenter | [pdf] | [code]
[IPSeg] | IJCV'24 | Towards Training-free Open-world Segmentation via Image Prompting Foundation Models | [pdf]
[SCLIP] | Arxiv'23.12 | SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference | [pdf]
[GEM] | CVPR'24 | Grounding Everything: Emerging Localization Properties in Vision-Language Transformers | [pdf] | [code]
[CLIP-DIY] | WACV'24 | CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free | [pdf]
[FOSSIL] | WACV'24 | FOSSIL: Free Open-Vocabulary Semantic Segmentation through Synthetic References Retrieval | [pdf]
[TagCLIP] | AAAI'24 | TagCLIP: A Local-to-Global Framework to Enhance Open-VocabularyMulti-Label Classification of CLIP Without Training | [pdf] | [code]
[EmerDiff] | ICLR'24 | EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models | [pdf] | [code]
[FreeSeg-Diff] | Arxiv'24.03 | FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models | [pdf] | [code]
[MaskDiffusion] | Arxiv'24.03 | MaskDiffusion: Exploiting Pre-trained Diffusion Models for Semantic Segmentation | [pdf] | [code]
[TAG] | Arxiv'24.03 | TAG: Guidance-free Open-Vocabulary Semantic Segmentation | [pdf] | [code]
[Sun et al.] | Arxiv'24.04 | Training-Free Semantic Segmentation via LLM-Supervision | [pdf]
[NACLIP] | Arxiv'24.04 | Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation| [pdf] | [code]
[PnP-OVSS] | CVPR'24 | Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models | [pdf] | [code]
[CaR] | CVPR'24 | CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor | [pdf] | [code]
[Wang et al.] | CVPR'24 | Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation | [pdf] | [code]
[FreeDA] | CVPR'24 | Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation| [pdf] | [code]
[Yang et al.] | Arxiv'24.05 | Tuning-free Universally-Supervised Semantic Segmentation | [pdf]
[CLIPTrase] | ECCV'24 | Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation | [pdf] | [code]
[ClearCLIP] | ECCV'24 | ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference | [pdf] | [code]
[ProxyCLIP] | ECCV'24 | ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation | [pdf] | [code]
[LaVG] | ECCV'24 | In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation | [pdf] | [code]

Others

[EntitySeg] | Arxiv'23.11 | Rethinking Evaluation Metrics of Open-Vocabulary Segmentation | [pdf] | [code]

Zero-Shot Semantic Segmentation

Different from open-vocabulary segmentation (cross-dataset), zero-shot methods split each dataset to seen classes and unseen classes.

[ZegFormer] | CVPR'22 | ZegFormer: Decoupling Zero-Shot Semantic Segmentation | [pdf] | [code]
[Xu et al.] | ECCV'22 | A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language Model | [pdf] | [code]
[ZegCLIP] | CVPR'23 | ZegCLIP: Towards Adapting CLIP for Zero-shot Semantic Segmentation | [pdf] | [code]
[PADing] | CVPR'23 | Primitive Generation and Semantic-related Alignment for Universal Zero-Shot Segmentation | [pdf] | [code]
[DeOP] | ICCV'23 | Open Vocabulary Semantic Segmentation with Decoupled One-Pass Network | [pdf] | [code]
[SPT] | AAAI'24 | Spectral Prompt Tuning: Unveiling Unseen Classes for Zero-Shot Semantic Segmentation | [pdf] | [code]
[Chen et al.] | Arxiv'24.02 | Generalizable Semantic Vision Query Generation for Zero-shot Panoptic and Semantic Segmentation | [pdf]
[LDVC] | Arxiv'24.03 | Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation | [pdf]
[OTSeg] | Arxiv'24.03 | OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation | [pdf]
[Cascade-CLIP] | ICML'24 | Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation | [pdf] | [code]
[SimZSS] | Arxiv'24.07 | A Simple Framework for Open-Vocabulary Zero-Shot Segmentation | [pdf]
[CaR] | CVPR'24 | CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor | [pdf] | [code]

Referring Image Segmentation

Fully-Supervised Referring Image Segmentation

[CARIS] | ACM MM'23 | CARIS: Context-Aware Referring Image Segmentation | [pdf] | [code]
[BKINet] | TMM'23 | Bilateral Knowledge Interaction Network for Referring Image Segmentation | [pdf] | [code]
[Group-RES] | ICCV'23 | Advancing Referring Expression Segmentation Beyond Single Image | [pdf] | [code]
[RIS-DMMI] | ICCV'23 | Beyond One-to-One: Rethinking the Referring Image Segmentation | [pdf] | [code]
[ETRIS] | ICCV'23 | Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation | [pdf] | [code]
[SEEM] | ArXiv'23.04 | Segment Everything Everywhere All at Once | [pdf] | [code]

Weakly-Supervised Referring Image Segmentation

[Strudel et al.] | ArXiv'22.05 | Weakly-supervised segmentation of referring expressions | [pdf]
[Kim et al.] | ICCV'23 | Shatter and Gather: Learning Referring Image Segmentation with Text Supervision | [pdf] | [code]
[TRIS] | ICCV'23 | Referring Image Segmentation Using Text Supervision | [pdf] | [code]
[Jungbeom Lee et al.] | ICCV'23 | Weakly Supervised Referring Image Segmentation with Intra-Chunk and Inter-Chunk Consistency | [pdf]
[PPT] | CVPR'24 | Curriculum Point Prompting for Weakly-Supervised Referring Segmentation | [pdf]

Open-Vocabulary Object Detection

[RO-ViT] | CVPR'23(Highlight) | Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers | [pdf] | [code]
[CAT] | CVPR'23 | CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection | [pdf] | [code]
[DetCLIPv2] | CVPR'23 | DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment | [pdf]
[CondHead] | CVPR'23 | Learning to Detect and Segment for Open Vocabulary Object Detection | [pdf]
[CORA] | CVPR'23 | CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching | [pdf] | [code]
[ovdet] | CVPR'23 | Aligning Bag of Regions for Open-Vocabulary Object Detection | [pdf] | [code]
[OADP] | CVPR'23 | Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection | [pdf] | [code]
[F-VLM] | ICLR'23 | F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models | [pdf] | [code]
[mm-ovod] | ICML 2023 | Multi-Modal Classifiers for Open-Vocabulary Object Detection | [pdf] | [code]
[SGDN] | Arxiv'23.07 | Open-Vocabulary Object Detection via Scene Graph Discovery | [pdf]
[MMC-Det] | Arxiv'23.08 | Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection | [pdf]
[SAS-Det] | CVPR'24 | Taming Self-Training for Open-Vocabulary Object Detection | [pdf] | [code]
[DITO] | Arxiv'23.09 | Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection | [pdf] | [code]
[EdaDet] | ICCV'23 | EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment | [pdf] | [code]
[LP-OVOD] | WACV'24 | LP-OVOD: Open-Vocabulary Object Detection by Linear Probing | [pdf] | [code]
[DST-Det] | Arxiv'23.10 | DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection | [pdf] | [code]
[CoDet] | NeurIPS'23 | CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection | [pdf] | [code]
[PLAC] | Arxiv'23.12 | Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection | [pdf]
[Sambor] | Arxiv'23.12 | Boosting Segment Anything Model Towards Open-Vocabulary Learning | [pdf] | [code]
[DVDet] | ICLR'24 | LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors | [pdf]
[DetCLIPv3] | CVPR'24 | DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection | [pdf]
[AggDet] | Arxiv'24.04 | Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation | [pdf]
[RALF] | CVPR'24 | Retrieval-Augmented Open-Vocabulary Object Detection | [pdf] | [code]
[Chhipa et al.] | Arxiv'24.06 | Investigating Robustness of Open-Vocabulary Foundation Object Detectors under Distribution Shifts | [pdf]
[SHiNe] | CVPR'24(Highlight) | SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection | [pdf] | [code]
[RTGen] | Arxiv'24.06 | RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection | [pdf] | [code]
[LBP] | CVPR'24 | Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection | [pdf]
[YOLO-World] | CVPR'24 | Real-Time Open-Vocabulary Object Detection | [pdf] | [code]
[OV-DINO] | Arxiv'24.07 | Unified Open-Vocabulary Detection with Language-Aware Selective Fusion | [pdf] | [code]
[OVLW-DETR] | Arxiv'24.07 | OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer | [pdf] | [code]
[LaMI-DETR] | ECCV'24 | LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction | [pdf] | [code]
[MarvelOVD] | ECCV'24 | MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection | [pdf] | [code]

Universal Semantic Segmentation

[Semantic-SAM] | ECCV'24 | Semantic-SAM: Segment and Recognize Anything at Any Granularity | [pdf] | [code]
[Open-Vocabulary SAM] | ECCV'24 | Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively | [pdf] | [code]
[OMG-Seg] | CVPR'24 | OMG-Seg: Is One Model Good Enough For All Segmentation? | [pdf] | [code]

Other Related Work

[DENOISER] | Arxiv'24.04 | DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition | [pdf]
[O2V-mapping] | Arxiv'24.04 | O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation | [pdf]
[CMD-SE] | CVPR'24 | Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection | [pdf]
[FG-CLIP] | CBMI'24 | Is CLIP the main roadblock for fine-grained open-world perception? | [pdf] | [code]
[NegPrompt] | CVPR'24 | Learning Transferable Negative Prompts for Out-of-Distribution Detection | [pdf] | [code]
[OVFoodSeg] | CVPR'24 | OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation | [pdf]
[Fed-MP] | NAACL'24 | Open-Vocabulary Federated Learning with Multimodal Prototyping | [pdf]
[PSALM] | ECCV'24 | PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model | [pdf] | [code]
[OVAM] | Arxiv'24.03 | Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models | [pdf]
[CLIP-VIS] | Arxiv'24.06 | CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation | [pdf]
[RoboHop] | ICRA'24 | RoboHop: Segment-based Topological Map Representation for Open-World Visual Navigation | [pdf]
[Rein] | CVPR'24 | Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation | [pdf] | [code]
[OVMR] | CVPR'24 | OVMR: Open-Vocabulary Recognition with Multi-Modal References | [pdf] | [code]
[PartCLIPSeg] | Arxiv'24.06 | Understanding Multi-Granularity for Open-Vocabulary Part Segmentation | [pdf] | [code]
[GBC] | Arxiv'24.07 | Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions | [pdf]
[TCC] | Arxiv'24.07 | A Study of Test-time Contrastive Concepts for Open-world, Open-vocabulary Semantic Segmentation | [pdf]
[OPS] | ECCV'24 | Open Panoramic Segmentation | [pdf] | [code]
[Yu et al.] | Arxiv'24.07 | PanopticRecon: Leverage Open-vocabulary Instance Segmentation for Zero-shot Panoptic Reconstruction | [pdf]
[Oryon] | CVPR'24(Highlight) | Oryon: Open-Vocabulary Object 6D Pose Estimation | [pdf] | [code]
[GLIS] | ECCV'24 | Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection | [pdf] | [code]
[OVExp] | Arxiv'24.07 | OVExp: Open Vocabulary Exploration for Object-Oriented Navigation | [pdf] | [code]
[OV-MLVC] | Arxiv'24.07 | Open Vocabulary Multi-Label Video Classification | [pdf]
[DART] | Arxiv'24.07 | An automated end-to-end object detection pipeline with data Diversification, open-vocabulary bounding box Annotation, pseudo-label Review, and model Training | [pdf] | [code]
[NOVIC] | Arxiv'24.07 | Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion | [pdf]
[CerberusDet] | Arxiv'24.07 | CerberusDet: Unified Multi-Task Object Detection | [pdf]
[GGSD] | Arxiv'24.07 | Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation | [pdf] | [code]
[Diff2Scene] | ECCV'24 | Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models | [pdf]
[SegPoint] | ECCV'24 | SegPoint: Segment Any Point Cloud via Large Language Model | [pdf] | [code]
[LangOcc] | Arxiv'24.07 | LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering | [pdf]
[OVR] | Arxiv'24.07 | A Dataset for Open Vocabulary Temporal Repetition Counting in Videos | [pdf] | [code]
[SAM-CP] | Arxiv'24.07 | SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation | [pdf] | [code]
[OV-AVSS] | ACM MM'24(Oral) | Open-Vocabulary Audio-Visual Semantic Segmentation | [pdf] | [code]
[Open3DRF] | Arxiv'24.08 | Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space | [pdf] | [code]
[OVA-DETR] | Arxiv'24.08 | OVA-DETR: Open Vocabulary Aerial Object Detection Using Image-Text Alignment and Fusion | [pdf] | [code]
[OVAL] | Arxiv'24.08 | Open-vocabulary Temporal Action Localization using VLMs | [pdf] | [code]
[EMPOWER] | IROS'24 | EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online Grounding and Execution | [pdf] | [code]

Related Survey

Towards Open Vocabulary Learning: A Survey | [pdf]
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future | [pdf]

Feedback

If you have any suggestions or find missing papers, please don't hesitate to contact me via tbh3223@mail.ustc.edu.cn or lydyc@mail.ustc.edu.cn.

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-Open-Vocabulary-Semantic-Segmentation

Contents

Open-Vocabulary Semantic Segmentation

Fully-Supervised Open-Vocabulary Semantic Segmentation

Weakly-Supervised Open-Vocabulary Semantic Segmentation

Training-Free Open-Vocabulary Semantic Segmentation

Others

Zero-Shot Semantic Segmentation

Referring Image Segmentation

Fully-Supervised Referring Image Segmentation

Weakly-Supervised Referring Image Segmentation

Open-Vocabulary Object Detection

Universal Semantic Segmentation

Other Related Work

Related Survey

Feedback

About

Releases

Packages

zwyang6/Awesome-Open-Vocabulary-Semantic-Segmentation

Folders and files

Latest commit

History

Repository files navigation

Awesome-Open-Vocabulary-Semantic-Segmentation

Contents

Open-Vocabulary Semantic Segmentation

Fully-Supervised Open-Vocabulary Semantic Segmentation

Weakly-Supervised Open-Vocabulary Semantic Segmentation

Training-Free Open-Vocabulary Semantic Segmentation

Others

Zero-Shot Semantic Segmentation

Referring Image Segmentation

Fully-Supervised Referring Image Segmentation

Weakly-Supervised Referring Image Segmentation

Open-Vocabulary Object Detection

Universal Semantic Segmentation

Other Related Work

Related Survey

Feedback

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages