Trident

Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation

News

Nov. 15th, 2024: We release paper and code for Trident.

Introduction

While CLIP has advanced open-vocabulary predictions, its performance on semantic segmentation remains suboptimal. This shortfall primarily stems from its noisy semantic features and constrained resolution. While previous adaptations addressed noisy semantic, the issue of limited resolution remains unexplored. To alleviate this issue, we introduce Trident, a training-free framework that first splices features extracted by CLIP and DINO from sub-images, then leverages SAM's encoder to create a correlation matrix for global aggregation. This repository contains the code for Trident on eight popular benchmarks. For more information, please refer to our paper.

Main Results

Getting Started

Installation

Step 1: Clone Trident repository:

git clone https://github.com/YuHengsss/Trident.git
cd Trident

Step 2: Environment Setup:

Create and activate a new conda environment

conda create -n Trident
conda activate Trident

Install Dependencies

pip install -r requirements.txt

Quick Start

Datasets Preparation

Please follow the MMSeg data preparation document to download and pre-process the datasets including PASCAL VOC, PASCAL Context, Cityscapes, ADE20k, COCO Object and COCO-Stuff164k. We provide some dataset processing scripts in the process_dataset.sh.

Evaluation

Before evaluating the model, you need to download the SAM checkpoints by the link provided in SAM's repo. Besides, please modify some settings in configs/base_config.py and corresponding dataset configuration files like the data_root and sam_ckpt. Then you may eval specific dataset by:

python eval.py --config ./config/cfg_DATASET.py --workdir YOUR_WORK_DIR --sam_refine

or eval on all datasets:

python eval_all.py

Resutls are listed in YOUR_WORK_DIR/results.txt.

Demo

By configuring the img_path, name_list and sam_checkpoint in trident_demo.py, you may run demo directly by:

python trident_demo.py

Citation

If Trident is helpful for your research, please cite the following paper:

@article{shi2024vssd,
    title={Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation},
    author={Yuheng Shi and Minjing Dong and Chang Xu},
    journal={arXiv preprint arXiv:2411.09219},
    year={2024},
}

Acknowledgment

This project is based on SAM, ProxyCLIP, SCLIP and OpenCLIP. Thanks for their excellent works.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
assets		assets
configs		configs
datasets		datasets
images		images
open_clip		open_clip
prompts		prompts
seg_utils		seg_utils
segment_anything		segment_anything
LICENSE		LICENSE
README.md		README.md
custom_datasets.py		custom_datasets.py
dist_test.sh		dist_test.sh
eval.py		eval.py
eval_all.py		eval_all.py
myutils.py		myutils.py
pamr.py		pamr.py
process_dataset.sh		process_dataset.sh
requirements.txt		requirements.txt
trident.py		trident.py
trident_demo.py		trident_demo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trident

Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation

News

Introduction

Main Results

Getting Started

Installation

Quick Start

Datasets Preparation

Evaluation

Demo

Citation

Acknowledgment

About

Releases

Packages

Languages

License

YuHengsss/Trident

Folders and files

Latest commit

History

Repository files navigation

Trident

Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation

News

Introduction

Main Results

Getting Started

Installation

Quick Start

Datasets Preparation

Evaluation

Demo

Citation

Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages