Skip to content

Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"

License

Notifications You must be signed in to change notification settings

dvlab-research/Seg-Zero

Repository files navigation

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

Paper: 📖 Seg-Zero
HuggingFace Daily: 🤗 Seg-Zero
Data: 🤗 RefCOCOg-2K
Model: 🤗 Seg-Zero-7B

Overview of Seg-Zero:

Seg-Zero demonstrates following features:

  1. Seg-Zero exhibits emergent test-time reasoning ability. It generates a reasoning chain before producing the final segmentation mask.
  2. Seg-Zero is trained exclusively using reinforcement learning, without any explicit supervised reasoning data.
  3. Compared to supervised fine-tuning, our Seg-Zero achieves superior performance on both in-domain and out-of-domain data.

Highlight Code Features:

  1. This code is based on the EasyR1 and veRL, which supports model split during sampling and is more GPU memory friendly.
  2. Supporting both Qwen2-VL and Qwen2.5-VL series models.
  3. Already implementing commonly used rewards in Object Detection and Object Segmentation, including IoU reward and L1 reward.

News

[March 11th, 2025] 🔥 Paper is coming!
[March 8th, 2025] 🔥 Seg-Zero is coming! We have released the code and training data.

Contents

Model

Seg-Zero employs a decoupled architecture, including a reasoning model and segmentation model. We manually design a sophiscated reward mechanism that integrates both the format and accuracy rewards.

Examples

Installation

git clone https://github.com/dvlab-research/Seg-Zero.git
cd Seg-Zero
conda create -n seg_zero python=3.11
conda activate seg_zero
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1
pip install -e .
pip install sam2
pip install matplotlib

Inference

python inference_scripts/infer.py

The default question is

"the unusal object in the image."

You will get the thinking process in command line, like:

"The image shows a bicycle with wheels that have been replaced with large, round objects resembling watermelon slices. The unusual aspect of the image is the substitution of the bicycle wheels with these watermelon-like objects, which is not a typical feature of a bicycle. The rest of the bicycle appears to be a standard design, but the wheels are the focal point of the image."

And the mask will be presented in inference_scripts folder.

You can also provide your own image_path and text by:

python inference_scripts/infer.py --image_path "your_image_path" --text "your question text"

Training

1. GRPO Training

bash training_scripts/run_qwen2_5_3b_refCOCOg.sh

You can try change the following hyper-parameters if you have a large GPU memory.

worker.actor.micro_batch_size_per_device_for_update=4 or 8 or 16 \
worker.actor.micro_batch_size_per_device_for_experience=4 or 8 or 16 \

If your GPU has less memory, you can change the following config. The number is depend on your GPU memory.

worker.rollout.tensor_parallel_size=[your number between 1-8]
worker.rollout.gpu_memory_utilization=[your number between 0-1]
worker.rollout.n=[your number between 4-32]

2. Merge Checkpoint in Hugging Face Format

python3 training_scripts/model_merger.py --local_dir [path_to_your_actor_checkpoint]

Tip

If you encounter issues with connecting to Hugging Face, consider using export HF_ENDPOINT=https://hf-mirror.com.

The GRPO Algorithm

Seg-Zero generates several samples, calculates the rewards and then optimizes towards samples that achieve higher rewards.

Tip

To learn more about the GRPO algorithm, you can refer to Hugging Face's blog.

Citation

@article{liu2025segzero,
  title        = {Seg-Zero: Reasoning-Chain Guided  Segmentation via Cognitive Reinforcement},
  author       = {Liu, Yuqi and Peng, Bohao and Zhong, Zhisheng and Yue, Zihao and Lu, Fanbin and Yu, Bei and Jia, Jiaya},
  journal      = {arXiv preprint arXiv:2503.06520},
  year         = {2025}
}

Acknowledgement

We would like to thank the following repos for their great work:

Star History

Star History Chart

Releases

No releases published

Packages

No packages published

Languages