On-Efficient-Variants-of-Segment-Anything-Model

The Segment Anything Model (SAM) is a foundational model for image segmentation tasks, known for its strong generalization across diverse applications. However, its impressive performance comes with significant computational and resource demands, making it challenging to deploy in resource-limited environments such as mobile devices. To address this, a variety of SAM variants have been proposed to enhance efficiency without sacrificing accuracy. This survey provides the first comprehensive review of these efficient SAM variants. We begin by exploring the motivations driving this research. We then present core techniques used in SAM and model acceleration. This is followed by an in-depth analysis of various acceleration strategies, categorized by approach. Finally, we offer a unified and extensive evaluation of these methods, assessing their efficiency and accuracy on representative benchmarks, and providing a clear comparison of their overall performance.

Efficient SAM Variants

Accelerating SegAny

Segment Anything (SegAny), i.e. the promptable segmentation task, is the foundation task of SAM, whose goal is to return a valid mask with any given prompt (e.g. a point, a box, a mask, and text).

Variants below focus on accelerating SegAny:

Model	Paper	Code	Key Features
FastSAM	arXiv	Github	Reformulate SAM’s pipeline with YOLOv8-Seg for SegEvery and the later prompts-guided selection for SegAny.
SqueezeSAM	arXiv		Substitute SAM’s architecture with UNet-based encoder-decoder.
EfficientSAM	CVPR2024	Github	Leverage SAMI pre-trained ViT-T/ViT-S as lightweight image encoder
RAP-SAM	arXiv	Github	Construct with a lite backbone and a unified dynamic convolution decoder, with addpters for multi-purpose segmentation.
SAM 2	arXiv	Github	Apply Hiera as backbone and introduce memory mechanism for video tasks.
MobileSAM	arXiv	Github	Leverage encoder-only distillation from SAM’s ViT to MobileSAM’s TinyViT.
ESAM	ResearchGate		Replace the image encoder with EfficientFormerV2 and conduct holistic distillation from a expert model.
NanoSAM		Github	Distill from MobileSAM with ResNet18 as backbone and optimize with TensorRT.
RepViT-SAM	arXiv	Github	Substitute the image encoder with pure CNN-based RepViT and leverage MobileSAM’s distillation pipeline.
EdgeSAM	arXiv	Github	Substitue SAM’s image encoder with RepViT and adopt a novel prompt-in-the-loop distillation
EfficientViT-SAM	CVPR2024	Github	Adopt the EfficientViT with ReLU linear attention as backbone and distill it from ViT-H.
FastSAM3D	MICCAI2024	Github	Replace the image encoder with a ViT-Tiny variant and incorporate the Dilated Attention and FlashAttention for efficiency.
SAM-Lightening	arXiv		A 2D version of FastSAM3D.
RWKV-SAM	arXiv		Adopt linear attention model RWKV into building efficient image encoder.
TinySAM	arXiv	Github	Leverage full-stage distillation with TinyViT as backbone, and adopt 8-bit quantization on encoder to get Q-TinySAM, and propose the hierarchical sampling strategy to accelerate SegEvery task.
PTQ4SAM	CVPR2024	Github	Eliminate the detrimental modal distribution and take the adaptive quantization on different distribution.
PQ-SAM	ECCV2024		Transfer the activation distribution into quantization-friendly distribution by truncating, grouping and learnable transformation.
SlimSAM	NeurIPS2024	Github	Divide image encoder into two substructures and conduct structured pruning in an alternative manner.
SuperSAM	arXiv		Apply the one-shot Neural Architecture Search with pruning-based methods to build up a supernetwork of SAM.
SAMfast	PyTorch Blog	Github	A rewrote version of SAM with pure, nature Pytorch optimizations.

Accelerating SegEvery

Segment Everything (SegEvery), i.e. the all-masks generation task, is an extension of SegAny task, which aims to segment all objects in a picture.

Variants below focus on accelerating SegEvery:

Model	Paper	Code	Key Features
FastSAM	arXiv	Github	Directly leverage YOLOv8-Seg to segment everything in high efficiency.
MobileSAMV2		Github	Object-aware prompt sampling based on the external YOLOv8 detector.
TinySAM	arXiv	Github	Hierarchical sampling strategy for efficient prompts selection.
Lite-SAM	ECCV2024		LiteViT as lightweight backbone and AutoPPN for efficient prompts generation.
AoP-SAM	OpenReview		Generate prompts iteratively by coarse prediction and fine-grained filtering.

Note: Variants like FastSAM and TinySAM propose efficient strategies for both tasks, so we put them in both lists.

Citation

  @artical{sun2024efficientvariantssegmentmodel,
        title={On Efficient Variants of Segment Anything Model: A Survey}, 
        author={Xiaorui Sun and Jun Liu and Heng Tao Shen and Xiaofeng Zhu and Ping Hu},
        journal={arXiv preprint arXiv:2410.04960},
        year={2024}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On-Efficient-Variants-of-Segment-Anything-Model

Efficient SAM Variants

Accelerating SegAny

Accelerating SegEvery

Citation

About

Releases

Packages

bhllx/On-Efficient-Variants-of-Segment-Anything-Model

Folders and files

Latest commit

History

Repository files navigation

On-Efficient-Variants-of-Segment-Anything-Model

Efficient SAM Variants

Accelerating SegAny

Accelerating SegEvery

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages