Skip to content

bhllx/On-Efficient-Variants-of-Segment-Anything-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 

Repository files navigation

On-Efficient-Variants-of-Segment-Anything-Model

[Paper]

The Segment Anything Model (SAM) is a foundational model for image segmentation tasks, known for its strong generalization across diverse applications. However, its impressive performance comes with significant computational and resource demands, making it challenging to deploy in resource-limited environments such as mobile devices. To address this, a variety of SAM variants have been proposed to enhance efficiency without sacrificing accuracy. This survey provides the first comprehensive review of these efficient SAM variants. We begin by exploring the motivations driving this research. We then present core techniques used in SAM and model acceleration. This is followed by an in-depth analysis of various acceleration strategies, categorized by approach. Finally, we offer a unified and extensive evaluation of these methods, assessing their efficiency and accuracy on representative benchmarks, and providing a clear comparison of their overall performance.

Taxonomy

Efficient SAM Variants

Accelerating SegAny

Segment Anything (SegAny), i.e. the promptable segmentation task, is the foundation task of SAM, whose goal is to return a valid mask with any given prompt (e.g. a point, a box, a mask, and text).

Variants below focus on accelerating SegAny:

Model Paper Code Key Features
FastSAM arXiv Github Reformulate SAM’s pipeline with YOLOv8-Seg for SegEvery and the later prompts-guided selection for SegAny.
SqueezeSAM arXiv Substitute SAM’s architecture with UNet-based encoder-decoder.
EfficientSAM CVPR2024 Github Leverage SAMI pre-trained ViT-T/ViT-S as lightweight image encoder
RAP-SAM arXiv Github Construct with a lite backbone and a unified dynamic convolution decoder, with addpters for multi-purpose segmentation.
SAM 2 arXiv Github Apply Hiera as backbone and introduce memory mechanism for video tasks.
MobileSAM arXiv Github Leverage encoder-only distillation from SAM’s ViT to MobileSAM’s TinyViT.
ESAM ResearchGate Replace the image encoder with EfficientFormerV2 and conduct holistic distillation from a expert model.
NanoSAM Github Distill from MobileSAM with ResNet18 as backbone and optimize with TensorRT.
RepViT-SAM arXiv Github Substitute the image encoder with pure CNN-based RepViT and leverage MobileSAM’s distillation pipeline.
EdgeSAM arXiv Github Substitue SAM’s image encoder with RepViT and adopt a novel prompt-in-the-loop distillation
EfficientViT-SAM CVPR2024 Github Adopt the EfficientViT with ReLU linear attention as backbone and distill it from ViT-H.
FastSAM3D MICCAI2024 Github Replace the image encoder with a ViT-Tiny variant and incorporate the Dilated Attention and FlashAttention for efficiency.
SAM-Lightening arXiv A 2D version of FastSAM3D.
RWKV-SAM arXiv Adopt linear attention model RWKV into building efficient image encoder.
TinySAM arXiv Github Leverage full-stage distillation with TinyViT as backbone, and adopt 8-bit quantization on encoder to get Q-TinySAM, and propose the hierarchical sampling strategy to accelerate SegEvery task.
PTQ4SAM CVPR2024 Github Eliminate the detrimental modal distribution and take the adaptive quantization on different distribution.
PQ-SAM ECCV2024 Transfer the activation distribution into quantization-friendly distribution by truncating, grouping and learnable transformation.
SlimSAM NeurIPS2024 Github Divide image encoder into two substructures and conduct structured pruning in an alternative manner.
SuperSAM arXiv Apply the one-shot Neural Architecture Search with pruning-based methods to build up a supernetwork of SAM.
SAMfast PyTorch Blog Github A rewrote version of SAM with pure, nature Pytorch optimizations.

Accelerating SegEvery

Segment Everything (SegEvery), i.e. the all-masks generation task, is an extension of SegAny task, which aims to segment all objects in a picture.

Variants below focus on accelerating SegEvery:

Model Paper Code Key Features
FastSAM arXiv Github Directly leverage YOLOv8-Seg to segment everything in high efficiency.
MobileSAMV2 Github Object-aware prompt sampling based on the external YOLOv8 detector.
TinySAM arXiv Github Hierarchical sampling strategy for efficient prompts selection.
Lite-SAM ECCV2024 LiteViT as lightweight backbone and AutoPPN for efficient prompts generation.
AoP-SAM OpenReview Generate prompts iteratively by coarse prediction and fine-grained filtering.

Note: Variants like FastSAM and TinySAM propose efficient strategies for both tasks, so we put them in both lists.

Citation

  @artical{sun2024efficientvariantssegmentmodel,
        title={On Efficient Variants of Segment Anything Model: A Survey}, 
        author={Xiaorui Sun and Jun Liu and Heng Tao Shen and Xiaofeng Zhu and Ping Hu},
        journal={arXiv preprint arXiv:2410.04960},
        year={2024}
  }

Releases

No releases published

Packages

No packages published