Skip to content

From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers

License

Notifications You must be signed in to change notification settings

Shenyi-Z/TaylorSeer

Repository files navigation

TaylorSeer: From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers

🔥 News

  • 2025/03/10 🚀🚀 Our latest work "From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers" is released! Codes are available at TaylorSeer! TaylorSeer supports lossless compression at a rate of 4.99x on FLUX.1-dev (with a latency speedup of 3.53x) and high-quality acceleration at a compression rate of 5.00x on HunyuanVideo (with a latency speedup of 4.65x)! We hope TaylorSeer can move the paradigm of feature caching methods from reusing to forecasting.For more details, please refer to our latest research paper.
  • 2025/02/19 🚀🚀 ToCa solution for FLUX has been officially released after adjustments, now achieving up to 3.14× lossless acceleration (in FLOPs)!
  • 2025/01/22 💥💥 ToCa is honored to be accepted by ICLR 2025!
  • 2024/12/29 🚀🚀 We release our work DuCa about accelerating diffusion transformers for FREE, which achieves nearly lossless acceleration of 2.50× on OpenSora! 🎉 DuCa also overcomes the limitation of ToCa by fully supporting FlashAttention, enabling broader compatibility and efficiency improvements.
  • 2024/12/24 🤗🤗 We release an open-sourse repo "Awesome-Token-Reduction-for-Model-Compression", which collects recent awesome token reduction papers! Feel free to contribute your suggestions!
  • 2024/12/10 💥💥 Our team's recent work, SiTo (https://github.com/EvelynZhang-epiclab/SiTo), has been accepted to AAAI 2025. It accelerates diffusion models through adaptive Token Pruning.
  • 2024/07/15 🤗🤗 We release an open-sourse repo "Awesome-Generation-Acceleration", which collects recent awesome generation accleration papers! Feel free to contribute your suggestions!
Abstract

Diffusion Transformers (DiT) have revolutionized high-fidelity image and video synthesis, yet their computational demands remain prohibitive for real-time applications. To solve this problem, feature caching has been proposed to accelerate diffusion models by caching the features in the previous timesteps and then reusing them in the following timesteps. However, at timesteps with significant intervals, the feature similarity in diffusion models decreases substantially, leading to a pronounced increase in errors introduced by feature caching, significantly harming the generation quality. To solve this problem, we propose TaylorSeer, which firstly shows that features of diffusion models at future timesteps can be predicted based on their values at previous timesteps. Based on the fact that features change slowly and continuously across timesteps, TaylorSeer employs a differential method to approximate the higher-order derivatives of features and predict features in future timesteps with Taylor series expansion. Extensive experiments demonstrate its significant effectiveness in both image and video synthesis, especially in high acceleration ratios. For instance, it achieves an almost lossless acceleration of 4.99 $\times$ on FLUX and 5.00 $\times$ on HunyuanVideo without additional training. On DiT, it achieves $3.41$ lower FID compared with previous SOTA at $4.53$ $\times$ acceleration.

🛠 Installation

git clone https://github.com/Shenyi-Z/TaylorSeer.git

TaylorSeer-FLUX

TaylorSeer achieved a lossless computational compression of 4.99 $\times$ and a Latency Speedup of 3.53 $\times$ on FLUX.1-dev, as measured by ImageReward for comprehensive quality. To run TaylorSeer-FLUX, see TaylorSeer-FLUX.

TaylorSeer-HunyuanVideo

TaylorSeer achieved a computational compression of 5.00 $\times$ and a remarkable Latency Speedup of 4.65 $\times$ on HunyuanVideo, as comprehensively measured by the VBench metric. Compared to previous methods, it demonstrated significant improvements in both acceleration efficiency and quality. To run TaylorSeer-HunyuanVideo, see TaylorSeer-HunyuanVideo.

TayorSeer-DiT

TaylorSeer achieved a lossless computational compression of 2.77 $\times$ on the base model DiT, as comprehensively evaluated by metrics such as FID. Its performance across various acceleration ratios significantly surpassed previous methods. For instance, in an extreme scenario with a 4.53 $\times$ compression ratio, TaylorSeer's FID only increased by 0.33 from the non-accelerated baseline of 2.32, reaching 2.65, while ToCa and DuCa exhibited FID scores above 6.0 under the same conditions. To run TaylorSeer-DiT,see TaylorSeer-DiT.

👍 Acknowledgements

  • Thanks to DiT for their great work and codebase upon which we build TaylorSeer-DiT.
  • Thanks to FLUX for their great work and codebase upon which we build TaylorSeer-FLUX.
  • Thanks to HunyuanVideo for their great work and codebase upon which we build TaylorSeer-HunyuanVideo.
  • Thanks to ImageReward for Text-to-Image quality evaluation.
  • Thanks to VBench for Text-to-Video quality evaluation.

📌 Citation

@article{TaylorSeer2025,
  title={From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers},
  author={Liu, Jiacheng and Zou, Chang and Lyu, Yuanhuiyi and Chen, Junjie and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2503.06923},
  year={2025}
}

📧 Contact

If you have any questions, please email shenyizou@outlook.com.

About

From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published