GitHub - EnVision-Research/ScalingAR: Go with Your Gut: Scaling Confidence for Autoregressive Image Generation

Go with Your Gut: Scaling Confidence for Autoregressive Image Generation

Harold H. Chen^1,2*, Xianfeng Wu^3*, Wen-Jie Shu², Rongjin Guo⁴, Disen Lan⁵, Harry Yang², Ying-Cong Chen^1,2†

^*Equal Contribution; ^†Corresponding Author
¹HKUST(GZ), ²HKUST, ³PolyU, ⁴CityUHK, ⁵FDU

If you like our project, please give us a star ⭐ on GitHub for latest update.

Abstract

Test-time scaling (TTS) has demonstrated remarkable success in enhancing large language models, yet its application to next-token prediction (NTP) autoregressive (AR) image generation remains largely uncharted. Existing TTS approaches for visual AR (VAR), which rely on frequent partial decoding and external reward models, are ill-suited for NTP-based image generation due to the inherent incompleteness of intermediate decoding results. To bridge this gap, we introduce ScalingAR, the first TTS framework specifically designed for NTP-based AR image generation that eliminates the need for early decoding or auxiliary rewards. ScalingAR leverages token entropy as a novel signal in visual token generation and operates at two complementary scaling levels: (i) Profile Level, which streams a calibrated confidence state by fusing intrinsic and conditional signals; and (ii) Policy Level, which utilizes this state to adaptively terminate low-confidence trajectories and dynamically schedule guidance for phase-appropriate conditioning strength. Experiments on both general and compositional benchmarks show that ScalingAR (1) improves base models by 12.5% on GenEval and 15.2% on TIIF-Bench, (2) efficiently reduces visual token consumption by 62.0% while outperforming baselines, and (3) successfully enhances robustness, mitigating performance drops by 26.0% in challenging scenarios.

🚩 Model Zoo

Please download models, put them in the folder ./pretrained_models.

VQ-VAE

Method	params	tokens	weight
vq_ds16_t2i	72M	16x16	vq_ds16_t2i.pt

T5

huggingface-cli download  --resume-download google/flan-t5-xl --local-dir google/flan-t5-xl

LlamaGen

Method	params	tokens	weight
LlamaGen-XL	775M	32x32	t2i_XL_stage2_512.pt

AR-GRPO

huggingface-cli download  --resume-download gCSshihao/AR-GRPO_T2I_XL_256 --local-dir CSshihao/AR-GRPO_T2I_XL_256

🚀 Install

Clone this repository and navigate to source folder

cd ScalingAR

Build Environment

echo "Creating conda environment"
conda create -n ScalingAR python=3.10
conda activate ScalingAR

echo "Installing dependencies"
pip install -r requirements.txt

📍 Inference

LlamaGen

PYTHONPATH=. python llamagen/sample_entropy.py --vq-ckpt ${VQ_CKPT} --gpt-ckpt ${LlamaGen_CKPT} --gpt-model GPT-XL --t5-path ${T5_PATH} --image-size 512

AR-GRPO

PYTHONPATH=. python AR_GRPO/sample_entropy.py --ckpt-path ${AR-GRPO_CKPT} --t5-path ${T5_PATH} --delay_load_text_encoder True --image-size 256

📝 Citation

Please consider citing our paper if our code is useful:

@article{chen2025go,
  title={Go with Your Gut: Scaling Confidence for Autoregressive Image Generation},
  author={Chen, Harold Haodong and Wu, Xianfeng and Shu, Wen-Jie and Guo, Rongjin and Lan, Disen and Yang, Harry and Chen, Ying-Cong},
  journal={arXiv preprint arXiv:2509.26376},
  year={2025}
}

🍗 Acknowledgement

Our ScalingAR is developed based on the codebases of LlamaGen and AR-GRPO, and we would like to thank the developers of both.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
AR_GRPO		AR_GRPO
asset		asset
llamagen		llamagen
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Go with Your Gut: Scaling Confidence for Autoregressive Image Generation

If you like our project, please give us a star ⭐ on GitHub for latest update.

Abstract

🚩 Model Zoo

VQ-VAE

T5

LlamaGen

AR-GRPO

🚀 Install

📍 Inference

LlamaGen

AR-GRPO

📝 Citation

🍗 Acknowledgement

About

Uh oh!

Releases

Packages

Languages

License

EnVision-Research/ScalingAR

Folders and files

Latest commit

History

Repository files navigation

Go with Your Gut: Scaling Confidence for Autoregressive Image Generation

If you like our project, please give us a star ⭐ on GitHub for latest update.

Abstract

🚩 Model Zoo

VQ-VAE

T5

LlamaGen

AR-GRPO

🚀 Install

📍 Inference

LlamaGen

AR-GRPO

📝 Citation

🍗 Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages