Feat: Pre-quantized LLM model support #3740

keehyuna · 2025-08-01T00:02:11Z

Description

Support pre-quantized HF models and post-training quantization (PTQ) option for run_llm.py

Fixes # (issue)

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

fp8 pre-quantized model support

208df56

meta-cla bot added the cla signed label Aug 1, 2025

chore: add nvfp4 quantization

0b0c3fc

keehyuna self-assigned this Aug 6, 2025

chore: clean up

df34e8b

keehyuna changed the title ~~fp8 pre-quantized model support~~ Pre-quantized model support Aug 7, 2025

keehyuna changed the title ~~Pre-quantized model support~~ Feat: Pre-quantized LLM model support Aug 7, 2025

keehyuna marked this pull request as ready for review August 7, 2025 12:39

keehyuna requested review from narendasan and peri044 and removed request for narendasan August 8, 2025 06:44