Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.
-
Updated
Jul 18, 2025 - Python
Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.
Quantize TinyLlama-1.1B-Chat from PyTorch to CoreML (float16, int8, int4) for efficient on-device inference on iOS 18+.
Add a description, image, and links to the ios18 topic page so that developers can more easily learn about it.
To associate your repository with the ios18 topic, visit your repo's landing page and select "manage topics."