Feature Request: Add TPU/Hardware Accelerator Support (e.g., Google Coral, Hailo) to llama.cpp #11603
Open
4 tasks done
Labels
enhancement
New feature or request
Prerequisites
Feature Description
I propose adding hardware acceleration support for AI-focused chips like TPUs (e.g., Google Coral) and Hailo to llama.cpp. This would allow users to leverage dedicated AI accelerators for faster inference of LLMs (e.g., LLaMA) on edge devices like Raspberry Pi or low-power setups.
Motivation
Possible Implementation
1. Google Coral (Edge TPU) Integration
libedgetpu
(GitHub), Google's open-source library for interacting with Edge TPUs.edgetpu_compiler
tool.libedgetpu
APIs.2. Hailo Integration
hailort
(GitHub), Hailo's runtime library for deploying models on Hailo accelerators.hailort
and manage inference pipelines for low-latency execution.3. Unified Hardware Abstraction
--tpu
,--hailo
) to let users select the accelerator at runtime.4. Cross-Platform Support
Use Case Examples
Testing Availability
I will soon acquire the Raspberry Pi AI Kit with Hailo-8L and can act as a tester for the Hailo integration. I should be able to start testing within a few weeks. My setup will include a Raspberry Pi 5 with 8 GB (or even 16 GB) RAM, and I plan to test models like LLaMA and DeepSeek for tasks such as text generation and chatbot applications.
The text was updated successfully, but these errors were encountered: