diff --git a/content/learning-paths/servers-and-cloud-computing/llama-cpu/_demo.md b/content/learning-paths/servers-and-cloud-computing/llama-cpu/_demo.md index 7f23df506..b5c230689 100644 --- a/content/learning-paths/servers-and-cloud-computing/llama-cpu/_demo.md +++ b/content/learning-paths/servers-and-cloud-computing/llama-cpu/_demo.md @@ -3,7 +3,7 @@ title: Run a llama.cpp chatbot powered by Arm Kleidi technology overview: | This Arm Kleidi learning path shows how to use a single AWS Graviton instance -- powered by an Arm Neoverse CPU -- to build a simple “Token as a Service” server, used below to provide a chat-bot to serve a small number of concurrent users. - This architecture would be suitable for businesses looking to deploy the latest Generative AI technologies using their existing CPU compute capacity and deployment pipelines. The demo uses the open source llama.cpp framework, which Arm has enhanced by contributing and merged the latest Arm Kleidi Technologies. Further optimizations are achieved by using the smaller 8 billion parameter Llama 3.1 model, which has been quantized to optimize memory usage. + This architecture would be suitable for businesses looking to deploy the latest Generative AI technologies using their existing CPU compute capacity and deployment pipelines. The demo uses the open source llama.cpp framework, which Arm has enhanced by contributing the latest Arm Kleidi Technologies. Further optimizations are achieved by using the smaller 8 billion parameter Llama 3.1 model, which has been quantized to optimize memory usage. Chat with the Llama-3.1-8B LLM below to see the performance for yourself, then follow the learning path to build your own Generative AI service on Arm Neoverse.