llama-cpp
Here are 62 public repositories matching this topic...
Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.
-
Updated
May 21, 2025 - Python
This repo is to showcase how you can run a model locally and offline, free of OpenAI dependencies.
-
Updated
Jul 12, 2024 - Python
LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.
-
Updated
Jun 10, 2023 - Python
BabyAGI-🦙: Enhanced for Llama models (running 100% local) and persistent memory, with smart internet search based on BabyCatAGI and document embedding in langchain based on privateGPT
-
Updated
Jun 4, 2023 - Python
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.
-
Updated
May 17, 2024 - Python
📚 Local PDF-Integrated Chat Bot: Secure Conversations and Document Assistance with LLM-Powered Privacy
-
Updated
Mar 24, 2025 - Python
◉ Universal Intelligence: AI made simple.
-
Updated
Jun 11, 2025 - Python
Local character AI chatbot with chroma vector store memory and some scripts to process documents for Chroma
-
Updated
Oct 7, 2024 - Python
✨ Your Custom Offline Role Play with LLM and Stable Diffusion on Mac and Linux (for now) 🧙♂️
-
Updated
Nov 21, 2023 - Python
DocMind AI is a powerful, open-source Streamlit application leveraging LangChain and local Large Language Models (LLMs) via Ollama for advanced document analysis. Analyze, summarize, and extract insights from a wide array of file formats—securely and privately, all offline.
-
Updated
Jul 29, 2025 - Python
Experimental interface environment for open source LLM, designed to democratize the use of AI. Powered by llama-cpp, llama-cpp-python and Gradio.
-
Updated
Jul 19, 2025 - Python
A quant trading system platform based on FinGPT, demonstrating new applications of large pre-trained Language Models in quantitative finance.
-
Updated
May 15, 2025 - Python
使用numpy实现DeepSeek-R1-Distill-Qwen-1.5B的推理过程,易于学习LLM推理与移植到其它编程语言加速。 Implementing the inference process of DeepSeek-R1-Distill-Qwen-1.5B using numpy, making it easy to learn LLM (Large Language Model) inference and to port to other programming languages for acceleration.
-
Updated
Jul 26, 2025 - Python
A simple AI chat using FastAPI, Langchain and llama.cpp
-
Updated
Sep 19, 2023 - Python
Auto Complete anything using a gguf model
-
Updated
Dec 4, 2023 - Python
Email Auto-ReplAI is a Python tool that uses AI to automate drafting responses to unread Gmail messages, streamlining email management tasks.
-
Updated
Aug 1, 2023 - Python
Improve this page
Add a description, image, and links to the llama-cpp topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the llama-cpp topic, visit your repo's landing page and select "manage topics."