llama-cpp

Star

Here are 62 public repositories matching this topic...

the-crypt-keeper / can-ai-code

Star

Self-evaluating interview for AI coders

ai transformers humaneval llm langchain llama-cpp ggml

Updated Jun 21, 2025
Python

dipampaul17 / KVSplit

Star

Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit keys & 4-bit values, reducing memory by 59% with <1% quality loss. Includes benchmarking, visualization, and one-command setup. Optimized for M1/M2/M3 Macs with Metal support.

metal optimization quantization m2 m3 m1 memory-optimization kv-cache apple-silicon llm generative-ai llama-cpp

Updated May 21, 2025
Python

jlonge4 / local_llama

Star

This repo is to showcase how you can run a model locally and offline, free of OpenAI dependencies.

python offline artificial-intelligence machinelearning langchain llama-cpp llamaindex

Updated Jul 12, 2024
Python

nuance1979 / llama-server

Star

LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.

llama chatbot-ui llamacpp llama-cpp

Updated Jun 10, 2023
Python

vtuber-plan / langport

Star

Langport is a language model inference service

api openai llama language-model tabby llm fauxpilot chatgpt langchain chatgpt-api llama-cpp

Updated Sep 9, 2024
Python

robiwan303 / babyagi

Star

BabyAGI-🦙: Enhanced for Llama models (running 100% local) and persistent memory, with smart internet search based on BabyCatAGI and document embedding in langchain based on privateGPT

python agi artificial-intelligence artificial-general-intelligence llama reasoning task-based ai-agents serpapi openai-api autonomous-agent google-search-api llm chatgpt langchain llama-cpp babyagi

Updated Jun 4, 2023
Python

OpenCSGs / llm-inference

Star

llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deployment, such as UI, RESTful API, auto-scaling, computing resource management, monitoring, and more.

transformer ray deepspeed llama-cpp vllm llm-inference

Updated May 17, 2024
Python

Abhi5h3k / PrivateDocBot

Star

📚 Local PDF-Integrated Chat Bot: Secure Conversations and Document Assistance with LLM-Powered Privacy

Updated Mar 24, 2025
Python

blueraai / universal-intelligence

Sponsor

Star

◉ Universal Intelligence: AI made simple.

Updated Jun 11, 2025
Python

ossirytk / llama-cpp-chat-memory

Star

Local character AI chatbot with chroma vector store memory and some scripts to process documents for Chroma

chatbot spacy ner llama-cpp langchain-python chromadb chainlit llama2 llama-cpp-python gguf

Updated Oct 7, 2024
Python

rbourgeat / llm-rp

Star

✨ Your Custom Offline Role Play with LLM and Stable Diffusion on Mac and Linux (for now) 🧙‍♂️

game chat ai roleplay llama llm stable-diffusion characterai llama-cpp ggml

Updated Nov 21, 2023
Python

BjornMelin / docmind-ai-llm

Star

DocMind AI is a powerful, open-source Streamlit application leveraging LangChain and local Large Language Models (LLMs) via Ollama for advanced document analysis. Analyze, summarize, and extract insights from a wide array of file formats—securely and privately, all offline.

python transformers torch document-analysis ai-agents streamlit sentence-transformers hybrid-search qdrant langchain llama-cpp local-llm ollama multimodal-embeddings private-ai-agents

Updated Jul 29, 2025
Python

controlecidadao / samantha_ia

Star

Experimental interface environment for open source LLM, designed to democratize the use of AI. Powered by llama-cpp, llama-cpp-python and Gradio.

Updated Jul 19, 2025
Python

ashioyajotham / fingpt_trader

Star

A quant trading system platform based on FinGPT, demonstrating new applications of large pre-trained Language Models in quantitative finance.

sentiment-analysis transformers asyncio quant quantitative-finance lora financial-analysis quantitative-trading cryptocurrency-exchanges peft binance-exchange llama-cpp fingpt falcon-7b ai-in-finance gguf-models market-inefficiencies

Updated May 15, 2025
Python

acai66 / qwen_numpy

Star

使用numpy实现DeepSeek-R1-Distill-Qwen-1.5B的推理过程，易于学习LLM推理与移植到其它编程语言加速。 Implementing the inference process of DeepSeek-R1-Distill-Qwen-1.5B using numpy, making it easy to learn LLM (Large Language Model) inference and to port to other programming languages for acceleration.

numpy llama-cpp llm-inference qwen deepseek qwen2 deepseek-r1