<p align="center"> <img alt="GitHub" src="https://img.shields.io/github/license/edgenai/edgen"> <!-- TODO: uncomment to show discord --> <!-- <img alt="Discord" src="https://img.shields.io/discord/1163068604074426408?logo=discord&label=Discord&link=https%3A%2F%2Fdiscord.gg%2FMMUcgBtV"> --> </p> <h3 align="center"> A Local GenAI API Server: A drop-in replacement for OpenAI's API for Local GenAI </h3> <p align="center"> | <!-- TODO: add proper links --> <a href="https://docs.edgen.co"><b>Documentation</b></a> | <a href="https://blog.edgen.co"><b>Blog</b></a> | <a href="https://discord.gg/QUXbwqdMRs"><b>Discord</b></a> | <a href="https://github.com/orgs/edgenai/projects/1/views/1"><b>Roadmap</b></a> | </p> <div align="center"> <img src="https://edgen.co/images/demo.gif" alt="EdgenChat, a local chat app powered by ⚡Edgen"> <p align="center"> <a href="https://chat.edgen.co">EdgenChat</a>, a local chat app powered by ⚡Edgen </p> </div> - [x] **OpenAI Compliant API**: ⚡Edgen implements an [OpenAI compatible API](https://docs.edgen.co/api-reference), making it a drop-in replacement. - [x] **Multi-Endpoint Support**: ⚡Edgen exposes multiple AI endpoints such as chat completions (LLMs) and speech-to-text (Whisper) for audio transcriptions. - [x] **Model Agnostic**: LLMs (Llama2, Mistral, Mixtral...), Speech-to-text (whisper) and [many others](https://docs.edgen.co/documentation/models). - [x] **Optimized Inference**: You don't need to take a PhD in AI optimization. ⚡Edgen abstracts the complexity of optimizing inference for different hardware, platforms and models. - [x] **Modular**: ⚡Edgen is **model** and **runtime** agnostic. New models can be added easily and ⚡Edgen can select the best runtime for the user's hardware: you don't need to keep up about the latest models and ML runtimes - **⚡Edgen will do that for you**. - [x] **Model Caching**: ⚡Edgen caches foundational models locally, so 1 model can power hundreds of different apps - users don't need to download the same model multiple times. - [x] **Native**: ⚡Edgen is built in 🦀Rust and is natively compiled to all popular platforms: **Windows, MacOS and Linux**. No docker required. - [ ] **Graphical Interface**: A graphical user interface to help users efficiently manage their models, endpoints and permissions. ⚡Edgen lets you use GenAI in your app, completely **locally** on your user's devices, for **free** and with **data-privacy**. It's a drop-in replacement for OpenAI (it uses the a compatible API), supports various functions like text generation, speech-to-text and works on Windows, Linux, and MacOS. ### Features - [x] Session Caching: ⚡Edgen maintains top performance with big contexts (big chat histories), by caching sessions. Sessions are auto-detected in function of the chat history. - [x] [GPU support](https://github.com/edgenai/edgen#gpu-support): CUDA, Vulkan. Metal ### Endpoints - [x] \[Chat\] [Completions](https://docs.edgen.co/api-reference/chat) - [x] \[Audio\] [Transcriptions](https://docs.edgen.co/api-reference/audio) - [x] \[Embeddings\] [Embeddings](https://platform.openai.com/docs/api-reference/embeddings) - [ ] \[Image\] Generation - [ ] \[Chat\] Multimodal chat completions - [ ] \[Audio\] Speech ### Supported Models Check in the [documentation](https://docs.edgen.co/documentation/models) ### Supported platforms - [x] Windows - [x] Linux - [x] MacOS ## 🔥 Hot Topics ## Why local GenAI? - **Data Private**: On-device inference means **users' data** never leave their devices. - **Scalable**: More and more users? No need to increment cloud computing infrastructure. Just let your users use their own hardware. - **Reliable**: No internet, no downtime, no rate limits, no API keys. - **Free**: It runs locally on hardware the user already owns. ## Quickstart 1. [Download](https://edgen.co/download) and start ⚡Edgen 2. Chat with ⚡[EdgenChat](https://chat.edgen.co) Ready to start your own GenAI application? [Checkout our guides](https://docs.edgen.co/guides)! ⚡Edgen usage: ``` Usage: edgen [<command>] [<args>] Toplevel CLI commands and options. Subcommands are optional. If no command is provided "serve" will be invoked with default options. Options: --help display usage information Commands: serve Starts the edgen server. This is the default command when no command is provided. config Configuration-related subcommands. version Prints the edgen version to stdout. oasgen Generates the Edgen OpenAPI specification. ``` `edgen serve` usage: ``` Usage: edgen serve [-b <uri...>] [-g] Starts the edgen server. This is the default command when no command is provided. Options: -b, --uri if present, one or more URIs/hosts to bind the server to. `unix://` (on Linux), `http://`, and `ws://` are supported. For use in scripts, it is recommended to explicitly add this option to make your scripts future-proof. -g, --nogui if present, edgen will not start the GUI; the default behavior is to start the GUI. --help display usage information ``` ## GPU Support ⚡Edgen also supports compilation and execution on a GPU, when building from source, through Vulkan, CUDA and Metal. The following cargo features enable the GPU: - `llama_vulkan` - execute LLM models using Vulkan. Requires a Vulkan SDK to be installed. - `llama_cuda` - execute LLM models using CUDA. Requires a CUDA Toolkit to be installed. - `llama_metal` - execute LLM models using Metal. - `whisper_cuda` - execute Whisper models using CUDA. Requires a CUDA Toolkit to be installed. Note that, at the moment, `llama_vulkan`, `llama_cuda` and `llama_metal` cannot be enabled at the same time. Example usage (building from source, [you need to first install the prerequisites](https://docs.edgen.co/documentation/getting-started)): ``` cargo run --features llama_vulkan --release -- serve ``` ## Architecture Overview <div align="center"> <img src="docs/assets/edgen_architecture_overview.svg" alt="⚡Edgen architecture overview" width="400"> <p align="center">⚡Edgen architecture overview</p> </div> ## Contribute If you don't know where to start, check [Edgen's roadmap](https://github.com/orgs/edgenai/projects/1/views/1)! Before you start working on something, see if there's an existing issue/pull-request. Pop into Discord to check with the team or see if someone's already tackling it. ## Communication Channels - [Edgen Discord server](https://discord.gg/QUXbwqdMRs): Real time discussions with the ⚡Edgen team and other users. - [GitHub issues](https://github.com/edgenai/edgen/issues): Feature requests, bugs. - [GitHub discussions](https://github.com/edgenai/edgen/discussions/): Q&A. - [Blog](https://blog.edgen.co): Big announcements. ## Special Thanks - [`llama.cpp`](https://github.com/ggerganov/llama.cpp/tree/master), [`whisper.cpp`](https://github.com/ggerganov/whisper.cpp), and [`ggml`](https://github.com/ggerganov/ggml) for being an excellent getting-on point for this space.