title | description | icon |
---|---|---|
Self-hosting a RAG pipeline |
Learn how to self-host the nodes needed to run a RAG pipeline |
passport |
This software is still in development, expect bugs and incomplete features. Experimental: Specifications are subject to change.
This guide will show you how to set up a RAG pipeline on your own hardware, you will need a pool already configured and running, see the Running a Pool for more information.
The RAG pipeline requires the following components:
- Document retrieval node: A node that fetches and transforms document into markdown from the web
- Embedding node: A node that converts documents and queries into searchable vectors
- Search node : A node that creates indexes from the vectors and performs similarity searches
- Extism Runtime : A general purpose node that runs Extism plugins. We will use it to run the RAG coordinator plugin
- The RAG Coordinator plugin: A plugin that coordinates the other nodes and provides a single request endpoint for the pipeline
- A pool running on the same network. See the Running a Pool guide for more information.
- An host with docker installed and configured (could be the same host as the pool).
- Basic knowledge on how to use a terminal and docker
docker run -d \
--name=retrieval \
-ePOOL_ADDRESS="127.0.0.1" \
-ePOOL_PORT="5021" \
-ePOOL_SSL="false" \
ghcr.io/openagentsinc/openagents-document-retrieval
You can check the logs of the container with docker logs --follow retrieval
to see if it started correctly.
You can use any local model that is compatible with the Sentence Transformers library.
docker run -d \
--name=embeddings \
-ePOOL_ADDRESS="127.0.0.1" \
-ePOOL_PORT="5021" \
-ePOOL_SSL="false" \
-eEMBEDDINGS_TRANSFORMERS_DEVICE="-1" \
-eEMBEDDINGS_MODEL="intfloat/multilingual-e5-base" \
-eEMBEDDINGS_MAX_TEXT_LENGTH="512" \
ghcr.io/openagentsinc/openagents-embeddings
You can check the logs of the container with docker logs --follow embeddings
to see if it started correctly.
docker run -d \
--name=embeddings \
-ePOOL_ADDRESS="127.0.0.1" \
-ePOOL_PORT="5021" \
-ePOOL_SSL="false" \
-eEMBEDDINGS_TRANSFORMERS_DEVICE="-1" \
-eEMBEDDINGS_MODEL="openai:text-embedding-3-small" \
-eEMBEDDINGS_MAX_TEXT_LENGTH="1024" \
ghcr.io/openagentsinc/openagents-embeddings
You can check the logs of the container with docker logs --follow embeddings
to see if it started correctly.
docker run -d \
--name=search \
-ePOOL_ADDRESS="127.0.0.1" \
-ePOOL_PORT="5021" \
-ePOOL_SSL="false" \
ghcr.io/openagentsinc/openagents-search
You can check the logs of the container with docker logs --follow search
to see if it started correctly.
docker run -d \
--name=extism-runtime \
-ePOOL_ADDRESS="127.0.0.1" \
-ePOOL_PORT="5021" \
-ePOOL_SSL="false" \
ghcr.io/openagentsinc/openagents-extism-runtime
You can check the logs of the container with docker logs --follow extism-runtime
to see if it started correctly.
For this step your are going to need only the public direct link for a RAG Coordinator plugin, you can get one from the openagents-rag-coordinator-plugin's Release page.
https://github.com/OpenAgentsInc/openagents-rag-coordinator-plugin/releases/download/v0.2/rag.wasm
If you've been following along, you should have the following containers running:
- retrieval
- embeddings
- search
- extism-runtime
all connected to the pool.
If that's the case, the rag component should be ready to use.
You can now craft a nostr NIP-90 Job Request to run the RAG pipeline.
Kind 5003 is a custom kind we use for generic openagents jobs, this might change in the future.{
"kind": 5003,
"created_at": $currentTimeInSeconds,
"tags": [
["param","run-on","openagents/extism-runtime"],
["expiration", "$currentTimeInSecondsPLUS10mins"],
["param","main","https://github.com/OpenAgentsInc/openagents-rag-coordinator-plugin/releases/download/v0.2/rag.wasm"],
["param","k","1"],
["param","max-tokens","256"],
["param","quantize","false"],
["i","https://bitcoin.org/bitcoin.pdf","url","","passage"],
["i","The quick brown fox jumps over the lazy dog.","text","", "passage"],
["i","Who invented bitcoin?","text","","query"]
],
"content": ""
}
As you can see, this event calls the RAG coordinator plugin on top of an extism-runtime node, with some inputs ( i tags) with one or more documents (as urls or plain text) marked with the "passage" marker, and a query marked with the "query" marker.
After this event is broadcasted to the relay, you will receive a Job feedback event
with status=="success"
after which you will be able to get the job results by fetching
a Job result event of kind==6003
(5003+1000)
and e
tag equivalent to the Job Request event id.
To avoid this you can run your own private relay or enforce the use of a specific pool by setting your pool's
nostr public key in the p
tag of the Job Request.
Additionally you can also encrypt the request for the same public key, using NIP-04 as explained here.
See the [NIP-90 specification](https://github.com/nostr-protocol/nips/blob/master/90.md) for more information on the protocol flow.Work in progress
This is a work in progress, the cli is not yet available.