Skip to content

Commit

Permalink
Fixes #4091: Update RAG docs with vector db examples
Browse files Browse the repository at this point in the history
  • Loading branch information
vga91 committed Jun 26, 2024
1 parent 207f780 commit eb2075b
Show file tree
Hide file tree
Showing 12 changed files with 302 additions and 44 deletions.
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

== ChromaDB

Here is a list of all available ChromaDB procedures,
Expand Down Expand Up @@ -221,6 +220,16 @@ For example, by executing a `CALL apoc.vectordb.chroma.query(...) YIELD metadata
so that we do not return the other values that we do not need.
====

It is possible to execute vectordb with apoc.ml.rag as follow:

[source,cypher]
----
CALL apoc.vectordb.chroma.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
----

.Delete vectors (it leverages https://docs.trychroma.com/usage-guide#deleting-data-from-a-collection[this API])
[source,cypher]
----
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

== Milvus

Here is a list of all available Milvus procedures:
Expand Down Expand Up @@ -223,6 +222,15 @@ For example, by executing a `CALL apoc.vectordb.milvus.query(...) YIELD metadata
so that we do not return the other values that we do not need.
====

It is possible to execute vectordb with apoc.ml.rag as follow:

[source,cypher]
----
CALL apoc.vectordb.milvus.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
----


.Delete vectors (it leverages https://milvus.io/api-reference/restful/v2.4.x/v2/Vector%20(v2)/Delete.md[this API])
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

== Pinecone

Here is a list of all available Pinecone procedures:
Expand Down Expand Up @@ -237,7 +236,15 @@ For example, by executing a `CALL apoc.vectordb.pinecone.query(...) YIELD metada
so that we do not return the other values that we do not need.
====

It is possible to execute vectordb with apoc.ml.rag as follow:

[source,cypher]
----
CALL apoc.vectordb.pinecone.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
----

.Delete vectors (it leverages https://docs.pinecone.io/reference/api/data-plane/delete[this API])
[source,cypher]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

== Qdrant

Here is a list of all available Qdrant procedures,
Expand Down Expand Up @@ -224,7 +223,15 @@ For example, by executing a `CALL apoc.vectordb.qdrant.query(...) YIELD metadata
so that we do not return the other values that we do not need.
====

It is possible to execute vectordb with apoc.ml.rag as follow, for example with Qdrant database:

[source,cypher]
----
CALL apoc.vectordb.qdrant.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
----

.Delete vectors (it leverages https://qdrant.github.io/qdrant/redoc/index.html#tag/points/operation/delete_vectors[this API])
[source,cypher]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

== Weaviate

Here is a list of all available Weaviate procedures,
Expand Down Expand Up @@ -240,7 +239,15 @@ For example, by executing a `CALL apoc.vectordb.weaviate.query(...) YIELD metada
so that we do not return the other values that we do not need.
====

It is possible to execute vectordb with apoc.ml.rag as follow:

[source,cypher]
----
CALL apoc.vectordb.weaviate.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD score, node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
----

.Delete vectors (it leverages https://weaviate.io/developers/weaviate/api/rest#tag/objects/delete/objects/\{className\}/\{id\}[this API])
[source,cypher]
Expand Down
59 changes: 34 additions & 25 deletions docs/asciidoc/modules/ROOT/pages/ml/openai.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,6 @@ CALL apoc.ml.openai.chat([{"role": "user", "content": "Explain the importance of
----



== Query with natural language

This procedure `apoc.ml.query` takes a question in natural language and returns the results of that query.
Expand Down Expand Up @@ -522,8 +521,8 @@ It uses the `chat/completions` API which is https://platform.openai.com/docs/api
.Query call
[source,cypher]
----
CALL apoc.ml.fromQueries(['MATCH (n:Movie) RETURN n', 'MATCH (n:Person) RETURN n'],
{apiKey: <apiKey>})
CALL apoc.ml.fromQueries(['MATCH (n:Movie) RETURN n', 'MATCH (n:Person) RETURN n'],
{apiKey: <apiKey>})
YIELD value
RETURN *
----
Expand All @@ -544,8 +543,8 @@ RETURN *
.Query call with path
[source,cypher]
----
CALL apoc.ml.fromQueries(['MATCH (n:Movie) RETURN n', 'MATCH p=(n:Movie)--() RETURN p'],
{apiKey: <apiKey>})
CALL apoc.ml.fromQueries(['MATCH (n:Movie) RETURN n', 'MATCH p=(n:Movie)--() RETURN p'],
{apiKey: <apiKey>})
YIELD value
RETURN *
----
Expand All @@ -556,7 +555,7 @@ RETURN *
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| value |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| "models relationships in the movie industry, connecting :Person nodes to :Movie nodes.
| "models relationships in the movie industry, connecting :Person nodes to :Movie nodes.
It represents actors, directors, writers, producers, and reviewers connected to movies they are involved with.
Similar to a social network graph but specialized for the entertainment industry.
Each relationship type corresponds to common roles in movie production and reviewing.
Expand Down Expand Up @@ -592,7 +591,7 @@ RETURN *

== Query with Retrieval-augmented generation (RAG) technique

This procedure `apoc.ml.rag` takes a list of paths or a vector index name, relevant attributes and a natural language question
This procedure `apoc.ml.rag` takes a list of paths or a vector index name, relevant attributes and a natural language question
to create a prompt implementing a Retrieval-augmented generation (RAG) technique.

See https://aws.amazon.com/what-is/retrieval-augmented-generation/[here] for more info about the RAG process.
Expand Down Expand Up @@ -620,7 +619,7 @@ It uses the `chat/completions` API which is https://platform.openai.com/docs/api
| embeddings | to search similar embeddings stored into a node vector index (in case of `embeddings: "NODE"`) or relationship vector index (in case of `embeddings: "REL"`) | no, default `"FALSE"`
| topK | number of neighbors to find for each node (in case of `embeddings: "NODE"`) or relationships (in case of `embeddings: "REL"`) | no, default `40`
| apiKey | OpenAI API key | in case `apoc.openai.key` is not defined
| prompt | the base prompt to be augmented with the context | no, default is:
| prompt | the base prompt to be augmented with the context | no, default is:

"You are a customer service agent that helps a customer with answering questions about a service.
Use the following context to answer the `user question` at the end.
Expand All @@ -629,7 +628,7 @@ If you don't know the answer, just say \`Sorry, I don't know`, don't try to make
|===


Using the apoc.ml.rag procedure we can reduce AI hallucinations (i.e. false or misleading responses),
Using the apoc.ml.rag procedure we can reduce AI hallucinations (i.e. false or misleading responses),
providing relevant and up-to-date information to our procedure via the 1st parameter.

For example, by executing the following procedure (with the `gpt-3.5-turbo` model, last updated in January 2022)
Expand All @@ -650,7 +649,7 @@ CALL apoc.ml.openai.chat([
| The gold medal in curling at the 2022 Winter Olympics was won by the Swedish men's team and the Russian women's team.
|===

So, we can use the RAG technique to provide real results.
So, we can use the RAG technique to provide real results.
For example with the given dataset (with data taken from https://en.wikipedia.org/wiki/Curling_at_the_2022_Winter_Olympics[this wikipedia page]):

.wikipedia dataset
Expand All @@ -673,9 +672,9 @@ we can execute:
----
MATCH path=(:Athlete)-[:HAS_MEDAL]->(Discipline)
WITH collect(path) AS paths
CALL apoc.ml.rag(paths,
["name", "country", "medal", "title", "year"],
"Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?",
CALL apoc.ml.rag(paths,
["name", "country", "medal", "title", "year"],
"Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?",
{apiKey: $apiKey}
) YIELD value
RETURN value
Expand All @@ -695,9 +694,9 @@ or:
----
MATCH path=(:Athlete)-[:HAS_MEDAL]->(Discipline)
WITH collect(path) AS paths
CALL apoc.ml.rag(paths,
["name", "country", "medal", "title", "year"],
"Which athletes won the silver medal in mixed doubles's curling at the 2022 Winter Olympics?",
CALL apoc.ml.rag(paths,
["name", "country", "medal", "title", "year"],
"Which athletes won the silver medal in mixed doubles's curling at the 2022 Winter Olympics?",
{apiKey: $apiKey}
) YIELD value
RETURN value
Expand All @@ -714,9 +713,9 @@ We can also pass a string query returning paths/relationships/nodes, for example

[source,cypher]
----
CALL apoc.ml.rag("MATCH path=(:Athlete)-[:HAS_MEDAL]->(Discipline) WITH collect(path) AS paths",
["name", "country", "medal", "title", "year"],
"Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?",
CALL apoc.ml.rag("MATCH path=(:Athlete)-[:HAS_MEDAL]->(Discipline) WITH collect(path) AS paths",
["name", "country", "medal", "title", "year"],
"Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?",
{apiKey: $apiKey}
) YIELD value
RETURN value
Expand Down Expand Up @@ -746,9 +745,9 @@ and some (:RagEmbedding) nodes with the `text` properties, we can execute:

[source,cypher]
----
CALL apoc.ml.rag("rag-embeddings",
["text"],
"Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?",
CALL apoc.ml.rag("rag-embeddings",
["text"],
"Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?",
{apiKey: $apiKey, embeddings: "NODE", topK: 20}
) YIELD value
RETURN value
Expand All @@ -771,10 +770,20 @@ and some [:RagEmbedding] relationships with the `text` properties, we can execut

[source,cypher]
----
CALL apoc.ml.rag("rag-rel-embeddings",
["text"],
"Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?",
CALL apoc.ml.rag("rag-rel-embeddings",
["text"],
"Which athletes won the gold medal in mixed doubles's curling at the 2022 Winter Olympics?",
{apiKey: $apiKey, embeddings: "REL", topK: 20}
) YIELD value
RETURN value
----

It is possible to execute vectordb with apoc.ml.rag as follow, for example with Qdrant database:

[source,cypher]
----
CALL apoc.vectordb.qdrant.getAndUpdate($host, $collection, [<id1>, <id2>], $conf) YIELD node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, $question, $confPrompt) YIELD value
RETURN value
----
48 changes: 44 additions & 4 deletions extended-it/src/test/java/apoc/vectordb/ChromaDbTest.java
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
package apoc.vectordb;

import apoc.ml.Prompt;
import apoc.util.TestUtil;
import org.junit.AfterClass;
import org.junit.Assume;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.ClassRule;
Expand All @@ -16,30 +18,39 @@
import java.util.Map;
import java.util.concurrent.atomic.AtomicReference;

import static apoc.ml.Prompt.API_KEY_CONF;
import static apoc.ml.RestAPIConfig.HEADERS_KEY;
import static apoc.util.MapUtil.map;
import static apoc.util.TestUtil.testCall;
import static apoc.util.TestUtil.testResult;
import static apoc.vectordb.VectorDbHandler.Type.CHROMA;
import static apoc.vectordb.VectorDbTestUtil.EntityType.FALSE;
import static apoc.vectordb.VectorDbTestUtil.EntityType.NODE;
import static apoc.vectordb.VectorDbTestUtil.EntityType.REL;
import static apoc.vectordb.VectorDbTestUtil.assertBerlinResult;
import static apoc.vectordb.VectorDbTestUtil.assertLondonResult;
import static apoc.vectordb.VectorDbTestUtil.assertNodesCreated;
import static apoc.vectordb.VectorDbTestUtil.assertRagWithVectors;
import static apoc.vectordb.VectorDbTestUtil.assertReadOnlyProcWithMappingResults;
import static apoc.vectordb.VectorDbTestUtil.assertRelsCreated;
import static apoc.vectordb.VectorDbTestUtil.dropAndDeleteAll;
import static apoc.vectordb.VectorDbTestUtil.EntityType.*;
import static apoc.vectordb.VectorDbTestUtil.getAuthHeader;
import static apoc.vectordb.VectorDbTestUtil.ragSetup;
import static apoc.vectordb.VectorEmbeddingConfig.ALL_RESULTS_KEY;
import static apoc.vectordb.VectorEmbeddingConfig.MAPPING_KEY;
import static apoc.vectordb.VectorMappingConfig.*;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNotNull;
import static org.junit.Assert.assertNull;
import static org.junit.Assert.fail;
import static org.junit.Assert.assertTrue;
import static org.neo4j.configuration.GraphDatabaseSettings.DEFAULT_DATABASE_NAME;
import static org.neo4j.configuration.GraphDatabaseSettings.SYSTEM_DATABASE_NAME;

public class ChromaDbTest {
private static final AtomicReference<String> COLL_ID = new AtomicReference<>();
private static final ChromaDBContainer CHROMA_CONTAINER = new ChromaDBContainer("chromadb/chroma:0.4.25.dev137");
private static final String READONLY_KEY = "my_readonly_api_key";
private static final Map<String, String> READONLY_AUTHORIZATION = getAuthHeader(READONLY_KEY);

private static String HOST;

Expand All @@ -60,7 +71,7 @@ public static void setUp() throws Exception {
CHROMA_CONTAINER.start();

HOST = "localhost:" + CHROMA_CONTAINER.getMappedPort(8000);
TestUtil.registerProcedure(db, ChromaDb.class, VectorDb.class);
TestUtil.registerProcedure(db, ChromaDb.class, VectorDb.class, Prompt.class);

testCall(db, "CALL apoc.vectordb.chroma.createCollection($host, 'test_collection', 'cosine', 4)",
map("host", HOST),
Expand Down Expand Up @@ -123,7 +134,7 @@ public void getVectorsWithoutVectorResult() {
assertNull(row.get("id"));
});
}

@Test
public void deleteVector() {
testCall(db, """
Expand Down Expand Up @@ -421,4 +432,33 @@ public void queryVectorsWithSystemDbStorage() {

assertNodesCreated(db);
}

@Test
public void queryVectorsWithRag() {
String openAIKey = ragSetup(db);

Map<String, Object> conf = map(ALL_RESULTS_KEY, true,
HEADERS_KEY, READONLY_AUTHORIZATION,
MAPPING_KEY, map(NODE_LABEL, "Rag",
ENTITY_KEY, "readID",
METADATA_KEY, "foo")
);

testResult(db,
"""
CALL apoc.vectordb.chroma.getAndUpdate($host, $collection, ['1', '2'], $conf) YIELD node, metadata, id, vector
WITH collect(node) as paths
CALL apoc.ml.rag(paths, $attributes, "Which city has foo equals to one?", $confPrompt) YIELD value
RETURN value
"""
,
map(
"host", HOST,
"conf", conf,
"collection", COLL_ID.get(),
"confPrompt", map(API_KEY_CONF, openAIKey),
"attributes", List.of("city", "foo")
),
VectorDbTestUtil::assertRagWithVectors);
}
}
Loading

0 comments on commit eb2075b

Please sign in to comment.