Skip to content

Commit

Permalink
Fixes #4087: Add vector info procedures (#4142)
Browse files Browse the repository at this point in the history
* Fixes #4087: Add vector info procedures

* changes review

* fix tests and added getInfoNotExistentCollection tests
  • Loading branch information
vga91 authored Jul 31, 2024
1 parent fd4a4ec commit 97b4fff
Show file tree
Hide file tree
Showing 17 changed files with 370 additions and 20 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ note that the list and the signature procedures are consistent with the others,
[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.chroma.info(hostOrKey, collection, $config) | Get information about the specified existing collection or throws an error 500 if it does not exist
| apoc.vectordb.chroma.createCollection(hostOrKey, collection, similarity, size, $config) |
Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/api/v1/collections`.
Expand Down Expand Up @@ -38,6 +39,19 @@ With hostOrKey=null, the default is 'http://localhost:8000'.

== Examples

.Get collection info (it leverages https://docs.trychroma.com/reference/py-client#get_collection[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.info(hostOrKey, 'test_collection', {<optional config>})
----

.Example results
[opts="header"]
|===
| value
| {"name": "test_collection", "metadata": {"size": 4, "hnsw:space": "cosine"}, "database": "default_database", "id": "74ebe008-1ccb-4d3d-8c5d-cdd7cfa526c2", "tenant": "default_tenant"}
|===

.Create a collection (it leverages https://docs.trychroma.com/usage-guide#creating-inspecting-and-deleting-collections[this API])
[source,cypher]
----
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Here is a list of all available Milvus procedures:
[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.milvus.info(hostOrKey, collection, $config) | Get information about the specified existing collection or returns a response with code 100 if it does not exist
| apoc.vectordb.milvus.createCollection(hostOrKey, collection, similarity, size, $config) |
Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/v2/vectordb/collections/create`.
Expand Down Expand Up @@ -39,6 +40,25 @@ With hostOrKey=null, the default host is 'http://localhost:19530'.

Here is a list of example using a local installation using th default port `19531`.

.Get collection info (it leverages https://milvus.io/docs/manage-collections.md#View-Collections[this API])
[source,cypher]
----
CALL apoc.vectordb.milvus.info($host, 'test_collection', '', {<optional config>})
----

.Example results
[opts="header"]
|===
| value
| {"data": {"shardsNum": 1, "aliases": [], "autoId": false, "description": "", "partitionsNum": 1, "collectionName": "test_collection",
"indexes": [{"metricType": "COSINE", "indexName": "vector", "fieldName": "vector"}],
"load": "LoadStateLoading", "consistencyLevel": "Bounded",
"fields": [{"partitionKey": false, "autoId": false, "name": "id", "description": "", "id": 100, "type": "Int64", "primaryKey": true},
{"partitionKey": false, "autoId": false, "name": "vector", "description": "", "id": 101, "params": [{"value": 4, "key": "dim"}], "type": "FloatVector", "primaryKey": false}
],
"collectionID": "451046728334049293", "enableDynamicField": true, "properties": []}, "message": "", "code": 200
}
|===

.Create a collection (it leverages https://milvus.io/api-reference/restful/v2.4.x/v2/Collection%20(v2)/Create.md[this API])
[source,cypher]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Here is a list of all available Pinecone procedures:
[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.pinecone.info(hostOrKey, collection, $config) | Get information about the specified existing collection or throws a 404 error if it does not exist
| apoc.vectordb.pinecone.createCollection(hostOrKey, index, similarity, size, $config) |
Creates an index, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/indexes`.
Expand Down Expand Up @@ -54,6 +55,26 @@ image::pinecone-index.png[width=800]

The following example assume we want to create and manage an index called `test-index`.

.Get collection info (it leverages https://docs.pinecone.io/reference/api/control-plane/describe_collection[this API])
[source,cypher]
----
CALL apoc.vectordb.pinecone.info(hostOrKey, 'test-collection', {<optional config>})
----

.Example results
[opts="header"]
|===
| value
| { "dimension": 3,
"environment": "us-east1-gcp",
"name": "tiny-collection",
"size": 3126700,
"status": "Ready",
"vector_count": 99
}
|===


.Create an index (it leverages https://docs.pinecone.io/reference/api/control-plane/create_index[this API])
[source,cypher]
----
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ note that the list and the signature procedures are consistent with the others,
[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.qdrant.info(hostOrKey, collection, $config) | Get information about the specified existing collection or throws a FileNotFoundException if it does not exist
| apoc.vectordb.qdrant.createCollection(hostOrKey, collection, similarity, size, $config) |
Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/collections/<collection param>`.
Expand Down Expand Up @@ -39,6 +40,29 @@ With hostOrKey=null, the default is 'http://localhost:6333'.

== Examples

.Get collection info (it leverages https://qdrant.github.io/qdrant/redoc/index.html#tag/collections/operation/get_collection[this API])
[source,cypher]
----
CALL apoc.vectordb.qdrant.info(hostOrKey, 'test_collection', {<optional config>})
----

.Example results
[opts="header"]
|===
| value
| {"result": {"optimizer_status": "ok", "points_count": 2, "vectors_count": 2, "segments_count": 8, "indexed_vectors_count": 0,
"config": {"params": {"on_disk_payload": true, "vectors": {"size": 4, "distance": "Cosine"}, "shard_number": 1, "replication_factor": 1, "write_consistency_factor": 1},
"optimizer_config": {"max_optimization_threads": 1, "indexing_threshold": 20000, "deleted_threshold": 0.2, "flush_interval_sec": 5, "memmap_threshold": null, "default_segment_number": 0, "max_segment_size": null, "vacuum_min_vector_number": 1000}, "quantization_config": null,
"hnsw_config": {"max_indexing_threads": 0, "full_scan_threshold": 10000, "ef_construct": 100, "m": 16, "on_disk": false},
"wal_config": {"wal_segments_ahead": 0, "wal_capacity_mb": 32}
},
"status": green,
"payload_schema": {}
},
"time": 1.2725E-4, "status": ok
}
|===

.Create a collection (it leverages https://qdrant.github.io/qdrant/redoc/index.html#tag/collections/operation/create_collection[this API])
[source,cypher]
----
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ note that the list and the signature procedures are consistent with the others,
[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.weaviate.info($host, $collectionName, $config) | Get information about the specified existing collection or throws a FileNotFoundException if it does not exist
| apoc.vectordb.weaviate.createCollection(hostOrKey, collection, similarity, size, $config) |
Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/schema`.
Expand Down Expand Up @@ -40,6 +41,33 @@ With hostOrKey=null, the default is 'http://localhost:8080/v1'.

== Examples

.Get collection info (it leverages https://weaviate.io/developers/weaviate/api/rest#tag/schema/get/schema/{className}[this API])
[source, cypher]
----
CALL apoc.vectordb.weaviate.info($host, 'test_collection', {<optional config>})
----

.Example results
[opts="header"]
|===
| value
| {"vectorizer": "none",
"invertedIndexConfig": {"bm25": {"b": 0.75, "k1": 1.2}, "stopwords": {"additions": null, "removals": null, "preset": en}, "cleanupIntervalSeconds": 60},
"vectorIndexConfig": {"ef": -1, "dynamicEfMin": 100, "pq": {"centroids": 256, "trainingLimit": 100000, "encoder": {"type": "kmeans", "distribution": "log-normal"},
"enabled": false, "bitCompression": false, "segments": 0
},
"distance": cosine, "skip": false, "dynamicEfFactor": 8, "bq": {"enabled": false},
"vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64},
"multiTenancyConfig": {"enabled": false},
"vectorIndexType": "hnsw", "replicationConfig": {"factor": 1},
"shardingConfig": {"desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id"},
"class": "TestCollection",
"properties": [{"name": "city", "description": "This property was generated by Weaviate's auto-schema feature on Wed Jul 10 12:50:18 2024", "indexFilterable": true, "tokenization": "word", "indexSearchable": true, "dataType": ["text"]},
{"name": "foo", "description": "This property was generated by Weaviate's auto-schema feature on Wed Jul 10 12:50:18 2024", "indexFilterable": true, "tokenization": word, "indexSearchable": true, "dataType": ["text"]}
]
}
|===

.Create a collection (it leverages https://weaviate.io/developers/weaviate/api/rest#tag/schema/post/schema[this API])
[source,cypher]
----
Expand Down
22 changes: 20 additions & 2 deletions extended-it/src/test/java/apoc/vectordb/ChromaDbTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

import static apoc.ml.Prompt.API_KEY_CONF;
import static apoc.ml.RestAPIConfig.HEADERS_KEY;
import static apoc.util.ExtendedTestUtil.assertFails;
import static apoc.util.MapUtil.map;
import static apoc.util.TestUtil.testCall;
import static apoc.util.TestUtil.testResult;
Expand All @@ -29,7 +30,6 @@
import static apoc.vectordb.VectorDbTestUtil.assertBerlinResult;
import static apoc.vectordb.VectorDbTestUtil.assertLondonResult;
import static apoc.vectordb.VectorDbTestUtil.assertNodesCreated;
import static apoc.vectordb.VectorDbTestUtil.assertRagWithVectors;
import static apoc.vectordb.VectorDbTestUtil.assertReadOnlyProcWithMappingResults;
import static apoc.vectordb.VectorDbTestUtil.assertRelsCreated;
import static apoc.vectordb.VectorDbTestUtil.dropAndDeleteAll;
Expand All @@ -41,7 +41,6 @@
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNotNull;
import static org.junit.Assert.assertNull;
import static org.junit.Assert.assertTrue;
import static org.neo4j.configuration.GraphDatabaseSettings.DEFAULT_DATABASE_NAME;
import static org.neo4j.configuration.GraphDatabaseSettings.SYSTEM_DATABASE_NAME;

Expand All @@ -50,6 +49,7 @@ public class ChromaDbTest {
private static final ChromaDBContainer CHROMA_CONTAINER = new ChromaDBContainer("chromadb/chroma:0.4.25.dev137");
private static final String READONLY_KEY = "my_readonly_api_key";
private static final Map<String, String> READONLY_AUTHORIZATION = getAuthHeader(READONLY_KEY);
private static final String COLLECTION_NAME = "test_collection";

private static String HOST;

Expand Down Expand Up @@ -109,6 +109,24 @@ public static void tearDown() throws Exception {
public void before() {
dropAndDeleteAll(db);
}

@Test
public void getInfo() {
testResult(db, "CALL apoc.vectordb.chroma.info($host, $collection, $conf) ",
map("host", HOST, "collection", COLLECTION_NAME, "conf", map(ALL_RESULTS_KEY, true)),
r -> {
Map<String, Object> row = (Map<String, Object>) r.next().get("value");
assertEquals(COLLECTION_NAME, row.get("name"));
});
}

@Test
public void getInfoNotExistentCollection() {
assertFails(db, "CALL apoc.vectordb.chroma.info($host, 'wrong_collection', $conf) ",
map("host", HOST, "collection", COLLECTION_NAME, "conf", map(ALL_RESULTS_KEY, true)),
"Server returned HTTP response code: 500"
);
}

@Test
public void getVectors() {
Expand Down
22 changes: 22 additions & 0 deletions extended-it/src/test/java/apoc/vectordb/MilvusTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,28 @@ public void before() {
dropAndDeleteAll(db);
}

@Test
public void getInfo() {
testResult(db, "CALL apoc.vectordb.milvus.info($host, 'test_collection', '', $conf) ",
map("host", HOST, "conf", map(FIELDS_KEY, FIELDS)),
r -> {
Map<String, Object> row = r.next();
Map value = (Map) row.get("value");
assertEquals(200L, value.get("code"));
});
}

@Test
public void getInfoNotExistentCollection() {
testResult(db, "CALL apoc.vectordb.milvus.info($host, 'wrong_collection', '', $conf) ",
map("host", HOST, "conf", map(FIELDS_KEY, FIELDS)),
r -> {
Map<String, Object> row = r.next();
Map value = (Map) row.get("value");
assertEquals(100L, value.get("code"));
});
}

@Test
public void getVectorsWithoutVectorResult() {
testResult(db, "CALL apoc.vectordb.milvus.get($host, 'test_collection', [1], $conf) ",
Expand Down
33 changes: 29 additions & 4 deletions extended-it/src/test/java/apoc/vectordb/QdrantTest.java
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
package apoc.vectordb;

import apoc.ml.Prompt;
import apoc.util.TestUtil;
import apoc.util.Util;
import org.junit.AfterClass;
Expand All @@ -17,15 +18,15 @@
import java.util.List;
import java.util.Map;

import apoc.ml.Prompt;
import static apoc.ml.RestAPIConfig.HEADERS_KEY;
import static apoc.ml.Prompt.API_KEY_CONF;
import static apoc.ml.RestAPIConfig.HEADERS_KEY;
import static apoc.util.ExtendedTestUtil.assertFails;
import static apoc.util.MapUtil.map;
import static apoc.util.TestUtil.testCall;
import static apoc.util.TestUtil.testResult;
import static apoc.vectordb.VectorDbHandler.Type.QDRANT;
import static apoc.vectordb.VectorDbTestUtil.EntityType.NODE;
import static apoc.vectordb.VectorDbTestUtil.EntityType.FALSE;
import static apoc.vectordb.VectorDbTestUtil.EntityType.NODE;
import static apoc.vectordb.VectorDbTestUtil.EntityType.REL;
import static apoc.vectordb.VectorDbTestUtil.assertBerlinResult;
import static apoc.vectordb.VectorDbTestUtil.assertLondonResult;
Expand All @@ -43,7 +44,8 @@
import static org.junit.Assert.assertNull;
import static org.junit.Assert.assertTrue;
import static org.junit.Assert.fail;
import static org.neo4j.configuration.GraphDatabaseSettings.*;
import static org.neo4j.configuration.GraphDatabaseSettings.DEFAULT_DATABASE_NAME;
import static org.neo4j.configuration.GraphDatabaseSettings.SYSTEM_DATABASE_NAME;

public class QdrantTest {
private static final String ADMIN_KEY = "my_admin_api_key";
Expand Down Expand Up @@ -117,6 +119,29 @@ public static void tearDown() throws Exception {
public void before() {
dropAndDeleteAll(db);
}

@Test
public void getInfo() {
testResult(
db,
"CALL apoc.vectordb.qdrant.info($host, 'test_collection', $conf)",
map("host", HOST, "conf", ADMIN_HEADER_CONF),
r -> {
Map<String, Object> res = r.next();
Map value = (Map) res.get("value");
assertEquals("ok", value.get("status"));
});
}

@Test
public void getInfoNotExistentCollection() {
assertFails(
db,
"CALL apoc.vectordb.qdrant.info($host, 'wrong_collection', $conf)",
map("host", HOST, "conf", ADMIN_HEADER_CONF),
"java.io.FileNotFoundException"
);
}

@Test
public void getVectorsWithReadOnlyApiKey() {
Expand Down
35 changes: 29 additions & 6 deletions extended-it/src/test/java/apoc/vectordb/WeaviateTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

import static apoc.ml.Prompt.API_KEY_CONF;
import static apoc.ml.RestAPIConfig.HEADERS_KEY;
import static apoc.util.ExtendedTestUtil.assertFails;
import static apoc.util.TestUtil.testCall;
import static apoc.util.TestUtil.testCallEmpty;
import static apoc.util.TestUtil.testResult;
Expand Down Expand Up @@ -56,6 +57,7 @@ public class WeaviateTest {
private static final List<String> FIELDS = List.of("city", "foo");
private static final String ADMIN_KEY = "jane-secret-key";
private static final String READONLY_KEY = "ian-secret-key";
private static final String COLLECTION_NAME = "TestCollection";

private static final WeaviateContainer WEAVIATE_CONTAINER = new WeaviateContainer("semitechnologies/weaviate:1.24.5")
.withEnv("AUTHENTICATION_APIKEY_ENABLED", "true")
Expand Down Expand Up @@ -114,10 +116,10 @@ public static void setUp() throws Exception {
MapUtil.map("host", HOST, "id1", ID_1, "id2", ID_2, "conf", ADMIN_HEADER_CONF),
r -> {
ResourceIterator<Map> values = r.columnAs("value");
assertEquals("TestCollection", values.next().get("class"));
assertEquals("TestCollection", values.next().get("class"));
assertEquals("TestCollection", values.next().get("class"));
assertEquals("TestCollection", values.next().get("class"));
assertEquals(COLLECTION_NAME, values.next().get("class"));
assertEquals(COLLECTION_NAME, values.next().get("class"));
assertEquals(COLLECTION_NAME, values.next().get("class"));
assertEquals(COLLECTION_NAME, values.next().get("class"));
assertFalse(values.hasNext());
});

Expand All @@ -134,8 +136,8 @@ public static void setUp() throws Exception {

@AfterClass
public static void tearDown() throws Exception {
testCallEmpty(db, "CALL apoc.vectordb.weaviate.deleteCollection($host, 'TestCollection', $conf)",
MapUtil.map("host", HOST, "conf", ADMIN_HEADER_CONF)
testCallEmpty(db, "CALL apoc.vectordb.weaviate.deleteCollection($host, $collectionName, $conf)",
MapUtil.map("host", HOST, "collectionName", COLLECTION_NAME, "conf", ADMIN_HEADER_CONF)
);

WEAVIATE_CONTAINER.stop();
Expand All @@ -147,6 +149,27 @@ public void before() {
dropAndDeleteAll(db);
}

@Test
public void getInfo() {
testResult(db, "CALL apoc.vectordb.weaviate.info($host, $collectionName, $conf)",
map("host", HOST, "collectionName", COLLECTION_NAME, "conf", map(ALL_RESULTS_KEY, true, HEADERS_KEY, READONLY_AUTHORIZATION)),
r -> {
Map<String, Object> row = r.next();
Map value = (Map) row.get("value");
assertEquals(COLLECTION_NAME, value.get("class"));
});
}

@Test
public void getInfoNotExistentCollection() {
assertFails(
db,
"CALL apoc.vectordb.weaviate.info($host, 'wrong_collection', $conf)",
map("host", HOST, "collectionName", COLLECTION_NAME, "conf", map(ALL_RESULTS_KEY, true, HEADERS_KEY, READONLY_AUTHORIZATION)),
"java.io.FileNotFoundException"
);
}

@Test
public void getVectorsWithReadOnlyApiKey() {
testResult(db, "CALL apoc.vectordb.weaviate.get($host, 'TestCollection', [$id1], $conf)",
Expand Down
Loading

0 comments on commit 97b4fff

Please sign in to comment.