Skip to content

Commit

Permalink
[NOID] Fixes #4087: Add vector info procedures (#4142)
Browse files Browse the repository at this point in the history
* Fixes #4087: Add vector info procedures

* changes review

* fix tests and added getInfoNotExistentCollection tests
  • Loading branch information
vga91 committed Dec 4, 2024
1 parent 9302bf4 commit 62f2bd7
Show file tree
Hide file tree
Showing 8 changed files with 176 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ note that the list and the signature procedures are consistent with the others,
[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.chroma.info(hostOrKey, collection, $config) | Get information about the specified existing collection or throws an error 500 if it does not exist
| apoc.vectordb.chroma.createCollection(hostOrKey, collection, similarity, size, $config) |
Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/api/v1/collections`.
Expand Down Expand Up @@ -38,6 +39,19 @@ With hostOrKey=null, the default is 'http://localhost:8000'.

=== Examples

.Get collection info (it leverages https://docs.trychroma.com/reference/py-client#get_collection[this API])
[source,cypher]
----
CALL apoc.vectordb.chroma.info(hostOrKey, 'test_collection', {<optional config>})
----

.Example results
[opts="header"]
|===
| value
| {"name": "test_collection", "metadata": {"size": 4, "hnsw:space": "cosine"}, "database": "default_database", "id": "74ebe008-1ccb-4d3d-8c5d-cdd7cfa526c2", "tenant": "default_tenant"}
|===

.Create a collection (it leverages https://docs.trychroma.com/usage-guide#creating-inspecting-and-deleting-collections[this API])
[source,cypher]
----
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ note that the list and the signature procedures are consistent with the others,
[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.qdrant.info(hostOrKey, collection, $config) | Get information about the specified existing collection or throws a FileNotFoundException if it does not exist
| apoc.vectordb.qdrant.createCollection(hostOrKey, collection, similarity, size, $config) |
Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/collections/<collection param>`.
Expand Down Expand Up @@ -39,6 +40,29 @@ With hostOrKey=null, the default is 'http://localhost:6333'.

=== Examples

.Get collection info (it leverages https://qdrant.github.io/qdrant/redoc/index.html#tag/collections/operation/get_collection[this API])
[source,cypher]
----
CALL apoc.vectordb.qdrant.info(hostOrKey, 'test_collection', {<optional config>})
----

.Example results
[opts="header"]
|===
| value
| {"result": {"optimizer_status": "ok", "points_count": 2, "vectors_count": 2, "segments_count": 8, "indexed_vectors_count": 0,
"config": {"params": {"on_disk_payload": true, "vectors": {"size": 4, "distance": "Cosine"}, "shard_number": 1, "replication_factor": 1, "write_consistency_factor": 1},
"optimizer_config": {"max_optimization_threads": 1, "indexing_threshold": 20000, "deleted_threshold": 0.2, "flush_interval_sec": 5, "memmap_threshold": null, "default_segment_number": 0, "max_segment_size": null, "vacuum_min_vector_number": 1000}, "quantization_config": null,
"hnsw_config": {"max_indexing_threads": 0, "full_scan_threshold": 10000, "ef_construct": 100, "m": 16, "on_disk": false},
"wal_config": {"wal_segments_ahead": 0, "wal_capacity_mb": 32}
},
"status": green,
"payload_schema": {}
},
"time": 1.2725E-4, "status": ok
}
|===

.Create a collection (it leverages https://qdrant.github.io/qdrant/redoc/index.html#tag/collections/operation/create_collection[this API])
[source,cypher]
----
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ note that the list and the signature procedures are consistent with the others,
[opts=header, cols="1, 3"]
|===
| name | description
| apoc.vectordb.weaviate.info($host, $collectionName, $config) | Get information about the specified existing collection or throws a FileNotFoundException if it does not exist
| apoc.vectordb.weaviate.createCollection(hostOrKey, collection, similarity, size, $config) |
Creates a collection, with the name specified in the 2nd parameter, and with the specified `similarity` and `size`.
The default endpoint is `<hostOrKey param>/schema`.
Expand Down Expand Up @@ -40,6 +41,33 @@ With hostOrKey=null, the default is 'http://localhost:8080/v1'.

=== Examples

.Get collection info (it leverages https://weaviate.io/developers/weaviate/api/rest#tag/schema/get/schema/{className}[this API])
[source, cypher]
----
CALL apoc.vectordb.weaviate.info($host, 'test_collection', {<optional config>})
----

.Example results
[opts="header"]
|===
| value
| {"vectorizer": "none",
"invertedIndexConfig": {"bm25": {"b": 0.75, "k1": 1.2}, "stopwords": {"additions": null, "removals": null, "preset": en}, "cleanupIntervalSeconds": 60},
"vectorIndexConfig": {"ef": -1, "dynamicEfMin": 100, "pq": {"centroids": 256, "trainingLimit": 100000, "encoder": {"type": "kmeans", "distribution": "log-normal"},
"enabled": false, "bitCompression": false, "segments": 0
},
"distance": cosine, "skip": false, "dynamicEfFactor": 8, "bq": {"enabled": false},
"vectorCacheMaxObjects": 1000000000000, "cleanupIntervalSeconds": 300, "dynamicEfMax": 500, "efConstruction": 128, "flatSearchCutoff": 40000, "maxConnections": 64},
"multiTenancyConfig": {"enabled": false},
"vectorIndexType": "hnsw", "replicationConfig": {"factor": 1},
"shardingConfig": {"desiredVirtualCount": 128, "desiredCount": 1, "actualCount": 1, "function": "murmur3", "virtualPerPhysical": 128, "strategy": "hash", "actualVirtualCount": 128, "key": "_id"},
"class": "TestCollection",
"properties": [{"name": "city", "description": "This property was generated by Weaviate's auto-schema feature on Wed Jul 10 12:50:18 2024", "indexFilterable": true, "tokenization": "word", "indexSearchable": true, "dataType": ["text"]},
{"name": "foo", "description": "This property was generated by Weaviate's auto-schema feature on Wed Jul 10 12:50:18 2024", "indexFilterable": true, "tokenization": word, "indexSearchable": true, "dataType": ["text"]}
]
}
|===

.Create a collection (it leverages https://weaviate.io/developers/weaviate/api/rest#tag/schema/post/schema[this API])
[source,cypher]
----
Expand Down
27 changes: 27 additions & 0 deletions full-it/src/test/java/apoc/full/it/vectordb/ChromaDbTest.java
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
package apoc.full.it.vectordb;

import java.util.List;
import java.util.Map;
import java.util.concurrent.atomic.AtomicReference;

import static apoc.ml.Prompt.API_KEY_CONF;
import static apoc.ml.RestAPIConfig.HEADERS_KEY;
import static apoc.util.ExtendedTestUtil.assertFails;
import static apoc.util.MapUtil.map;
import static apoc.util.TestUtil.testCall;
import static apoc.util.TestUtil.testResult;
Expand All @@ -8,6 +15,7 @@
import static apoc.vectordb.VectorDbTestUtil.assertBerlinResult;
import static apoc.vectordb.VectorDbTestUtil.assertLondonResult;
import static apoc.vectordb.VectorDbTestUtil.assertNodesCreated;
import static apoc.vectordb.VectorDbTestUtil.assertReadOnlyProcWithMappingResults;
import static apoc.vectordb.VectorDbTestUtil.assertRelsCreated;
import static apoc.vectordb.VectorDbTestUtil.dropAndDeleteAll;
import static apoc.vectordb.VectorDbUtil.ERROR_READONLY_MAPPING;
Expand Down Expand Up @@ -42,6 +50,7 @@
public class ChromaDbTest {
private static final AtomicReference<String> COLL_ID = new AtomicReference<>();
private static final ChromaDBContainer CHROMA_CONTAINER = new ChromaDBContainer("chromadb/chroma:0.4.25.dev137");
private static final String COLLECTION_NAME = "test_collection";

private static String HOST;

Expand Down Expand Up @@ -101,6 +110,24 @@ public void before() {
dropAndDeleteAll(db);
}

@Test
public void getInfo() {
testResult(db, "CALL apoc.vectordb.chroma.info($host, $collection, $conf) ",
map("host", HOST, "collection", COLLECTION_NAME, "conf", map(ALL_RESULTS_KEY, true)),
r -> {
Map<String, Object> row = (Map<String, Object>) r.next().get("value");
assertEquals(COLLECTION_NAME, row.get("name"));
});
}

@Test
public void getInfoNotExistentCollection() {
assertFails(db, "CALL apoc.vectordb.chroma.info($host, 'wrong_collection', $conf) ",
map("host", HOST, "collection", COLLECTION_NAME, "conf", map(ALL_RESULTS_KEY, true)),
"Server returned HTTP response code: 500"
);
}

@Test
public void getVectors() {
testResult(
Expand Down
43 changes: 43 additions & 0 deletions full-it/src/test/java/apoc/full/it/vectordb/QdrantTest.java
Original file line number Diff line number Diff line change
@@ -1,6 +1,26 @@
package apoc.full.it.vectordb;

import apoc.ml.Prompt;
import apoc.util.TestUtil;
import apoc.util.Util;
import org.junit.AfterClass;
import org.junit.Assume;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.ClassRule;
import org.junit.Test;
import org.junit.rules.TemporaryFolder;
import org.neo4j.dbms.api.DatabaseManagementService;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.test.TestDatabaseManagementServiceBuilder;
import org.testcontainers.qdrant.QdrantContainer;

import java.util.List;
import java.util.Map;

import static apoc.ml.Prompt.API_KEY_CONF;
import static apoc.ml.RestAPIConfig.HEADERS_KEY;
import static apoc.util.ExtendedTestUtil.assertFails;
import static apoc.util.MapUtil.map;
import static apoc.util.TestUtil.testCall;
import static apoc.util.TestUtil.testResult;
Expand Down Expand Up @@ -119,6 +139,29 @@ public void before() {
dropAndDeleteAll(db);
}

@Test
public void getInfo() {
testResult(
db,
"CALL apoc.vectordb.qdrant.info($host, 'test_collection', $conf)",
map("host", HOST, "conf", ADMIN_HEADER_CONF),
r -> {
Map<String, Object> res = r.next();
Map value = (Map) res.get("value");
assertEquals("ok", value.get("status"));
});
}

@Test
public void getInfoNotExistentCollection() {
assertFails(
db,
"CALL apoc.vectordb.qdrant.info($host, 'wrong_collection', $conf)",
map("host", HOST, "conf", ADMIN_HEADER_CONF),
"java.io.FileNotFoundException"
);
}

@Test
public void getVectorsWithReadOnlyApiKey() {
testResult(
Expand Down
22 changes: 22 additions & 0 deletions full-it/src/test/java/apoc/full/it/vectordb/WeaviateTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ public class WeaviateTest {
private static final List<String> FIELDS = List.of("city", "foo");
private static final String ADMIN_KEY = "jane-secret-key";
private static final String READONLY_KEY = "ian-secret-key";
private static final String COLLECTION_NAME = "TestCollection";

private static final WeaviateContainer WEAVIATE_CONTAINER = new WeaviateContainer(
"semitechnologies/weaviate:1.24.5")
Expand Down Expand Up @@ -142,6 +143,27 @@ public void before() {
dropAndDeleteAll(db);
}

@Test
public void getInfo() {
testResult(db, "CALL apoc.vectordb.weaviate.info($host, $collectionName, $conf)",
map("host", HOST, "collectionName", COLLECTION_NAME, "conf", map(ALL_RESULTS_KEY, true, HEADERS_KEY, READONLY_AUTHORIZATION)),
r -> {
Map<String, Object> row = r.next();
Map value = (Map) row.get("value");
assertEquals(COLLECTION_NAME, value.get("class"));
});
}

@Test
public void getInfoNotExistentCollection() {
assertFails(
db,
"CALL apoc.vectordb.weaviate.info($host, 'wrong_collection', $conf)",
map("host", HOST, "collectionName", COLLECTION_NAME, "conf", map(ALL_RESULTS_KEY, true, HEADERS_KEY, READONLY_AUTHORIZATION)),
"java.io.FileNotFoundException"
);
}

@Test
public void getVectorsWithReadOnlyApiKey() {
testResult(
Expand Down
15 changes: 15 additions & 0 deletions full/src/main/java/apoc/vectordb/VectorDbUtil.java
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,21 @@
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Relationship;

import java.net.HttpURLConnection;
import java.net.URL;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import static apoc.ml.RestAPIConfig.BASE_URL_KEY;
import static apoc.ml.RestAPIConfig.BODY_KEY;
import static apoc.ml.RestAPIConfig.ENDPOINT_KEY;
import static apoc.ml.RestAPIConfig.METHOD_KEY;
import static apoc.util.SystemDbUtil.withSystemDb;
import static apoc.vectordb.VectorEmbeddingConfig.MAPPING_KEY;
import static apoc.vectordb.VectorMappingConfig.MODE_KEY;
import static apoc.vectordb.VectorMappingConfig.MappingMode.READ_ONLY;

public class VectorDbUtil {

public static final String ERROR_READONLY_MAPPING =
Expand Down
4 changes: 3 additions & 1 deletion full/src/main/resources/extended.txt
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,8 @@ apoc.vectordb.weaviate.get
apoc.vectordb.weaviate.getAndUpdate
apoc.vectordb.weaviate.query
apoc.vectordb.weaviate.queryAndUpdate
apoc.vectordb.custom.get
apoc.vectordb.weaviate.info
apoc.vectordb.pinecone.info
apoc.vectordb.milvus.info
apoc.vectordb.custom
apoc.vectordb.configure

0 comments on commit 62f2bd7

Please sign in to comment.