chore: Upload Embeddings Doc (#719)

* Added code example to NLS upload * created uploading embeddings doc * Fixed URL link issues and updated titles * Created instructions for upload embeddings * Updated uploaded embeddings to show embedding name
kolenaIO · Nov 5, 2024 · c45a166 · c45a166
1 parent 0772ba8
commit c45a166
Show file tree

Hide file tree

Showing 4 changed files with 76 additions and 6 deletions.
diff --git a/docs/assets/images/upload-embeddings-enable.gif b/docs/assets/images/upload-embeddings-enable.gif
diff --git a/docs/automations/set-up-natural-language-search.md b/docs/automations/set-up-natural-language-search.md
@@ -37,12 +37,11 @@ In this document, we will go over main components of the below
 and steps you need to take to tailor it for your application.
 
 !!! Example
-    The [`kolena`](https://github.com/kolenaIO/kolena) repository contains a runnable
-    [example](https://github.com/kolenaIO/kolena/tree/trunk/examples/dataset/search_embeddings) for
-    embeddings extraction and
-    upload. This builds off the data uploaded in the
+    The [`Kolena`](https://github.com/kolenaIO/kolena) repository includes a
+    [code example](https://github.com/kolenaIO/kolena/tree/trunk/examples/dataset/search_embeddings) for
+    extraction and uploading embeddings. It builds on data from the
     [semantic_segmentation](https://github.com/kolenaIO/kolena/tree/trunk/examples/dataset/semantic_segmentation)
-    example dataset, and is best run after this data has been uploaded to your Kolena environment.
+    example dataset, so ensure the dataset is uploaded to your Kolena environment before running the code example.
 
 Uploading embeddings to Kolena can be done in four simple steps:
 
@@ -56,7 +55,7 @@ Uploading embeddings to Kolena can be done in four simple steps:
 The package can be installed via `pip` or `uv` and requires use of your kolena token which can be created
 on the [:kolena-developer-16: Developer](https://app.kolena.com/redirect/developer) page.
 
-We first [retrieve and set](../installing-kolena.md#initialization) our `KOLENA_TOKEN` environment variable.
+We first [retrieve and set](../installing-kolena.md) our `KOLENA_TOKEN` environment variable.
 This is used by the uploader for authentication against your Kolena instance.
 
 ```shell

diff --git a/docs/dataset/advanced-usage/upload-embeddings.md b/docs/dataset/advanced-usage/upload-embeddings.md
@@ -0,0 +1,67 @@
+---
+icon: kolena/classification-16
+---
+
+# :kolena-classification-16: Uploading Custom Embeddings
+
+This guide explains how to upload your own embeddings to Kolena using the Kolena SDK.
+Please ensure you have the SDK installed.
+[Instructions for installing the SDK are available here.](https://docs.kolena.com/installing-kolena/)
+
+## Step 1: Import the Embedding Upload Function
+
+To upload embeddings, use the `upload_dataset_embeddings` function from Kolena. You can import
+it with the following code:
+```python
+from kolena._experimental.search import upload_dataset_embeddings
+```
+
+## Step 2: Prepare the Required DataFrame
+
+The DataFrame you upload should have:
+
+- Unique Identifier Columns: This is typically the `locator` field, which serves as a unique identifier for each entry.
+ Could be multiple id fields combined together like `locator` + `person_id`
+- Embedding Column: Each embedding must have the same size across all rows.
+
+### Example code
+
+Here’s an example where we download the `instance-seg` dataset from Kolena,
+then add a placeholder embedding (a zero-filled array):
+```python
+from kolena.dataset import download_dataset
+
+dataset = "instance-seg"
+df = download_dataset(dataset)
+df_embedding = df['locator']
+df_embedding["embedding"] = [np.zeros((1,512))] * len(df_embedding)
+```
+!!! Note
+    Replace the placeholder embeddings with embeddings generated from your own embedding model.
+
+## Step 3: Upload the DataFrame using Kolena SDK
+
+With the DataFrame prepared, use the `upload_dataset_embeddings` function to upload it to Kolena.
+
+```python
+upload_dataset_embeddings(dataset_name="instance-seg", key="my-embedding-model", df_embedding=df_embedding)
+```
+
+The `dataset_name` parameter specifies the target dataset where the embeddings will be uploaded.
+The key parameter is a unique identifier for the embeddings being uploaded, allowing multiple embeddings
+ to be associated with the same dataset. Finally, `df_embeddings` is the DataFrame object
+ prepared in Step 2 that contains the data you want to upload.
+
+## Step 4: Verify Your Embeddings in Kolena Studio
+
+To confirm the embeddings uploaded successfully:
+
+- Open Kolena Studio.
+- In the top right corner, click on "Off" beside the embeddings toggle to enable embeddings view.
+- Select the embedding model key saved in the upload function. In the example it was `my-embedding-model`.
+- Choose from the visualization options: UMAP, t-SNE, or PCA.
+
+![Enabling Embeddings on Studio](../../assets/images/upload-embeddings-enable.gif)
+
+If you have trouble with creating embeddings, refer to our example code for
+[generating image embeddings and uploading to Kolena](https://github.com/kolenaIO/kolena/tree/trunk/examples/dataset/search_embeddings).
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -25,6 +25,7 @@ nav:
       - Automatically Extract Image Properties: automations/extract-image-metadata.md
       - Automatically Extract Bounding Box Properties: automations/extract-bounding-box-metadata.md
       - Setting Up Natural Language Search: automations/set-up-natural-language-search.md
+      - Uploading Custom Embeddings: dataset/advanced-usage/upload-embeddings.md
       - Object Detection with Kolena: dataset/object-detection.md
       - LLM Powered Data Processing: dataset/advanced-usage/llm-prompt-extraction.md
       - Custom Queries and Fields: dataset/advanced-usage/custom-queries.md