docs (#1465)

* up * up * up * add chunk enrichment part
SciPhi-AI · Oct 23, 2024 · edf8220 · edf8220
1 parent bc53dc9
commit edf8220
Show file tree

Hide file tree

Showing 3 changed files with 70 additions and 2 deletions.
diff --git a/docs/documentation/cli/graph.mdx b/docs/documentation/cli/graph.mdx
@@ -129,8 +129,6 @@ r2r get-triples --collection-id my-collection --offset 0 --limit 10 --triple-ids
     </ParamField>
   </Accordion>
 </AccordionGroup>
-````
-
 
 ### Delete Graph
 

diff --git a/docs/documentation/python-sdk/graphrag.mdx b/docs/documentation/python-sdk/graphrag.mdx
@@ -500,6 +500,52 @@ client.get_tuned_prompt(
 
 The tuning process provides an LLM with chunks from each document in the collection. The relative sample size can therefore be controlled by adjusting the document and chunk limits.
 
+## Deduplicate Entities
+
+```python
+client.deduplicate_entities(
+    collection_id='122fdf6a-e116-546b-a8f6-e4cb2e2c0a09',
+    entity_deduplication_settings=entity_deduplication_settings
+)
+```
+
+<AccordionGroup>
+  <Accordion title="Response">
+    <ResponseField name="response" type="dict">
+      The response from the R2R system after deduplicating the entities.
+      ```bash
+      {
+        "message": "Entity deduplication task queued successfully.",
+        "task_id": "6e27dfca-606d-422d-b73f-2d9e138661b4"
+      }
+      ```
+    </ResponseField>
+  </Accordion>
+</AccordionGroup>
+
+
+<ParamField path="collection_id" type="Union[UUID, str]">
+  The ID of the collection to deduplicate entities for.
+</ParamField>
+
+<ParamField path="entity_deduplication_settings" type="EntityDeduplicationSettings">
+  The settings for the entity deduplication process.
+  <Expandable title="EntityDeduplicationSettings">
+    <ParamField path="kg_entity_deduplication_type" type="str">
+      The type of deduplication to perform. Valid values are "by_name". More deduplication types will be added in the future.
+    </ParamField>
+    <ParamField path="kg_entity_deduplication_prompt" type="str">
+      The prompt to use for entity deduplication.
+    </ParamField>
+    <ParamField path="generation_config" type="GenerationConfig">
+      The configuration for text generation during entity deduplication.
+    </ParamField>
+    <ParamField path="max_description_input_length" type="int">
+      The maximum length of the description for a node in the graph in characters (and not tokens).
+      Used so that we don't hit the input context window of the LLM while generating descriptions.
+    </ParamField>
+  </Expandable>
+</ParamField>
 
 ## Search and RAG
 

diff --git a/docs/documentation/python-sdk/ingestion.mdx b/docs/documentation/python-sdk/ingestion.mdx
@@ -26,6 +26,8 @@ We recommend this method for achieving the highest quality ingestion results.
 
 </Note>
 
+
+
 ### Ingest Files
 
 
@@ -48,6 +50,9 @@ ingest_response = client.ingest_files(
         "max_characters": 512, # hard maximum
         "combine_under_n_chars": 64, # hard minimum
         "overlap": 100,
+        "chunk_enrichment_settings": {
+            "enable_chunk_enrichment": False,
+        }
     }
 )
 
@@ -224,6 +229,25 @@ Refer to the [ingestion configuration](/documentation/configuration/ingestion/pa
 
 </ParamField>
 
+<Note>
+
+We have added support for contextual chunk enrichment! You can learn more about it [here](/cookbooks/contextual-enrichment). 
+
+Currently, you need to enable it in your ingestion config:
+
+```toml
+[ingestion.chunk_enrichment_settings]
+    enable_chunk_enrichment = true # disabled by default
+    strategies = ["semantic", "neighborhood"]
+    forward_chunks = 3            # Look ahead 3 chunks
+    backward_chunks = 3           # Look behind 3 chunks
+    semantic_neighbors = 10       # Find 10 semantically similar chunks
+    semantic_similarity_threshold = 0.7  # Minimum similarity score
+    generation_config = { model = "openai/gpt-4o-mini" }
+```
+
+</Note>
+
 ### Ingest Chunks
 
 Ingest pre-parsed text chunks into your R2R system: