Skip to content

Commit

Permalink
docs (#1465)
Browse files Browse the repository at this point in the history
* up

* up

* up

* add chunk enrichment part
  • Loading branch information
shreyaspimpalgaonkar authored Oct 23, 2024
1 parent bc53dc9 commit edf8220
Show file tree
Hide file tree
Showing 3 changed files with 70 additions and 2 deletions.
2 changes: 0 additions & 2 deletions docs/documentation/cli/graph.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -129,8 +129,6 @@ r2r get-triples --collection-id my-collection --offset 0 --limit 10 --triple-ids
</ParamField>
</Accordion>
</AccordionGroup>
````

### Delete Graph

Expand Down
46 changes: 46 additions & 0 deletions docs/documentation/python-sdk/graphrag.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -500,6 +500,52 @@ client.get_tuned_prompt(

The tuning process provides an LLM with chunks from each document in the collection. The relative sample size can therefore be controlled by adjusting the document and chunk limits.

## Deduplicate Entities

```python
client.deduplicate_entities(
collection_id='122fdf6a-e116-546b-a8f6-e4cb2e2c0a09',
entity_deduplication_settings=entity_deduplication_settings
)
```

<AccordionGroup>
<Accordion title="Response">
<ResponseField name="response" type="dict">
The response from the R2R system after deduplicating the entities.
```bash
{
"message": "Entity deduplication task queued successfully.",
"task_id": "6e27dfca-606d-422d-b73f-2d9e138661b4"
}
```
</ResponseField>
</Accordion>
</AccordionGroup>


<ParamField path="collection_id" type="Union[UUID, str]">
The ID of the collection to deduplicate entities for.
</ParamField>

<ParamField path="entity_deduplication_settings" type="EntityDeduplicationSettings">
The settings for the entity deduplication process.
<Expandable title="EntityDeduplicationSettings">
<ParamField path="kg_entity_deduplication_type" type="str">
The type of deduplication to perform. Valid values are "by_name". More deduplication types will be added in the future.
</ParamField>
<ParamField path="kg_entity_deduplication_prompt" type="str">
The prompt to use for entity deduplication.
</ParamField>
<ParamField path="generation_config" type="GenerationConfig">
The configuration for text generation during entity deduplication.
</ParamField>
<ParamField path="max_description_input_length" type="int">
The maximum length of the description for a node in the graph in characters (and not tokens).
Used so that we don't hit the input context window of the LLM while generating descriptions.
</ParamField>
</Expandable>
</ParamField>

## Search and RAG

Expand Down
24 changes: 24 additions & 0 deletions docs/documentation/python-sdk/ingestion.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ We recommend this method for achieving the highest quality ingestion results.

</Note>



### Ingest Files


Expand All @@ -48,6 +50,9 @@ ingest_response = client.ingest_files(
"max_characters": 512, # hard maximum
"combine_under_n_chars": 64, # hard minimum
"overlap": 100,
"chunk_enrichment_settings": {
"enable_chunk_enrichment": False,
}
}
)

Expand Down Expand Up @@ -224,6 +229,25 @@ Refer to the [ingestion configuration](/documentation/configuration/ingestion/pa

</ParamField>

<Note>

We have added support for contextual chunk enrichment! You can learn more about it [here](/cookbooks/contextual-enrichment).

Currently, you need to enable it in your ingestion config:

```toml
[ingestion.chunk_enrichment_settings]
enable_chunk_enrichment = true # disabled by default
strategies = ["semantic", "neighborhood"]
forward_chunks = 3 # Look ahead 3 chunks
backward_chunks = 3 # Look behind 3 chunks
semantic_neighbors = 10 # Find 10 semantically similar chunks
semantic_similarity_threshold = 0.7 # Minimum similarity score
generation_config = { model = "openai/gpt-4o-mini" }
```

</Note>

### Ingest Chunks

Ingest pre-parsed text chunks into your R2R system:
Expand Down

0 comments on commit edf8220

Please sign in to comment.