Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds note on reindexing existing data for semantic_text usage #113590

Merged
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,17 @@ PUT semantic-embeddings
It will be used to generate the embeddings based on the input text.
Every time you ingest data into the related `semantic_text` field, this endpoint will be used for creating the vector representation of the text.

[NOTE]
====
If you're using web crawlers or connectors to generate indices, you have to <<indices-put-mapping,update>> the existing index mapping
for the indices generated by web crawlers and connectors to include the `semantic_text` field. Once the mapping is updated,
you must run a full crawl if you are using a web crawler or a full sync if you are using a connector. This ensures that all
existing documents are reprocessed and updated with the new semantic embeddings, enabling you to perform semantic search on
the updated data.
====



kosabogi marked this conversation as resolved.
Show resolved Hide resolved

[discrete]
[[semantic-text-load-data]]
Expand Down Expand Up @@ -118,6 +129,13 @@ Create the embeddings from the text by reindexing the data from the `test-data`
The data in the `content` field will be reindexed into the `content` semantic text field of the destination index.
The reindexed data will be processed by the {infer} endpoint associated with the `content` semantic text field.

[NOTE]
====
This step uses the reindex API to simulate data ingestion. If you are working with data that has already been indexed,
rather than using the test-data set, reindexing is required to ensure that the data is processed by the {infer} endpoint
and the necessary embeddings are generated.
====

[source,console]
------------------------------------------------------------
POST _reindex?wait_for_completion=false
Expand Down