From 8a95b5fafc35cb42a01e9b24ce0f792ab83f4eab Mon Sep 17 00:00:00 2001 From: kosabogi Date: Thu, 26 Sep 2024 12:29:46 +0200 Subject: [PATCH 1/4] Adds note on reindexing existing data for semantic_text usage --- .../semantic-search-semantic-text.asciidoc | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc b/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc index de9a35e0d29b8..2c67b46fffec2 100644 --- a/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc +++ b/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc @@ -118,6 +118,13 @@ Create the embeddings from the text by reindexing the data from the `test-data` The data in the `content` field will be reindexed into the `content` semantic text field of the destination index. The reindexed data will be processed by the {infer} endpoint associated with the `content` semantic text field. +[NOTE] +==== +This step uses the reindex API to simulate data ingestion. If you have existing data already indexed +(i.e., you’re not using the `test-data` set but your own data), reindexing is required to ensure that +the data is processed by the {infer} endpoint and the necessary embeddings are generated. +==== + [source,console] ------------------------------------------------------------ POST _reindex?wait_for_completion=false From 84ecc1705f8e3ddc9fda39652d4a0cf09c0021b2 Mon Sep 17 00:00:00 2001 From: kosabogi Date: Mon, 7 Oct 2024 08:11:48 +0200 Subject: [PATCH 2/4] Adds note about full crawl and full sync --- .../semantic-search-semantic-text.asciidoc | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc b/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc index 2c67b46fffec2..5d403b1119077 100644 --- a/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc +++ b/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc @@ -89,6 +89,17 @@ PUT semantic-embeddings It will be used to generate the embeddings based on the input text. Every time you ingest data into the related `semantic_text` field, this endpoint will be used for creating the vector representation of the text. +[NOTE] +==== +If you're using web crawlers or connectors to generate indices, you have to <> the existing index mapping +for the indices generated by web crawlers and connectors to include the `semantic_text` field. Once the mapping is updated, +you must run a full crawl if you are using a web crawler or a full sync if you are using a connector. This ensures that all +existing documents are reprocessed and updated with the new semantic embeddings, enabling you to perform semantic search on +the updated data. +==== + + + [discrete] [[semantic-text-load-data]] From 37e772e5e5767932fbcbae79705630298f629bc8 Mon Sep 17 00:00:00 2001 From: kosabogi Date: Mon, 7 Oct 2024 09:48:47 +0200 Subject: [PATCH 3/4] Style guide related fix --- .../search-your-data/semantic-search-semantic-text.asciidoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc b/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc index 5d403b1119077..6e71ca0b61f68 100644 --- a/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc +++ b/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc @@ -131,9 +131,9 @@ The reindexed data will be processed by the {infer} endpoint associated with the [NOTE] ==== -This step uses the reindex API to simulate data ingestion. If you have existing data already indexed -(i.e., you’re not using the `test-data` set but your own data), reindexing is required to ensure that -the data is processed by the {infer} endpoint and the necessary embeddings are generated. +This step uses the reindex API to simulate data ingestion. If you are working with data that has already been indexed, +rather than using the test-data set, reindexing is required to ensure that the data is processed by the {infer} endpoint +and the necessary embeddings are generated. ==== [source,console] From 461afbc6e2a7edb6a66773dc6838b2d29513465c Mon Sep 17 00:00:00 2001 From: kosabogi <105062005+kosabogi@users.noreply.github.com> Date: Tue, 8 Oct 2024 06:57:17 +0200 Subject: [PATCH 4/4] Update docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> --- .../semantic-search-semantic-text.asciidoc | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc b/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc index 6e71ca0b61f68..7658a2a94dbb2 100644 --- a/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc +++ b/docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc @@ -91,16 +91,15 @@ Every time you ingest data into the related `semantic_text` field, this endpoint [NOTE] ==== -If you're using web crawlers or connectors to generate indices, you have to <> the existing index mapping -for the indices generated by web crawlers and connectors to include the `semantic_text` field. Once the mapping is updated, -you must run a full crawl if you are using a web crawler or a full sync if you are using a connector. This ensures that all -existing documents are reprocessed and updated with the new semantic embeddings, enabling you to perform semantic search on -the updated data. +If you're using web crawlers or connectors to generate indices, you have to +<> for these indices to +include the `semantic_text` field. Once the mapping is updated, you'll need to run +a full web crawl or a full connector sync. This ensures that all existing +documents are reprocessed and updated with the new semantic embeddings, +enabling semantic search on the updated data. ==== - - [discrete] [[semantic-text-load-data]] ==== Load data