Apply suggestions from code review

Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Yaliang Wu <ylwu@amazon.com>
opensearch-project · Feb 28, 2024 · a7f2e9f · a7f2e9f
1 parent 33ef3a6
commit a7f2e9f
Showing 1 changed file with 43 additions and 43 deletions.
diff --git a/docs/tutorials/aws/semantic_search_with_CFN_template_for_Sagemaker.md b/docs/tutorials/aws/semantic_search_with_CFN_template_for_Sagemaker.md
@@ -1,17 +1,17 @@
 # Topic
 
-This doc introduces how to build semantic search in Amazon managed OpenSearch with [AWS CloudFormation](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/cfn-template.html) and Sagemaker.
-If you are not using Amazon OpenSearch, you can refer to [sagemaker_connector_blueprint](https://github.com/opensearch-project/ml-commons/blob/main/docs/remote_inference_blueprints/sagemaker_connector_blueprint.md) and [OpenSearch semantic search](https://opensearch.org/docs/latest/search-plugins/semantic-search/).
+This doc describes how to build semantic search in Amazon-managed OpenSearch service with [AWS CloudFormation](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/cfn-template.html) and SageMaker.
+If you are not using Amazon OpenSearch, refer to [sagemaker_connector_blueprint](https://github.com/opensearch-project/ml-commons/blob/main/docs/remote_inference_blueprints/sagemaker_connector_blueprint.md) and [OpenSearch semantic search](https://opensearch.org/docs/latest/search-plugins/semantic-search/).
 
-The CloudFormation integration will automate the manual process in this [semantic_search_with_sagemaker_embedding_model tutorial](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/tutorials/aws/semantic_search_with_sagemaker_embedding_model.md).
+The CloudFormation integration automates the manual process described in the [semantic_search_with_sagemaker_embedding_model tutorial](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/tutorials/aws/semantic_search_with_sagemaker_embedding_model.md).
 
-The CloudFormation template will create IAM role and then use Lambda function to create AI connector and model.
+The CloudFormation template creates an IAM role and then uses a Lambda function to create an AI connector and model.
 
-Make sure your sageMaker model inputs follow the format as an array of Strings, so the [default pre-process function](https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/blueprints/#preprocessing-function) can work
+Make sure your SageMaker model inputs follow the format that the [default pre-processing function](https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/blueprints/#preprocessing-function) requires. The model input must be an array of strings. 
 ```
 ["hello world", "how are you"]
 ```
-and output follow such format as an array of array, with each array corresponds to the result of an input String, so the [default post-process function](https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/blueprints/#post-processing-function) can work
+Additionally, make sure the model output follows the format that the [default post-processing function](https://opensearch.org/docs/latest/ml-commons-plugin/remote-models/blueprints/#post-processing-function) requires. The model output must be an array of arrays, where each array corresponds to the embedding of an input string.
 ```
 [
   [
@@ -27,17 +27,17 @@ and output follow such format as an array of array, with each array corresponds
 ]
 ```
 
-If your model input/output is not the same with default, you can build your own pre/post process function with [painless script](https://opensearch.org/docs/latest/api-reference/script-apis/exec-script/).
+If your model input/output is not the same as the required default, you can build your own pre/post-processing function using a [Painless script](https://opensearch.org/docs/latest/api-reference/script-apis/exec-script/).
 
-For example, Bedrock Titan embedding model ([blueprint](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/bedrock_connector_titan_embedding_blueprint.md#2-create-connector-for-amazon-bedrock)) input is 
+For example, the Amazon Bedrock Titan embedding model ([blueprint](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/remote_inference_blueprints/bedrock_connector_titan_embedding_blueprint.md#2-create-connector-for-amazon-bedrock)) input is 
 ```
 { "inputText": "your_input_text" }
 ```
-Neural-search plugin will send such input to ml-commons
+The Neural Search plugin will sends the model input to ml-commons as follows:
 ```
 { "text_docs": [ "your_input_text1", "your_input_text2"] }
 ```
-So you need to build such pre-process function to transform `text_docs` to `inputText`:
+Thus, you need to build a pre-processing function to transform `text_docs` into `inputText`:
 ```
 "pre_process_function": """
     StringBuilder builder = new StringBuilder();
@@ -49,13 +49,13 @@ So you need to build such pre-process function to transform `text_docs` to `inpu
     return  "{" +"\"parameters\":" + parameters + "}";"""
 ```
 
-Default Bedrock Titan embedding model output:
+The default Amazon Bedrock Titan embedding model output has the following format:
 ```
 {
   "embedding": <float_array>
 }
 ```
-But neural-search plugin expects such format
+However, the Neural Search plugin expects the following format:
 ```
 {
   "name": "sentence_embedding",
@@ -64,7 +64,7 @@ But neural-search plugin expects such format
   "data": <float_array>
 }
 ```
-Similarly, you need to build post-process function to transform Bedrock Titan embedding model output, so neural-search plugin can recognize:
+Similarly, you need to build a post-processing function to transform the Bedrock Titan embedding model output into a format that the Neural Search plugin requires:
 
 ```
 "post_process_function": """
@@ -84,71 +84,71 @@ Similarly, you need to build post-process function to transform Bedrock Titan em
     """
 ```
 
-Note: You should replace the placeholders with prefix `your_` with your own value.
+Note: Replace the placeholders that start with the prefix `your_` with your own values.
 
 # Steps
 
-## 0. Create OpenSearch cluster
+## 0. Create an OpenSearch cluster
 
-Go to AWS OpenSearch console UI and create OpenSearch domain.
+Go to the AWS OpenSearch console UI and create an OpenSearch domain.
 
-Note the domain ARN, which will be used in next step.
+Note the domain ARN; you'll use it in the next step.
 
 ## 1. Map backend role
 
-AWS OpenSearch Integration CloudFormation template will use Lambda to create AI connector with some IAM role. You need to 
-map IAM role to `ml_full_access` to grant it permission. 
-You can refer to [semantic_search_with_sagemaker_embedding_model#map-backend-role](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/tutorials/aws/semantic_search_with_sagemaker_embedding_model.md#22-map-backend-role) part. 
+AWS OpenSearch Integration CloudFormation template will use a Lambda function to create an AI connector with an IAM role. You need to 
+map the IAM role to `ml_full_access` to grant it the required permissions. 
+Refer to [semantic_search_with_sagemaker_embedding_model#map-backend-role](https://github.com/opensearch-project/ml-commons/blob/2.x/docs/tutorials/aws/semantic_search_with_sagemaker_embedding_model.md#22-map-backend-role). 
 
-The IAM role can be located in the "Lambda Invoke OpenSearch ML Commons Role Name" field on the CloudFormation template. Please refer to the screenshot in step 2.1.
+You can find the IAM role in the `Lambda Invoke OpenSearch ML Commons Role Name` field in the CloudFormation template (see the screenshot in step 2.1).
 
-The default IAM role is `LambdaInvokeOpenSearchMLCommonsRole`, so you need to map this backend role `arn:aws:iam::your_aws_account_id:role/LambdaInvokeOpenSearchMLCommonsRole` to `ml_full_access`.
+The default IAM role is `LambdaInvokeOpenSearchMLCommonsRole`, so you need to map the `arn:aws:iam::your_aws_account_id:role/LambdaInvokeOpenSearchMLCommonsRole` backend role  to `ml_full_access`.
 
-For quick start, you can also map all roles to `ml_full_access` with wildcard `arn:aws:iam::your_aws_account_id:role/*`
+For a quick start, you can also map all roles to `ml_full_access` using a wildcard `arn:aws:iam::your_aws_account_id:role/*`
 
-As `all_access` has more permission than `ml_full_access`, it's ok to map backend role to `all_acess`.
+Because `all_access` has more permissions than `ml_full_access`, it's OK to map the backend role to `all_access`.
 
 
 ## 2. Run CloudFormation template
 
-You can find CloudFormation template integration on AWS OpenSearch console.
+You can find CloudFormation template integration in the AWS OpenSearch console.
 
 ![Alt text](images/semantic_search/semantic_search_remote_model_Integration_1.png)
 
-For all options below, you can find OpenSearch AI connector and model id in "Outputs" of CloudFormation stack when it completes.
+For all options below, you can find the OpenSearch AI connector and model IDs in the CloudFormation stack `Outputs` when it completes.
 
-If  you see any failure, you can find log on Sagemaker Console by searching "Log Groups" with CloudFormation stack name.
+If you see any failure, you can find the log in the SageMaker Console by searching for `Log Groups` with the CloudFormation stack name.
 
-### 2.1 Option1: Deploy pretrained model to Sagemaker
+### 2.1 Option 1: Deploy pretrained model to SageMaker
 
-You can deploy pretrained Huggingface sentence-transformer embedding model from [DJL](https://djl.ai/) model repo.
+You can deploy a pretrained Huggingface sentence-transformer embedding model from the [DJL](https://djl.ai/) model repo.
 
-Keep other fields as default value if not mentioned below:
+Fill out the following fields as described. Keep the default values for all fields not mentioned below:
 
-1. You must fill your "Amazon OpenSearch Endpoint"
-2. You can use default setting of "Sagemaker Configuration" part for quick start. If necessary, you can fine tune the values. You can find all supported Sagemaker instance type [here](https://aws.amazon.com/sagemaker/pricing/).
-3. You must leave "SageMaker Endpoint Url" as empty. By inputting this value, you will not deploy model to Sagemaker to create a new inference endpoint.
-4. You can leave "Custom Image" part as empty, it will use `djl-inference:0.22.1-cpu-full` as default value. You can find more available image from [this document](https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html).
-5. You must leave "Custom Model Data Url" as empty for this option.
-6. The default value "Custom Model Environment" is `djl://ai.djl.huggingface.pytorch/sentence-transformers/all-MiniLM-L6-v2`, find all supported model in Appendix part of this doc.
+1. You must fill your `Amazon OpenSearch Endpoint`.
+2. You can use the default setting of the `Sagemaker Configuration` field for a quick start. If necessary, you can change these values. For all supported SageMaker instance types, see [SageMaker documentation](https://aws.amazon.com/sagemaker/pricing/).
+3. You must leave the `SageMaker Endpoint Url` empty. If you input a URL in this field, you will not deploy the model to SageMaker to create a new inference endpoint.
+4. You can leave the `Custom Image` field empty. The default is `djl-inference:0.22.1-cpu-full`. For all available images, see [this document](https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html).
+5. You must leave the `Custom Model Data Url` empty.
+6. The default value of `Custom Model Environment` is `djl://ai.djl.huggingface.pytorch/sentence-transformers/all-MiniLM-L6-v2`. For all supported models see the [Appendix](#appendix).
 
 ![Alt text](images/semantic_search/semantic_search_remote_model_Integration_2.png)
 
 
-### 2.2 Option2: Create model with your existing Sagemaker inference endpoint
+### 2.2 Option 2: Create model with your existing Sagemaker inference endpoint
 
-If you already have a Sagemaker inference endpoint, you can create remote model directly with it.
+If you already have a SageMaker inference endpoint, you can create a remote model directly using this endpoint.
 
-Keep other fields as default value if not mentioned below:
-1. You must fill your "Amazon OpenSearch Endpoint"
-2. You must fill your "SageMaker Endpoint Url".
-3. You must leave "Custom Image", "Custom Model Data Url" and "Custom Model Environment" as empty.
+Fill out the following fields as described. Keep the default values for all fields not mentioned below:
+1. You must fill your `Amazon OpenSearch Endpoint`.
+2. You must fill your `SageMaker Endpoint Url`.
+3. You must leave `Custom Image`, `Custom Model Data Url`, and `Custom Model Environment` empty.
 
 ![Alt text](images/semantic_search/semantic_search_remote_model_Integration_3.png)
 
 
 # Appendix
-## Huggingface sentence-transformer embedding model in DJL model repo
+## Huggingface sentence-transformer embedding models available in DJL model repo
 ```
 djl://ai.djl.huggingface.pytorch/sentence-transformers/LaBSE/
 djl://ai.djl.huggingface.pytorch/sentence-transformers/all-MiniLM-L12-v1/