elastic · markjhoy · Jul 12, 2024 · Jul 8, 2024 · Jul 8, 2024 · Jul 8, 2024
diff --git a/docs/reference/inference/inference-apis.asciidoc b/docs/reference/inference/inference-apis.asciidoc
@@ -25,6 +25,7 @@ include::delete-inference.asciidoc[]
 include::get-inference.asciidoc[]
 include::post-inference.asciidoc[]
 include::put-inference.asciidoc[]
+include::service-amazon-bedrock.asciidoc[]
 include::service-azure-ai-studio.asciidoc[]
 include::service-azure-openai.asciidoc[]
 include::service-cohere.asciidoc[]

diff --git a/docs/reference/inference/put-inference.asciidoc b/docs/reference/inference/put-inference.asciidoc
@@ -34,6 +34,7 @@ The create {infer} API enables you to create an {infer} endpoint and configure a
 
 The following services are available through the {infer} API, click the links to review the configuration details of the services:
 
+* <<infer-service-amazon-bedrock,Amazon Bedrock>>
 * <<infer-service-azure-ai-studio,Azure AI Studio>>
 * <<infer-service-azure-openai,Azure OpenAI>>
 * <<infer-service-cohere,Cohere>>

diff --git a/docs/reference/inference/service-amazon-bedrock.asciidoc b/docs/reference/inference/service-amazon-bedrock.asciidoc
@@ -0,0 +1,173 @@
+[[infer-service-amazon-bedrock]]
+=== Amazon Bedrock {infer} service
+
+Creates an {infer} endpoint to perform an {infer} task with the `amazonbedrock` service.
+
+[discrete]
+[[infer-service-azure-ai-studio-api-request]]
-[[infer-service-azure-ai-studio-api-request]]
+[[infer-service-amazon-bedrock-api-request]]
-[[infer-service-azure-ai-studio-api-request]]
+[[infer-service-amazon-bedrock-api-request]]
+==== {api-request-title}
+
+`PUT /_inference/<task_type>/<inference_id>`
+
+[discrete]
+[[infer-service-azure-ai-studio-api-path-params]]
-[[infer-service-azure-ai-studio-api-path-params]]
+[[infer-service-amazon-bedrock-path-params]]
-[[infer-service-azure-ai-studio-api-path-params]]
+[[infer-service-amazon-bedrock-path-params]]
+==== {api-path-parms-title}
+
+`<inference_id>`::
+(Required, string)
+include::inference-shared.asciidoc[tag=inference-id]
+
+`<task_type>`::
+(Required, string)
+include::inference-shared.asciidoc[tag=task-type]
++
+--
+Available task types:
+
+* `completion`,
+* `text_embedding`.
+--
+
+[discrete]
+[[infer-service-amazon-bedrock-api-request-body]]
+==== {api-request-body-title}
+
+`service`::
+(Required, string) The type of service supported for the specified task type.
+In this case,
+`amazonbedrock`.
+
+`service_settings`::
+(Required, object)
+include::inference-shared.asciidoc[tag=service-settings]
++
+--
+These settings are specific to the `amazonbedrock` service.
+--
+
+`access_key`:::
+(Required, string)
+A valid AWS access key that has permissions to use Amazon Bedrock and access to models for inference requests.
+
+`secret_key`:::
+(Required, string)
+A valid AWS secret key that is paired with the `access_key`.
+To create or manage access and secret keys, see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html[Managing access keys for IAM users] in the AWS documentation.
+
+IMPORTANT: You need to provide the access and secret keys only once, during the {infer} model creation.
+The <<get-inference-api>> does not retrieve your access or secret keys.
+After creating the {infer} model, you cannot change the associated key pairs.
+If you want to use a different access and secret key pair, delete the {infer} model and recreate it with the same name and the updated keys.
+
+`provider`:::
+(Required, string)
+The model provider for your deployment.
+Note that some providers may support only certain task types.
+Supported providers include:
+
+* `amazontitan` - available for `text_embedding` and `completion` task types
+* `anthropic` - available for `completion` task type only
+* `ai21labs` - available for `completion` task type only
+* `cohere` - available for `text_embedding` and `completion` task types
+* `meta` - available for `completion` task type only
+* `mistral` - available for `completion` task type only
+
+`model`:::
+(Required, string)
+The base model ID or an ARN to a custom model based on a foundational model.
+The base model IDs can be found in the https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock model IDs] documentation.
+Note that the model ID must be available for the provider chosen, and your IAM user must have access to the model.
+
+`region`:::
+(Required, string)
+The region that your model or ARN is deployed in.
+The list of available regions per model can be found in the https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html[Model support by AWS region] documentation.
+
+`rate_limit`:::
+(Optional, object)
+By default, the `azureaistudio` service sets the number of requests allowed per minute to `240`.
-By default, the `azureaistudio` service sets the number of requests allowed per minute to `240`.
+By default, the `amazonbedrock` service sets the number of requests allowed per minute to `240`.
-By default, the `azureaistudio` service sets the number of requests allowed per minute to `240`.
+By default, the `amazonbedrock` service sets the number of requests allowed per minute to `240`.
+This helps to minimize the number of rate limit errors returned from Azure AI Studio.
-This helps to minimize the number of rate limit errors returned from Azure AI Studio.
+This helps to minimize the number of rate limit errors returned from Amazon Bedrock.
-This helps to minimize the number of rate limit errors returned from Azure AI Studio.
+This helps to minimize the number of rate limit errors returned from Amazon Bedrock.
+To modify this, set the `requests_per_minute` setting of this object in your service settings:
++
+--
+include::inference-shared.asciidoc[tag=request-per-minute-example]
+--
+
+`task_settings`::
+(Optional, object)
+include::inference-shared.asciidoc[tag=task-settings]
++
+.`task_settings` for the `completion` task type
+[%collapsible%closed]
+=====
+
+`max_new_tokens`:::
+(Optional, integer)
+Provides a hint for the maximum number of output tokens to be generated.
-Provides a hint for the maximum number of output tokens to be generated.
+Sets a maximum number for the output tokens to be generated.
-Provides a hint for the maximum number of output tokens to be generated.
+Sets a maximum number for the output tokens to be generated.
+Defaults to 64.
+
+`temperature`:::
+(Optional, float)
+A number in the range of 0.0 to 1.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions.
-A number in the range of 0.0 to 1.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions.
+A number between 0.0 and 1.0 that controls the apparent creativity of the results. At temperature 0.0 the model is most deterministic, at temperature 1.0 most random.
-A number in the range of 0.0 to 1.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions.
+A number between 0.0 and 1.0 that controls the apparent creativity of the results. At temperature 0.0 the model is most deterministic, at temperature 1.0 most random.
+Should not be used if `top_p` or `top_k` is specified.
+
+`top_p`:::
+(Optional, float)
+A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability.
-A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability.
+Alternative to `temperature`. A number in the range of 0.0 to 1.0, to eliminate low-probability tokens. Top-p uses nucleus sampling to select top tokens whose sum of likelihoods does not exceed a certain value, ensuring both variety and coherence.
-A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability.
+Alternative to `temperature`. A number in the range of 0.0 to 1.0, to eliminate low-probability tokens. Top-p uses nucleus sampling to select top tokens whose sum of likelihoods does not exceed a certain value, ensuring both variety and coherence.
+Should not be used if `temperature` or `top_k` is specified.
+
+`top_p`:::
-`top_p`:::
+`top_k`:::
-`top_p`:::
+`top_k`:::
+(Optional, float)
+Only available for `anthropic`, `cohere`, and `mistral` providers.
+A number in the range of 0.0 to 1.0 that is an alternative value to temperature or top_p that causes the model to consider the results of the tokens with nucleus sampling probability.
-A number in the range of 0.0 to 1.0 that is an alternative value to temperature or top_p that causes the model to consider the results of the tokens with nucleus sampling probability.
+Alternative to `temperature`. Limits samples to the top-K most likely words, balancing coherence and variability.
+A number in the range of 0.0 to 1.0.
-A number in the range of 0.0 to 1.0 that is an alternative value to temperature or top_p that causes the model to consider the results of the tokens with nucleus sampling probability.
+Alternative to `temperature`. Limits samples to the top-K most likely words, balancing coherence and variability.
+A number in the range of 0.0 to 1.0.
+Should not be used if `temperature` or `top_p` is specified.
+
+=====
++
+.`task_settings` for the `text_embedding` task type
+[%collapsible%closed]
+=====
+
+There are no `task_settings` available for the `text_embedding` task type.
+
+[discrete]
+[[inference-example-amazonbedrock]]
+==== Amazon Bedrock service example
+
+The following example shows how to create an {infer} endpoint called `amazon_bedrock_embeddings` to perform a `text_embedding` task type.
+
+The list of chat completion and embeddings models that you can choose from should be a https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base model] you have access to.
-The list of chat completion and embeddings models that you can choose from should be a https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base model] you have access to.
+Choose chat completion and embeddings models you have access to from the https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base models].
-The list of chat completion and embeddings models that you can choose from should be a https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base model] you have access to.
+Choose chat completion and embeddings models you have access to from the https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base models].
+
+[source,console]
+------------------------------------------------------------
+PUT _inference/text_embedding/amazon_bedrock_embeddings
+{
+    "service": "amazonbedrock",
+    "service_settings": {
+        "access_key": "<aws_access_key>",
+        "secret_key": "<aws_secret_key>",
+        "region": "us-east-1",
+        "provider": "amazontitan",
+        "model": "amazon.titan-embed-text-v2:0",
+    }
+}
+------------------------------------------------------------
+// TEST[skip:TBD]
+
+The next example shows how to create an {infer} endpoint called `amazon_bedrock_completion` to perform a `completion` task type.
+
+[source,console]
+------------------------------------------------------------
+PUT _inference/completion/amazon_bedrock_completion
+{
+    "service": "amazonbedrock",
+    "service_settings": {
+        "access_key": "<aws_access_key>",
+        "secret_key": "<aws_secret_key>",
+        "region": "us-east-1",
+        "provider": "amazontitan",
+        "model": "amazon.titan-text-premier-v1:0",
+    }
+}
+------------------------------------------------------------
+// TEST[skip:TBD]