-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference API] Add Docs for Amazon Bedrock Support for the Inference API #110594
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,173 @@ | ||||||||
[[infer-service-amazon-bedrock]] | ||||||||
=== Amazon Bedrock {infer} service | ||||||||
|
||||||||
Creates an {infer} endpoint to perform an {infer} task with the `amazonbedrock` service. | ||||||||
|
||||||||
[discrete] | ||||||||
[[infer-service-azure-ai-studio-api-request]] | ||||||||
==== {api-request-title} | ||||||||
|
||||||||
`PUT /_inference/<task_type>/<inference_id>` | ||||||||
|
||||||||
[discrete] | ||||||||
[[infer-service-azure-ai-studio-api-path-params]] | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
==== {api-path-parms-title} | ||||||||
|
||||||||
`<inference_id>`:: | ||||||||
(Required, string) | ||||||||
include::inference-shared.asciidoc[tag=inference-id] | ||||||||
|
||||||||
`<task_type>`:: | ||||||||
(Required, string) | ||||||||
include::inference-shared.asciidoc[tag=task-type] | ||||||||
+ | ||||||||
-- | ||||||||
Available task types: | ||||||||
|
||||||||
* `completion`, | ||||||||
* `text_embedding`. | ||||||||
-- | ||||||||
|
||||||||
[discrete] | ||||||||
[[infer-service-amazon-bedrock-api-request-body]] | ||||||||
==== {api-request-body-title} | ||||||||
|
||||||||
`service`:: | ||||||||
(Required, string) The type of service supported for the specified task type. | ||||||||
In this case, | ||||||||
`amazonbedrock`. | ||||||||
|
||||||||
`service_settings`:: | ||||||||
(Required, object) | ||||||||
include::inference-shared.asciidoc[tag=service-settings] | ||||||||
+ | ||||||||
-- | ||||||||
These settings are specific to the `amazonbedrock` service. | ||||||||
-- | ||||||||
|
||||||||
`access_key`::: | ||||||||
(Required, string) | ||||||||
A valid AWS access key that has permissions to use Amazon Bedrock and access to models for inference requests. | ||||||||
|
||||||||
`secret_key`::: | ||||||||
(Required, string) | ||||||||
A valid AWS secret key that is paired with the `access_key`. | ||||||||
To create or manage access and secret keys, see https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html[Managing access keys for IAM users] in the AWS documentation. | ||||||||
|
||||||||
IMPORTANT: You need to provide the access and secret keys only once, during the {infer} model creation. | ||||||||
The <<get-inference-api>> does not retrieve your access or secret keys. | ||||||||
After creating the {infer} model, you cannot change the associated key pairs. | ||||||||
If you want to use a different access and secret key pair, delete the {infer} model and recreate it with the same name and the updated keys. | ||||||||
|
||||||||
`provider`::: | ||||||||
(Required, string) | ||||||||
The model provider for your deployment. | ||||||||
Note that some providers may support only certain task types. | ||||||||
Supported providers include: | ||||||||
|
||||||||
* `amazontitan` - available for `text_embedding` and `completion` task types | ||||||||
* `anthropic` - available for `completion` task type only | ||||||||
* `ai21labs` - available for `completion` task type only | ||||||||
* `cohere` - available for `text_embedding` and `completion` task types | ||||||||
* `meta` - available for `completion` task type only | ||||||||
* `mistral` - available for `completion` task type only | ||||||||
|
||||||||
`model`::: | ||||||||
(Required, string) | ||||||||
The base model ID or an ARN to a custom model based on a foundational model. | ||||||||
The base model IDs can be found in the https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock model IDs] documentation. | ||||||||
Note that the model ID must be available for the provider chosen, and your IAM user must have access to the model. | ||||||||
|
||||||||
`region`::: | ||||||||
(Required, string) | ||||||||
The region that your model or ARN is deployed in. | ||||||||
The list of available regions per model can be found in the https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html[Model support by AWS region] documentation. | ||||||||
|
||||||||
`rate_limit`::: | ||||||||
(Optional, object) | ||||||||
By default, the `azureaistudio` service sets the number of requests allowed per minute to `240`. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
This helps to minimize the number of rate limit errors returned from Azure AI Studio. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Argh - great catches ;) That's what I get for copy / pasting |
||||||||
To modify this, set the `requests_per_minute` setting of this object in your service settings: | ||||||||
+ | ||||||||
-- | ||||||||
include::inference-shared.asciidoc[tag=request-per-minute-example] | ||||||||
-- | ||||||||
|
||||||||
`task_settings`:: | ||||||||
(Optional, object) | ||||||||
include::inference-shared.asciidoc[tag=task-settings] | ||||||||
+ | ||||||||
.`task_settings` for the `completion` task type | ||||||||
[%collapsible%closed] | ||||||||
===== | ||||||||
|
||||||||
`max_new_tokens`::: | ||||||||
(Optional, integer) | ||||||||
Provides a hint for the maximum number of output tokens to be generated. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Not sure what "hint" means here, rewording tries to clarify |
||||||||
Defaults to 64. | ||||||||
|
||||||||
`temperature`::: | ||||||||
(Optional, float) | ||||||||
A number in the range of 0.0 to 1.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
Should not be used if `top_p` or `top_k` is specified. | ||||||||
|
||||||||
`top_p`::: | ||||||||
(Optional, float) | ||||||||
A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
Should not be used if `temperature` or `top_k` is specified. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reading around it looks like top-p and top-k can be used in combination? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI - you're correct here... theoretically, you can use all three, but you shouldn't use |
||||||||
|
||||||||
`top_p`::: | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Assuming the first top_p is the correct one 😉 |
||||||||
(Optional, float) | ||||||||
Only available for `anthropic`, `cohere`, and `mistral` providers. | ||||||||
A number in the range of 0.0 to 1.0 that is an alternative value to temperature or top_p that causes the model to consider the results of the tokens with nucleus sampling probability. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||
Should not be used if `temperature` or `top_p` is specified. | ||||||||
|
||||||||
===== | ||||||||
+ | ||||||||
.`task_settings` for the `text_embedding` task type | ||||||||
[%collapsible%closed] | ||||||||
===== | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @markjhoy I think this unclosed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah thanks! I could not figure out for the life of me where that error was coming from! |
||||||||
|
||||||||
There are no `task_settings` available for the `text_embedding` task type. | ||||||||
|
||||||||
[discrete] | ||||||||
[[inference-example-amazonbedrock]] | ||||||||
==== Amazon Bedrock service example | ||||||||
|
||||||||
The following example shows how to create an {infer} endpoint called `amazon_bedrock_embeddings` to perform a `text_embedding` task type. | ||||||||
|
||||||||
The list of chat completion and embeddings models that you can choose from should be a https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base model] you have access to. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
nit: keep sentence short |
||||||||
|
||||||||
[source,console] | ||||||||
------------------------------------------------------------ | ||||||||
PUT _inference/text_embedding/amazon_bedrock_embeddings | ||||||||
{ | ||||||||
"service": "amazonbedrock", | ||||||||
"service_settings": { | ||||||||
"access_key": "<aws_access_key>", | ||||||||
"secret_key": "<aws_secret_key>", | ||||||||
"region": "us-east-1", | ||||||||
"provider": "amazontitan", | ||||||||
"model": "amazon.titan-embed-text-v2:0", | ||||||||
} | ||||||||
} | ||||||||
------------------------------------------------------------ | ||||||||
// TEST[skip:TBD] | ||||||||
|
||||||||
The next example shows how to create an {infer} endpoint called `amazon_bedrock_completion` to perform a `completion` task type. | ||||||||
|
||||||||
[source,console] | ||||||||
------------------------------------------------------------ | ||||||||
PUT _inference/completion/amazon_bedrock_completion | ||||||||
{ | ||||||||
"service": "amazonbedrock", | ||||||||
"service_settings": { | ||||||||
"access_key": "<aws_access_key>", | ||||||||
"secret_key": "<aws_secret_key>", | ||||||||
"region": "us-east-1", | ||||||||
"provider": "amazontitan", | ||||||||
"model": "amazon.titan-text-premier-v1:0", | ||||||||
} | ||||||||
} | ||||||||
------------------------------------------------------------ | ||||||||
// TEST[skip:TBD] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.