-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference API] Add Docs for Amazon Bedrock Support for the Inference API #110594
[Inference API] Add Docs for Amazon Bedrock Support for the Inference API #110594
Conversation
Documentation preview: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just highlighting some Azure AI Studio references (Sorry, I just saw that this is draft, but leaving the comments here, so we don't forget it :) )
Creates an {infer} endpoint to perform an {infer} task with the `amazonbedrock` service. | ||
|
||
[discrete] | ||
[[infer-service-azure-ai-studio-api-request]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[[infer-service-azure-ai-studio-api-request]] | |
[[infer-service-amazon-bedrock-api-request]] |
`PUT /_inference/<task_type>/<inference_id>` | ||
|
||
[discrete] | ||
[[infer-service-azure-ai-studio-api-path-params]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[[infer-service-azure-ai-studio-api-path-params]] | |
[[infer-service-amazon-bedrock-path-params]] |
|
||
`rate_limit`::: | ||
(Optional, object) | ||
By default, the `azureaistudio` service sets the number of requests allowed per minute to `240`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, the `azureaistudio` service sets the number of requests allowed per minute to `240`. | |
By default, the `amazonbedrock` service sets the number of requests allowed per minute to `240`. |
`rate_limit`::: | ||
(Optional, object) | ||
By default, the `azureaistudio` service sets the number of requests allowed per minute to `240`. | ||
This helps to minimize the number of rate limit errors returned from Azure AI Studio. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helps to minimize the number of rate limit errors returned from Azure AI Studio. | |
This helps to minimize the number of rate limit errors returned from Amazon Bedrock. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh - great catches ;) That's what I get for copy / pasting
@elasticmachine run docs build |
Pinging @elastic/es-docs (Team:Docs) |
+ | ||
.`task_settings` for the `text_embedding` task type | ||
[%collapsible%closed] | ||
===== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markjhoy I think this unclosed ====
block might be breaking your build :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah thanks! I could not figure out for the life of me where that error was coming from!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. Found a few minor errors and suggested some rephrasings. Also hopefully identified the formatting issues that's failing docs build. Once these updates are made and we can preview the formatting for tabs, will be ready for final review!
A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability. | ||
Should not be used if `temperature` or `top_k` is specified. | ||
|
||
`top_p`::: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`top_p`::: | |
`top_k`::: |
Assuming the first top_p is the correct one 😉
|
||
`max_new_tokens`::: | ||
(Optional, integer) | ||
Provides a hint for the maximum number of output tokens to be generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provides a hint for the maximum number of output tokens to be generated. | |
Sets a maximum number for the output tokens to be generated. |
Not sure what "hint" means here, rewording tries to clarify
|
||
`temperature`::: | ||
(Optional, float) | ||
A number in the range of 0.0 to 1.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A number in the range of 0.0 to 1.0 that specifies the sampling temperature to use that controls the apparent creativity of generated completions. | |
A number between 0.0 and 1.0 that controls the apparent creativity of the results. At temperature 0.0 the model is most deterministic, at temperature 1.0 most random. |
|
||
`top_p`::: | ||
(Optional, float) | ||
A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability. | |
Alternative to `temperature`. A number in the range of 0.0 to 1.0, to eliminate low-probability tokens. Top-p uses nucleus sampling to select top tokens whose sum of likelihoods does not exceed a certain value, ensuring both variety and coherence. |
`top_p`::: | ||
(Optional, float) | ||
A number in the range of 0.0 to 1.0 that is an alternative value to temperature that causes the model to consider the results of the tokens with nucleus sampling probability. | ||
Should not be used if `temperature` or `top_k` is specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading around it looks like top-p and top-k can be used in combination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - you're correct here... theoretically, you can use all three, but you shouldn't use temperature
and top_p
at the same time. For reference, see the parameters in Amazon's Anthropic docs
`top_p`::: | ||
(Optional, float) | ||
Only available for `anthropic`, `cohere`, and `mistral` providers. | ||
A number in the range of 0.0 to 1.0 that is an alternative value to temperature or top_p that causes the model to consider the results of the tokens with nucleus sampling probability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A number in the range of 0.0 to 1.0 that is an alternative value to temperature or top_p that causes the model to consider the results of the tokens with nucleus sampling probability. | |
Alternative to `temperature`. Limits samples to the top-K most likely words, balancing coherence and variability. | |
A number in the range of 0.0 to 1.0. |
|
||
The following example shows how to create an {infer} endpoint called `amazon_bedrock_embeddings` to perform a `text_embedding` task type. | ||
|
||
The list of chat completion and embeddings models that you can choose from should be a https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base model] you have access to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list of chat completion and embeddings models that you can choose from should be a https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base model] you have access to. | |
Choose chat completion and embeddings models you have access to from the https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html[Amazon Bedrock base models]. |
nit: keep sentence short
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from writing perspective!
… API (elastic#110594) * Add Amazon Bedrock Inference API to docs * fix example errors * update semantic search tutorial; add changelog * fix typo * fix error; accept suggestions
💚 Backport successful
|
Add docs in support of Amazon Bedrock support in the Inference API: #110248