Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Inference API] Add Amazon Bedrock support to Inference API #110248

Merged

Conversation

markjhoy
Copy link
Contributor

@markjhoy markjhoy commented Jun 27, 2024

Adds Amazon Bedrock text embeddings and chat completion support for the inference API.

Prerequisites to Model Creation

  • An AWS Account with Amazon Bedrock access
  • A pair of access and secret keys used to access Amazon Bedrock

Inference Model Creation:

PUT _inference/{task_type}/{inference_model_id}
{
  "service": "amazonbedrock",
  "service_settings": {
    "access_key": "{{aws_access_key}}",
    "secret_key": "{{aws_secret_key}}",
    "region": "<<aws_region>>",
    "provider": "<<provider>>",
    "model": "<<model_id>>",
    <<ADDITIONAL SERVICE SETTINGS (see below)>>,
  }
  "task_settings": {
    <<TASK SETTINGS (see below)>>
  }
}

Service Settings

Where provider is one of:

  • for text embeddings:
    • amazontitan
    • cohere
  • for chat completions:
    • amazontitan
    • anthropic
    • ai21labs
    • cohere
    • meta
    • mistral

The model_id should match a model you have access to when using a base model, or the ARN of the model if you are using a custom model based on a base model.

Also, be sure that the model you are using is supported in your region that you specify.

text_embedding tasks for Amazon Bedrock have the following additional, optional settings:

  • dimensions: The output dimensions to use for the inference
  • max_input_tokens: the maximum number of input tokens
  • similarity: the similarity measure to use

completion tasks for Amazon Bedrock do not have any additional service settings.

Task Settings

text_embedding tasks for Amazon Bedrock do not have any settings, and none should be sent.

chat_completion tasks have the following task settings available (all optional):

  • max_new_tokens: the max new tokens to produce
  • temperature: the temperature (0.0 to 1.0) to use
  • top_p: the top P metric to use (0.0 to 1.0)
  • top_k: and alternative to top_p from 0.0 to 1.0 (only available for Anthropic, Cohere, and Mistral)

Inference Model Inference

POST _inference/{task_type}/{inference_model_id}
{
  "input": <single string, or array of strings>
}

Manual Testing Completed:

Tested Embeddings Providers:

  • Amazon Titan
  • Cohere

Tested Chat Completion Providers:

  • Amazon Titan
  • Anthropic
  • AI21 Labs
  • Cohere
  • Meta
  • Mistral

@markjhoy markjhoy force-pushed the markjhoy/add_amazon_bedrock_inference_api branch from 6deb3ba to 705f79d Compare June 29, 2024 15:44
@markjhoy markjhoy force-pushed the markjhoy/add_amazon_bedrock_inference_api branch from 4318dd4 to c39e552 Compare July 2, 2024 00:13
@markjhoy markjhoy added >non-issue :ml Machine learning Team:ML Meta label for the ML team :EnterpriseSearch/Application Enterprise Search Team:Enterprise Search Meta label for Enterprise Search team Team:Search - Inference labels Jul 2, 2024
@markjhoy markjhoy requested review from a team, timgrein and jonathan-buttner July 2, 2024 02:00
@markjhoy markjhoy marked this pull request as ready for review July 2, 2024 02:01
@markjhoy markjhoy requested a review from a team as a code owner July 2, 2024 02:01
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ent-search-eng (Team:Enterprise Search)

@markjhoy markjhoy requested a review from davidkyle July 4, 2024 14:46
Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Epic PR 👏

Copy link
Member

@maxhniebergall maxhniebergall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, great PR! Very well organized and easy to understand. I do have a concern about the embedding task settings.

@@ -0,0 +1 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentionally left empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so -- these license files were brought over from the respository-s3 module pieces which also use the AWS SDK (and is blank there too)

Comment on lines +39 to +42
private final Integer dimensions;
private final Boolean dimensionsSetByUser;
private final Integer maxInputTokens;
private final SimilarityMeasure similarity;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be task settings, since they only apply to the embedding task?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively, I would think so, but all our services (OpenAI, Cohere, etc.) all have these in the service settings - I assume as these are rarely (if ever) changed when performing inference... @jonathan-buttner or @davidkyle - thoughts here? I'll keep them in the service settings for now, but just a note for the future if we ever want to change these to be task settings...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, we will have to think about refactoring this in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be task settings, since they only apply to the embedding task?

Hmm I'm not sure what you mean @maxhniebergall . I believe these settings are only used for embedding tasks.

Do you mean that they should be task settings because they should be modifiable after an inference endpoint is created?

I don't think we'd want a user to be able to change these after setting up an inference endpoint (I suppose max input tokens maybe?). If a user generated text embeddings using a certain number of dimensions and expected similarity I would think it'd break the index if we stored different dimension sizes or potentially a different similarity during an inference request.

Copy link
Member

@maxhniebergall maxhniebergall Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that what the difference between task settings and service settings is? Maybe we should rename them to MutableSettings and ImmutableSettings? I had thought that service settings were for every task in a service, and task settings were for when the settings were specific to a task.

Copy link
Contributor

@jonathan-buttner jonathan-buttner Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that what the difference between task settings and service settings is?

Yep!

Maybe we should rename them to MutableSettings and ImmutableSettings?

That's true, it isn't intuitive that service settings aren't mutable and task settings are. I think the convention we went with was to reflect the field name from body of the request. And we add the task type in the class name to indicate what it's for.

I guess we could have the reverse problem if we named them mutable and immutable as you wouldn't know immediately what they refer too without looking at the class contents. I suppose we could include it in the name too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, these are also serialized as part of the request, so we can't change the naming without a breaking change.

import java.util.List;
import java.util.Map;

public final class AmazonBedrockProviderCapabilities {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Love how all of this info is in one spot

var actionCreator = new AmazonBedrockActionCreator(amazonBedrockSender, this.getServiceComponents(), timeout);
if (model instanceof AmazonBedrockModel baseAmazonBedrockModel) {
var maxBatchSize = getEmbeddingsMaxBatchSize(baseAmazonBedrockModel.provider());
var batchedRequests = new EmbeddingRequestChunker(input, maxBatchSize, EmbeddingRequestChunker.EmbeddingType.FLOAT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do all of the bedrock models use Float embeddings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - both Amazon Titan and Cohere use floats in their output vectors

import java.util.Map;

public final class AmazonBedrockProviderCapabilities {
private static final int DEFAULT_MAX_CHUNK_SIZE = 2048;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this value here instead of in AmazonBedrockConstants?

import static org.elasticsearch.xpack.inference.services.amazonbedrock.AmazonBedrockConstants.TOP_K_FIELD;
import static org.elasticsearch.xpack.inference.services.amazonbedrock.AmazonBedrockConstants.TOP_P_FIELD;

public record AmazonBedrockChatCompletionRequestTaskSettings(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this class and AmazonBedrockChatCompletionTaskSettings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Request task settings are ones that can be overridden during an inference POST request to Elasticsearch... the AmazonBedrockChatCompletionTaskSettings are the default task settings a user can set up when they create the model (PUT)

Copy link
Contributor

@jonathan-buttner jonathan-buttner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work Mark! Thanks for all the refactoring. I left a comment about the dimensions set by user. The others are just nits.

x-pack/plugin/inference/build.gradle Outdated Show resolved Hide resolved
static final String DIMENSIONS_SET_BY_USER = "dimensions_set_by_user";

private final Integer dimensions;
private final Boolean dimensionsSetByUser;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed it but do we or do we plan on sending this to an external provider? I think I only saw that we use it in the updateModelWithEmbeddingDetails() method. If we don't send it to an external provider then the user set dimensions might mess the semantic text field 🤔 . What I mean is, if a users sets it to 5 but the actual stored embeddings is 1000 I'm not sure what happens.

I think my vote would be to remove this if we don't plan on using it in the near future. If we do plan on using it soon, we might want to prevent a user from setting it for now 🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an oversight on my part... for Amazon Titan embeddings - you can pass the dimensions, but Cohere does not have this... I'll fix this up, with some validations on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok cool 👍 yeah don't forget to pass it along in the titan request, or at least I missed where we are doing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sooooo - here's something silly... Amazon Bedrock G1 models don't allow the dimensions parameter, but the version 2 does..:

Titan Embeddings G1 - Text doesn't support the use of inference parameters. The following sections detail the request and response formats and provides a code example.

I think for now, I'll omit the dimensions all together because there's no definitive way we can tell which model the end user is using easily (we could check on the model ID as the string to see if it matches the v2 version, but, if they are using a custom ARN, it'd require a lot more hoops to jump through that I don't think is worth it at this point in time...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do plan on using it soon, we might want to prevent a user from setting it for now

I'll go this route for now as it's probably the safest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now, I'll omit the dimensions all together because there's no definitive way we can tell which model the end user is using easily

👍 sounds good


ValidationException validationException = new ValidationException();

var temperature = extractOptionalDoubleInRange(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I know I previously (in a different PR) asked that we adding functionality to validate the range. I think I've changed my position on that 😅 . To better support provider changes in these fields it probably makes sense to not validate them on our end and let the upstream provide return an error if they're not in a specific range/positive/negative etc.

I know we have this in a few places so you don't need to change it here but maybe going forward we rely more on the upstream provider to do this for us and we try to ensure the returned error message is clear enough to the user to know what needs to be changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is still in the works, I'll see if it's easy enough to change this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now, I'll keep this as-is... providing an invalid value does have an error message available in the exception from the SDK, but it's quite buried ... :

(about 30 more lines above this here...)
...
(RequestExecutorService.java:192)\n\tat org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.http.sender.AmazonBedrockRequestExecutorService.start(AmazonBedrockRequestExecutorService.java:19)\n\tat org.elasticsearch.inference@8.16.0-SNAPSHOT/org.elasticsearch.xpack.inference.external.amazonbedrock.AmazonBedrockRequestSender.lambda$start$0(AmazonBedrockRequestSender.java:89)\n\tat org.elasticsearch.server@8.16.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:917)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)\n\tat java.base/java.lang.Thread.run(Thread.java:1570)\nCaused by: java.util.concurrent.ExecutionException: com.amazonaws.services.bedrockruntime.model.ValidationException: 1 validation error detected: Value '500.0' at 'inferenceConfig.temperature' failed to satisfy constraint: Member must have value less than or equal to 1 (Service: AmazonBedrockRuntime; Status Code: 400; Error Code: ValidationException; Request ID: fe7b8dbd-31f9-4821-a663-d00319d36e89; Proxy: null)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Comment on lines +39 to +42
private final Integer dimensions;
private final Boolean dimensionsSetByUser;
private final Integer maxInputTokens;
private final SimilarityMeasure similarity;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these be task settings, since they only apply to the embedding task?

Hmm I'm not sure what you mean @maxhniebergall . I believe these settings are only used for embedding tasks.

Do you mean that they should be task settings because they should be modifiable after an inference endpoint is created?

I don't think we'd want a user to be able to change these after setting up an inference endpoint (I suppose max input tokens maybe?). If a user generated text embeddings using a certain number of dimensions and expected similarity I would think it'd break the index if we stored different dimension sizes or potentially a different similarity during an inference request.

@markjhoy markjhoy merged commit 52e591d into elastic:main Jul 5, 2024
15 checks passed
@markjhoy markjhoy added v8.15.1 and removed v8.15.0 labels Jul 5, 2024
markjhoy added a commit to markjhoy/elasticsearch that referenced this pull request Jul 5, 2024
…110248)

* Initial commit; setup Gradle; start service

* initial commit

* minor cleanups, builds green; needs tests

* bug fixes; tested working embeddings & completion

* use custom json builder for embeddings request

* Ensure auto-close; fix forbidden API

* start of adding unit tests; abstraction layers

* adding additional tests; cleanups

* add requests unit tests

* all tests created

* fix cohere embeddings response

* fix cohere embeddings response

* fix lint

* better test coverage for secrets; inference client

* update thread-safe syncs; make dims/tokens + int

* add tests for dims and max tokens positive integer

* use requireNonNull;override settings type;cleanups

* use r/w lock for client cache

* remove client reference counting

* update locking in cache; client errors; noop doc

* remove extra block in internalGetOrCreateClient

* remove duplicate dependencies; cleanup

* add fxn to get default embeddings similarity

* use async calls to Amazon Bedrock; cleanups

* use Clock in cache; simplify locking; cleanups

* cleanups around executor; remove some instanceof

* cleanups; use EmbeddingRequestChunker

* move max chunk size to constants

* oof - swapped transport vers w/ master node req

* use XContent instead of Jackson JsonFactory

* remove gradle versions; do not allow dimensions

(cherry picked from commit 52e591d)
@markjhoy
Copy link
Contributor Author

markjhoy commented Jul 5, 2024

💚 All backports created successfully

Status Branch Result
8.15

Questions ?

Please refer to the Backport tool documentation

markjhoy added a commit to markjhoy/elasticsearch that referenced this pull request Jul 5, 2024
…110248)

* Initial commit; setup Gradle; start service

* initial commit

* minor cleanups, builds green; needs tests

* bug fixes; tested working embeddings & completion

* use custom json builder for embeddings request

* Ensure auto-close; fix forbidden API

* start of adding unit tests; abstraction layers

* adding additional tests; cleanups

* add requests unit tests

* all tests created

* fix cohere embeddings response

* fix cohere embeddings response

* fix lint

* better test coverage for secrets; inference client

* update thread-safe syncs; make dims/tokens + int

* add tests for dims and max tokens positive integer

* use requireNonNull;override settings type;cleanups

* use r/w lock for client cache

* remove client reference counting

* update locking in cache; client errors; noop doc

* remove extra block in internalGetOrCreateClient

* remove duplicate dependencies; cleanup

* add fxn to get default embeddings similarity

* use async calls to Amazon Bedrock; cleanups

* use Clock in cache; simplify locking; cleanups

* cleanups around executor; remove some instanceof

* cleanups; use EmbeddingRequestChunker

* move max chunk size to constants

* oof - swapped transport vers w/ master node req

* use XContent instead of Jackson JsonFactory

* remove gradle versions; do not allow dimensions
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.15

@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

The backport operation could not be completed due to the following error:

An unexpected error occurred when attempting to backport this PR.

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 110248

elasticsearchmachine pushed a commit that referenced this pull request Jul 5, 2024
…#110545)

* Initial commit; setup Gradle; start service

* initial commit

* minor cleanups, builds green; needs tests

* bug fixes; tested working embeddings & completion

* use custom json builder for embeddings request

* Ensure auto-close; fix forbidden API

* start of adding unit tests; abstraction layers

* adding additional tests; cleanups

* add requests unit tests

* all tests created

* fix cohere embeddings response

* fix cohere embeddings response

* fix lint

* better test coverage for secrets; inference client

* update thread-safe syncs; make dims/tokens + int

* add tests for dims and max tokens positive integer

* use requireNonNull;override settings type;cleanups

* use r/w lock for client cache

* remove client reference counting

* update locking in cache; client errors; noop doc

* remove extra block in internalGetOrCreateClient

* remove duplicate dependencies; cleanup

* add fxn to get default embeddings similarity

* use async calls to Amazon Bedrock; cleanups

* use Clock in cache; simplify locking; cleanups

* cleanups around executor; remove some instanceof

* cleanups; use EmbeddingRequestChunker

* move max chunk size to constants

* oof - swapped transport vers w/ master node req

* use XContent instead of Jackson JsonFactory

* remove gradle versions; do not allow dimensions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport pending :EnterpriseSearch/Application Enterprise Search :ml Machine learning >non-issue Team:Enterprise Search Meta label for Enterprise Search team Team:ML Meta label for the ML team v8.15.1 v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants