[Feature]: Support for SageMaker-required endpoints #11557

nathan-az · 2024-12-27T07:27:05Z

🚀 The feature, motivation and pitch

This was discussed before and was not supported due to AWS needing to manage the images.

I'm wondering if there is interest in at least including routing sourcecode for the required SageMaker endpoints (/invocations and /ping) to the vLLM source.

The main benefit would be the standard openai vLLM image should be automatically compatible with SageMaker endpoints. Currently, interested users have to do so through LMI, or fork vLLM and add these.

If there is interest and support from vLLM maintainers, I'm happy to contribute this to the openai entrypoints:

a ping endpoint rerouting to /health
an invocations endpoint that routes to the expected existing endpoint (or with an additional parameter so the user dictates the target)

My understanding is that these are the only two requirements for SageMaker support.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

The text was updated successfully, but these errors were encountered:

parthiban-manick · 2025-01-28T08:26:01Z

vllm/openai:v0.7.0 throws api_server.py: error: unrecognized arguments: serve from sagemaker environment. @nathan-az

nathan-az · 2025-01-28T10:48:46Z

@parthiban-manick The base images published by the vLLM team don't natively support SageMaker, since SageMaker requires a different environment, such as using port 8080 for serving. The solution here is a different Dockerfile target, with a different entrypoint. You can see the environment differences here.

Key note for users is that since SageMaker doesn't allow specifying CLI args, you will instead set environment variables prefixed by SM_VLLM_ to specify your engine. For example, if you want to set the equivalent of--max-num-seqs 4, in your sagemaker endpoint environment dictionary you specify "SM_VLLM_MAX_NUM_SEQS": "4".

Unfortunately this PR simply adds the required endpoint functionality, the entrypoint, and the Dockerfile target. I do not maintain SageMaker-specific images for vLLM, and as of right now neither does AWS or the vLLM team. You will need to build and publish the image yourself to a repository accessible to SageMaker.

From memory, the basic steps are:

clone the vLLM repo
build the Dockerfile yourself
i. see the vllm-sagemaker target
ii. Your command will look something like docker build --target vllm-sagemaker -t vllm-sagemaker .
iii. You may have to specify some build args like maximising num_jobs and nvcc_threads for your build environment. I also found I had to set RUN_WHEEL_CHECK to false
publish it to a private ECR repository in the same AWS region that you are using SageMaker
use that image URI in sagemaker

If I am misunderstanding and you are using it in SageMaker differently, feel free to provide more detail, although you should note I am not an AWS employee, nor am I an expert in SageMaker.

parthiban-manick · 2025-01-28T11:01:00Z

@nathan-az Thanks a lot for your reply. We will try it out and let you know.

parthiban-manick · 2025-02-04T03:20:49Z

@nathan-az We successfully built a Docker image that supports SageMaker and deployed it. However, we're encountering issues when trying to set environment variables. While we're able to set key-value pair environment variables through the SageMaker endpoint environment dictionary, we're unable to set positional arguments (like --enable-auto-tool-choice) that aren't key-value pairs. Is this feature under development, or is there a better place to report this issue? Thanks in advance

nathan-az · 2025-02-04T03:48:07Z

@parthiban-manick Unfortunately this is simply a limitation on Sagemaker's side by not allowing explicitly setting CLI args and requiring everything to be done via environment variables (as far as I can tell from the docs).

I have no plans to develop this further, I simply lack the time.

That said, if you want to contribute it, supporting positional args could likely be done by modifying the custom sagemaker entrypoint to extract them from the environment variables, perhaps by using some reserved keyword for the value, or a different prefix. (This partially depends on whether order matters for vLLM's positional args)

parthiban-manick · 2025-02-04T05:42:59Z

@nathan-az Sure. First, we will make the changes and test it in our environment

nathan-az added the feature request label Dec 27, 2024

This was referenced Dec 28, 2024

[Misc] Minimum requirements for SageMaker compatibility #11575

Closed

[Misc] Minimum requirements for SageMaker compatibility #11576

Merged

simon-mo closed this as completed in #11576 Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support for SageMaker-required endpoints #11557

[Feature]: Support for SageMaker-required endpoints #11557

nathan-az commented Dec 27, 2024 •

edited

Loading

parthiban-manick commented Jan 28, 2025 •

edited

Loading

nathan-az commented Jan 28, 2025 •

edited

Loading

parthiban-manick commented Jan 28, 2025

parthiban-manick commented Feb 4, 2025

nathan-az commented Feb 4, 2025 •

edited

Loading

parthiban-manick commented Feb 4, 2025

[Feature]: Support for SageMaker-required endpoints #11557

[Feature]: Support for SageMaker-required endpoints #11557

Comments

nathan-az commented Dec 27, 2024 • edited Loading

🚀 The feature, motivation and pitch

Before submitting a new issue...

parthiban-manick commented Jan 28, 2025 • edited Loading

nathan-az commented Jan 28, 2025 • edited Loading

parthiban-manick commented Jan 28, 2025

parthiban-manick commented Feb 4, 2025

nathan-az commented Feb 4, 2025 • edited Loading

parthiban-manick commented Feb 4, 2025

nathan-az commented Dec 27, 2024 •

edited

Loading

parthiban-manick commented Jan 28, 2025 •

edited

Loading

nathan-az commented Jan 28, 2025 •

edited

Loading

nathan-az commented Feb 4, 2025 •

edited

Loading