Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support for SageMaker-required endpoints #11557

Closed
1 task done
nathan-az opened this issue Dec 27, 2024 · 6 comments · Fixed by #11576
Closed
1 task done

[Feature]: Support for SageMaker-required endpoints #11557

nathan-az opened this issue Dec 27, 2024 · 6 comments · Fixed by #11576

Comments

@nathan-az
Copy link
Contributor

nathan-az commented Dec 27, 2024

🚀 The feature, motivation and pitch

This was discussed before and was not supported due to AWS needing to manage the images.

I'm wondering if there is interest in at least including routing sourcecode for the required SageMaker endpoints (/invocations and /ping) to the vLLM source.

The main benefit would be the standard openai vLLM image should be automatically compatible with SageMaker endpoints. Currently, interested users have to do so through LMI, or fork vLLM and add these.

If there is interest and support from vLLM maintainers, I'm happy to contribute this to the openai entrypoints:

  • a ping endpoint rerouting to /health
  • an invocations endpoint that routes to the expected existing endpoint (or with an additional parameter so the user dictates the target)

My understanding is that these are the only two requirements for SageMaker support.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@parthiban-manick
Copy link

parthiban-manick commented Jan 28, 2025

vllm/openai:v0.7.0 throws api_server.py: error: unrecognized arguments: serve from sagemaker environment. @nathan-az

@nathan-az
Copy link
Contributor Author

nathan-az commented Jan 28, 2025

@parthiban-manick The base images published by the vLLM team don't natively support SageMaker, since SageMaker requires a different environment, such as using port 8080 for serving. The solution here is a different Dockerfile target, with a different entrypoint. You can see the environment differences here.

Key note for users is that since SageMaker doesn't allow specifying CLI args, you will instead set environment variables prefixed by SM_VLLM_ to specify your engine. For example, if you want to set the equivalent of--max-num-seqs 4, in your sagemaker endpoint environment dictionary you specify "SM_VLLM_MAX_NUM_SEQS": "4".

Unfortunately this PR simply adds the required endpoint functionality, the entrypoint, and the Dockerfile target. I do not maintain SageMaker-specific images for vLLM, and as of right now neither does AWS or the vLLM team. You will need to build and publish the image yourself to a repository accessible to SageMaker.

From memory, the basic steps are:

  1. clone the vLLM repo
  2. build the Dockerfile yourself
    i. see the vllm-sagemaker target
    ii. Your command will look something like docker build --target vllm-sagemaker -t vllm-sagemaker .
    iii. You may have to specify some build args like maximising num_jobs and nvcc_threads for your build environment. I also found I had to set RUN_WHEEL_CHECK to false
  3. publish it to a private ECR repository in the same AWS region that you are using SageMaker
  4. use that image URI in sagemaker

If I am misunderstanding and you are using it in SageMaker differently, feel free to provide more detail, although you should note I am not an AWS employee, nor am I an expert in SageMaker.

@parthiban-manick
Copy link

@nathan-az Thanks a lot for your reply. We will try it out and let you know.

@parthiban-manick
Copy link

@nathan-az We successfully built a Docker image that supports SageMaker and deployed it. However, we're encountering issues when trying to set environment variables. While we're able to set key-value pair environment variables through the SageMaker endpoint environment dictionary, we're unable to set positional arguments (like --enable-auto-tool-choice) that aren't key-value pairs. Is this feature under development, or is there a better place to report this issue? Thanks in advance

@nathan-az
Copy link
Contributor Author

nathan-az commented Feb 4, 2025

@parthiban-manick Unfortunately this is simply a limitation on Sagemaker's side by not allowing explicitly setting CLI args and requiring everything to be done via environment variables (as far as I can tell from the docs).

I have no plans to develop this further, I simply lack the time.

That said, if you want to contribute it, supporting positional args could likely be done by modifying the custom sagemaker entrypoint to extract them from the environment variables, perhaps by using some reserved keyword for the value, or a different prefix. (This partially depends on whether order matters for vLLM's positional args)

@parthiban-manick
Copy link

@nathan-az Sure. First, we will make the changes and test it in our environment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants