-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature]: Support for SageMaker-required endpoints #11557
Comments
vllm/openai:v0.7.0 throws |
@parthiban-manick The base images published by the vLLM team don't natively support SageMaker, since SageMaker requires a different environment, such as using port 8080 for serving. The solution here is a different Dockerfile target, with a different entrypoint. You can see the environment differences here. Key note for users is that since SageMaker doesn't allow specifying CLI args, you will instead set environment variables prefixed by Unfortunately this PR simply adds the required endpoint functionality, the entrypoint, and the Dockerfile target. I do not maintain SageMaker-specific images for vLLM, and as of right now neither does AWS or the vLLM team. You will need to build and publish the image yourself to a repository accessible to SageMaker. From memory, the basic steps are:
If I am misunderstanding and you are using it in SageMaker differently, feel free to provide more detail, although you should note I am not an AWS employee, nor am I an expert in SageMaker. |
@nathan-az Thanks a lot for your reply. We will try it out and let you know. |
@nathan-az We successfully built a Docker image that supports SageMaker and deployed it. However, we're encountering issues when trying to set environment variables. While we're able to set key-value pair environment variables through the SageMaker endpoint environment dictionary, we're unable to set positional arguments (like --enable-auto-tool-choice) that aren't key-value pairs. Is this feature under development, or is there a better place to report this issue? Thanks in advance |
@parthiban-manick Unfortunately this is simply a limitation on Sagemaker's side by not allowing explicitly setting CLI args and requiring everything to be done via environment variables (as far as I can tell from the docs). I have no plans to develop this further, I simply lack the time. That said, if you want to contribute it, supporting positional args could likely be done by modifying the custom sagemaker entrypoint to extract them from the environment variables, perhaps by using some reserved keyword for the value, or a different prefix. (This partially depends on whether order matters for vLLM's positional args) |
@nathan-az Sure. First, we will make the changes and test it in our environment |
🚀 The feature, motivation and pitch
This was discussed before and was not supported due to AWS needing to manage the images.
I'm wondering if there is interest in at least including routing sourcecode for the required SageMaker endpoints (
/invocations
and/ping
) to the vLLM source.The main benefit would be the standard openai vLLM image should be automatically compatible with SageMaker endpoints. Currently, interested users have to do so through LMI, or fork vLLM and add these.
If there is interest and support from vLLM maintainers, I'm happy to contribute this to the openai entrypoints:
ping
endpoint rerouting to/health
invocations
endpoint that routes to the expected existing endpoint (or with an additional parameter so the user dictates the target)My understanding is that these are the only two requirements for SageMaker support.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: