Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vllm serve to wrap vllm.entrypoints.openai.api_server #4167

Closed
wants to merge 5 commits into from

Conversation

simon-mo
Copy link
Collaborator

@simon-mo simon-mo commented Apr 18, 2024

Easier to type. It will be now

(base) xmo@simon-devbox:~/vllm$ vllm serve --help
usage: vllm serve <model_tag> [options]

positional arguments:
  model                 The model tag to serve

options:
  -h, --help            show this help message and exit
  --host HOST           host name
  --port PORT           port number
  --uvicorn-log-level {debug,info,warning,error,critical,trace}
                        log level for uvicorn
  --allow-credentials   allow credentials
  --allowed-origins ALLOWED_ORIGINS
                        allowed origins
  --allowed-methods ALLOWED_METHODS
                        allowed methods
.....

benchmarks/benchmark_serving.py Show resolved Hide resolved
@@ -26,6 +26,8 @@

TIMEOUT_KEEP_ALIVE = 5 # seconds

engine: AsyncLLMEngine = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shoudln't this be optional?


if __name__ == "__main__":
# NOTE(simon):
# This section should be in sync with vllm/scripts.py for CLI entrypoints.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any way to add a simple regression test for this?

usage="vllm serve <model_tag> [options]")
make_arg_parser(serve_parser)
# Override the `--model` optional argument, make it positional.
serve_parser.add_argument("model", type=str, help="The model tag to serve")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's happening if vllm serve --model ?

serve_parser.set_defaults(func=run_server)

args = parser.parse_args()
if hasattr(args, "func"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part of code is confusing. Add a comment to explain what this does?

@@ -95,8 +95,7 @@ template, or the template in string form. Without a chat template, the server wi
and all chat requests will error.

```bash
python -m vllm.entrypoints.openai.api_server \
--model ... \
vllm serve ... \
--chat-template ./path-to-chat-template.jinja
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on #4709, the :prog: value under CLI args (line 111) should be updated to vllm serve.

@sasha0552
Copy link
Contributor

Is there any update on this? Having the command simplifies the installation of vllm to

pipx install vllm

# now you can use the command
vllm serve --help

with the wonderful pipx tool that manages virtual environments automatically. In the current state you cannot use vllm with pipx, pipx only supports exposing commands.

@DarkLight1337
Copy link
Member

DarkLight1337 commented May 16, 2024

Would be nice if #4794 is also made available via CLI (perhaps vllm batch?).

@EthanqX
Copy link
Contributor

EthanqX commented Jun 5, 2024

Please refer to #5090 for the complete new CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants