We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would you have a recommendation on how to most easily set up an API endpoint that can dynamically batch requests (e.g. like vLLM)?
I realise this is probably quite involved, but perhaps you have some suggestions on quickest paths to hack a working solution.
The text was updated successfully, but these errors were encountered:
I too am wondering this and have started looking into making a handler.py for deployment using hugging face inference endpoints
Sorry, something went wrong.
Just created a pull request to add support to vLLM: vllm-project/vllm#4228
No branches or pull requests
Would you have a recommendation on how to most easily set up an API endpoint that can dynamically batch requests (e.g. like vLLM)?
I realise this is probably quite involved, but perhaps you have some suggestions on quickest paths to hack a working solution.
The text was updated successfully, but these errors were encountered: