BentoML - v1.0.22
🍱 BentoML v1.0.22
release has brought a list of well-anticipated updates.
-
Added support for Pydantic 2 for better validate performance.
-
Added support for CUDA 12 versions in builds and containerization.
-
Introduced service lifecycle events allowing adding custom logic
on_deployment
,on_startup
, andon_shutdown
. States can be managed using the contextctx
variable during theon_startup
andon_shutdown
events and during request serving in the API.@svc.on_deployment def on_deployment(): pass @svc.on_startup def on_startup(ctx: bentoml.Context): ctx.state["object_key"] = create_object() @svc.on_shutdown def on_shutdown(ctx: bentoml.Context): cleanup_state(ctx.state["object_key"]) @svc.api def predict(input_data, ctx): object = ctx.state["object_key"] pass
-
Added support for traffic control for both API Server and Runners. Timeout and maximum concurrency can now be configured through configuration.
api_server: traffic: timeout: 10 # API Server request timeout in seconds max_concurrency: 32 # Maximum concurrency requests in the API Server runners: iris: traffic: timeout: 10 # Runner request timeout in seconds max_concurrency: 32 # Maximum concurrency requests in the Runner
-
Improved performance of
bentoml push
performance for large Bentos.
🚀 One more thing, the team is delighted to unveil our latest endeavor, OpenLLM. This innovative project allows you to effortless build with the state-of-the-art open source or fine-tuned Large Language Models.
-
Supports all variants of Flan-T5, Dolly V2, StarCoder, Falcon, StableLM, and ChatGLM out-of-box. Fully customizable with model specific arguments.
openllm start [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]
-
Exposes the familiar BentoML APIs and transforms LLMs seamlessly into Runners.
llm_runner = openllm.Runner("dolly-v2")
-
Builds LLM application into the Bento format that can be deployed to BentoCloud or containerized into OCI images.
openllm build [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]
Our dedicated team is working hard to pioneering more integrations of advanced models for our upcoming releases of OpenLLM. Stay tuned for the unfolding developments.