Skip to content

BentoML - v1.0.22

Compare
Choose a tag to compare
@ssheng ssheng released this 12 Jun 20:44
· 964 commits to main since this release
89e5fda

🍱 BentoML v1.0.22 release has brought a list of well-anticipated updates.

  • Added support for Pydantic 2 for better validate performance.

  • Added support for CUDA 12 versions in builds and containerization.

  • Introduced service lifecycle events allowing adding custom logic on_deployment, on_startup, and on_shutdown. States can be managed using the context ctx variable during the on_startup and on_shutdown events and during request serving in the API.

    @svc.on_deployment
    def on_deployment():
      pass
    
    @svc.on_startup
    def on_startup(ctx: bentoml.Context):
      ctx.state["object_key"] = create_object()
    
    @svc.on_shutdown
    def on_shutdown(ctx: bentoml.Context):
      cleanup_state(ctx.state["object_key"])
    
    @svc.api
    def predict(input_data, ctx):
      object = ctx.state["object_key"]
      pass
  • Added support for traffic control for both API Server and Runners. Timeout and maximum concurrency can now be configured through configuration.

    api_server:
      traffic:
        timeout: 10 # API Server request timeout in seconds
        max_concurrency: 32 # Maximum concurrency requests in the API Server
    
    runners:
      iris:
        traffic:
          timeout: 10 # Runner request timeout in seconds
          max_concurrency: 32 # Maximum concurrency requests in the Runner
  • Improved performance of bentoml push performance for large Bentos.

🚀 One more thing, the team is delighted to unveil our latest endeavor, OpenLLM. This innovative project allows you to effortless build with the state-of-the-art open source or fine-tuned Large Language Models.

  • Supports all variants of Flan-T5, Dolly V2, StarCoder, Falcon, StableLM, and ChatGLM out-of-box. Fully customizable with model specific arguments.

    openllm start [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]
  • Exposes the familiar BentoML APIs and transforms LLMs seamlessly into Runners.

    llm_runner = openllm.Runner("dolly-v2")
  • Builds LLM application into the Bento format that can be deployed to BentoCloud or containerized into OCI images.

    openllm build [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]

Our dedicated team is working hard to pioneering more integrations of advanced models for our upcoming releases of OpenLLM. Stay tuned for the unfolding developments.