Yatai 2.0 Proposal #504

parano · 2024-02-26T23:06:02Z

Introduction and background

It has been over 1 year since the initial release of Yatai 1.0 and thank you for all your trust and support for the project!

Recognition and Highlights

Introducing Yatai 1.0: https://bentoml.com/blog/yatai-10-model-deployment-on-kubernetes-made-easy
Yatai and Kubeflow native integration: https://www.kubeflow.org/docs/external-add-ons/serving/bentoml/
MLOps Infrastructure at Mission Lane: https://medium.com/mission-lane-tech-blog/mlops-infrastructure-at-mission-lane-7e780d99496e
ML at scale with Yatai by CJ Express: https://medium.com/cj-express-tech-tildi/deploy-machine-learning-at-scale-with-yatai-2671b56d0c32
MLOps at LINE: https://bentoml.com/blog/mlops-with-bentoml

We've learnt a lot from working with the BentoML developer community and are planning for a major update to Yatai project. In this post, I'd like to discuss our thoughts on the direction we are taking, gathering feedback, and call for OSS contributors to join us in building Yatai version 2.0.

Learnings

Setup complexity: the number 1 complain from Yatai users, is that the K8s setup is way too intrusive from DevOps and operational perspective. Currently Yatai requires multiple stateful components (e.g. a Database) and multiple namespaces (for image building, system components, deployed workloads). Yatai's own user system doesn't work nicely with the RBAC system in Kubernetes. Although many of the setup complexity comes from a relatively small UX gains.
Custom image building process: this is the top requested feature in BentoML slack community (also Deploy a "custom Bentoml image" using "yatai-deployment". #483): can user put a pre-built BentoML docker image’s registry URL in BentoDeployment yaml instead of a bento tag. The yatai image builder is very hard to optimize or debug(e.g. builder OOM Killed #457) without heavy investment on the cloud infrastructure side, making it less useful than its original intention.
Customize the deployment process. Adding custom labels or resource annotation to your deployed containers, e.g. Support labels from values.yaml #486.
Stateful components like Databases are either very expensive (e.g. using RDS) or unreliable/hard to operate (In-cluster instances).
Elastic 2 license is relatively restricting and limit who we can partner to build this project

Goals of Yatai 2.0

The main goal for 2.0, is to focus on our core value proposition: Yatai was built for scaling BentoML deployments on Kubernetes and that's the main reason most teams come to Yatai. We'd like to double down on that single value proposition and making it work extremely well towards to promise of scalable AI deployment. This also means we may reduce features that increases the complexity without contributing to the core benefits.

Simplify setup for DevOps teams

Offer a single “BentoDeployment CRD controller” component via Helm and fully embrace the cloud native design, simplifying both the onboarding and advanced customizations.
Allow custom integration with other cloud-native tools, such as Knative, ArgoCD, Istio, Prometheus/Grafana, Jaeger, Elasticsearch, Loki, etc.
Remove stateful components (RDS, Docker Registry) and replace with support for custom docker registry and S3 based model/bento store.

Features Highlights

Support for Distributed Service deployment mode in BentoML 1.2
Offer optional "playbooks" for additional features such as ingress settings, monitoring setup, deployment dashboard, and model store integrations.
Offer a Kubernetes native workflow that integrates well with the boarder eco-system (e.g. support K8s RBAC Authorization, custom ingress control, and ArgoCD deployment pipeline)
Support for external message queue for long-running inference tasks and async API endpoints

Open Governance.

Moving from Elastic 2 to Apache 2.0 License
Explore partnership opportunities with open source foundations. Contact us!
Call for contributors. Join the #yatai channel, introduce yourself and share your thoughts.

Tentative Timeline and Milestones

March-May, 2024: Gathering CFP feedback and finish initial design draft and
May-August, 2024: Community meeting on project updates
September-October, 2024: Yatai 2.0 Beta Release

Migration to 2.0

Due to the change in scope, we expect Yatai 2.0 to have some incompatible APIs comparing to 1.0. The exact migration plan will need additional design and pending on some of the design decisions. We will provide office hours by the time of 2.0 release, in assisting your team with the migration process.

Call for Contributors

BentoML is a small team supporting many customers and community users. We'd love to get your help in building Yatai 2.0! We will need help with writing code, docs, testing, and early feedback on its design.

To get started, please join the #yatai channel in BentoML slack community, introduce yourself and join our Yatai community meeting.

The text was updated successfully, but these errors were encountered:

phitoduck · 2024-03-16T16:04:13Z

So exciting!

Re: removing stateful components.

Would this include secrets management?

Re: the change in scope will mean breaking changes

Is this primarily referring to

the removal of stateful components
Deprecation of the ImageBuilder workflow where docker images are built from bentos once they are selected for a deployment?

Question: Are additional potential enhancements to Yatai outside the scope of this proposal? E.g.

Async endpoints option
A/B testing of bentos
Shipping of logs and metrics to 3rd party backends e.g. Datadog/NewRelic

Question: would development of Yatai enhancements (such as these) be blocked until the release of 2.0?

Question: Is it possible that Yatai 2.0 could be decoupled from Kubernetes? I.e. run on other orchestrators such as OpenShift, AWS ECS, etc.? (and Kubernetes as well)

hutm · 2024-05-30T23:16:44Z

is there a timeline for 2.0?

parano self-assigned this Feb 26, 2024

parano mentioned this issue Feb 26, 2024

Support BentoML 1.2 in Yatai #505

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yatai 2.0 Proposal #504

Yatai 2.0 Proposal #504

parano commented Feb 26, 2024 •

edited

Loading

phitoduck commented Mar 16, 2024 •

edited

Loading

hutm commented May 30, 2024

Yatai 2.0 Proposal #504

Yatai 2.0 Proposal #504

Comments

parano commented Feb 26, 2024 • edited Loading

Introduction and background

Recognition and Highlights

Learnings

Goals of Yatai 2.0

Simplify setup for DevOps teams

Features Highlights

Open Governance.

Tentative Timeline and Milestones

Migration to 2.0

Call for Contributors

phitoduck commented Mar 16, 2024 • edited Loading

hutm commented May 30, 2024

parano commented Feb 26, 2024 •

edited

Loading

phitoduck commented Mar 16, 2024 •

edited

Loading