-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubeflow 1.2 Blog post #455
Comments
@hamelsmu Hey Hamel - might you potentially help with some formatting of this post. some of the images aren't copying in either . Here is the original doc. I just copied and pasted but maybe that was wrong. https://docs.google.com/document/d/1fMAqZ9hv0p22TA6uMM2gpii65pMPiKJAKxSQ6e6RFho/edit?ts=5fb4991b |
@jbottum do you have an open PR with the blog post you are trying to publish? |
@hamelsmu thanks for your reply, not yet, I thought the instructions said to open an issue 1st. Also, I lost all the hyperlinks. so now I open a PR ? |
Yeah someone needs to help you convert it into markdown and open a PR |
@hamelsmu got it converted to .md and pasted in, have some more editing to do and then will open PR, now have hyperlinks but not images. |
I think this issue is resolved since we have 1.2 blog post released last couple of weeks ago. /close |
@PatrickXYS: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Announcing Kubeflow v1.2 release
David’s First Draft: Three years (!!) ago, we (Jeremy Lewi, Vish Kannan and David Aronchick) stood on stage at Kubecon to introduce Kubeflow for the first time. We could not have possibly imagined what would have come about - thousands of GitHub stars, tens of thousands of commits and a community that has built the most flexible and scalable platform for machine learning. And, best of all, it’s not backed by an enormous company that requires you to “upgrade” in order to use it; we gave it all away for free! Here’s to everything you all have done and we could not be more excited about the NEXT three years (and the three years beyond that). Thank you!
The Kubeflow Community’s delivery of the Kubeflow 1.2 software release includes ~100 user requested enhancements to improve model building, training, tuning, ML pipelining and serving. This post includes a Release Highlights Section, which details significant 1.2 features as contributed by the Kubeflow application working groups (WG), SIGs, and ecosystem partners. The Kubeflow 1.2 changelog provides a quick view of the 1.2 deliveries.
The Release was validated, tested and documented by the developers, and the Release is now being validated, tested and documented by users, cloud providers and commercial support partners on popular platforms i.e. AWS, Azure, GCP, IBM, etc. The Community is working on a more sustainable approach to owning and maintaining test infrastructure.
For Release 1.2, AWS has built and contributed a shared test-infra, which provides WG owners with enough permissions to identify problems, and test proposed solutions to completion. Currently, most WGs (AutoML, Training-Operators, KFServing, Deployments, Manifests) have already migrated their tests on this solution. As a result, the test-infra blocking time has fallen significantly, which is good for users and contributors.
Getting Involved
The Community continues to grow and we invite new users and contributors to join the Working Groups and Community Meetings. The following provides some helpful links to those looking to get involved with the Kubeflow Community:
If you have questions and/or run into issues, please leverage the Kubeflow Slack channel and/or submit bugs via Kubeflow on GitHub.
What’s next
The Community has started discussions on Kubeflow 1.3. Arrikto has agreed to lead the 1.3 Release Management process and the Community will continue to capture input from users and contributors as features are defined, developed and delivered. Onward and upward!
Special thanks to Constantinos Venetsanopoulos (Arrikto), Animesh Singh (IBM), Jiaxin Shan (AWS), Yao Xiao (AWS), David Aronchick (Azure), Dan Sun (Bloomberg), Andrey Velichkevich (Cisco), Matthew Wicks (Eliiza), Willem Pienaar (Feast), Yuan Gong (Google), James Wu (Google), Jeremy Lewi (Google), Josh Bottum (Arrikto), Rui Vasconcelos (Canonical), Jeff Fogarty (US Bank) , Karl Shriek (AlexanderThamm), and Clive Cox (Seldon) for their help on 1.2 and this post.
Release Highlights Section
Working Group: AutoML / Katib
Working Group Meeting Notes: Katib Working Group Meeting Notes
Overall benefit: Better model accuracy, Better infrastructure utilization
Overall description: Katib 0.10 with the new v1beta1 API has been released in Kubeflow 1.2. Automated configuration of Hyperparameters to deliver more accuracy models that use less infrastructure, AutoML / Katib simplified the process of finding the optimized set of parameters for your model with Early Stopping techniques. Possibility to orchestrate complex pipeline during Katib Experiment with custom Kubernetes CRD support.
Version:
Feature Name: Early Stopping
Feature Description: Save your cluster resources by using Katib Early Stopping techniques. Allow to use the Median Stopping Rule algorithm.
Feature Benefit: You don’t need to modify your training source code to use the feature! Early Stopping can be used with every Katib algorithm.
Feature Name: Support custom CRD in the new Trial template.
Feature Description: You are able to follow two simple steps to integrate your custom Kubernetes resource in Katib. Flexible way to send your hyperparameters in the new Trial template design, which is a valid YAML.
Feature Benefit: Define Tekton Pipeline in your Katib experiment. You are able to pass hyperparameters even if your model config is a JSON scikit learn Pipeline.
Feature Name: Resume Experiments
Feature Description: Implementation of the various methods to resume Katib Experiments. Save the Experiment’s Suggestion data in the custom volume. Use Katib config to modify your volume settings.
Feature Benefit: Free your cluster resources after your Experiment is finished.
Feature Name: Multiple Ways to Extract Metrics
Feature Description: You can specify metrics strategies for your Experiment. Katib computes the Experiment objective based on these values. You are able to view detailed metric info for each Trial.
Feature Benefit: Get correct optimisation results when your model produces necessary value at the final training step.
Working Group: KFServing
Working Group Meeting Notes: KFServing Working Group Meeting Notes
Overall benefit: KFServing enables serverless inferencing on Kubernetes and provides performant, high abstraction interfaces for common machine learning (ML) frameworks like TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX to solve production model serving use cases.
Overall description: Kubeflow 1.2 includes KFServing v0.4.1, where the focus has been on enabling KFServing on OpenShift and additionally providing more features, such as adding batcher module as sidecar, Triton inference server renaming and integrations, upgrading Alibi explainer to 0.4.0, updating logger to CloudEvents V1 protocol and allowing customized URL paths on data plane. Additionally, the minimum Istio version is now v1.3.1, and KNative version has been moved to KNative 0.14.3. More details can be found here and here
Feature Name: Add batcher module as sidecar #847 @zhangrongguo
Feature Description: KFServer Batcher accepts user requests, batch them and then send to the "InferenceService". Batcher Feature Description
Feature Benefit: Faster response time to inference requests, and Improve infrastructure utilization
Feature Name: Alibi explainer upgrade to 0.4.0 #803 @cliveseldon
Feature Description: The enhancements include a KernelSHAP explainer for black-box model SHAP scores and documentation for the LinearityMeasure algorithm. This delivery includes a new API for explainer and explanation objects, which provide a variety of improvements, but are breaking changes.
Feature Benefit: This delivery improves the ability to understand which features impact model accuracy along with improving operations.
Feature Name/Description : Triton inference server rename and integrations #747 @deadeyegoodwin
Working Group: Pipelines
Working Group Meeting Notes: http://bit.ly/kfp-meeting-notes
Overall benefit: Simplify process of creating a model when you have new data and new code
Overall description: Kubeflow Pipelines is a platform for building and deploying portable, scalable machine learning (ML) workflows based on containers. The Kubeflow Pipelines platform consists of:
The following are the goals of Kubeflow Pipelines:
Kubeflow Pipelines is stabilizing over a few patch releases. At the same time, we made a lot of progress at standardizing the pipeline IR (intermediate representation) which will serve as a unified pipeline definition for different execution engines.
Version:
Feature Name: Kubeflow Pipelines with Tekton backend available
Feature Description: After an extensive effort, we have Kubeflow Pipelines running on Tekton end-to-end and available in open source. Additionally it's available as default with Kubeflow deployment on IBM Cloud, and can be deployed on OpenShift.
Feature Benefit: If you are an existing user of Tekton, or are a fan of Tekton, or running OpenShift Pipelines, get Kubeflow Pipelines running on top of it. More details here
https://developer.ibm.com/blogs/kubeflow-pipelines-with-tekton-and-watson/
>>>>> gd2md-html alert: inline image link here (to images/image1.png). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>>
Feature Name: stabilizing Kubeflow Pipelines 1.0.x
Feature Description: We are stabilizing Kubeflow Pipelines over a few patch releases: Kubeflow Pipelines 1.0.4 Changelog ~20 fixes and ~5 minor features.
Working Group: Notebooks
Working Group Meeting Notes: TBD
Overall benefit: Interactive, experimental coding environment for model development
Overall description: Notebooks provide an advanced, interactive coding environment that users and teams can share and leverage kubernetes namespaces for isolation and resource utilization
Version:
(IN 1.2)
Feature Name: Affinity/Toleration configs #5237
Feature Description: Adds the ability for Kubeflow administrators to set groups of Affinity/Toleration configs which users can pick from a dropdown.
Feature Benefit: Allows more fine-grained selection of how Notebook pods are scheduled.
Merged code features, artifacts available on 1.2.1 or later(NOT IN 1.2)
Feature Name: Refactor Notebook
Feature Description: The details of the refactoring are defined in these deliveries:
Feature Benefit : Refactoring will enable an easier future integration with other web apps - Tensorboard, volume manager.
(NOT IN 1.2)
Feature Name: Stop and Restart Notebooks while maintaining state, #4857
Feature Description: Implementation of a "shut down server" button in the central dashboard that scales the stateful set for the server down to zero and a "start server" button that scales it back up again. Note - this feature will be supported in 1.2.1 or later.
Feature Benefit: Save work, save infrastructure resources
Working Group: Training-Operators
Working Group Meeting Notes: TBD
Overall benefit: Faster model development using operators that simplify distributed computing
Feature Name: The Training Operator contributors provided the following fixes and improvements in Kubeflow 1.2:
SIG: Model Management
Overall benefit: The ability to find model versions and their subcomponents including metadata
SIG Meeting Notes: Model Management SIG Meeting Notes
Overall description:
The SIG was initiated to define and develop a Kubeflow solution for model management, which will make it easier to organize and find models and their artifacts. In addition, several contributors are submitting proposals on how to define data types for ML model and data, with the goal of driving wider metadata standards, and interoperability of models between ML platforms, clouds, and frameworks. The proposals are working to define an ontology for model and data types and tooling to search and organize that metadata.
Proposals from Kubeflow Pipelines contributors, the Model Management SIG, Seldon and a MLSpec from David Aronchick (Azure) are under discussion. Please find links to those proposals below:
EcoSystem: Seldon
Overall benefit: Deploy, Scale, Update models built with Kubeflow.
Overall description: Seldon handles scaling of production machine learning models and provides advanced machine learning capabilities out of the box including Advanced Metrics, Request Logging, Explainers, Outlier Detectors, A/B Tests, and Canaries.
Kubeflow 1.2 comes with Seldon’s 1.4 release. This release of Seldon adds further capabilities for model deployment and inferencing including the addition of batch and streaming interfaces to a deployed model. It also allows for fine grained control of how a deployed model interfaces with Kubernetes with the addition of KEDA and Pod Disruption Budget options. Finally, it begins a process of compatibility with KFServing by allowing the usage of the V2 Dataplane supported by Seldon, KFServing and NVIDIA Triton.
Version: 1.4.0
Feature Name: Stream and Batch support
Feature Description: Streaming support for native Kafka integration. Batch prediction support from and to cloud storage.
Feature Benefit: Allows Seldon users to interact with their models via RPC, Streaming or Batch as needed.
Feature Name: Extended kubernetes control via KEDA and PDBs
Feature Description: Allows fine grained control of deployed models via autoscaling with KEDA metrics and addition of pod disruption budgets.
Feature Benefit: Manage models at scale in a production cluster.
Feature Name: Alpha V2 Dataplane.
Feature Description: Run custom python models using an updated python server along with support for the V2 Dataplane (NVIDIA Triton, KFServing, Seldon)
Feature Benefit: Utilize a standard powerful protocol that is supported cross project.
EcoSystem: Kale
Overall benefit: Kubeflow Workflow tool that simplifies ML pipeline building and versioning directly from a Notebook or IDE i.e. VSCode
Kale GitHub repo: https://github.com/kubeflow-kale/kale
Kale Tutorials: https://www.arrikto.com/tutorials/
Overall description: Kale lets you deploy Jupyter Notebooks that run on your laptop or on the cloud to Kubeflow Pipelines, without requiring any of the Kubeflow SDK boilerplate. You can define pipelines just by annotating Notebook’s code cells and clicking a deployment button in the Jupyter UI. Kale will take care of converting the Notebook to a valid Kubeflow Pipelines deployment, taking care of resolving data dependencies and managing the pipeline’s lifecycle
Version:
Feature Name: Dog Breed Classification example
Feature Description: Tutorial for simplified pipeline to build a model for Image Classification
Feature Benefit: Faster understanding of ML workflows to deliver models with hyperparameter tuning
Feature Name: Katib integration with Kale
Feature Description: Automated hyperparameter tuning and reproducible katib trials using pipelines
Feature Benefit: Better model accuracy and easy reproducibility and debugging
Feature Name: Pipeline Step Caching for Katib Trials using Kale’s integration with Rok
Feature Description: Kale recognizes when a pipeline step has been run before and fetches complete results from Rok and inserts into pipeline processing
Feature Benefit: Faster hyperparameter tuning, reduced infrastructure utilization
EcoSystem: Feast
Overall benefit: Feast allows teams to register, ingest, serve, and monitor machine learning features in production.
Working Group Meeting Notes: https://tinyurl.com/kf-feast-sig
Overall description: The latest release of Feast was a concerted effort by the Feast community to make Feast available in more environments than Google Cloud. We’ve removed all hard couplings to managed services and made it possible to run Feast both on AWS and locally.
Version: Feast 0.8
Feature Name: Support for AWS
Feature Description: Feast 0.8 now comes with support for deployment on AWS, with native support for job management on EMR, and support for both S3 and Kinesis as data sources.
Feature Benefit: Finally makes it possible for Kubeflow users on AWS to run Feast
Feature Name: Batch-only ingestion
Feature Description: Allows teams to ingest data into stores without passing the data through a stream.
Feature Benefit: Allows for a more performant ingestion compared to the stream-first approach.
Feature Name: Local-only mode
Feature Description: Makes it possible to run Feast without any external infrastructure, using only Docker Compose or Minikube
Feature Benefit: Lowers the barrier to entry for new users, and makes it easier to test and develop Feast
EcoSystem: On-Prem SIG
Description: The on-prem SIG was officially created during this release with the intent to develop best practices for Kubeflow deployment in on-prem installations. With the new release, the SIG has also secured testing infrastructure in order to provide a well-tested reference architecture.
SIG Meeting Notes: https://bit.ly/2LyTh14
Slack channel: https://kubeflow.slack.com/archives/C01C9NPD15H
Platform: AWS
Description: Better reliability, better testing coverage by enabling E2E tests for Kubeflow AWS deployment, better Kubeflow notebook user experience.
Platform: IBM
Description: Pipelines and Security have been the key focus for Kubeflow on IBM Cloud for this release. On the Pipelines side, Kubeflow Pipelines with Tekton is available for deployment on IBM Cloud Kubernetes Service and is included by default with Kubeflow deployment on IBM Cloud. On the security side, we have enabled integration with IBM Cloud AppId as an authentication provider instead of Dex. When using AppID, it delegates the identity provider to IBM Cloud with builtin identity providers (Cloud Directory, SAML, social log-in with Google or Facebook etc.) or custom providers. Additionally for securing the Kubeflow authentication with HTTPS we have provided integration instructions using the IBM Cloud Network Load Balancer.
Platform: GCP
Description: Better UX and reliability for installation and upgrade. Upgrade Cloud Config Connector in management cluster to latest.
Platform: Azure
Description:
Platform: OpenShift
Description: Our focus for this release was to create the OpenShift stack that can install Kubeflow components on OpenShift 4.x . We architected the stack so users can pick and choose components they would like to install by adding or removing kustomizeConfig entries in the kfdef. Components currently supported are istio, single user pipeline, Jupyter notebooks with a custom Tensorflow notebook image, profile controller with custom image, Katib, pytorch and Tensorflow job operatorss and Seldon. You can install Kubeflow 1.2 on Openshift from the Open Data Hub community operator in OpenShift Catalog using the OpenShift kfdef.
Platform: MicroK8s
Description: Kubeflow is a built-in add-on to MicroK8s, and now includes Istio v1.5 as default.
Platform: MiniKF
Description: MiniKF is currently testing with Kubeflow 1.2 and will provide an updated MiniKF version based after validation testing and documentation has completed. Please find more information on MiniKF here: https://www.arrikto.com/get-started/ . You can also find tutorials that will guide you through end-to-end data science examples here: https://www.arrikto.com/tutorials
The text was updated successfully, but these errors were encountered: