-
Notifications
You must be signed in to change notification settings - Fork 775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a Reusable E2E Kubeflow ML Lifecycle #3728
Changes from all commits
9009d27
4c98022
22d0ca8
7d0af04
fea8b2b
81e1b37
729a9b5
de6207f
266cd8d
c6df45d
b97bd41
17cddd2
de798b7
86db3f9
277b670
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
This file was deleted.
This file was deleted.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only persona shown here is ML Engineer which in my opinion is not correct as Data Preparation can be done by a Data Engineer. Similarly Model Development, Hyperparameter tuning, Model Training can/will be done by data scientist. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I added the persona to highlight explicitly how an ideal user should think about this workflow. Though maybe this could be amended to add more personas. I worry about the clarity though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @franciscojavierarceo these are my thoughts as well as this gets political with who does what as there is no simple answer hence I was wondering if we should get into personas at all or not There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, that make sense. I definitely understand how it can be a rabbit hole. I am generally customer-centric so my goal was really to just elicit the value-prop for people who are quickly thinking "why should I, as someone who builds models, care about kubeflow?" There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the main goal and motivation of this page is to explain the value of Kubeflow ecosystem to our users. |
Large diffs are not rendered by default.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only persona shown here is ML Engineer which in my opinion is not correct as Data Preparation can be done by a Data Engineer. Similarly Model Development, Hyperparameter tuning, Model Training can/will be done by data scientist. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the issue here is the lines are blurred, and there is no prescriptive authority as to how this works. What I would do is call that out. "To scale, you have to specialize," but right now MLOPs (and Kubeflow) are incubating, so the average user wears many hats. If an MLE wants to do data prep or a data engineer or a computer engineer nothing stops them if they aren't leaving other work untouched. Ultimately this is a business and engineering mgmt conversation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 @chasecadet As mentioned in another comment, I've worked at several places where the MLE was responsible for all if this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @chasecadet @franciscojavierarceo the question is not who does what as it is very subjective, the question is that should we get into personas? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah that makes sense. Really I just wanted to provide high level clarity about the value proposition of Kubeflow for MLEs or data scientists or whatever they're called this week. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @vikas-saxena02 @franciscojavierarceo THIS IS GREAT. So here is the philosophical/KF values questions. My biggest power as a solutions architect is saying " My customers commonly do XYZ". So we need to decide are we doing this as a text book style "this is the world we live in" where we need to point to an authority (@andreyvelich and I were discussing "who's ML Lifecycle are we referencing") or do we make this more community and experience based where we say "We commonly see MLEs within the Kubeflow community leverage these tools aligned to what we have defined as the ML lifecycle based on community feedback Etc... Andrey was mentioning the ML lifecycle we are using was sourced from the CNCF white paper by other professionals who worked to define it. That is totally fine but we need to give the lineage of our information, call out when it can be considered subjective, and also flavor what we are defining as something based on what we have seen in and agreed upon our community ( something that is powerful but is not necessarily the be all end all) and how new users can align themselves to it. We can also provide a place to discuss and challenge our ML lifecycle opinions but if we say "we commonly see data engineers using X" then its not necessarily us telling you what to do, but mentioning what we have seen so far and opening the door to new perspectives. This also helps us stay out of peoples scopes if they say "well the KF community said that this is an MLE tool so I didn't use it for data engineering and/or told off my data engineer". We have to be careful when we are being perscriptive because we could be liable and lose credibility as a community. If this is our "current world view open for discussion/growth" we invite discussion and contribution instead of enforcing our world view. Now that being said, we can 1000% defend our view point as we continue to gather data and understand how organizations do MLOPs with KF and not just let anyone reinvent the lifecycle, but still keep the door open in case someone does have something the community can discuss as a view point that makes sense to adopt or call out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I think it's a great idea to call out that in practice the lines end up blurry between DE/MLE/DS for some orgs versus others. I definitely welcome feedback and iteration on this! I think having this guidance is very useful though as it can provide a lot more clarity to the end user involved on why an MLOps team maybe recommending Kubeflow. Andrej and I drafted this based on the CNCF diagram and modified it a little bit but, again, the language around personas across the industry is pretty fuzzy so I think sharing it with an asterisk is very helpful. It would also be valuable to hiring managers/executives that are trying to make staffing decisions but may not have the nuanced view of things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I generally agree with these points @chasecadet, but again it is out of scope of this PR. We can always iterate and improve our architecture page if we agree with the Kubeflow community. |
Large diffs are not rendered by default.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The only persona shpwn here is ML Engineer which in my opinion is not correct as Data Preparation can be done by a Data Engineer. Similarly Model Development, Hyperparameter tuning, Model Training can/will be done by data scientist. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's right, but in different use-cases Data Processing can be done by ML Engineerings. Especially when Spark integrated to the Jupyter Notebooks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand that data preparation is made by data engineers, but considering we need show an e2e flow that covers all kubeflow components and we just brought spark operator to the ecosystem, we should cover data preparation too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@rimolive @andreyvelich I am 100% with you on that and the answer to this depends on the org structure or the MLOps literature one follows. My question really is that from a tool/platform perspective, should we be putting personas on the documentation as a lot of it are grey areas. Also, given SparkOperator is fully onboard with Kubeflow, should we put that in the main architecture diagram or not? I have put this as a comment on the main PR as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From my perspective this is out of scope of this PR. This PR is initial change to the architecture page to make sure our lifecycle diagrams represent up do date version of Kubeflow components. Also, CNCF white paper already has personas explanation which might be useful for orgs who are looking for Kubernetes as primary platform for AI/Ml infra: https://www.cncf.io/wp-content/uploads/2024/03/cloud_native_ai24_031424a-2.pdf There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also suggest splitting the Model Serving box in two i.e. Model Serving and ModelMonitoring/Drift detection as KServe has components to do that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. E.g. Model Monitoring, Drift Detection is part of model serving from my point of view. If we want to split this block, we should say: Online Inference vs Batch Inference, but I am not sure if we need to explain such details. I hope that more detailed diagrams can be showed in the KServe docs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @andreyvelich as a consultant I can vouch that not many people know that kserve has drift detection capabilities and hence m request to put it there. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's right, that is why they should explore individual components docs for it. It is just impossible to show everything in this end-to-end ML lifecycle diagram. |
Large diffs are not rendered by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In line 40, the definition for Data prepartion can be reworded to say that
In the Data Preparation step you ingest/raw data and transfer it to perform feature engineering to extract ML features for the offline feature store, and prepare training data for model development. Usually, this step is associated with data processing tools such as Spark, Dask, Flink, or Ray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by
you ingest/raw data raw data
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry thats was a typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, the idea of this statement is to say that you use Spark to inject raw data and process it.