-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADR 0010 ODH/Caikit/TGIS integration #20
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# Open Data Hub - ODH, Caikit, and TGIS architecture | ||
|
||
|
||
| | | | ||
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------ | | ||
| Date | 2023-Sept-13 | | ||
| Scope | OpenDataHub and Caikit/TGIS integration architecture | | ||
| Status | Accepted | | ||
| Authors | [Sean Pryor](@Xaenalt), | | ||
| Supersedes | N/A | | ||
| Superseded by: | N/A | | ||
| Tickets | | | ||
| Other docs: | https://lucid.app/lucidchart/06fbfa85-ac66-40f7-9e60-1aa1d1ae426b/edit?invitationId=inv_74fb2b71-c771-405e-909c-e813b7d65623 | | ||
|
||
## What | ||
|
||
This ADR describes the architecture of the joint IBM-RedHat integration of ODH and Caikit/TGIS into the AI stack. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we good mentioning company ascription here? It doesn't seem relevant for ODH. |
||
|
||
## Why | ||
|
||
Caikit and TGIS are two parts of the IBM software stack used for training and serving Large Language Models (LLMs). This stack allows ODH to have a stack that specificially addresses LLM use cases. | ||
|
||
## Goals | ||
|
||
* Integration Caikit/TGIS as runtime backends for KServe and CodeFlare/Ray. | ||
* Open sourcing and integration of the Caikit API for clients. | ||
|
||
## Non-Goals | ||
|
||
## How | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How will the Caikit be deployed on a k8s cluster? Will it be additional pods/services that get deployed to the cluster? Will it have it's own controller/operator? Will it have it's own CRDs that it manages? If so what will they do? How will it be integrated with the ODH Operator? Will it be a new component that the DSC will need to deploy? Will it be able to function if a user has only deployed Ray or KServe and not both? What is the relationship between a Caikit CR and the Ray/KServe objects? Will it be like a DSPA where an instance of Caikit will need to be deployed in every Data Science Project? Is there something that a user needs to do to make the Caikit SDK to work with Ray/Kserve or will all of the compatibility be handled on the users end (e.g. Elyra handles 100% of the translation from an "Elyra Pipeline" to a kfp-tekton compatible pipeline in the running notebook so dsp never needs to "understand" Elyra)? Will anything be required by the Ray or KServe stacks to get Caikit to function or will it slot in on top of them as they exist today? Some sort of rough architecture diagram would probably be very helpful here. |
||
|
||
Users will have a few ways to interact with the software stack. Caikit will be used both as a backend software runtime, which is used by the Caikit SDK that users can code against to create their models. These models can be trained in Ray using the Caikit runtime stack as the training backend on the nodes. Caikit will also be integrated as a serving runtime under KServe. All of these components can be interacted with using the standard OpenShift APIs, creating CRs in OpenShift, etc. Additionally, Caikit will also expose an API that can run on the cluster, allowing for several convenience features such as moving a model between training and serving, as well as some tracking. These features will be implemeted in the same manner, creating CRs and calling OpenShift APIs. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This sentence only lists one option. The second option probably got moved into a separate sentence while it was being edited so "both" no longer makes sense here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What CRs? What will they do? |
||
|
||
## Open Questions | ||
|
||
## Alternatives | ||
|
||
* One alternative discussed was to have the Caikit API be the only method to interact with resources on the cluster. However, the downside to this approach is that it would severely limit the utility of Caikit attempting to require the community to use this rather than the familiar APIs of KServe/Ray. In this case, presenting them together allows users to pick and choose how to interact with the software stack, and doesn't lock out any of the important features. | ||
|
||
## Security and Privacy Considerations | ||
|
||
The sidecar approach, having Caikit and TGIS colocated in a pod allows for a narrowing of the security surface. This allows the shared volume to not be a point of security concern. | ||
|
||
## Risks | ||
|
||
## Stakeholder Impacts | ||
|
||
|
||
| Group | Key Contacts | Date | Impacted? | | ||
| ----------------------------------- | ------------------------------- | ------ | ----------- | | ||
| RedHat Model Serving Team | Sean Pryor, JooHo Lee | | Yes | | ||
| RedHat Distributed Workloads Team | Anish Asthana, Antonin Stefanutti | | Yes | | ||
| IBM Caikit Team | Gabe Goodhart, Gaurav Kumbhat | | Yes | | ||
| IBM TGIS Team | Nick Hill | | No | | ||
|
||
References | ||
* https://lucid.app/lucidchart/06fbfa85-ac66-40f7-9e60-1aa1d1ae426b/edit?invitationId=inv_74fb2b71-c771-405e-909c-e813b7d65623 | ||
|
||
## Reviews | ||
|
||
|
||
| Reviewed by | Date | Notes | | ||
| ------------- | ------ | ------- | | ||
| name | date | ? | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this saying accepted already? :-)