From d1086e2d502019a290d148f7579d377bac664ae1 Mon Sep 17 00:00:00 2001 From: Sean Pryor Date: Tue, 26 Sep 2023 14:30:42 -0400 Subject: [PATCH 1/2] ADR 10 ODH/Caikit/TGIS integration --- ODH-ADR-0010-caikit-tgis-architecture.md | 64 ++++++++++++++++++++++++ 1 file changed, 64 insertions(+) create mode 100644 ODH-ADR-0010-caikit-tgis-architecture.md diff --git a/ODH-ADR-0010-caikit-tgis-architecture.md b/ODH-ADR-0010-caikit-tgis-architecture.md new file mode 100644 index 0000000..08978fb --- /dev/null +++ b/ODH-ADR-0010-caikit-tgis-architecture.md @@ -0,0 +1,64 @@ +# Open Data Hub - ODH, Caikit, and TGIS architecture + + +| | | +| ---------------- | ------------------------------------------------------------------------------------------------------------------------------ | +| Date | 2023-Sept-13 | +| Scope | OpenDataHub and Caikit/TGIS integration architecture | +| Status | Accepted | +| Authors | [Sean Pryor](@Xaenalt), | +| Supersedes | N/A | +| Superseded by: | N/A | +| Tickets | | +| Other docs: | https://lucid.app/lucidchart/06fbfa85-ac66-40f7-9e60-1aa1d1ae426b/edit?invitationId=inv_74fb2b71-c771-405e-909c-e813b7d65623 | + +## What + +This ADR describes the architecture of the joint IBM-RedHat integration of ODH and Caikit/TGIS into the AI stack. + +## Why + +Caikit and TGIS are two parts of the IBM software stack used for training and serving Large Language Models (LLMs). This stack allows ODH to have a stack that specificially addresses LLM use cases. + +## Goals + +* Integration Caikit/TGIS as runtime backends for KServe and CodeFlare/Ray. +* Open sourcing and integration of the Caikit API for clients. + +## Non-Goals + +## How + +Users will have a few ways to interact with the software stack. Caikit will be used both as a backend software runtime, which is used by the Caikit SDK that users can code against to create their models. These models can be trained in Ray using the Caikit runtime stack as the training backend on the nodes. Caikit will also be integrated as a serving runtime under KServe. All of these components can be interacted with using the standard OpenShift APIs, creating CRs in OpenShift, etc. Additionally, Caikit will also expose an API that can run on the cluster, allowing for several convenience features such as moving a model between training and serving, as well as some tracking. These features will be implemeted in the same manner, creating CRs and calling OpenShift APIs. + +## Open Questions + +## Alternatives + +* One alternative discussed was to have the Caikit API be the only method to interact with resources on the cluster. However, the downside to this approach is that it would severely limit the utility of Caikit attempting to require the community to use this rather than the familiar APIs of KServe/Ray. In this case, presenting them together allows users to pick and choose how to interact with the software stack, and doesn't lock out any of the important features. + +## Security and Privacy Considerations + +The sidecar approach, having Caikit and TGIS colocated in a pod allows for a narrowing of the security surface. This allows the shared volume to not be a point of security concern. + +## Risks + +## Stakeholder Impacts + + +| Group | Key Contacts | Date | Impacted? | +| ----------------------------------- | ------------------------------- | ------ | ----------- | +| RedHat Model Serving Team | Sean Pryor, JooHo Lee | | Yes | +| RedHat Distributed Workloads Team | Anish Asthana | | Yes | +| IBM Caikit Team | Gabe Goodhart, Gaurav Kumbhat | | Yes | +| IBM TGIS Team | Nick Hill | | No | + +References +* https://lucid.app/lucidchart/06fbfa85-ac66-40f7-9e60-1aa1d1ae426b/edit?invitationId=inv_74fb2b71-c771-405e-909c-e813b7d65623 + +## Reviews + + +| Reviewed by | Date | Notes | +| ------------- | ------ | ------- | +| name | date | ? | From da11454bee9332ef2f0cf7ee90ca4655bebf5604 Mon Sep 17 00:00:00 2001 From: Sean Pryor Date: Tue, 26 Sep 2023 16:40:07 -0400 Subject: [PATCH 2/2] Update ODH-ADR-0010-caikit-tgis-architecture.md Co-authored-by: Anish Asthana --- ODH-ADR-0010-caikit-tgis-architecture.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ODH-ADR-0010-caikit-tgis-architecture.md b/ODH-ADR-0010-caikit-tgis-architecture.md index 08978fb..ec0417d 100644 --- a/ODH-ADR-0010-caikit-tgis-architecture.md +++ b/ODH-ADR-0010-caikit-tgis-architecture.md @@ -49,7 +49,7 @@ The sidecar approach, having Caikit and TGIS colocated in a pod allows for a nar | Group | Key Contacts | Date | Impacted? | | ----------------------------------- | ------------------------------- | ------ | ----------- | | RedHat Model Serving Team | Sean Pryor, JooHo Lee | | Yes | -| RedHat Distributed Workloads Team | Anish Asthana | | Yes | +| RedHat Distributed Workloads Team | Anish Asthana, Antonin Stefanutti | | Yes | | IBM Caikit Team | Gabe Goodhart, Gaurav Kumbhat | | Yes | | IBM TGIS Team | Nick Hill | | No |