Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR 0010 ODH/Caikit/TGIS integration #20

Closed
wants to merge 2 commits into from

Conversation

Xaenalt
Copy link
Member

@Xaenalt Xaenalt commented Sep 26, 2023

Overview of the architecture and diagram of the ODH+Caikit+TGIS architecture

Description

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@Xaenalt
Copy link
Member Author

Xaenalt commented Sep 26, 2023

@jwforres @etirelli PTAL

@anishasthana
Copy link
Member

Imo we should have the lucidchart be included in this ADR itself

@Xaenalt
Copy link
Member Author

Xaenalt commented Sep 26, 2023

Imo we should have the lucidchart be included in this ADR itself

It's linked in "other docs" and in the "references" section, is there a way to include it more generally?

Co-authored-by: Anish Asthana <anishasthana1@gmail.com>
@astefanutti
Copy link

astefanutti commented Oct 5, 2023

Imo we should have the lucidchart be included in this ADR itself

It's linked in "other docs" and in the "references" section, is there a way to include it more generally?

I'd recommend exporting the diagram to SVG, commit the SVG file along the ADR, and include the SVG into the ADR markdown file as suggested by @anishasthana.

Also as Lucidchart is being retired at Red Hat, I'd recommend to export the diagram in VSDX format, so it can be imported into other diagramming solutions like draw.io.

| ---------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| Date | 2023-Sept-13 |
| Scope | OpenDataHub and Caikit/TGIS integration architecture |
| Status | Accepted |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this saying accepted already? :-)


## Non-Goals

## How

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the Caikit be deployed on a k8s cluster? Will it be additional pods/services that get deployed to the cluster?

Will it have it's own controller/operator? Will it have it's own CRDs that it manages? If so what will they do?

How will it be integrated with the ODH Operator? Will it be a new component that the DSC will need to deploy?

Will it be able to function if a user has only deployed Ray or KServe and not both?

What is the relationship between a Caikit CR and the Ray/KServe objects? Will it be like a DSPA where an instance of Caikit will need to be deployed in every Data Science Project?

Is there something that a user needs to do to make the Caikit SDK to work with Ray/Kserve or will all of the compatibility be handled on the users end (e.g. Elyra handles 100% of the translation from an "Elyra Pipeline" to a kfp-tekton compatible pipeline in the running notebook so dsp never needs to "understand" Elyra)?

Will anything be required by the Ray or KServe stacks to get Caikit to function or will it slot in on top of them as they exist today?

Some sort of rough architecture diagram would probably be very helpful here.


## How

Users will have a few ways to interact with the software stack. Caikit will be used both as a backend software runtime, which is used by the Caikit SDK that users can code against to create their models. These models can be trained in Ray using the Caikit runtime stack as the training backend on the nodes. Caikit will also be integrated as a serving runtime under KServe. All of these components can be interacted with using the standard OpenShift APIs, creating CRs in OpenShift, etc. Additionally, Caikit will also expose an API that can run on the cluster, allowing for several convenience features such as moving a model between training and serving, as well as some tracking. These features will be implemeted in the same manner, creating CRs and calling OpenShift APIs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caikit will be used both as

This sentence only lists one option. The second option probably got moved into a separate sentence while it was being edited so "both" no longer makes sense here.


## How

Users will have a few ways to interact with the software stack. Caikit will be used both as a backend software runtime, which is used by the Caikit SDK that users can code against to create their models. These models can be trained in Ray using the Caikit runtime stack as the training backend on the nodes. Caikit will also be integrated as a serving runtime under KServe. All of these components can be interacted with using the standard OpenShift APIs, creating CRs in OpenShift, etc. Additionally, Caikit will also expose an API that can run on the cluster, allowing for several convenience features such as moving a model between training and serving, as well as some tracking. These features will be implemeted in the same manner, creating CRs and calling OpenShift APIs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

creating CRs in OpenShift

What CRs? What will they do?


## What

This ADR describes the architecture of the joint IBM-RedHat integration of ODH and Caikit/TGIS into the AI stack.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we good mentioning company ascription here?

It doesn't seem relevant for ODH.

@Xaenalt
Copy link
Member Author

Xaenalt commented Oct 25, 2023

RHOAI + Caikit Architecture v1 0

@etirelli etirelli added the Stale label Jul 9, 2024
Copy link

This PR was closed because it has been stale for 21+7 days with no activity.

@github-actions github-actions bot closed this Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants