Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHOAIENG-4528 - Customizable kfp-launcher with a config map #630

Closed
wants to merge 1 commit into from

Conversation

amadhusu
Copy link
Contributor

@amadhusu amadhusu commented Apr 19, 2024

The issue resolved by this Pull Request:

Resolves RHOAIENG-4528

Description of your changes:

kfp-launcher is a KFP component that is responsible for fulfilling the "Executor wrapper" and "Publisher" roles as described in the KFP v2 System Design document:

Executor: user container that runs user specified image with command and arguments.
Publisher: publish MLMD metadata read from the executor to MLMD. The publisher is built in a statically compiled go binary and injected as an entrypoint of the executor Pod.

kfp-launcher requires a configmap to exist in the namespace where it runs. This configmap contains pipeline root and object storage configuration. This configmap must be named "kfp-launcher".

We currently deploy a default copy of the kfp-launcher configmap via DSPO, but we want the user to be able to provide their own configmap configuration as well, so that they can specify multiple object storage sources and paths (as described in https://issues.redhat.com/browse/RHOAIENG-4528). This PR adds this functionality.

We don't want to assume anything about the structure of the kfp-launcher configmap (and we know that it will likely change in the future), so this implementation simply copies the data contents of the user-provided configmap into the kfp-launcher configmap.

Testing instructions

  1. Deploy DSPO
  2. Apply the config map mentioned below to the namespace where you will deploy the DSPA.
  3. Deploy the DSPA mentioned below
  4. After all the DSPA Pods are running, verify the kfp-launcher ConfigMap is the same as the config map we applied earlier.
  5. Deploy the config/samples/v2/dspa-simple/ DSPA in another namespace
  6. After all the DSPA Pods are running ,verify the kfp-launcher ConfigMap is created with autogenerated values as was always the case.
  7. Run a Pipeline to verify that the simple-dspa runs as it should

Attachments:

config_map.yaml
kind: ConfigMap
apiVersion: v1
metadata:
  name: custom-config
  labels:
    app: ds-pipeline-test
    component: data-science-pipelines
data:
    s3: |
      defaultPipelineRoot: s3://rhods-dsp-dev endpoint: https://s3.amazonaws.com/ region: us-east-2 secretName: aws-artifact-secret accessKeyKey: accesskey secretKeyKey: secretkey
dspa.yaml
apiVersion: datasciencepipelinesapplications.opendatahub.io/v1alpha1
kind: DataSciencePipelinesApplication
metadata:
  name: sample
spec:
  apiServer:
    customKfpLauncherConfigMap: custom-config
  dspVersion: v2
  objectStorage:
    minio:
      deploy: true
      image: 'quay.io/opendatahub/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance'
  mlpipelineUI:
    image: quay.io/opendatahub/ds-pipelines-frontend:latest

Checklist

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@dsp-developers
Copy link
Contributor

A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-630
An OCP cluster where you are logged in as cluster admin is required.

To use this image run the following:

cd $(mktemp -d)
git clone git@github.com:opendatahub-io/data-science-pipelines-operator.git
cd data-science-pipelines-operator/
git fetch origin pull/630/head
git checkout -b pullrequest 3c04640cefd84fc0e47a6a024783670e2b59881f
oc new-project opendatahub
make deploy IMG="quay.io/opendatahub/data-science-pipelines-operator:pr-630"

More instructions here on how to deploy and test a Data Science Pipelines Application.

@amadhusu amadhusu changed the title WIP: RHOAIENG-4528 - Customizable kfp-launcher with a config map RHOAIENG-4528 - Customizable kfp-launcher with a config map Apr 23, 2024
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-630

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-630

@amadhusu amadhusu requested a review from gmfrasca April 25, 2024 14:17
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-630

Copy link
Contributor

@hbelmiro hbelmiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@HumairAK
Copy link
Contributor

/hold

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-630

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-630

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-630

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-630

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-630

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-630

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-630

Signed-off-by: Achyut Madhusudan <amadhusu@redhat.com>
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-630

@VaniHaripriya
Copy link
Contributor

Tested as per instructions, both the scenarios are working as expected. But the Persistence Agent pod keeps crashing, also not able to create pipeline runs. Attaching the log file.
ds-pipeline-persistenceagent-sample-846cd74f8f-h7kw2-ds-pipeline-persistenceagent.log

Copy link
Contributor

@gregsheremeta gregsheremeta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall approach is correct!

@@ -1,5 +1,9 @@
apiVersion: v1
data:
{{ if .APIServer.CustomKfpLauncherConfig }}
providers: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Context: "pipeline root" is a terrible name, and what it actually means is "the base folder in my object storage where all my stuff will go.")

Hm, looks like the user can only set the fields under providers -- i.e., they can't change defaultPipelineRoot. That seems incorrect to me.

  1. If I'm allowed to override this ConfigMap with my own for the purposes of making the object storage connection exactly what I want, I should absolutely be able to have my pipeline root (base object storage path) set right along side in that same override ConfigMap.

  2. Given the (original -- I suggested a change) desription of the field above, as a user, I would be caught off guard if only a subset of the ConfigMap was replaced. I'm expecting to be able to control everything under data.

Comment on lines +89 to +90
// CustomKfpLauncherConfig is a custom config file that you can provide
// for the api server to use instead of the one provided with DSPO.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// CustomKfpLauncherConfig is a custom config file that you can provide
// for the api server to use instead of the one provided with DSPO.
// Allows the user to fully replace the contents of the kfp-launcher ConfigMap.
// kfp-launcher requires a ConfigMap to exist in the namespace where it runs.
// This ConfigMap contains pipeline root and object storage configuration.
// This ConfigMap must be named "kfp-launcher". We currently deploy a default copy
// of the kfp-launcher ConfigMap via DSPO, but a user may want to provide their own
// ConfigMap configuration, so that they can specify multiple object storage sources
// and paths. The "data" contents of the "kfp-launcher" ConfigMap will be fully replaced
// with the "data" contents of the ConfigMap specified here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry about the spaces. use tabs :)

// If the custom kfp-launcher configmap is not available, that is OK
if !apierrs.IsNotFound(err) {
log.Error(err, fmt.Sprintf("Encountered error when attempting to fetch ConfigMap: [%s], Error: %v", cfg, err))
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's ok, I think we try to proceed with the defaults as a fallback, just like if the user didn't specify any ConfigMap file at all.

So, don't return an error here -- log it and then keep going. If you return, all the rest of the param setting below doesn't happen.

I'd log something like "User specified a CustomKfpLauncherConfig, but it's not found. Falling back to using defaults."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah my suggestion was to throw the error, but I'm fine with the fallback approach as long as we are providing sufficient logs for users to be able to deduce why their configmap was not picked up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha. Yeah I like falling back here :) Thanks!

@gregsheremeta
Copy link
Contributor

gregsheremeta commented Jul 26, 2024

Also, this requires a test to ensure the generated kfp-launcher configmap contains the contents we expect when we override the default one

@gregsheremeta
Copy link
Contributor

moving to #681

/close

@gregsheremeta
Copy link
Contributor

/close

Copy link
Contributor

openshift-ci bot commented Aug 5, 2024

@gregsheremeta: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci bot closed this Aug 5, 2024
@gregsheremeta gregsheremeta reopened this Aug 5, 2024
@gregsheremeta
Copy link
Contributor

/close

@openshift-ci openshift-ci bot closed this Aug 5, 2024
Copy link
Contributor

openshift-ci bot commented Aug 5, 2024

@gregsheremeta: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants