Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove External Connection Secrets from operator responsibilities #118

Merged
merged 13 commits into from
Nov 8, 2023

Conversation

gmfrasca
Copy link
Member

@gmfrasca gmfrasca commented May 15, 2023

Resolves #151

DSPO should not be responsible for maintaining the connections secrets for an external database or object store. Remove this from the maintained artifacts list.

Description

DSPO should not be responsible for maintaining the connections secrets for an external database or object store. Remove this from the maintained artifacts list.

How Has This Been Tested?

For all tests, assume DSPA is named sample

There are essentially 9 test cases/DSPAs here:

** Default Values Test **

  • both Default MariaDB and Minio options [expected: will create ds-pipline-db-sample and ds-pipeline-s3-sample secrets]

** Object Storage Tests **

  • DSPA with spec.objectStorage.minio.s3CredentialsSecret specified but the secret specified in secretName does not exist yet [expected: DSPO will create this secret and generate random credentials for the accesskey and secretkey]
  • DSPA with spec.objectStorage.minio.s3CredentialsSecret specified and the secret specified in secretName already exists [expected: DSPO will create Minio pod but reuse values found in this secret]
  • DSPA with spec.objectStorage.externalStorage.s3CredentialsSecret specified but the secret specified in secretName does not exist yet [expected: DSPO will not spin up any pods, and will return an error]
  • DSPA with spec.objectStorage.externalStorage.s3CredentialsSecret specified and the secret specified in secretName already exists [expected: DSPO will spin up a DSPA and successfully connect to the External S3 Storage]

** Database Tests **

  • DSPA with spec.database.mariaDB.passwordSecret specified but the secret specified in name does not exist yet [expected: DSPO will create this secret and generate random credentials for the password]
  • DSPA with spec.database.mariaDB.passwordSecret specified and the secret specified in name already exists [expected: DSPO will create MariaDB pod but reuse the password found in this secret]
  • DSPA with spec.database.externalDB.passwordSecret specified but the secret specified in name does not exist yet [expected: DSPO will not spin up any pods, and will return an error]
  • DSPA with spec.database.externalDB.passwordSecret specified and the secret specified in name already exists [expected: DSPO will spin up a DSPA and successfully connect to the External Database]

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@gmfrasca gmfrasca changed the title WIP: Remove External Connection Secrets from operator responsibilities Remove External Connection Secrets from operator responsibilities May 23, 2023
@gmfrasca
Copy link
Member Author

Secrets used for testing:

ds-pipeline-db-sample:

apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: ds-pipeline-db-sample
data:
  password: {{ base64-encoded Database password }}

mlpipeline-minio-artifact:

apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: mlpipeline-minio-artifact
data:
  accesskey: {{ base64-encoded S3 Access Key }}
  host: {{ base64-encoded S3 Endpoint URL }}
  port: {{ base64-encoded S3 Endpoint Port }}
  secretkey: {{ base64-encoded S3 Secret Key }}
  secure: {{ base64-encoded boolean true/false (true if https, false if http) }}

Copy link
Contributor

@HumairAK HumairAK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atm this will not work for db because the deployment for apiserver (and kfp ui) will still get the secret it was previously deploying passed in as env vars, the culprit is here

Also, you may want to run through this code now that we are not deploying our own secret any more, as this may need to reflect that change.

@HumairAK
Copy link
Contributor

HumairAK commented Jun 6, 2023

Similarly for this, you will likely need to update this

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-118

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-118

controllers/dspipeline_params.go Outdated Show resolved Hide resolved
controllers/dspipeline_params.go Show resolved Hide resolved
controllers/dspipeline_params.go Outdated Show resolved Hide resolved
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-118

@HumairAK
Copy link
Contributor

currently this doesn't work for s3 storage because we are not passing in OBJECTSTORECONFIG_CREDENTIALSSECRET to the api server, so I think you'll need to do that. right now it's still expecting the default secret so pipelines are failing with Error: secret "mlpipeline-minio-artifact" not found

opendatahub-io/data-science-pipelines@8d49200#diff-47d6ab6bca869e0559465170316707ceda42f65103bd0d4e93519294305e62d1R50

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-118

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-118

@HumairAK
Copy link
Contributor

HumairAK commented Oct 4, 2023

I have pretty thoroughly tested this and logged the outcomes:
Check marks indicate the expected outcome, unchecks indicate it not meet the outcome.

The default cases I don't really care that much, they seem like more edge cases.


Verification check list:

Standard Test suite (STS):

  • Simple DSPA deployed successfully
  • Run Iris Pipeline Successfully
  • Check S3 storage for artifacts passing/storage
  • Retrieve completed run (db test)

For the DSPA + Secret at the same time deploy cases, the Secrets are listed after the DSPA in the same yaml file.

sample.yaml
apiVersion: datasciencepipelinesapplications.opendatahub.io/v1alpha1
kind: DataSciencePipelinesApplication
metadata:
  name: case3
  namespace: dspa
spec:
  apiServer:
    image: quay.io/opendatahub/ds-pipelines-api-server@sha256:650f196f212664c5d4bd9a293fcd5fd0dca40e1d5c3af67f57b2e69a436070b2
  database:
    mariaDB:
      passwordSecret:
        key: case3dbpsw
        name: case3-dbsecret
  objectStorage:
    minio:
      image: 'quay.io/opendatahub/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance'
  mlpipelineUI:
    image: 'quay.io/opendatahub/odh-ml-pipelines-frontend-container:beta-ui'
---
kind: Secret
apiVersion: v1
metadata:
  name: case3-dbsecret
stringData:
  case3dbpsw: 
type: Opaque

Default Connections

  1. Deploy without any secrets specified
    • Case: Standard Test suite
    • Case: All secrets cleaned up on DSPA delete
  2. Deploy custom DB secret specified, Secret first, then DSPA
    • Case: Standard Test suite
    • Case: DB secret NOT cleaned up on DSPA delete
    • Case: DB Secret is specified in DSPA, but not deployed, should throw error
  3. Deploy custom DB secret specified, DSPA + Secret at the same time
    • Case: Standard Test suite
  4. Deploy custom S3 secret specified, Secret first, then DSPA
    • Constraints: s3 secret name != mlpipeline-minio-artifact
    • Case: Standard Test suite
    • Case: S3 secret NOT cleaned up on DSPA delete
    • Case: S3 Secret is specified in DSPA, but not deployed, should throw error
  5. Deploy custom S3 secret specified, DSPA + Secret at the same time
    • Constraints: s3 secret name != mlpipeline-minio-artifact
    • Case: Standard Test suite
  6. Deploy custom S3 AND default DB secret specified, Secret first, then DSPA
    • Constraints: s3 secret name != mlpipeline-minio-artifact
    • Case: Standard Test suite
    • Case: S3 secrets NOT cleaned up on DSPA delete

External Connections

  1. Deploy using S3/DB external with custom secrets, Secret first, then DSPA

    • Constraints: s3 secret name != mlpipeline-minio-artifact
    • Case: Standard Test suite
    • Case: Secrets NOT cleaned up on DSPA delete
  2. Deploy using S3/DB external with custom secrets, DSPA + Secret at the same time

    • Constraints: s3 secret name != mlpipeline-minio-artifact
    • Case: Standard Test suite
    • Case: Secrets NOT cleaned up on DSPA delete

    Note this case results in the DSPO operator blocking on a thread for a couple of minutes
    It then deploys fine.

    2023-10-04T19:26:19Z	DEBUG	Unable to retrieve secret [dbsecret].
    2023-10-04T19:26:19Z	INFO	Encountered error when parsing CR: [DB Password from secret [dbsecret] for key [customkey] was not successfully retrieved, ensure that the secret with this key exist.]	{"namespace": "dspa", "dspa_name": "case2"}
    2023-10-04T19:26:19Z	DEBUG	DataSciencePipelinesApplication Reconciler called.	{"namespace": "dspa", "dspa_name": "case2"}
    2023-10-04T19:26:19Z	DEBUG	Unable to retrieve secret [dbsecret].
    2023-10-04T19:26:19Z	INFO	Encountered error when parsing CR: [DB Password from secret [dbsecret] for key [customkey] was not successfully retrieved, ensure that the secret with this key exist.]	{"namespace": "dspa", "dspa_name": "case2"}
    2023-10-04T19:28:19Z	DEBUG	DataSciencePipelinesApplication Reconciler called.	{"namespace": "dspa", "dspa_name": "case2"}
    
  3. Deploy using S3/DB external with no secrets

    • Case: Simple DSPA fails to deploy

    Note this does block a thread ~2min, but with a maxconcurrency count > 1, this is fine
    unless n dspas are blocked, and maxconcurrencycount == n

Upgrade testing:

Steps (with default MariaDB):

  • Install ODH (v1.8)
  • Install DSPA Kfdef with ODH-Dashboard & DS Pipelines (main)
  • Create Data Connection + Pipeline Server
  • Upgrade Kfdef to leverage custom secrets change in DSPO
  • Confirm Pipelines run successfully
  • Reboot API Server to confirm connection still works (repull secrets into pod)
  • Confirm Pipelines run successfully
  • Confirm artifact storage works in iris example
  • Confirm Runs are stored in db
  • Deleted old minio artifact secret
  • Restarted API Server pod
  • Confirm Runs still work, artifacts are stored in s3
  • Confirm deleting pipeline server does not delete data connection secret

Steps (with External DB):

  • Install ODH (v1.8)
  • Install DSPA Kfdef with ODH-Dashboard & DS Pipelines (main)
  • Create Data Connection + Pipeline Server
  • Upgrade Kfdef to leverage custom secrets change in DSPO
  • Confirm Pipelines run successfully
  • Reboot API Server to confirm connection still works (repull secrets into pod)
  • Confirm Pipelines run successfully
  • Confirm artifact storage works in iris example
  • Confirm Runs are stored in db
  • Deleted old minio artifact secret
  • Restarted API Server pod
  • Confirm Runs still work, artifacts are stored in s3
  • Confirm deleting pipeline server does not delete data connection secret
  • Confirm deleting pipeline server does not delete db secret (created by dashboard)

Note: When doing this via odh dashboard the secret will be deleted, but it's the dashboard that deletes it not dspo, you can confirm this by only deleting the dspa cr and not via dashboard.

@HumairAK
Copy link
Contributor

HumairAK commented Oct 5, 2023

We want this check:

 Confirm deleting pipeline server does not delete db secret (created by dashboard)

@gmfrasca I did not confirm the above for the MariaDB case. I'm not sure if ODH Dashboard creates any secrets for the MariaDB default deployment, please confirm this and if dashboard indeed uses their own secret, then we need to handle this too:

Case: DB secret NOT cleaned up on DSPA delete
Case: DB Secret is specified in DSPA, but not deployed, should throw error (not crash)

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-118

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-118

@HumairAK
Copy link
Contributor

HumairAK commented Nov 8, 2023

restested all cases, and confirmed upgrades with odh v2 operator, the dspa seamlessly switches to using the secret created via odh dashboard, it does not clean up the mlpipeline-minio-artifact secret, but it is no longer used

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm label Nov 8, 2023
Copy link
Contributor

openshift-ci bot commented Nov 8, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: HumairAK

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Nov 8, 2023
@openshift-merge-bot openshift-merge-bot bot merged commit 40a8930 into opendatahub-io:main Nov 8, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm qe/verify Labels to inform qe on issues to verify.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DSPO should not create it's own secret for object storage / data connection
4 participants