Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): add tls to api server #656

Merged
merged 2 commits into from
Jul 17, 2024
Merged

Conversation

HumairAK
Copy link
Contributor

@HumairAK HumairAK commented May 27, 2024

The issue resolved by this Pull Request:

Resolves: https://issues.redhat.com/browse/RHOAIENG-4968

Depends on: opendatahub-io/data-science-pipelines#40

Description of your changes:

Add new component-wide level tls enablement field to DSPA: PodToPodTLS, which is enabled by default and ensures that API server serves over TLS. Also updates PA and kfp UI, as well as OAUTH Server in apiserver pod to be able to recognize the CA Bundle.

One potentially controversial decision here is the choice to have the OAUTH Proxy side car communicate via svc dns name instead of localhost, the reason for this is because the certs used to serve tls for api server are leveraged from openshift service signer, and these only recognize subjects for the service name and not localhost. My assumption here is that it shouldn't matter and kube proxy / network optimizations here do not let the intra pod traffic leave the pod's worker node.

Another thing to note, the PA deployment is updated recognize certs from 2 paths now, and because we use rhel containers, the rhel ssl default cert path is recognized here.

Other changes are to the health check, to allow for polling based on transfer protocol.

Testing instructions

instructions.md
  1. Deploy DSPO from this PR
  2. Deploy DSPA using images built from here https://github.com/opendatahub-io/data-science-pipelines/pull/40/files
  3. Confirm API Server pod logs show TLS serving messages:
"TLS cert key/pair loaded"
...
"Starting RPC server (TLS enabled)"
"Starting Https Proxy "

When podToPodTLS: false these omit tls mentions. Note that when podToPodTLS is not specified in the DSPA we see https and tls grpc enabled by default.

  1. Port-forward the api server, and curl this route, confirm you see self-signed cert warnings
dspaNamespace=dspa # fill this out
dspaName=pr-40       # fill this out
oc port-forward -n ${dspaNamespace} service/ds-pipeline-${dspaName} 8888:8888
curl --location --request GET 'https://localhost:8888/apis/v2beta1/runs' --header "Authorization: Bearer $(oc whoami --show-token)"

Note the https. Attempt to use http, you should get an error response.

curl --location --request GET 'http://localhost:8888/apis/v2beta1/runs' --header "Authorization: Bearer $(oc whoami --show-token)"

You should see "Client sent an HTTP request to an HTTPS server."
This confirms that the api http server container is now behind a tls cert.
Try again by disabling tls verification:

curl --insecure --location --request GET 'https://localhost:8888/apis/v2beta1/runs' --header "Authorization: Bearer $(oc whoami --show-token)"

This should return {} if you have no runs in this DSP.

  1. Provide the signer cert pulled from the configmap kube-root-ca.crt to curl via --ca-cert and try again, attempt to list runs via the route, confirm this works:
cd $(mktemp -d)

# the following command requires yq if you don't have it, just copy this from the kube-root-ca.crt configmap in the dspa namespace
oc -n ${dspaNamespace} get configmap kube-root-ca.crt  -o yaml | yq '.data."ca.crt"' > kube-root-ca.crt 
DS_ROUTE=$(echo https://$(oc get routes -n ${dspaNamespace} ds-pipeline-${dspaName} --template={{.spec.host}}))
curl --cacert kube-root-ca.crt --location --request GET ${DS_ROUTE}/apis/v2beta1/runs --header "Authorization: Bearer $(oc whoami --show-token)"
  1. Inspect the kfp-ui, it should have successfully connected to the server
  2. Verify persistent agent works by starting a run, and confirming the status of the run is updating (e.g. green check mark)
  3. Now try deploying a dspa with podToPodTLS=false, confirm dsp works as before

Note that in the integration tests podToPodTLS is disabled, because we rely on the openshift service signing certs not available in a kind environment

Checklist

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@openshift-ci openshift-ci bot requested review from hbelmiro and rimolive May 27, 2024 21:45
@dsp-developers
Copy link
Contributor

A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656
An OCP cluster where you are logged in as cluster admin is required.

To use this image run the following:

cd $(mktemp -d)
git clone git@github.com:opendatahub-io/data-science-pipelines-operator.git
cd data-science-pipelines-operator/
git fetch origin pull/656/head
git checkout -b pullrequest 70dba66fba5a22f6c2848554c523d84e071dd0e3
oc new-project opendatahub
make deploy IMG="quay.io/opendatahub/data-science-pipelines-operator:pr-656"

More instructions here on how to deploy and test a Data Science Pipelines Application.

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656

@HumairAK HumairAK changed the title Rhoaieng 4968 WIP: Rhoaieng 4968 May 27, 2024
@HumairAK HumairAK changed the title WIP: Rhoaieng 4968 WIP: Feat add tls to api server May 27, 2024
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656

@HumairAK HumairAK changed the title WIP: Feat add tls to api server Feat add tls to api server May 28, 2024
@HumairAK HumairAK changed the title Feat add tls to api server (Feat): add tls to api server May 29, 2024
@HumairAK HumairAK changed the title (Feat): add tls to api server (feat): add tls to api server May 29, 2024
@VaniHaripriya
Copy link
Contributor

Deployed DSPO, created a DSPA using below yaml file.

apiVersion: datasciencepipelinesapplications.opendatahub.io/v1alpha1
kind: DataSciencePipelinesApplication
metadata:
  name: sample
spec:
  dspVersion: v2
  apiServer:
    image: "quay.io/opendatahub/ds-pipelines-api-server:pr-40"
    argoDriverImage: "quay.io/opendatahub/ds-pipelines-driver:pr-40"
    argoLauncherImage: "quay.io/opendatahub/ds-pipelines-launcher:pr-40"
    deploy: true
    enableSamplePipeline: true
  persistenceAgent:
    image: "quay.io/opendatahub/ds-pipelines-persistenceagent:pr-40"
  scheduledWorkflow:
    image: "quay.io/opendatahub/ds-pipelines-scheduledworkflow:pr-40"
  mlmd:  
    deploy: true  # Optional component
    grpc:
      image: "quay.io/opendatahub/mlmd-grpc-server:latest"
    envoy:
      image: "registry.redhat.io/openshift-service-mesh/proxyv2-rhel8:2.3.9-2"
  mlpipelineUI:
    deploy: true  # Optional component 
    image: "quay.io/opendatahub/ds-pipelines-frontend:pr-40"
  objectStorage:
    minio:
      deploy: true
      image: 'quay.io/opendatahub/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance'

created a pipeline run and it failed with the below error:

F0606 14:37:55.090582 17 main.go:79] KFP driver: driver.Container(pipelineName=iris-training-pipeline, runID=5e551b24-e239-459a-9b2c-a647b3e15fc0, task="create-dataset", component="comp-create-dataset", dagExecutionID=2, componentSpec) failed: failure while getting executionCache: failed to list tasks: rpc error: code = Unavailable desc = connection error: desc = "error reading server preface: read tcp 10.128.3.8:49910->172.30.11.199:8887: read: connection reset by peer"
time="2024-06-06T14:37:55.936Z" level=info msg="sub-process exited" argo=true error="<nil>"

Other tests:

$ curl --cacert openshift-service-ca.crt --location --request GET https://localhost:8888/apis/v2beta1/runs --header "Authorization: Bearer $(oc whoami -t)"
curl: (60) SSL: no alternative certificate subject name matches target host name 'localhost'

@HumairAK
Copy link
Contributor Author

HumairAK commented Jun 6, 2024

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656

@VaniHaripriya
Copy link
Contributor

Tested as per the instructions, first 7 tests worked as expected. Pipeline run fails with the attached error.
iris-training-pipeline-ctq6p-2740947217-main.log

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656

@gregsheremeta
Copy link
Contributor

gregsheremeta commented Jul 15, 2024

going to be adding some more changes on how we mount cabundles, as this is needed for driver/launcher pods for when they build client connections to apiserver

don't see this anywhere yet -- just making sure I'm not missing it

edit: it might be ConfigureCABundle, but I'm not seeing how that bundle makes it into driver and launcher

edit2: ok yep I see the calls to ConfigureCABundle for driver and executor in both container.go and dag.go

@HumairAK
Copy link
Contributor Author

don't see this anywhere yet -- just making sure I'm not missing it

sorry didn't follow up, I found that this was in fact not needed so the comment is irrelevant

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656

@gregsheremeta
Copy link
Contributor

/lgtm

I can tag this with approve once @VaniHaripriya verifies it.

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656

@HumairAK
Copy link
Contributor Author

sorry didn't follow up, I found that this was in fact not needed so the comment is irrelevant

I was wrong about being wrong here, I've added a new commit that resolves this issue. To sum it up, we need to add the service-ca cabundles to our custom ca (that gets mounted to apiserver, driver, launcher), we get this from another configmap, relevant code additions are in params here: ffdd9ad#diff-5fd0ead87d572dec71ec0194647f49e295c220aa3798b993d75e3b491f64a17a

the rest of the changes in the commit are additional test cases.

@VaniHaripriya
Copy link
Contributor

Tested both the scenarios (podtopodtls = false and podtopodtls = true) everything works as expected.

@VaniHaripriya VaniHaripriya added the verified For deploying/executing changes and verifyign results. label Jul 17, 2024
* add openshift ingress cabundle to pa/apiserver/ui
* add ui tls enabled kfp server client

Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-656

@gregsheremeta
Copy link
Contributor

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm label Jul 17, 2024
Copy link
Contributor

openshift-ci bot commented Jul 17, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gregsheremeta

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@HumairAK HumairAK merged commit 174be7f into opendatahub-io:main Jul 17, 2024
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm verified For deploying/executing changes and verifyign results.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants