Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: disables features not needed for Istio deployments #96

Merged

Conversation

bartoszmajsak
Copy link

@bartoszmajsak bartoszmajsak commented Apr 17, 2023

Description

In order to handle the creation of Notebooks that should belong to the service mesh we have to ensure this controller does not create unnecessary resources. For this purpose, when opendatahub.io/service-mesh annotation is present and set to true on the Notebook resource it will make sure that:

  • no OAuth Proxy related resources are created
  • no routes are created

It will fail the admission of a notebook pod for the injection of proxy sidecars if annotation opendatahub.io/service-mesh is present and set to true.

I made a few minor adjustments in the test code as I needed to re-use several resources, such as NetworkPolicy.

How Has This Been Tested?

  • make test to run integration tests
  • e2e tests can be executed using the following commands:
export K8S_NAMESPACE=opendatahub
kubectl create ns $K8S_NAMESPACE
make deploy-service-mesh 
make deploy-with-mesh -e IMG=quay.io/maistra-dev/odh-notebook-controller -e TAG=enable_ossm
make e2e-test-service-mesh
make undeploy-with-mesh undeploy-service-mesh

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@openshift-ci
Copy link

openshift-ci bot commented Apr 17, 2023

Hi @bartoszmajsak. Thanks for your PR.

I'm waiting for a opendatahub-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bartoszmajsak bartoszmajsak changed the title feat: disables features conflicting with Istio feat: disables features not need for Istio deployments Apr 17, 2023
@bartoszmajsak bartoszmajsak marked this pull request as ready for review April 19, 2023 12:47
@openshift-ci openshift-ci bot requested review from harshad16 and LaVLaS April 19, 2023 12:47
@bartoszmajsak
Copy link
Author

@harshad16 @LaVLaS could you please take a look when you find some time? Many thanks!

@VaishnaviHire
Copy link
Member

@bartoszmajsak Can we update the e2e tests to capture the changes included in this PR?

@bartoszmajsak
Copy link
Author

@bartoszmajsak Can we update the e2e tests to capture the changes included in this PR?

Thanks for looking at it @VaishnaviHire. I will update e2e tests as well. Can we ok-to-test in the meanwhile so I can see if there's anything else I overlooked?

@VaishnaviHire
Copy link
Member

/ok-to-test

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for testing purposes - not sure if that's the place it should be put. If not I can move it somewhere else and adjust the build process.

maistra.io/gateway-name: odh-gateway
maistra.io/gateway-namespace: odh-notebook-controller-system
spec:
host: opendatahub.apps-crc.testing
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is replaced on the fly when executing make setup-service-mesh target.

protocol: HTTPS
tls:
mode: SIMPLE
credentialName: odh-dashboard-cert
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the overlays here are used mainly for testing purposes I can adjust tests to be using http if desired.

.PHONY: e2e-test
e2e-test: ## Run e2e tests for the controller
go test ./e2e/ -run ^TestE2ENotebookController -v --nb-namespace=${K8S_NAMESPACE} ${E2E_TEST_FLAGS}
e2e-test: e2e-test-oauth ## Run e2e tests for the controller with oauth proxy
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to keep existing target behavior - to run only tests scenarios where oauth-proxy approach is defined. Happy to adjust based on your needs for CI. There's already a script in place to run the service mesh scenario https://github.com/opendatahub-io/kubeflow/pull/96/files#diff-13d60fc682972fcc720de0f06e1826afc91ba40af5d5c8bd987b370d44d0a332

// Verify if the configmap has key ISTIO_GATEWAY with value 'kubeflow/kubeflow-gateway'
require.EqualValuesf(t, "kubeflow/kubeflow-gateway",
configmap.Data["ISTIO_GATEWAY"], "error getting ISTIO_GATEWAY in the configmap")
break
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to suggest here checking only for the config map existence, as its values can differ by scenario. If the config map is misconfigured the test will fail so that should be enough for end-to-end verification in my eyes.

@bartoszmajsak
Copy link
Author

@VaishnaviHire I added e2e tests, please let me know if anything else is pending on my end.

In order to handle creation of Notebooks which should belong to the service mesh we have to ensure this controller does not create unnecessary resources. For this purpose, when opendatahub.io/service-mesh annotation is present on the Notebook resource it will make sure that:

- no OAuth Proxy related resources are created
- no routes are created

It will fail admission of a notebook pod for the injection of proxy sidecars if annotation opendatahub.io/service-mesh is present and set to true as well.

I made few minor adjustements in the test code as I needed to re-use
several resources, such as NetworkPolicy.
One reason being IDEs such as GoLand have troubles recongizing private funcs, but it is mainly due to the fact that it is a purely test code
@VaishnaviHire
Copy link
Member

VaishnaviHire commented Jun 14, 2023

@bartoszmajsak Overall code changes look good to me. On the local test run I am seeing following error when creating Notebook instance

Error creating Notebook resource thoth-minimal-service-mesh-notebook: Internal error occurred: failed calling webhook "notebooks.opendatahub.io": failed to call webhook: Post "https://odh-notebook-controller-webhook-service.opendatahub.svc:443/mutate-notebook-v1?timeout=10s": context deadline exceeded, trying again

Did you come across this while testing?

Note that I am running the tests in the opendatahub namespaces

@bartoszmajsak
Copy link
Author

bartoszmajsak commented Jun 14, 2023

@bartoszmajsak Overall code changes look good to me. On the local test run I am seeing following error when creating Notebook instance

Error creating Notebook resource thoth-minimal-service-mesh-notebook: Internal error occurred: failed calling webhook "notebooks.opendatahub.io": failed to call webhook: Post "https://odh-notebook-controller-webhook-service.opendatahub.svc:443/mutate-notebook-v1?timeout=10s": context deadline exceeded, trying again

Did you come across this while testing?

Note that I am running the tests in the opendatahub namespaces

@VaishnaviHire Leaving aside the fact that Makefile is not propagating the ns correctly between the steps (it's using default odh-notebook-controller-system ns regardless) there might be one related problem. I ran those tests on CRC and I noticed that sometimes there was an issue with istio-cni which was leading to the notebook pod stuck in ContainerCreating state. The solution was to simply restart istio-cni pod - see https://access.redhat.com/solutions/6995845. You can check if you see cni-related events in the openshift console for example.

I will fix the ns propagation.

@bartoszmajsak
Copy link
Author

bartoszmajsak commented Jun 15, 2023

@VaishnaviHire I prepared a similar bash script for ci tests - https://github.com/opendatahub-io/kubeflow/pull/96/files#diff-13d60fc682972fcc720de0f06e1826afc91ba40af5d5c8bd987b370d44d0a332

Should this be incorporated as part of this work?

@bartoszmajsak
Copy link
Author

@VaishnaviHire @harshad16 Could someone please give it another look?

@shalberd
Copy link

shalberd commented Jun 23, 2023

that sometimes there was an issue with istio-cni which was leading to the notebook pod stuck in ContainerCreating state

Yeah, Istio can be a bit finicky :-)

@bartoszmajsak excellent work, and a big requirement for i.e. using an external auth provider like IBMAppID in conjunction with Istio / ServiceMesh, as well as for support for evaluating RBAC via the kubeflow profiles controller medium-term. Psyched to see your contribution. Wielkie dzię and merci vielmal; best regards from Lake Lauerz.

@bartoszmajsak bartoszmajsak changed the title feat: disables features not need for Istio deployments feat: disables features not needed for Istio deployments Jun 23, 2023
@bartoszmajsak
Copy link
Author

@bartoszmajsak excellent work, and a big requirement for i.e. using an external auth provider like IBMAppID in conjunction with Istio / ServiceMesh, as well as for support for evaluating RBAC via the kubeflow profiles controller medium-term. Psyched to see your contribution. Wielkie dzię and merci vielmal; best regards from Lake Lauerz.

Thanks @shalberd! Also for pointing out the profile controller - I will take a closer look to see how we can leverage it. Lake Lauerz is great - I must return there for SUP this summer!

@shalberd
Copy link

shalberd commented Jun 27, 2023

@bartoszmajsak if you ever want to connect with IBM's Sebastian Lehrig @lehrig, he's got heaps of experience making Kubeflow, including RBAC with Kubeflow Profiles controller, work on Openshift (within a PowerPC initiative). @raffaelespazzoli worked in detail on it, being the detective on Openshift

@lehrig
Copy link

lehrig commented Jun 27, 2023

thanks for the flowers @shalberd - kudos particularly goes to Raffaele who did lots of the ground work: https://cloud.redhat.com/blog/operationalizing-kubeflow-in-openshift

Copy link
Member

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.
i had a little trouble installing , when tried to do re-install
however i see that was due to my cluster.

/lgtm
/approve

thanks 👍

@openshift-ci
Copy link

openshift-ci bot commented Jul 5, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: harshad16

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Jul 5, 2023
@openshift-merge-robot openshift-merge-robot merged commit 92739d6 into opendatahub-io:v1.6-branch Jul 5, 2023
@bartoszmajsak bartoszmajsak deleted the enable_ossm branch July 5, 2023 15:36
@bartoszmajsak
Copy link
Author

i had a little trouble installing , when tried to do re-install
however i see that was due to my cluster.

Thanks for taking care of it @harshad16. If you still have some details about the failures, such as logs, I can have a look and think about how this could be improved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants