refactor, chore: refactor charm to use `Deployment` for workload, also bumps training-operator 1.7->1.8 #167

DnPlas · 2024-06-19T17:03:55Z

This PR merges the changes in the KF-5692-1.8-dev-branch into main

pin integration test dependencies, refactor constants in tests (pin integration test dependencies, refactor constants in tests (#155) #164)
refactor: deploy the training-operator with kubernetes resources (refactor: deploy the training-operator with kubernetes resources #161)
chore: bump training-operator v1.7 -> v1.8 (chore: bump training-operator v1.7 -> v1.8 #162)
refactor: apply a workload Service instead of using juju created one (refactor: apply a workload Service instead of using juju created one #173)
tests: skip test_upgrade due to training-operator cannot be upgraded from 1.7/stable to recent version #170 (tests: skip test_upgrade due to #170 #171)
build, tests: bump charmed-kubeflow-chisme 0.4.0 -> 0.4.1 (build, tests: bump charmed-kubeflow-chisme 0.4.0 -> 0.4.1 #172)

Fixes #159

* pin integration test dependencies, refactor constants in tests (#155) Pins dependencies in the integration tests to their corresponding channels for this release. Ref: canonical/bundle-kubeflow#866 Co-authored-by: Andrew Scribner <ca.scribner+1@gmail.com>

* refactor: deploy the training-operator with kubernetes resources This commit refactors the way the training-operator is deployed, as instead of using a sidecar container that runs the workload, we are now applying the Deployment and all the Kubernetes resources required by the training-operator controller to be able to mange training resources. We are introducing this change in preparation for the upcoming 1.8 version, as it introduces the hard dependency on a Kubernetes Secret mounted in a volume for the training-operator workload to start. For more details please refer to #159.

Build charmed-kubeflow-chisme for requirements-integration.txt. Part of charmed-kubeflow-chisme#104

* tests: skip test_upgrade due to #170 #170 is affecting the execution of this test, but since the fix is on juju, there is not much we can do at the moment other than skipping the test. Part of #170

…173) * refactor: apply a workload Service instead of using juju created one To avoid inconsistent behaviours, it is preferrable to apply and use a Service owned by the charm so it can be rendered as needed by the controller.

This commit introduces the following changes: * The charm now renders and applies a ValidatingWebhookConfiguration resource for training-operator CRDs. * The charm will render the Service to also serve on port 9443 for the webhook service. * The oci-image is updated to v1.8 of the training-operator * The training-operator Deployment now has a volume mount for mounting the secret that is used by the cert-controller to generate and rotate certificates for the ValidatingWebhookConfiguration * The training-operator Deployment will now take an argument so the webhook service can use the training-operator workload's Service instead of the default * Updates the examples directory with examples from kubeflow/training-operator v1.8-branch Fixes #159

* feat: relate to dashboard and add documentation link CKF 1.9

NohaIhab

LGTM
I have tested the upgrade path we'll have to go with described in #170 by:

deploy training-operator 1.7/stable
wait until active then remove the charm
redeploy with the channel from this pr
run a training job from the /examples

DnPlas requested a review from a team as a code owner June 19, 2024 17:03

github-actions bot added Libraries: Out of sync labels Jun 19, 2024

DnPlas force-pushed the KF-5692-training-1.8-dev-branch branch 3 times, most recently from 5bb0371 to 32ecee6 Compare June 19, 2024 17:07

DnPlas mentioned this pull request Jun 26, 2024

training-operator cannot be upgraded from 1.7/stable to recent version #170

Open

DnPlas and others added 6 commits July 4, 2024 15:54

build, tests: bump charmed-kubeflow-chisme 0.4.0 -> 0.4.1 (#172)

2b16e45

Build charmed-kubeflow-chisme for requirements-integration.txt. Part of charmed-kubeflow-chisme#104

tests: skip test_upgrade due to #170 (#171)

3e6de53

* tests: skip test_upgrade due to #170 #170 is affecting the execution of this test, but since the fix is on juju, there is not much we can do at the moment other than skipping the test. Part of #170

DnPlas force-pushed the KF-5692-training-1.8-dev-branch branch from 350b34e to 8381b07 Compare July 4, 2024 13:55

DnPlas enabled auto-merge (squash) July 4, 2024 19:51

DnPlas disabled auto-merge July 4, 2024 21:26

DnPlas enabled auto-merge (squash) July 4, 2024 21:27

feat: relate to dashboard and add documentation link CKF 1.9 (#177)

8212eda

* feat: relate to dashboard and add documentation link CKF 1.9

NohaIhab approved these changes Jul 9, 2024

View reviewed changes

DnPlas merged commit d9197c5 into main Jul 9, 2024
7 checks passed

DnPlas deleted the KF-5692-training-1.8-dev-branch branch July 9, 2024 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor, chore: refactor charm to use `Deployment` for workload, also bumps training-operator 1.7->1.8 #167

refactor, chore: refactor charm to use `Deployment` for workload, also bumps training-operator 1.7->1.8 #167

DnPlas commented Jun 19, 2024 •

edited

Loading

NohaIhab left a comment

refactor, chore: refactor charm to use Deployment for workload, also bumps training-operator 1.7->1.8 #167

refactor, chore: refactor charm to use Deployment for workload, also bumps training-operator 1.7->1.8 #167

Conversation

DnPlas commented Jun 19, 2024 • edited Loading

NohaIhab left a comment

Choose a reason for hiding this comment

refactor, chore: refactor charm to use `Deployment` for workload, also bumps training-operator 1.7->1.8 #167

refactor, chore: refactor charm to use `Deployment` for workload, also bumps training-operator 1.7->1.8 #167

DnPlas commented Jun 19, 2024 •

edited

Loading