Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate imago image updating, as well as Semantic Versioning #961

Closed
Jose-Matsuda opened this issue Mar 23, 2022 · 5 comments
Closed
Assignees
Labels
size/M 2-3 days

Comments

@Jose-Matsuda
Copy link
Contributor

Jose-Matsuda commented Mar 23, 2022

CREATE ASSOCIATED TASKS

Change our pods created by the notebooks to use imagePullPolicy: Always --> so for when we restart the pods it will pull the latest


Part of #957

Description to change as I gain more information

Read Supply Chain Security / Immutable in Hacking Kubernetes Book --> nothing too related to this task in that chapter
Investigate "imago sort of image updates, semver"
-- This would be a complete overhaul of aaw-kubeflow-containers

https://github.com/philpep/imago

Compare and Contrast this to other approaches (such as using say the configmap as a source of truth)

@Jose-Matsuda Jose-Matsuda added the size/M 2-3 days label Mar 23, 2022
@Jose-Matsuda Jose-Matsuda self-assigned this Mar 23, 2022
@Jose-Matsuda
Copy link
Contributor Author

Jose-Matsuda commented Mar 23, 2022

Information for Imago

https://github.com/philpep/imago (uses https://github.com/containers/image/tree/v5.0.0/docker)


The setup of this seems simple, you provide the credentials to the docker registry (either ACR or Artifactory, but I would assume ACR since it would just be faster probably), and then creating a SA and CJ in the cluster to let it do its thing. Ofc, right now it does not support notebook objects, but perhaps we wanted to look into the approach and see what its like.

Don't think I this is too much different from what we are planning to do for user workloads. They say,
imago looks for Deployments, DaemonSets, StatefulSet and CronJob configuration, get the latest sha256 digest from registry and update containers specifications to set image to the corresponding registry/image@sha256:... notation. It track the original image specification in the imago-config-spec annotation.

This is a tad more complicated than using the configmap as a source of truth. Additionally, we may not always want to update a notebook object to the latest image (say we have an image building in a branch that has other breaking or test changes).

I also would hesitate to use this with our platform or non-workload images as we usually don't want to be the latest due to incompatibilities.

@Jose-Matsuda
Copy link
Contributor Author

Jose-Matsuda commented Mar 24, 2022

My current strategy, before looking into imago and semantic versioning.

This is just for the aaw-kubeflow-containers images, other images are out of scope / contrib we do not control

I was going to make use of the jupyter-web-app-config Configmap as a source of truth for our most "up to date" image because that is also what drives what image the users get by default / if it's something that they want to be proactive with and delete their machine, and then create a new one and have their task not be affected by the image update.

Having said that, if the image on the CM is also vulnerable, we would not reschedule / update the image to the current one anyways because it would be a waste of time. This would also drive us of course, to update our configmap more quickly after CVE fixes have been made (could also potentially have an email being done for this).

I honestly think that this is the simplest and best way, even if we do incorporate semantic versioning as this approach binds together what the user creates by default (which in their eyes should be the most 'up to date' image <-- though this can be solved / lessened if we just do the configmap thing right away. (having said this, I could also probably just use the master tag)

@Jose-Matsuda
Copy link
Contributor Author

Jose-Matsuda commented Mar 25, 2022

Semantic Versioning

https://semver.org/

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards compatible manner, and
PATCH version when you make backwards compatible bug fixes.
Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

Look into how our CI and process would change in github actions and the like

Thoughts

In the Tag Images with Real repository step. We do tag the image with master so going off of Blair's idea in aaw-kubeflow-containers 214 we could rename master to something like edge or whatever is chosen since it does it anyways. (This seems to be only done for those commits to the main branch, it doesn't make tags for other branches (null)

QUESTIONS

  1. Once we increment to a new MAJOR version, will we continue to support the the older one (vulnerability fixes) maybe what like a year EoL? So we move from 1.5.x to 2.0.0 will we go 1.5.x+1? If so will we keep these images in the drop-down?
  2. I guess upon a breaking change, we would also create a new branch for not edge images, maybe like Kubeflow with a 1.X.Y branch

Possible solutions (specifically for semantic versioning)
https://github.com/semantic-release/semantic-release with the use of plugins like https://github.com/eclass/semantic-release-docker or https://github.com/esatterwhite/semantic-release-docker


How does this affect my task?

First thoughts are that we may have to use the azure cli if we want to fully take advantage of this. This is because with the remote repository we cannot place property sets or any extra metadata on the remote repository, we can if the image is cached. For the specific task of notebook security scanning I can't seem to find a good enough reason to implement it, aside from say letting the users know on the UI what specific version they are on (instead of a tag)

How about combining the approaches as in we can still use the configmap as a source of truth, but also keep the semantic tagging? This way users can still see something of significance of them in the image tags and I guess we would keep the previous major release in the dropdown (ie still in the configmap so I can still use that as a source of truth, this as opposed to looking through the acr)?

@Jose-Matsuda Jose-Matsuda changed the title Investigate imago sort of image updates Investigate imago image updating, as well as Semantic Versioning Mar 31, 2022
@Jose-Matsuda
Copy link
Contributor Author

Jose-Matsuda commented Mar 31, 2022

Results of technical elaboration (per Brendan notes, expand on)

Want

V1 (Long-lived tag)

Avoid breaking changes
Don't have to update spawner config very often
Benefits from semver

Imago approach

Auto-rollout bug fixes to all users
Stop supporting old broken (but not insecure images)
Almost never have to patch notebook image tags

Images would be updated as bug fixes become available with the long lived tag, this is where imago would come into play in how it determines if there is an image to update to (even if the tag is the same, the stuff underneath would not).
Note that honestly, with the whole "long-lived" tag, if you try and update and the underlying image is the same, nothing will change (so no real effect to user).
As in
kubectl patch Notebook testpatch --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"k8sc/jupyterlab-cpu:78a9b939"}]' notebook.kubeflow.org/testpatch patched (no change)

Needs!

Dev registry #292
Conda in $HOME, BUT istio chmod hell (slows down, but if we do on a weekend users may not care) #285
Imago works on pods/deployments/statefulsets, doesn't care about CRDs --> so what actually cares here is the actual image on the pod. We don't care about the image on the Notebook or even Statefulset since it doesn't seem to draw on the actual imageid and because the underlying image tag does not change it is fine.

Weekly update? Direcly trigger job for CVE fixes?

@Jose-Matsuda
Copy link
Contributor Author

Closing as we've elaborated, but will update with any other comments.

Other TODOs is to CREATE THE ASSOCIATED TASKS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M 2-3 days
Projects
None yet
Development

No branches or pull requests

1 participant