-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Continuous build of docker images and updating kustomize manifests #450
Comments
GCB now has direct integration via GitHub App Triggers So if we install that GitHub App in our project then we can trigger GCB builds in response to PRs. The GCB build could then create K8s resources. This is very similar to our prow infra works today. We use Prow to trigger Prow jobs which run run_workflow_e2e.py which in turn submits a bunch of Argo workflows based on prow_config.yaml. We could do something similar but use GCB to invoke run_e2e_workflow.py |
With kubeflow/kubeflow#4029 we have a pretty good POC for CD of the jupyter web app image.
|
I have created the CI/CD for Kubeflow Applications Card in the Engprod Project to track this. |
@jlewi thanks. I should be able to get to do some work on this in the next few days |
Design doc is here: bit.ly/kfcd It looks like its a bit outdated. It would be good to update it and then socialize our thinking at the community meeting. |
@jlewi I don't know if it's just me but that design doc link redirects to http://www.thelaptop-computers.info/2009/11/watauga-county-sheriff%E2%80%99s-office-arrests-two-suspected-burglars-go-blue-ridge/ which is not relevant. |
…cluster * Get rid of the PVC used to pass the image digest file between the build and update manifests step * Creating a PVC just creates operational complexity * We combine the build and update manifests step into one task. We can then use /workspace (a pod volume) to pass data like the image digest file between the steps * Update pipelineRun to work with version 0.9 of Tekton * Field serviceAccount has been renamed serviceAccountName * TaskRun no longer supports outputImageDir so we remove it; we will have to use Tekton to pass the image digest file * Remove the namespace.yaml and secrets.yaml from the kustomize package * The secrets should be created out of band and not checked in * So the behavior should be to deploy the kustomize package in a namespace that already exists with the appropriate secrets * Checking in secrets is confusing * If we check in dummy secrets then users will get confused about whether the secrets are valid or not * Furthermore, the file secrets.yaml is an invitation to end up checking the secrets into source control. * Configure some values to use gcr.io/kubeflow-images-public * Disable ISTIO sidecar in the pipelines * For kaniko we don't need the secret to be named a certain way we just need to set GOOGLE_APPLICATION_CREDENTIALS to point to the correct value * We change kaniko to use the user-gcp-sa secret that Kubeflow creates * We shouldn't need an image pull secret since kubeflow-images-public is public * GOOGLE_APPLICATION_CREDENTIALS should be used for pushing images * Change the name of the secret containing ssh credentials for kubeflow-bot to kubeflow-bot-github-ssh * rebuild-manifests.sh should use /workspace to get the image digest rather than the PVC. * Simplify rebuild-manifests.ssh * Tekton will mount the .ssh information in /tekton/home/.ssh so we just need to create a symbolic link to /root/.ssh * The image digest file should be fetched from /workspace and not some PVC. * Set GITHUB_TOKEN environment variable using secrets so that we don't need to use kubectl get to fetch it * We need to make the clone of kubeflow/manifests a non-shallow clone before we can push changes to the remote repo Next steps: * This PR only updated the profile controller * We need to refactor how the PipelineRun's are laid out * I think we may want the PipelineRun's to be separate from the reused resurces like Task * rebuil-manifests.sh should only regenerate tests for changed files * The created PRs don't satisfy the Kubeflow CLA check. Related to: kubeflow/testing#450
…cluster * Get rid of the PVC used to pass the image digest file between the build and update manifests step * Creating a PVC just creates operational complexity * We combine the build and update manifests step into one task. We can then use /workspace (a pod volume) to pass data like the image digest file between the steps * Update pipelineRun to work with version 0.9 of Tekton * Field serviceAccount has been renamed serviceAccountName * TaskRun no longer supports outputImageDir so we remove it; we will have to use Tekton to pass the image digest file * Remove the namespace.yaml and secrets.yaml from the kustomize package * The secrets should be created out of band and not checked in * So the behavior should be to deploy the kustomize package in a namespace that already exists with the appropriate secrets * Checking in secrets is confusing * If we check in dummy secrets then users will get confused about whether the secrets are valid or not * Furthermore, the file secrets.yaml is an invitation to end up checking the secrets into source control. * Configure some values to use gcr.io/kubeflow-images-public * Disable ISTIO sidecar in the pipelines * For kaniko we don't need the secret to be named a certain way we just need to set GOOGLE_APPLICATION_CREDENTIALS to point to the correct value * We change kaniko to use the user-gcp-sa secret that Kubeflow creates * We shouldn't need an image pull secret since kubeflow-images-public is public * GOOGLE_APPLICATION_CREDENTIALS should be used for pushing images * Change the name of the secret containing ssh credentials for kubeflow-bot to kubeflow-bot-github-ssh * rebuild-manifests.sh should use /workspace to get the image digest rather than the PVC. * Simplify rebuild-manifests.ssh * Tekton will mount the .ssh information in /tekton/home/.ssh so we just need to create a symbolic link to /root/.ssh * The image digest file should be fetched from /workspace and not some PVC. * Set GITHUB_TOKEN environment variable using secrets so that we don't need to use kubectl get to fetch it * We need to make the clone of kubeflow/manifests a non-shallow clone before we can push changes to the remote repo * I was able to successfully run the profile controller workflow and create a PR kubeflow/manifests#669 Next steps: * This PR only updated the profile controller * We need to refactor how the PipelineRun's are laid out * I think we may want the PipelineRun's to be separate from the reused resurces like Task * rebuil-manifests.sh should only regenerate tests for changed files * The created PRs don't satisfy the Kubeflow CLA check. Related to: kubeflow/testing#450
Status Update:
Next steps
|
@kkasravi I wrote up my current thinking in this doc: PTAL |
I had commented on restructuring the PipelineRun to embed a pipelineSpec and resourceSpec rather than a pipelineRef and resourceRefs here: #544 (comment) I'll comment on the doc as well |
…cluster (#4568) * Get rid of the PVC used to pass the image digest file between the build and update manifests step * Creating a PVC just creates operational complexity * We combine the build and update manifests step into one task. We can then use /workspace (a pod volume) to pass data like the image digest file between the steps * Update pipelineRun to work with version 0.9 of Tekton * Field serviceAccount has been renamed serviceAccountName * TaskRun no longer supports outputImageDir so we remove it; we will have to use Tekton to pass the image digest file * Remove the namespace.yaml and secrets.yaml from the kustomize package * The secrets should be created out of band and not checked in * So the behavior should be to deploy the kustomize package in a namespace that already exists with the appropriate secrets * Checking in secrets is confusing * If we check in dummy secrets then users will get confused about whether the secrets are valid or not * Furthermore, the file secrets.yaml is an invitation to end up checking the secrets into source control. * Configure some values to use gcr.io/kubeflow-images-public * Disable ISTIO sidecar in the pipelines * For kaniko we don't need the secret to be named a certain way we just need to set GOOGLE_APPLICATION_CREDENTIALS to point to the correct value * We change kaniko to use the user-gcp-sa secret that Kubeflow creates * We shouldn't need an image pull secret since kubeflow-images-public is public * GOOGLE_APPLICATION_CREDENTIALS should be used for pushing images * Change the name of the secret containing ssh credentials for kubeflow-bot to kubeflow-bot-github-ssh * rebuild-manifests.sh should use /workspace to get the image digest rather than the PVC. * Simplify rebuild-manifests.ssh * Tekton will mount the .ssh information in /tekton/home/.ssh so we just need to create a symbolic link to /root/.ssh * The image digest file should be fetched from /workspace and not some PVC. * Set GITHUB_TOKEN environment variable using secrets so that we don't need to use kubectl get to fetch it * We need to make the clone of kubeflow/manifests a non-shallow clone before we can push changes to the remote repo * I was able to successfully run the profile controller workflow and create a PR kubeflow/manifests#669 Next steps: * This PR only updated the profile controller * We need to refactor how the PipelineRun's are laid out * I think we may want the PipelineRun's to be separate from the reused resurces like Task * rebuil-manifests.sh should only regenerate tests for changed files * The created PRs don't satisfy the Kubeflow CLA check. Related to: kubeflow/testing#450
* Update applications.yaml with a v0.8 release. * The purpose of this PR is to check that just by defining the appropriate release we can begin building images from release branches and updating the release branch of kubeflow/manifests * Related to kubeflow#450 - Continuous delivery of Kubeflow applications
* Update applications.yaml with a v0.8 release. * The purpose of this PR is to check that just by defining the appropriate release we can begin building images from release branches and updating the release branch of kubeflow/manifests * Related to kubeflow#450 - Continuous delivery of Kubeflow applications * Create a python script for opening up the PR; this script replaces the bash script rebuild-manifests.sh that was used previously * The new script doesn't assume that the base branch for PRs is master. We need this to support updating release branches. * Create a profile in skaffold.yaml for running on the release cluster. * Create an image_util package to parse image URLs. * Use the Docker image for apps-cd to run create_manifests_pr.py * Add kustomize, go, and some other tools we need * In the docker image create a symbolic link for .ssh so we can pick up ssh credentials created by Tekton.
* The infrastructure for continuously rebuilding our docker images and updating our kustomize manifests has now been generalized. see kubeflow/testing#450 and https://github.com/kubeflow/testing/tree/master/apps-cd * This is the old code for updating the jupyter web app and is no longer needed.
* Define a v0.8 release * Update applications.yaml with a v0.8 release. * The purpose of this PR is to check that just by defining the appropriate release we can begin building images from release branches and updating the release branch of kubeflow/manifests * Related to #450 - Continuous delivery of Kubeflow applications * Create a python script for opening up the PR; this script replaces the bash script rebuild-manifests.sh that was used previously * The new script doesn't assume that the base branch for PRs is master. We need this to support updating release branches. * Create a profile in skaffold.yaml for running on the release cluster. * Create an image_util package to parse image URLs. * Use the Docker image for apps-cd to run create_manifests_pr.py * Add kustomize, go, and some other tools we need * In the docker image create a symbolic link for .ssh so we can pick up ssh credentials created by Tekton. * * Define a 1.0 release now that the branches have been cut * Related to kubeflow/kubeflow#4685
Update
|
* The infrastructure for continuously rebuilding our docker images and updating our kustomize manifests has now been generalized. see kubeflow/testing#450 and https://github.com/kubeflow/testing/tree/master/apps-cd * This is the old code for updating the jupyter web app and is no longer needed.
This is working. Here's a list of PRs indicating several PRs updating 1.0 applications which were successfully merged Only remaining thing to do before updating this PR is updating the instance of the release infrastructure in the prod namespace. |
Closing this issue. |
…cluster (kubeflow#4568) * Get rid of the PVC used to pass the image digest file between the build and update manifests step * Creating a PVC just creates operational complexity * We combine the build and update manifests step into one task. We can then use /workspace (a pod volume) to pass data like the image digest file between the steps * Update pipelineRun to work with version 0.9 of Tekton * Field serviceAccount has been renamed serviceAccountName * TaskRun no longer supports outputImageDir so we remove it; we will have to use Tekton to pass the image digest file * Remove the namespace.yaml and secrets.yaml from the kustomize package * The secrets should be created out of band and not checked in * So the behavior should be to deploy the kustomize package in a namespace that already exists with the appropriate secrets * Checking in secrets is confusing * If we check in dummy secrets then users will get confused about whether the secrets are valid or not * Furthermore, the file secrets.yaml is an invitation to end up checking the secrets into source control. * Configure some values to use gcr.io/kubeflow-images-public * Disable ISTIO sidecar in the pipelines * For kaniko we don't need the secret to be named a certain way we just need to set GOOGLE_APPLICATION_CREDENTIALS to point to the correct value * We change kaniko to use the user-gcp-sa secret that Kubeflow creates * We shouldn't need an image pull secret since kubeflow-images-public is public * GOOGLE_APPLICATION_CREDENTIALS should be used for pushing images * Change the name of the secret containing ssh credentials for kubeflow-bot to kubeflow-bot-github-ssh * rebuild-manifests.sh should use /workspace to get the image digest rather than the PVC. * Simplify rebuild-manifests.ssh * Tekton will mount the .ssh information in /tekton/home/.ssh so we just need to create a symbolic link to /root/.ssh * The image digest file should be fetched from /workspace and not some PVC. * Set GITHUB_TOKEN environment variable using secrets so that we don't need to use kubectl get to fetch it * We need to make the clone of kubeflow/manifests a non-shallow clone before we can push changes to the remote repo * I was able to successfully run the profile controller workflow and create a PR kubeflow/manifests#669 Next steps: * This PR only updated the profile controller * We need to refactor how the PipelineRun's are laid out * I think we may want the PipelineRun's to be separate from the reused resurces like Task * rebuil-manifests.sh should only regenerate tests for changed files * The created PRs don't satisfy the Kubeflow CLA check. Related to: kubeflow/testing#450
* The infrastructure for continuously rebuilding our docker images and updating our kustomize manifests has now been generalized. see kubeflow/testing#450 and https://github.com/kubeflow/testing/tree/master/apps-cd * This is the old code for updating the jupyter web app and is no longer needed.
…cluster (kubeflow#4568) * Get rid of the PVC used to pass the image digest file between the build and update manifests step * Creating a PVC just creates operational complexity * We combine the build and update manifests step into one task. We can then use /workspace (a pod volume) to pass data like the image digest file between the steps * Update pipelineRun to work with version 0.9 of Tekton * Field serviceAccount has been renamed serviceAccountName * TaskRun no longer supports outputImageDir so we remove it; we will have to use Tekton to pass the image digest file * Remove the namespace.yaml and secrets.yaml from the kustomize package * The secrets should be created out of band and not checked in * So the behavior should be to deploy the kustomize package in a namespace that already exists with the appropriate secrets * Checking in secrets is confusing * If we check in dummy secrets then users will get confused about whether the secrets are valid or not * Furthermore, the file secrets.yaml is an invitation to end up checking the secrets into source control. * Configure some values to use gcr.io/kubeflow-images-public * Disable ISTIO sidecar in the pipelines * For kaniko we don't need the secret to be named a certain way we just need to set GOOGLE_APPLICATION_CREDENTIALS to point to the correct value * We change kaniko to use the user-gcp-sa secret that Kubeflow creates * We shouldn't need an image pull secret since kubeflow-images-public is public * GOOGLE_APPLICATION_CREDENTIALS should be used for pushing images * Change the name of the secret containing ssh credentials for kubeflow-bot to kubeflow-bot-github-ssh * rebuild-manifests.sh should use /workspace to get the image digest rather than the PVC. * Simplify rebuild-manifests.ssh * Tekton will mount the .ssh information in /tekton/home/.ssh so we just need to create a symbolic link to /root/.ssh * The image digest file should be fetched from /workspace and not some PVC. * Set GITHUB_TOKEN environment variable using secrets so that we don't need to use kubectl get to fetch it * We need to make the clone of kubeflow/manifests a non-shallow clone before we can push changes to the remote repo * I was able to successfully run the profile controller workflow and create a PR kubeflow/manifests#669 Next steps: * This PR only updated the profile controller * We need to refactor how the PipelineRun's are laid out * I think we may want the PipelineRun's to be separate from the reused resurces like Task * rebuil-manifests.sh should only regenerate tests for changed files * The created PRs don't satisfy the Kubeflow CLA check. Related to: kubeflow/testing#450
* The infrastructure for continuously rebuilding our docker images and updating our kustomize manifests has now been generalized. see kubeflow/testing#450 and https://github.com/kubeflow/testing/tree/master/apps-cd * This is the old code for updating the jupyter web app and is no longer needed.
We need a good way to continuously build our docker images and then update our kustomize manifests to use the updated images.
This is critical for maintaining velocity. One of the big problems we are seeing with releases is that changes are piling up and not getting exercised until we start cutting releases because we haven't updated our kustomize manifests.
Also as the number of applications scale the toil around building docker images and then updating manifests becomes significant. This is especially true during releases as we try to rapidly push out fixes.
There is a POC based on the jupyter web app here.
https://github.com/kubeflow/kubeflow/tree/master/releasing/auto-update
We'd like to make it super easy for people to define new workflows to auto-build their application. In an ideal world they would just check in a YAMl file with a couple configurations e.g.
A couple things we are missing
/cc @scottilee @animeshsingh @kkasravi @jinchihe
The text was updated successfully, but these errors were encountered: