UPSTREAM: <carry>: Add the upstream code rebase document

Signed-off-by: Ricardo M. Oliveira <rmartine@redhat.com>
rimolive · Aug 27, 2024 · f5a03d1 · f5a03d1
1 parent a8fbbd2
commit f5a03d1
Showing 1 changed file with 145 additions and 0 deletions.
diff --git a/REBASE.opendatahub.md b/REBASE.opendatahub.md
@@ -0,0 +1,145 @@
+# KFP -> DSP Rebase process
+
+This document describes the process to upgrade Data Science Pipelines (DSP) code from a specific Kubeflow Pipelines (KFP) version tag. The following repositories must be updated, in the following order:
+
+- https://github.com/opendatahub-io/data-science-pipelines
+- https://github.com/opendatahub-io/argo-workflows
+- https://github.com/opendatahub-io/data-science-pipelines-operator
+
+## Checklist
+
+- The new rebase branch has been created from the upstream tag
+- The new rebase branch includes relevant carries from target branch
+- The upstream tag is pushed to kubeflow/pipelines to ensure that build artifacts are versioned correctly
+
+## Getting Started
+
+### Data Science Pipelines repository
+
+Preparing the local repo clone
+
+Clone from a personal fork, and add the remote for upstream and opendatahub, fetching its branches:
+
+ ```
+git remote add --fetch kubeflow https://github.com/kubeflow/pipelines
+git remote add --fetch opendatahub https://github.com/opendatahub-io/data-science-pipelines
+```
+
+### Creating a new local branch for the new rebase
+
+Branch the target release to a new branch:
+
+```
+TAG=2.2.0
+git checkout -b rebase-$TAG $TAG
+```
+
+Merge opendatahub(master) branch into the `rebase-\$TAG` branch with merge strategy ours. It discards all changes from the other branch (opendatahub/master) and create a merge commit. This leaves the content of your branch unchanged, and when you next merge with the other branch, Git will only consider changes made from this point forward. (Do not confuse this with ours conflict resolution strategy for recursive merge strategy, -X option.)
+
+```
+git merge -s ours opendatahub/master
+```
+
+### Creating a spreadsheet of carry commits from the previous release
+
+- Given the upstream tag (e.g. v2.2.0) of the most recent rebase and the name of the branch that is targeted for rebase (e.g. opendatahub/master), generate a tsv file containing the set of carry commits that need to be considered for picking:
+
+```
+echo "Comment SHA\tAction\tClean\tSummary\tCommit Link\tPR Link" > ~/Documents/${TAG}.tsv
+ 
+git log $( git merge-base opendatahub/master $TAG )..opendatahub/master \
+    --ancestry-path --reverse --no-merges \
+    --pretty='tformat:%x09%h%x09%x09%x09%s%x09https://github.com/opendatahub-io/data-science-pipelines/commit/%h?w=1' | grep -E $'\t''UPSTREAM: .*'$'\t' | sed -E 's~UPSTREAM: ([0-9]+)(:.*)~UPSTREAM: \1\2\thttps://github.com/kubeflow/pipelines/pull/\1~' >> ~/Documents/$TAG.tsv
+```
+
+### Picking commits from the previous rebase branch to the new branch
+
+Go through the spreadsheet and for every commit set one of the appropriate actions:
+
+- p, to pick the commit
+- s, to squash it (add a comment with the sha of the target)
+- d, to drop the commit (if it is not obvious, comment why)
+
+Set up conditional formatting in the google sheet to color these lines appropriately.
+
+Commits carried on rebase branches have commit messages prefixed as follows:
+
+- UPSTREAM: <carry>:
+    - A persistent carry that should probably be picked for the subsequent rebase branch. In general, these commits are used to modify behavior for consistency or compatibility with openshift.
+- UPSTREAM: <drop>:
+    - A carry that should probably not be picked for the subsequent rebase branch. In general, these commits are used to maintain the codebase in ways that are branch-specific, like the update of generated files or dependencies.
+- UPSTREAM: 77870:
+    - The number identifies a PR in upstream kubeflow pipelines (i.e. https://github.com/kubeflow/pipelines/pull/<pr id>). A commit with this message should only be picked into the subsequent rebase branch if the commits of the referenced PR are not included in the upstream branch. To check if a given commit is included in the upstream branch, open the referenced upstream PR and check any of its commits for the release tag (e.g. v.1.25.0) targeted by the new rebase branch.
+
+With these guidelines in mind, pick the appropriate commits from the previous rebase branch into the new rebase branch. Create a new filter view in the spreadsheet to allow you get a view where Action==p || Action==s and copy paste the shas to git cherry-pick command. Use `tr '\n' ' ' <<< "<line_separated_commits>"` to get a space separated list from the copy&paste.
+
+Where it makes sense to do so, squash carried changes that are tightly coupled to simplify future rebases. If the commit message of a carry does not conform to expectations, feel free to revise and note the change in the spreadsheet row for the commit.
+
+If you first pick all the pick+squash commits first and push them for review it is easier for you and your reviewers to check the code changes and you squash it at the end.
+
+When filling in Clean column in the spreadsheet make sure to use the following number to express the complexity of the pick:
+
+- 0 - clean
+- 1 - format fixups
+- 2 - code fixups
+- 3 - logic changes
+
+Explicit commit rules:
+
+- Anything touching openshift-hack/, openshift specific READMEs or similar files should be squashed to 1 commit named "UPSTREAM: : Add OpenShift specific files"
+- Updating generated files coming from kubernetes should be <drop> commit
+- Generated changes should never be mixed with non-generated changes. If a carry is ever seen to contain generated changes, those changes should be dropped.
+
+### Create the Pull-Request in opendatahub-io/data-science-pipelines repository
+
+Create a PR with the result of the previous tasks with the following description: `UPSTREAM <carry>: Rebase code to kfp x.y.z`
+
+## Argo Workflows Repo
+
+If the kfp code you are rebasing uses a newer Argo workflows version, you must update opendatahub-io/argo-workflows repository.
+
+### Preparing the local repo clone
+
+Clone from a personal fork, and add the remote for upstream and opendatahub, fetching its branches:
+
+```
+git clone git@github.com:<user id>/data-science-pipelines
+git remote add --fetch kubeflow https://github.com/kubeflow/pipelines
+git remote add --fetch opendatahub https://github.com/opendatahub-io/data-science-pipelines
+```
+
+### Creating a new local branch for the new rebase
+
+Branch the target release to a new branch:
+
+```
+TAG=v3.4.17
+git checkout -b argo-upgrade $TAG
+```
+
+### Create the Pull-Request in opendatahub-io/argo-workflows repository
+
+Create a PR with the result of the previous tasks with the following description: `Upgrade argo-workflows code to x.y.z`
+
+## Data Science Pipelines Operator repository
+
+### Apply the DataSciencePipelinesApplication CustomResource from the opendatahub-io/data-science-pipielines Pull-Request
+
+With the Pull-Request opened in opendatahub-io/data-science-pipelines repository, you can get a DataSciencePipelinesApplication (DSPA)  CustomResource with the resulting image builds from the bot comment like this.
+
+### Fix the Data Science Pipelines Operator code
+
+Check if there are any breaking changes, and fix the code whenever is needed
+One obvious change would be the tag references in params.env file
+
+### Create the Pull-Request in opendatahub-io/data-science-pipelines-operator repository
+
+Create a PR with the changes in previous task with the following description: `Prepare to upgrade to the next DSP Release`
+
+
+## Followup work
+WIP
+
+## Updating with rebase.sh (experimental)
+WIP
+