-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore periodic prow job for upgrade tests #5193
Comments
Hello! I run e2e on my machine and try to figure out how the e2e test works. Now I think I am a little familiar with this e2e test. But when I run the e2e test I found some weird things and was not sure if it is a bug. First, there is some unnormal log and I find the related issue #4777 but seems like the bug is not resolved. But I think it is a good chance to fix it and happy to do this. Second I did not find the function apply_resources in the repository tektoncd/pipeline or tektoncd/plumbing, I guess maybe the function is deleted but forget to delete the reference here ? pipeline/test/e2e-tests-upgrade.sh Line 69 in c1d4702
|
I noticed that in the e2e-tests-upgrade.sh the older version is v0.5.2. pipeline/test/e2e-tests-upgrade.sh Line 28 in c1d4702
I think maybe v0.5.2 is too old that is not suitable to be the older version in the upgrade e2e test? I guess maybe no one will upgrade the server to the latest from the oldest version in their production environment? The above are some of my humble opinions, not sure if it is reasonable. 😂 |
I find that the e2e-tests-upgrade.sh script is referenced here as a cron job but I don't find the related running job in tektoncd/pipeline github actions. Could you help show me where I can get the relevant running job? @lbernick Thanks very much! |
Upgrade testing is something we've been discussing with @abayer in the productivity WG. We also need to make sure we have enough visibility on periodic jobs. If an upgrade job is running it has not been maintained or seen by anyone for a long time afaik. We where discussing perhaps we should include it in the release process. |
@yuzp1996 Nice detective work! I agree pipeline v0.5.2 is too old and we should definitely make sure the cron job is 1) running and 2) sending notifications somewhere when it fails |
Looking at our prow dashboard, upgrade tests have been failing with Since these tests appear to be broken anyway, might be a good opportunity to move them to Tekton :) |
I am sorry I don't understand what moving them to Tekton means. Is it something like dogfooding? Another CI system based on Tekton? |
Hi, @lbernick @afrittoli looks like there will be a large change for the e2e upgrade test. I am interested in e2e upgrade test. Is there anything I can do to help now? 😁 |
What I meant here is that instead of having this test run on prow, we can run it on our dogfooding kubernetes cluster via a Tekton TaskRun or PipelineRun. One of our medium-term goals is to move Tekton test infrastructure away from using prow and use Tekton to test Tekton instead.
A great first step would be seeing if you can get the existing scripts working on a k8s cluster! It looks like the tests are currently broken with the error message I've pasted above. Once we have a working script we can get started on adding this as a recurring TaskRun/PipelineRun in our dogfooding cluster. |
OK! I commented the pipeline/test/e2e-tests-upgrade.sh Line 32 in 3c5d33b
Looks like something went wrong in |
I find that Prow job will run the command Then I pull the image to find out if the files Related Issue: Related file: Solution: |
I made a change in tektoncd/plumbing#1175 and maybe it can resolve the problem when running the e2e-upgrade-test.sh. |
If directory ${GOPATH}/src/github.com/google/licenseclassifier exists and is not empty we should skip clone again or will cause an error. Related Issue: tektoncd/pipeline#5193 Signed-off-by: yuzhipeng <yuzp1996@qq.com>
If directory ${GOPATH}/src/github.com/google/licenseclassifier exists and is not empty we should skip clone again or will cause an error. Related Issue: tektoncd/pipeline#5193 Signed-off-by: yuzhipeng <yuzp1996@qq.com>
I noticed our unit tests also clone the license classifier but don't run into the same problem-- check out the first log line here. Also, the upgrade tests use a version of the test-runner that is 4 months older than the version the rest of our CI uses, and uses go 1.16 instead of go 1.18. I wonder if this could be related, but I don't have a way to know what was in those images. @afrittoli @abayer can you comment here? |
If directory ${GOPATH}/src/github.com/google/licenseclassifier exists and is not empty we should skip clone again or will cause an error. Related Issue: tektoncd/pipeline#5193 Signed-off-by: yuzhipeng <yuzp1996@qq.com>
If directory ${GOPATH}/src/github.com/google/licenseclassifier exists and is not empty we should skip clone again or will cause an error. Related Issue: tektoncd/pipeline#5193 Signed-off-by: yuzhipeng <yuzp1996@qq.com>
If directory ${GOPATH}/src/github.com/google/licenseclassifier exists and is not empty we should skip clone again or will cause an error. Related Issue: tektoncd/pipeline#5193 Signed-off-by: yuzhipeng <yuzp1996@qq.com>
@abayer are you still able to update the prow config to use the new image? Hoping that upgrade tests can run successfully after this change, but even if they do I think it's worth making their failures more discoverable (e.g. post on tekton slack). |
I'm pretty sure I already did, lemme check. |
Yup, I did. =) |
It looks like upgrade tests are still failing on the prow dashboard with
which now that I look at it is a different error than before-- so might need some more updates to the image |
This comment updates the current progress:
|
This seems to be the expected behavior of the existing upgrade tests where the resources of configmaps are missing. Thanks to pointers from @lbernick , when testing with a fresh kind cluster without the previous config. Currently status: tried to figure out the reasons why specific tests failed |
Current update:
Previous assumptions:
Approaches to this question:
And my questions are about:
|
/close |
@JeromeJu: You can't close an active issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
commenting for the record, looks like tektoncd/plumbing#1279 is the PR that got upgrade tests working. |
Scenario 2 of the upgrade test aims to install pipeline server from the preivous release, create resources from that version and test on those resources against the client of the current release. This commit restores the apply_resources function to enable the 2nd scenario of the upgrade test work by applying the resources from the old server version of pipeline. It was preivously renamed and removed in tektoncd#2685. It fixes the missing piece where the required resources from the previous release have not been created to accomplish scenario 2 of the upgrade test. part of: tektoncd#5193
Scenario 2 of the upgrade test aims to install pipeline server from the preivous release, create resources from that version and test on those resources against the client of the current release. This commit restores the apply_resources function to enable the 2nd scenario of the upgrade test work by applying the resources from the old server version of pipeline. It was preivously renamed and removed in tektoncd#2685. It fixes the missing piece where the required resources from the previous release have not been created to accomplish scenario 2 of the upgrade test. More specifically, it creates resources in v1 examples that are not under beta or no-ci. part of: tektoncd#5193
Scenario 2 of the upgrade test aims to install pipeline server from the preivous release, create resources from that version and test on those resources against the client of the current release. This commit restores the apply_resources function to enable the 2nd scenario of the upgrade test work by applying the resources from the old server version of pipeline. It was preivously renamed and removed in tektoncd#2685. It fixes the missing piece where the required resources from the previous release have not been created to accomplish scenario 2 of the upgrade test. More specifically, it creates resources in v1 examples that are not under beta or no-ci. part of: tektoncd#5193
Scenario 2 of the upgrade test aims to install pipeline server from the preivous release, create resources from that version and test on those resources against the client of the current release. This commit restores the apply_resources function to enable the 2nd scenario of the upgrade test work by applying the resources from the old server version of pipeline. It was preivously renamed and removed in tektoncd#2685. It fixes the missing piece where the required resources from the previous release have not been created to accomplish scenario 2 of the upgrade test. More specifically, it creates resources in v1 examples that are not under beta or no-ci. part of: tektoncd#5193
Scenario 2 of the upgrade test aims to install pipeline server from the preivous release, create resources from that version and test on those resources against the client of the current release. This commit restores the apply_resources function to enable the 2nd scenario of the upgrade test work by applying the resources from the old server version of pipeline. It was preivously renamed and removed in tektoncd#2685. It fixes the missing piece where the required resources from the previous release have not been created to accomplish scenario 2 of the upgrade test. More specifically, it creates resources in v1 examples that are not under beta or no-ci. part of: tektoncd#5193
Recently, we have had a few issues of regressions that were not detected by CI.
For example, #4834 introduced changes that caused a new client to be incompatible with an old server.
In addition, some breaking changes were introduced where resources already stored on a cluster became invalid when the server version was upgraded. #4779 added new defaults via the mutating webhook, and then #4818 validated the existence of these defaults. However, existing resources didn't receive these defaults, so they failed validation.
We already have some upgrade tests.
These tests have two parts:
We should add a step where we run integration tests on the older server version (before upgrading the server). This will address the first type of regression detailed here. When we do make a breaking change purposefully (after deprecating a field), I would imagine we'd remove tests for this functionality, so I don't think adding this check would stop us from removing deprecated fields.
The second type of regression (existing resources becoming invalid) seems like it should have been caught by the upgrade tests. I'm not sure why these tests didn't catch this issue-- maybe upgrading the server reapplies the webhook defaults.
The text was updated successfully, but these errors were encountered: