-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[zuul] Update the tempest job to use test_operator #665
[zuul] Update the tempest job to use test_operator #665
Conversation
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/3761837e16fb4d7fa8587259956ddd11 ✔️ nova-operator-content-provider SUCCESS in 55m 56s |
recheck |
something broke in zuul but the job actually passed. however the allowed and excluded list is not propagating so it only ran the default |
That's my bad, I forgot to update the names of the vars used to pass in the include_list and exclude_list, the update is pushed |
recheck The Depends on link was not correct |
Using the tempest role from ci_framework ran the tempest container via podman on the controller. test_operator uses the same image and runs tempest in a pod on the OCP cluster. Depends-On: openstack-k8s-operators/ci-framework#1065
cifmw_tempest_tests_allowed -> cifmw_test_operator_tempest_include_list cifmw_tempest_tests_skipped -> cifmw_test_operator_tempest_exclude_list
c991769
to
30bb0e7
Compare
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/947d0966917a461d91da7308ba39f656 ✔️ nova-operator-content-provider SUCCESS in 2h 32m 43s |
rdo-check |
check-github |
recheck there are a lot of rabbitmq and mysql connection errors in the failed tempest run whcih are unrelated to how we ran tempest so that could explain many of the test failures |
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/f27a2eb76f4447368ea23a662d4b34d8 ✔️ nova-operator-content-provider SUCCESS in 2h 34m 41s |
Tempest logs are here now: https://review.rdoproject.org/zuul/build/b6577dec951b4bb3a7f746d47b5fae85/log/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/pods/tempest-tests-6ksc4/logs/tempest-tests-tests-runner.log The execution was interrupted by the job timeout. But I see a lot of failures in the executed tests that makes the execution slower e.g.:
I think the main reason is to much parallelism. In the current podman based executor we run tempest in 4 processes |
ya we likely need to drop it down to 3-4 ill check with emma in the morning and see if she wants us to take over this patch. one downside to the test-operator is it only copies the full tempest logs to the logs folder if the tempest execution does not time out. where it runes properly it provides the html report and the raw tempest logs in a seperate tests log dir but where waiting for the job to complete fails it does not collect those logs properly. |
I'm going to take a look at this ^^. Also, about the concurrency. It would probably be good to drop it down to 4. I'm going to propose a patch. We've encountered similar issues related to it. |
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/075656d74dcc4638b0ce17d392ef0008 ✔️ nova-operator-content-provider SUCCESS in 3h 27m 06s |
Hi @elfiesmelfie, this should fix the timeout issue -> openstack-k8s-operators/ci-framework#1112. I checked the logs and it seems that tempest pod finished the execution of the tempest tests but the role kept waiting [1]. |
rdo-check |
check-github |
check-rdo is what i wanted |
looking at the failed tempest tests the 500 errors form keystone correspond to mysql exceptions cause by the galarea cluster not being writeable look at the mysql pod events we can see it was restarted at least 4 times. Warning Unhealthy 152m kubelet Startup probe failed: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111) so i dont think the deploy db was stable and functional when tempest ran |
i can see in the pod logs that nova and keystone were unable to connect to the db for a protracted period of time. i belive we have a tracker for a similar issue in other ci jobs so i think the test failures are directly a result of the db being inaccessable. |
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/4d723d57417b42c583616b847a0e40c2 ✔️ nova-operator-content-provider SUCCESS in 2h 14m 39s |
much much better, only one test failed this time test_mtu_sized_frames |
this change reduces the concurrency to 4 and extends the tempest timeout to 7200
8a81459
to
36f1faa
Compare
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/008a6023b5b04df880b8ec49928d0404 ✔️ nova-operator-content-provider SUCCESS in 2h 15m 21s |
These are the lost test cases with this PR:
I'm OK to not running them here |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: elfiesmelfie, gibizer The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
check-rdo |
0610489
into
openstack-k8s-operators:main
Using the tempest role from ci_framework ran the tempest container via podman on the controller.
test_operator uses the same image and runs tempest in a pod on the OCP cluster.
Depends-On: openstack-k8s-operators/openstack-operator#659