[zuul] Update the tempest job to use test_operator #665

elfiesmelfie · 2024-02-02T13:25:56Z

Using the tempest role from ci_framework ran the tempest container via podman on the controller.
test_operator uses the same image and runs tempest in a pod on the OCP cluster.

Depends-On: openstack-k8s-operators/openstack-operator#659

softwarefactory-project-zuul · 2024-02-02T14:23:13Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/3761837e16fb4d7fa8587259956ddd11

✔️ nova-operator-content-provider SUCCESS in 55m 56s
✔️ nova-operator-kuttl SUCCESS in 36m 48s
❌ nova-operator-tempest-multinode RETRY_LIMIT in 4s

gibizer · 2024-02-02T14:49:22Z

recheck

SeanMooney · 2024-02-02T17:19:28Z

something broke in zuul but the job actually passed.

https://logserver.rdoproject.org/65/665/796bce2a61df96939f273e23905f29684659f55f/github-check/nova-operator-tempest-multinode/39fd880/controller/ci-framework-data/tests/test_operator/stestr_results.html

however the allowed and excluded list is not propagating so it only ran the default

elfiesmelfie · 2024-02-02T17:24:21Z

however the allowed and excluded list is not propagating so it only ran the default

That's my bad, I forgot to update the names of the vars used to pass in the include_list and exclude_list, the update is pushed

elfiesmelfie · 2024-02-02T17:30:56Z

recheck

The Depends on link was not correct

Using the tempest role from ci_framework ran the tempest container via podman on the controller. test_operator uses the same image and runs tempest in a pod on the OCP cluster. Depends-On: openstack-k8s-operators/ci-framework#1065

cifmw_tempest_tests_allowed -> cifmw_test_operator_tempest_include_list cifmw_tempest_tests_skipped -> cifmw_test_operator_tempest_exclude_list

softwarefactory-project-zuul · 2024-02-02T20:10:10Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/947d0966917a461d91da7308ba39f656

✔️ nova-operator-content-provider SUCCESS in 2h 32m 43s
✔️ nova-operator-kuttl SUCCESS in 36m 38s
❌ nova-operator-tempest-multinode FAILURE in 2h 14m 20s

SeanMooney · 2024-02-03T11:42:45Z

rdo-check

SeanMooney · 2024-02-03T11:43:29Z

check-github

SeanMooney · 2024-02-03T11:45:00Z

recheck there are a lot of rabbitmq and mysql connection errors in the failed tempest run whcih are unrelated to how we ran tempest so that could explain many of the test failures

.zuul.yaml

softwarefactory-project-zuul · 2024-02-03T14:21:33Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/f27a2eb76f4447368ea23a662d4b34d8

✔️ nova-operator-content-provider SUCCESS in 2h 34m 41s
✔️ nova-operator-kuttl SUCCESS in 43m 41s
❌ nova-operator-tempest-multinode FAILURE in 2h 16m 37s

gibizer · 2024-02-05T10:51:26Z

Tempest logs are here now: https://review.rdoproject.org/zuul/build/b6577dec951b4bb3a7f746d47b5fae85/log/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/pods/tempest-tests-6ksc4/logs/tempest-tests-tests-runner.log

The execution was interrupted by the job timeout. But I see a lot of failures in the executed tests that makes the execution slower e.g.:

{7} tempest.api.compute.security_groups.test_security_group_rules_negative.SecurityGroupRulesNegativeTestJSON.test_create_security_group_rule_with_invalid_port_range [67.534572s] ... FAILED
{6} tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_building_state [98.352015s] ... FAILED

I think the main reason is to much parallelism. In the current podman based executor we run tempest in 4 processes
https://logserver.rdoproject.org/66/666/d9aa648cb7613bb124fd9d878fe97ab00d77664d/github-check/nova-operator-tempest-multinode/12ca46c/controller/ci-framework-data/tests/tempest/podman_tempest.log
but it seems the tempest operator uses 8.

SeanMooney · 2024-02-06T03:16:43Z

ya we likely need to drop it down to 3-4

ill check with emma in the morning and see if she wants us to take over this patch.

one downside to the test-operator is it only copies the full tempest logs to the logs folder if the tempest execution does not time out.

where it runes properly it provides the html report and the raw tempest logs in a seperate tests log dir but where waiting for the job to complete fails it does not collect those logs properly.

lpiwowar · 2024-02-06T11:41:14Z

one downside to the test-operator is it only copies the full tempest logs to the logs folder if the tempest execution does not time out.

I'm going to take a look at this ^^.

Also, about the concurrency. It would probably be good to drop it down to 4. I'm going to propose a patch. We've encountered similar issues related to it.

softwarefactory-project-zuul · 2024-02-06T17:18:32Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/075656d74dcc4638b0ce17d392ef0008

✔️ nova-operator-content-provider SUCCESS in 3h 27m 06s
✔️ nova-operator-kuttl SUCCESS in 36m 38s
❌ nova-operator-tempest-multinode FAILURE in 3h 09m 10s

lpiwowar · 2024-02-06T18:40:10Z

Hi @elfiesmelfie, this should fix the timeout issue -> openstack-k8s-operators/ci-framework#1112.

I checked the logs and it seems that tempest pod finished the execution of the tempest tests but the role kept waiting [1].

[1] https://logserver.rdoproject.org/65/665/8a814593c6acbbf86b6ad3022bb565bb23d73247/github-check/nova-operator-tempest-multinode/66b37ab/controller/ci-framework-data/logs/openstack-k8s-operators-openstack-must-gather/namespaces/openstack/pods/tempest-tests-w26n7/logs/tempest-tests-tests-runner.log

SeanMooney · 2024-02-08T08:09:56Z

rdo-check

SeanMooney · 2024-02-08T08:10:49Z

check-github

SeanMooney · 2024-02-08T08:12:16Z

check-rdo is what i wanted

SeanMooney · 2024-02-08T08:22:12Z

looking at the failed tempest tests the 500 errors form keystone correspond to

mysql exceptions cause by the galarea cluster not being writeable

look at the mysql pod events we can see it was restarted at least 4 times.

Warning Unhealthy 152m kubelet Startup probe failed: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)
Normal Pulled 151m (x4 over 152m) kubelet Container image "quay.io/podified-antelope-centos9/openstack-mariadb@sha256:095e75c0a028bf5ba83af90882ce1836e00fc198038f776ee1104f6b1232da93" already present on machine
Normal Started 151m (x4 over 152m) kubelet Started container galera
Normal Created 151m (x4 over 152m) kubelet Created container galera
Warning BackOff 151m (x8 over 152m) kubelet Back-off restarting failed container
Warning Unhealthy 57s (x4 over 102m) kubelet Liveness probe failed: command timed out
Warning Unhealthy 57s (x4 over 3m38s) kubelet Readiness probe failed: command timed out

so i dont think the deploy db was stable and functional when tempest ran

SeanMooney · 2024-02-08T08:41:35Z

i can see in the pod logs that nova and keystone were unable to connect to the db for a protracted period of time. i belive we have a tracker for a similar issue in other ci jobs so i think the test failures are directly a result of the db being inaccessable.

softwarefactory-project-zuul · 2024-02-08T10:28:20Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/4d723d57417b42c583616b847a0e40c2

✔️ nova-operator-content-provider SUCCESS in 2h 14m 39s
❌ nova-operator-kuttl RETRY_LIMIT in 50m 53s
❌ nova-operator-tempest-multinode FAILURE in 1h 56m 25s

SeanMooney · 2024-02-08T10:49:18Z

much much better, only one test failed this time test_mtu_sized_frames
and I'm not sure we configure tempest to run that properly as we have to take into account the lower mtu in ci.

this change reduces the concurrency to 4 and extends the tempest timeout to 7200

softwarefactory-project-zuul · 2024-02-08T13:14:12Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/008a6023b5b04df880b8ec49928d0404

✔️ nova-operator-content-provider SUCCESS in 2h 15m 21s
❌ nova-operator-kuttl RETRY_LIMIT in 50m 53s
✔️ nova-operator-tempest-multinode SUCCESS in 1h 55m 01s

gibizer · 2024-02-08T13:58:17Z

These are the lost test cases with this PR:

473,480d472
< tempest.api.compute.volumes.test_volumes_negative.VolumesNegativeTest.test_create_volume_with_invalid_size
< tempest.api.compute.volumes.test_volumes_negative.VolumesNegativeTest.test_create_volume_without_passing_size
< tempest.api.compute.volumes.test_volumes_negative.VolumesNegativeTest.test_create_volume_with_size_zero
< tempest.api.compute.volumes.test_volumes_negative.VolumesNegativeTest.test_delete_invalid_volume_id
< tempest.api.compute.volumes.test_volumes_negative.VolumesNegativeTest.test_delete_volume_without_passing_volume_id
< tempest.api.compute.volumes.test_volumes_negative.VolumesNegativeTest.test_get_volume_without_passing_volume_id
< tempest.api.compute.volumes.test_volumes_negative.VolumesNegativeTest.test_volume_delete_nonexistent_volume_id
< tempest.api.compute.volumes.test_volumes_negative.VolumesNegativeTest.test_volume_get_nonexistent_volume_id
491d482
< tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_mtu_sized_frames

I'm OK to not running them here

openshift-ci · 2024-02-08T13:59:11Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elfiesmelfie, gibizer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [gibizer]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

SeanMooney · 2024-02-09T14:10:51Z

check-rdo

openshift-ci bot requested review from frenzyfriday and jamepark4 February 2, 2024 13:26

elfiesmelfie requested a review from gibizer February 2, 2024 13:46

elfiesmelfie added 2 commits February 2, 2024 17:35

[zuul] Update the tempest job to use test_operator

482768c

Using the tempest role from ci_framework ran the tempest container via podman on the controller. test_operator uses the same image and runs tempest in a pod on the OCP cluster. Depends-On: openstack-k8s-operators/ci-framework#1065

[zuul] Update var passed to tempest test

30bb0e7

cifmw_tempest_tests_allowed -> cifmw_test_operator_tempest_include_list cifmw_tempest_tests_skipped -> cifmw_test_operator_tempest_exclude_list

SeanMooney force-pushed the efoley-tempest-test-operator branch from c991769 to 30bb0e7 Compare February 2, 2024 17:35

SeanMooney reviewed Feb 3, 2024

View reviewed changes

.zuul.yaml Show resolved Hide resolved

set tempest timeout and concurrency

36f1faa

this change reduces the concurrency to 4 and extends the tempest timeout to 7200

SeanMooney force-pushed the efoley-tempest-test-operator branch from 8a81459 to 36f1faa Compare February 8, 2024 10:57

SeanMooney mentioned this pull request Feb 8, 2024

[test-operator] Fix handling of failed job openstack-k8s-operators/ci-framework#1112

Merged

4 tasks

gibizer approved these changes Feb 8, 2024

View reviewed changes

openshift-ci bot assigned gibizer Feb 8, 2024

openshift-ci bot added the lgtm label Feb 8, 2024

openshift-ci bot added the approved label Feb 8, 2024

openshift-merge-bot bot merged commit 0610489 into openstack-k8s-operators:main Feb 9, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[zuul] Update the tempest job to use test_operator #665

[zuul] Update the tempest job to use test_operator #665

elfiesmelfie commented Feb 2, 2024 •

edited by SeanMooney

Loading

softwarefactory-project-zuul bot commented Feb 2, 2024

gibizer commented Feb 2, 2024

SeanMooney commented Feb 2, 2024

elfiesmelfie commented Feb 2, 2024

elfiesmelfie commented Feb 2, 2024

softwarefactory-project-zuul bot commented Feb 2, 2024

SeanMooney commented Feb 3, 2024

SeanMooney commented Feb 3, 2024

SeanMooney commented Feb 3, 2024

softwarefactory-project-zuul bot commented Feb 3, 2024

gibizer commented Feb 5, 2024

SeanMooney commented Feb 6, 2024

lpiwowar commented Feb 6, 2024 •

edited

Loading

softwarefactory-project-zuul bot commented Feb 6, 2024

lpiwowar commented Feb 6, 2024

SeanMooney commented Feb 8, 2024

SeanMooney commented Feb 8, 2024

SeanMooney commented Feb 8, 2024

SeanMooney commented Feb 8, 2024

SeanMooney commented Feb 8, 2024

softwarefactory-project-zuul bot commented Feb 8, 2024

SeanMooney commented Feb 8, 2024

softwarefactory-project-zuul bot commented Feb 8, 2024

gibizer commented Feb 8, 2024

openshift-ci bot commented Feb 8, 2024

SeanMooney commented Feb 9, 2024

[zuul] Update the tempest job to use test_operator #665

[zuul] Update the tempest job to use test_operator #665

Conversation

elfiesmelfie commented Feb 2, 2024 • edited by SeanMooney Loading

softwarefactory-project-zuul bot commented Feb 2, 2024

gibizer commented Feb 2, 2024

SeanMooney commented Feb 2, 2024

elfiesmelfie commented Feb 2, 2024

elfiesmelfie commented Feb 2, 2024

softwarefactory-project-zuul bot commented Feb 2, 2024

SeanMooney commented Feb 3, 2024

SeanMooney commented Feb 3, 2024

SeanMooney commented Feb 3, 2024

softwarefactory-project-zuul bot commented Feb 3, 2024

gibizer commented Feb 5, 2024

SeanMooney commented Feb 6, 2024

lpiwowar commented Feb 6, 2024 • edited Loading

softwarefactory-project-zuul bot commented Feb 6, 2024

lpiwowar commented Feb 6, 2024

SeanMooney commented Feb 8, 2024

SeanMooney commented Feb 8, 2024

SeanMooney commented Feb 8, 2024

SeanMooney commented Feb 8, 2024

SeanMooney commented Feb 8, 2024

softwarefactory-project-zuul bot commented Feb 8, 2024

SeanMooney commented Feb 8, 2024

softwarefactory-project-zuul bot commented Feb 8, 2024

gibizer commented Feb 8, 2024

openshift-ci bot commented Feb 8, 2024

SeanMooney commented Feb 9, 2024

elfiesmelfie commented Feb 2, 2024 •

edited by SeanMooney

Loading

lpiwowar commented Feb 6, 2024 •

edited

Loading