Test operator behavior after failover #2012

david-kow · 2019-10-17T12:26:34Z

As a stateless service we expect the operator to be resilient to downtime, but we don't currently test for it. We should at least have a sanity test around correctness of what operator is doing when being randomly restarted. Somewhat related to general chaos testing.

anyasabo · 2020-05-07T14:08:29Z

Related: #709

david-kow · 2020-05-07T15:05:59Z

We should create a separate test pipeline where we would randomly delete the operator pod while the entire E2E test suite is running. This way we don't introduce noise into existing pipelines and we have a good coverage of different test scenarios. We should log and preserve timestamps when the operator pod is being removed for potential issues investigation.

barkbay · 2020-09-24T07:07:45Z

Fixed by #3706

david-kow added loe:medium >test Related to unit/integration/e2e tests labels Oct 17, 2019

pebrc removed the loe:medium label Apr 27, 2020

barkbay mentioned this issue Aug 14, 2020

Add operator election support #3632

Merged

barkbay self-assigned this Aug 26, 2020

barkbay mentioned this issue Sep 24, 2020

Test operator behavior after failover #3706

Merged

2 tasks

barkbay closed this as completed Sep 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test operator behavior after failover #2012

Test operator behavior after failover #2012

david-kow commented Oct 17, 2019

anyasabo commented May 7, 2020

david-kow commented May 7, 2020

barkbay commented Sep 24, 2020

Test operator behavior after failover #2012

Test operator behavior after failover #2012

Comments

david-kow commented Oct 17, 2019

anyasabo commented May 7, 2020

david-kow commented May 7, 2020

barkbay commented Sep 24, 2020