Fix flaky e2e test TestFailoverPlayground/*stop_podinfo_on_eu_cluster
#1684
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The FailoverPlaygroundTest where
podinfo
is stopped on one of the clusters has been flaky. Example run 1 and run2.The test deploys a GSLB with failover strategy on two clusters. Then, the app is stopped on the EU cluster and the test sometimes fails because the following line returns an error:
err = instanceEU.WaitForAppIsStopped()
. This function calls a chain of functions that ends up on the WaitForApp function expecting the app to have 0 replicas and 0DNSEndpoint
targets. In some situations this is the case. However, since the app is running on the US cluster, in other situations the failover happens very quickly (good feature of K8GB) and the US targets are already part of the DNSEndpoints targets: example 1, example 2.The
DNSEndpoint
taken from the links above looks as follows, where the targets are the IP addresses of the US cluster:This is an expected output, and should be accepted. To better understand the proposed solution here is how the
DNSEndpoint
resource looked like before the failover:Above we can see that in addition to the main
playground-failover.cloud.example.com
domain there is also alocaltargets-playground-failover.cloud.example.com
. Thislocaltargets-*
domain disappears once the app is stopped, which indicates that the controller learnt that the app was scaled to 0 replicas.The proposed fix is therefore to check for the targets of the
localtargets-*
domain. This shows that the K8GB controller behaved as expected and does not depend on the synchronization of records between clusters.