Skip to content

Commit

Permalink
Backport e2e timeout bump (#2391)
Browse files Browse the repository at this point in the history
* Increase timeout for all Pods to be eventually ready (#2348)

Some of our E2E tests fail because the CheckExpectedPodsEventuallyReady
test reaches its 5min timeout.
On good conditions, it takes more than 3 minutes for some rolling
upgrades to be completely applied (for eg.
TestMutationNodeSetReplacementWithChangeBudget).

Depending on external factors (slow Pod scheduling, slow
PersistentVolume binding, etc.), we can easily reach the fixed 5min
timeout.

I propose we increase the timeout to 15min for this particular check.
This is an arbitrary value (unfortunately), but I think we're OK with
the eventual consistency nature of k8s Pods scheduling.

We could make the test smarter (continue waiting if we see there's some
small progress), but we'd still have to pick up some arbitrary timeout
values anyway, so let's keep things simple.

* Use a 15min RollingUpgradeTimeout for keystore checks in E2E tests (#2388)

* Use the RollingUpgradeTimeout for keystore checks in E2E tests

Since we added a 30sec preStop wait, rolling upgrades take longer than
before. We recently updated the rolling upgrade timeout to 15 minutes,
but did not do it for the keystore rolling upgrade test which is written
differently.

* fix comment
  • Loading branch information
sebgl authored Jan 9, 2020
1 parent 7da3f19 commit ee64ca2
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 9 deletions.
14 changes: 12 additions & 2 deletions test/e2e/test/elasticsearch/checks_k8s.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import (
"fmt"
"reflect"
"sort"
"time"

esv1 "github.com/elastic/cloud-on-k8s/pkg/apis/elasticsearch/v1"
"github.com/elastic/cloud-on-k8s/pkg/controller/common/certificates"
Expand All @@ -22,6 +23,15 @@ import (
"k8s.io/apimachinery/pkg/types"
)

const (
// RollingUpgradeTimeout is used for checking a rolling upgrade is complete.
// Most tests require less than 5 minutes for all Pods to be running and ready,
// but it occasionally takes longer for various reasons (long Pod creation time, long volume binding, etc.).
// We use a longer timeout here to not be impacted too much by those external factors, and only fail
// if things seem to be stuck.
RollingUpgradeTimeout = 15 * time.Minute
)

func (b Builder) CheckK8sTestSteps(k *test.K8sClient) test.StepList {
return test.StepList{
CheckCertificateAuthority(b, k),
Expand Down Expand Up @@ -229,9 +239,9 @@ func CheckESPassword(b Builder, k *test.K8sClient) test.Step {
func CheckExpectedPodsEventuallyReady(b Builder, k *test.K8sClient) test.Step {
return test.Step{
Name: "All expected Pods should eventually be ready",
Test: test.Eventually(func() error {
Test: test.UntilSuccess(func() error {
return checkExpectedPodsReady(b, k)
}),
}, RollingUpgradeTimeout),
}
}

Expand Down
4 changes: 2 additions & 2 deletions test/e2e/test/elasticsearch/checks_keystore.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ import (
func CheckESKeystoreEntries(k *test.K8sClient, b Builder, expectedKeys []string) test.Step {
return test.Step{
Name: "Elasticsearch secure settings should eventually be set in all nodes keystore",
Test: test.Eventually(func() error {
Test: test.UntilSuccess(func() error {
pods, err := k.GetPods(test.ESPodListOptions(b.Elasticsearch.Namespace, b.Elasticsearch.Name)...)
if err != nil {
return err
Expand Down Expand Up @@ -59,6 +59,6 @@ func CheckESKeystoreEntries(k *test.K8sClient, b Builder, expectedKeys []string)
}

return nil
}),
}, RollingUpgradeTimeout),
}
}
13 changes: 8 additions & 5 deletions test/e2e/test/utils.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,16 +66,19 @@ func ExitOnErr(err error) {
}
}

// Eventually runs the given function until success,
// with a default timeout
// Eventually runs the given function until success with a default timeout.
func Eventually(f func() error) func(*testing.T) {
return UntilSuccess(f, ctx.TestTimeout)
}

// UntilSuccess executes f until it succeeds, or the timeout is reached.
func UntilSuccess(f func() error, timeout time.Duration) func(*testing.T) {
return func(t *testing.T) {
defaultTimeout := Ctx().TestTimeout
fmt.Printf("Retries (%s timeout): ", defaultTimeout)
fmt.Printf("Retries (%s timeout): ", timeout)
err := retry.UntilSuccess(func() error {
fmt.Print(".") // super modern progress bar 2.0!
return f()
}, defaultTimeout, DefaultRetryDelay)
}, timeout, DefaultRetryDelay)
fmt.Println()
require.NoError(t, err)
}
Expand Down

0 comments on commit ee64ca2

Please sign in to comment.