-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug 1986003: Retry kubeconfig checks, when kube-apiserver is temporarily unavailable #26377
Conversation
fmt.Sprintf(`oc --kubeconfig "%s" get namespace kube-system`, kubeconfigPath)).Output() | ||
framework.Logf(out) | ||
// retry error when kube-apiserver was temporarily unavailable | ||
retry := strings.Contains(out, "The connection to the server localhost:6443 was refused") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you be open to adding a precondition before running the test instead?
not sure this is the only error we might get, for example on an IPv6 cluster we might get [::1]:6443: connect: connection refused
we could check if the cluster is in stable condition (not progressing, the pods on the same revision) for X min.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking progressing might slow this down, but we can add that as last resort, re-tries seems simpler, b/c theoretically we can even pass the test when rollout is in progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check for ipv6, though
wasn't me :) |
framework.Logf("Verifying kubeconfig %q on master %s", master.Name) | ||
out, err := oc.AsAdmin().Run("debug").Args("node/"+master.Name, "--", "chroot", "/host", "/bin/bash", "-euxo", "pipefail", "-c", fmt.Sprintf(`oc --kubeconfig "%s" get namespace kube-system`, kubeconfigPath)).Output() | ||
retry, err := testNode(oc, kubeconfig, master.Name) | ||
if retry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is one retry enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, after consideration I did 2 😉
@aojea updated, ptal |
a3d8ff8
to
db592fb
Compare
or @p0lyn0mial |
retry := strings.Contains(out, "The connection to the server localhost:6443 was refused") || | ||
strings.Contains(out, "[::1]:6443: connect: connection refused") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the messages so different between protocols? is just curiosity , using curl they are more consisten
$ curl localhost:6443
curl: (7) Failed to connect to localhost port 6443: Connection refused
$ curl [::1]:6443
curl: (7) Failed to connect to ::1 port 6443: Connection refused
$ curl 127.0.0.1:6443
curl: (7) Failed to connect to 127.0.0.1 port 6443: Connection refused
lgtm, just a question to double check the "retry error messages" |
@soltysh: This pull request references Bugzilla bug 1986003, which is valid. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
db592fb
to
dc68551
Compare
I just checked that, you're right the message should be identical, only the address will change, it's coming from here: https://github.com/kubernetes/kubernetes/blob/cbb5ea8210596ada1efce7e7a271ca4217ae598e/staging/src/k8s.io/kubectl/pkg/cmd/util/helpers.go#L237-L243, so I've updated accordingly the PR. With: _, err := net.Dial("tcp", "localhost:6443")
fmt.Println(err)
_, err = net.Dial("tcp6", "[::1]:6443")
fmt.Println(err) I got:
|
fmt.Sprintf(`oc --kubeconfig "%s" get namespace kube-system`, kubeconfigPath)).Output() | ||
framework.Logf(out) | ||
// retry error when kube-apiserver was temporarily unavailable, this matches oc error coming from: | ||
// https://github.com/kubernetes/kubernetes/blob/cbb5ea8210596ada1efce7e7a271ca4217ae598e/staging/src/k8s.io/kubectl/pkg/cmd/util/helpers.go#L237-L243 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the message you've linked ends with a quesiton tag, but the regex work fine
https://play.golang.org/p/aBGVR3_3YsQ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't want to be that precise 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those symbols on regexps 😬
🤣
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test e2e-gcp |
/test e2e-gcp |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/override ci/prow/e2e-agnostic-cmd |
@soltysh: Overrode contexts on behalf of soltysh: ci/prow/e2e-agnostic-cmd, ci/prow/e2e-aws-single-node In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test e2e-gcp |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/test e2e-gcp |
1 similar comment
/test e2e-gcp |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
4 similar comments
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
/retest-required Please review the full test history for this PR and help us cut down flakes. |
@soltysh: Some pull requests linked via external trackers have merged:
The following pull requests linked via external trackers have not merged:
These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with Bugzilla bug 1986003 has not been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @p0lyn0mial
since you complained about this particular to me some time ago, maybe not necessarily about this particular problem but it's a good starting point 😉 which I noticed fails pretty frequently