-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusing use of TooManyRequests error for eviction #106286
Comments
/sig api-machinery |
/remove-sig api-machinery |
@liggitt , are you sure? As far as I've been able to work out, the 429 HTTP response is correct and the PDB error is a red herring. The PDB error we observed doesn't even make sense, given the number of healthy pods (12) was more than the needed (11):
(My experience is that I spent a bunch of time digging into possible causes of the PDB error before realizing that I'd been ignoring the HTTP response code - and once I started paying attention to that, everything started to make sense. Of course, you understand this area far better, I just don't want you to fall into the same trap that I did!) |
any message coming back with PDB language in it belongs to sig-apps |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
1 similar comment
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
/triage accepted It looks like eviction has some creative uses of kubernetes/pkg/registry/core/pod/storage/eviction.go Lines 294 to 305 in 786316f
Also this observed generation mismatch (stale cache?): kubernetes/pkg/registry/core/pod/storage/eviction.go Lines 419 to 422 in 786316f
Violating the disruption budget is more debatable, but I still don't think a 429 error is right: kubernetes/pkg/registry/core/pod/storage/eviction.go Lines 429 to 433 in 786316f
|
that was added in https://github.com/kubernetes/kubernetes/pull/94381/files#r507982781 with some discussion there I think the short answer is we wanted to drive an automatic retry (which the server returning TooManyRequests with a RetryAfter does), and returning a conflict when the user did not set a precondition on the request did not seem correct |
This issue has not been updated in over 1 year, and should be re-triaged. You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
If we put aside the 429 for a moment and focus on the error's cause in question:
source: kubernetes/pkg/registry/core/pod/storage/eviction.go Lines 429 to 433 in 786316f
Isn't it weird that this state ( needs 11 healthy pods and has 12 currently ) is an error?The code returns this error cause when pdb.Status.DisruptionsAllowed == 0 . Is this value wrong in this case, should it have been 1? I couldn't find how this is calculated but I don't know the code so well.
|
kubernetes/pkg/controller/disruption/disruption.go Lines 944 to 962 in bb838fd
If the |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/lifecycle stale |
/remove-lifecycle stale |
What happened?
When the Kubernetes API server returns a 429 (throttled), the response may include misleading cause information.
We observed that a pod eviction API call received a response that included both of the following fragments:
and
Note that the
message
given above is confusing, as there are more currently healthy pods than the required number.What did you expect to happen?
When a request is rejected due to throttling (429 HTTP response), either
How can we reproduce it (as minimally and precisely as possible)?
Provoke a 429 HTTP response and inspect the response.
Anything else we need to know?
Issue #88535 seems to be similar - it also describes a 429 response where inclusion of a cause resulted in confusion.
Kubernetes version
Version: 1.20.9
Cloud provider
Azure Kubernetes Service
OS version
Install tools
Container runtime (CRI) and and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: