Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributor: Return 529 for ingestion rate limit when serviceOverloadErrorEnabled #6549

Merged
merged 7 commits into from
Nov 14, 2023

Conversation

ying-jeanne
Copy link
Contributor

@ying-jeanne ying-jeanne commented Nov 2, 2023

This has been overlooked. The flag was introduced in this PR. We also need to implement a similar approach for the ingestion rate limit to prevent data loss when rate limiting occurs.

Comment is equally added into doc, since with load test, I have observed that returning 529 instead of 429 could result in enhancing thundering herd. See test result

Metrics data accumulates for 5 minutes with 1000 agents, after which the network is enabled to transmit data to Mimir. This results in a 429 error code, impacting the request rate on the gateway:
Screenshot 2023-11-10 at 00 28 53

with 529 error code, request rate on gateway:
Screenshot 2023-11-10 at 00 29 05

What this PR does

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated
  • na Documentation added
  • na CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@ying-jeanne ying-jeanne changed the title Distributor: return also 529 for ingestion rate limit when serviceOve… Distributor: return also 529 for ingestion rate limit when serviceOverloadErrorEnabled Nov 2, 2023
@ying-jeanne ying-jeanne changed the title Distributor: return also 529 for ingestion rate limit when serviceOverloadErrorEnabled Distributor: return 529 for ingestion rate limit when serviceOverloadErrorEnabled Nov 2, 2023
@ying-jeanne ying-jeanne changed the title Distributor: return 529 for ingestion rate limit when serviceOverloadErrorEnabled Distributor: Return 529 for ingestion rate limit when serviceOverloadErrorEnabled Nov 2, 2023
@ying-jeanne ying-jeanne marked this pull request as ready for review November 2, 2023 21:09
@ying-jeanne ying-jeanne requested a review from a team as a code owner November 2, 2023 21:09
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change deserves a changelog entry.

Copy link
Contributor

@aknuds1 aknuds1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree w/ Peter there should be a changelog entry. Should this be backported to v2.10 since it fixes #5752, which is in that release?

@aknuds1 aknuds1 added bug Something isn't working component/distributor labels Nov 3, 2023
@pstibrany
Copy link
Member

Should this be backported to v2.10 since it fixes #5752, which is in that release?

This is experimental feature, even we don't use it anywhere internally yet.

@ying-jeanne
Copy link
Contributor Author

@aknuds1 @pstibrany It is important to give it whatever its deserved. PR updated, could you give it another look. thanks

pkg/util/validation/limits.go Outdated Show resolved Hide resolved
@ying-jeanne ying-jeanne enabled auto-merge (squash) November 14, 2023 14:14
@ying-jeanne ying-jeanne merged commit bb5ef32 into main Nov 14, 2023
28 checks passed
@ying-jeanne ying-jeanne deleted the retry-on-rate-limit branch November 14, 2023 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/distributor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants