-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributor: Add limits for Otel write request byte size #8574
Conversation
40c6a2e
to
2f4d3d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comments. I did a quick review since I saw some fundamental problems. Please ping me for another review when my comments have been addressed.
d8183b2
to
6951714
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating these docs!
8534200
to
51e14cd
Compare
51e14cd
to
2c9e58f
Compare
|
||
How it **works**: | ||
|
||
- The distributor implements an upper limit on the message size of incoming OTel write requests after decompression regardless of the compression type. Refer to [OTLP collector compression details](https://github.com/open-telemetry/opentelemetry-collector/blob/main/config/configgrpc/README.md#client-configuration) for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referenced documentation seems irrelevant here, because it points to gRPC client config, but Mimir doesn't accept OTLP/gRPC requests.
I don't think we need to mention "compression" at all in this runbook. It doesn't do any harm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, gracais
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated link to confighttp
pkg/distributor/distributor.go
Outdated
@@ -237,6 +240,7 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet, logger log.Logger) { | |||
cfg.RetryConfig.RegisterFlags(f) | |||
|
|||
f.IntVar(&cfg.MaxRecvMsgSize, "distributor.max-recv-msg-size", 100<<20, "Max message size in bytes that the distributors will accept for incoming push requests to the remote write API. If exceeded, the request will be rejected.") | |||
f.IntVar(&cfg.MaxOTLPRequestSize, maxOTLPRequestSizeFlag, 80<<20, "Maximum OTLP request size in bytes that the distributors accept. Requests exceeding this limit are rejected.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How was the 80 MiB selected? Do we have some data about size of the request and processing time?
Intuitively 80 MiB still feels very large. With previous 100 MiB we sometimes see translation to Mimir request taking up to 10 seconds, so perhaps better value would be somewhere close to 10 MiB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned in the description of the PR, it is tentative. Since 413 error is not retryable, we need to be cautious. if 100MiB/10s, how about 2s of response time ~ 20 MiB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we need to be careful when rolling this out. It may be safer to use original value of 100 MiB for now, so that this doesn't break existing setups, and then we can experiment with correct value in our setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me revert my reverted commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing my feedback. I left few more minor comments.
ff229ed
to
59c7075
Compare
59c7075
to
1599188
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Reviewing again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a couple of suggested clarifications wrt. compression.
Thank you! @pstibrany @aknuds1 @tacole02. Let's merge and now set it lower along the time and reject some requests🤞 |
b83a721
to
b101031
Compare
Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
b101031
to
7e92bcc
Compare
What this PR does
Which issue(s) this PR fixes or relates to
The current distributor message limit is 100MB See doc, Which demonstrates too much for compressed Otel write request. This could cause problem for otel, end up in big decompressed message and timeout. Knowing that otlp collector could have different compression algorithm and also can also be configured to be no compression. See doc, introduce new uncompressed request byte size in order to reject abusive write request as early as possible.
Why not using existing
distributor.max-recv-msg-size
?About default value
Adding runbook dedicated to the error
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.