-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
services/horizon/internal: Add limit on number of horizon requests in flight #5244
services/horizon/internal: Add limit on number of horizon requests in flight #5244
Conversation
services/horizon/internal/flags.go
Outdated
@@ -68,7 +68,8 @@ const ( | |||
// StellarTestnet is a constant representing the Stellar test network | |||
StellarTestnet = "testnet" | |||
|
|||
defaultMaxHTTPRequestSize = uint(200 * 1024) | |||
defaultMaxConcurrentRequests = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what the number was at when we OOMed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was a little over 3000 when the instance OOMed as can be observed in this graph
ConfigKey: &config.MaxConcurrentRequests, | ||
OptType: types.Uint, | ||
FlagDefault: defaultMaxConcurrentRequests, | ||
Usage: "sets the limit on the maximum number of concurrent http requests, default is 1000, to disable the limit set to 0. " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for non-breaking backwards compatibility, should the default be 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default limit is really high that it should never be observed during normal usage. so, although it is technically a breaking change, I think it would be better to have this default so we can protect against OOMs by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we get ~2k/rps on the whole cluster, so yeah 1k/machine should be way more than plenty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, left one suggestion for consideration on default value
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
Add limit on number of horizon requests in flight.
Why
We observed a case where a staging horizon node received too many requests which caused it to OOM. See https://stellarfoundation.slack.com/archives/C02B04RMK/p1710326856362359?thread_ts=1710235095.516249&cid=C02B04RMK for more details.
Known limitations
[N/A]