Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon/internal: Add limit on number of horizon requests in flight #5244

Merged
merged 2 commits into from
Mar 14, 2024

Conversation

tamirms
Copy link
Contributor

@tamirms tamirms commented Mar 13, 2024

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Add limit on number of horizon requests in flight.

Why

We observed a case where a staging horizon node received too many requests which caused it to OOM. See https://stellarfoundation.slack.com/archives/C02B04RMK/p1710326856362359?thread_ts=1710235095.516249&cid=C02B04RMK for more details.

Known limitations

[N/A]

@tamirms tamirms requested a review from a team March 13, 2024 19:50
@@ -68,7 +68,8 @@ const (
// StellarTestnet is a constant representing the Stellar test network
StellarTestnet = "testnet"

defaultMaxHTTPRequestSize = uint(200 * 1024)
defaultMaxConcurrentRequests = 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what the number was at when we OOMed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was a little over 3000 when the instance OOMed as can be observed in this graph

ConfigKey: &config.MaxConcurrentRequests,
OptType: types.Uint,
FlagDefault: defaultMaxConcurrentRequests,
Usage: "sets the limit on the maximum number of concurrent http requests, default is 1000, to disable the limit set to 0. " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for non-breaking backwards compatibility, should the default be 0 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default limit is really high that it should never be observed during normal usage. so, although it is technically a breaking change, I think it would be better to have this default so we can protect against OOMs by default

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we get ~2k/rps on the whole cluster, so yeah 1k/machine should be way more than plenty

Copy link
Contributor

@sreuland sreuland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, left one suggestion for consideration on default value

@tamirms tamirms merged commit 98afa8d into stellar:release-horizon-v2.29.0 Mar 14, 2024
29 checks passed
@tamirms tamirms deleted the throttle-limit branch March 14, 2024 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants