[Security Solutions][Detection Engine] Increases pre-packaged socket timeout and chunks the requests #94531

FrankHassanabad · 2021-03-12T22:54:19Z

Summary

Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout.

Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests.

See this issue talked about below:
sindresorhus/ky#233
https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0
https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20

Manual testing
You can bump up the rule version numbers manually through a search and replace and then install them. You can add a console.trace() to the backend and slow down the requests to ensure they are not happening more than once.

Trace: 
    at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11)
    at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:93:5)
    at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27
    at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30)
    at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11)
    at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28)
    at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20)
    at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20)
    at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32)
    at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9)

Checklist

Delete any items that are not applicable to this PR.

Unit or functional tests were updated or added to match the most common scenarios

…REST backend

kibanamachine · 2021-03-13T00:54:30Z

💚 Build Succeeded

Metrics [docs]

✅ unchanged

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @FrankHassanabad

madirey

LGTM! Thanks!!

…REST backend (elastic#94531) ## Summary Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout. Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests. See this issue talked about below: sindresorhus/ky#233 https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0 https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20 **Manual testing** You can bump up the rule version numbers manually through a search and replace and then install them. You can add a `console.trace()` to the backend and slow down the requests to ensure they are not happening more than once. ``` Trace: at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11) at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9) at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:93:5) at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27 at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30) at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11) at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28) at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20) at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20) at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32) at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9) ``` ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

kibanamachine · 2021-03-15T15:30:30Z

💔 Backport failed

❌ 7.12: Commit could not be cherrypicked due to conflicts
✅ 7.x / #94587

Successful backport PRs will be merged automatically after passing CI.

To backport manually, check out the target branch and run:
node scripts/backport --pr 94531

banderror

Thank you 👍

banderror · 2021-03-15T16:01:14Z

x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts

+/**
+ * How many rules to update at a time is set to 50 from errors coming from
+ * the slow environments such as cloud when the rule updates are > 100 we were
+ * seeing timeout issues.
+ *
+ * Since there is not timeout options at the alerting API level right now, we are
+ * at the mercy of the Elasticsearch server client/server default timeouts and what
+ * we are doing could be considered a workaround to not being able to increase the timeouts.
+ *
+ * However, other bad effects and saturation of connections beyond 50 makes this a "noisy neighbor"
+ * if we don't limit its number of connections as we increase the number of rules that can be
+ * installed at a time.
+ *
+ * Lastly, we saw weird issues where Chrome on upstream 408 timeouts will re-call the REST route
+ * which in turn could create additional connections we want to avoid.
+ *
+ * See file import_rules_route.ts for another area where 50 was chosen, therefore I chose
+ * 50 here to mimic it as well. If you see this re-opened or what similar to it, consider
+ * reducing the 50 above to a lower number.
+ *
+ * See the original ticket here:
+ * https://github.com/elastic/kibana/issues/94418
+ */
+export const UPDATE_CHUNK_SIZE = 50;


Thank you for the detailed explanation, including the "whys": why this was needed and why exactly 50.

banderror · 2021-03-15T16:04:09Z

x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts

+  const ruleChunks = chunk(UPDATE_CHUNK_SIZE, rules);
+  for (const ruleChunk of ruleChunks) {
+    const rulePromises = createPromises(alertsClient, savedObjectsClient, ruleChunk, outputIndex);
+    await Promise.all(rulePromises);
+  }


It's time to push bulk CRUD APIs again it seems :) #53144

…REST backend (elastic#94531) Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout. Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests. See this issue talked about below: sindresorhus/ky#233 https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0 https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20 **Manual testing** You can bump up the rule version numbers manually through a search and replace and then install them. You can add a `console.trace()` to the backend and slow down the requests to ensure they are not happening more than once. ``` Trace: at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11) at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9) at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:93:5) at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27 at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30) at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11) at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28) at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20) at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20) at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32) at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9) ``` Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

…of the REST backend (#94531) (#94607) * Updated to allow chunked queries and to increase the timeouts of the REST backend (#94531) Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout. Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests. See this issue talked about below: sindresorhus/ky#233 https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0 https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20 **Manual testing** You can bump up the rule version numbers manually through a search and replace and then install them. You can add a `console.trace()` to the backend and slow down the requests to ensure they are not happening more than once. ``` Trace: at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11) at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9) at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:93:5) at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27 at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30) at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11) at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28) at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20) at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20) at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32) at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9) ``` Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios * Wrong import as alerting is now called alerts. bad merge

…REST backend (#94531) (#94587) ## Summary Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout. Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests. See this issue talked about below: sindresorhus/ky#233 https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0 https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20 **Manual testing** You can bump up the rule version numbers manually through a search and replace and then install them. You can add a `console.trace()` to the backend and slow down the requests to ensure they are not happening more than once. ``` Trace: at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11) at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9) at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:93:5) at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27 at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30) at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11) at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28) at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20) at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20) at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32) at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9) ``` ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios Co-authored-by: Frank Hassanabad <frank.hassanabad@elastic.co>

elasticmachine · 2021-03-16T18:42:51Z

Pinging @elastic/security-solution (Team: SecuritySolution)

FrankHassanabad added 2 commits March 12, 2021 15:52

Updated to allow chunked queries and to increase the timeouts of the …

7f08f4f

…REST backend

Merge branch 'master' into bug-fix-timeouts

35d3253

FrankHassanabad self-assigned this Mar 12, 2021

FrankHassanabad changed the title ~~Updated to allow chunked queries and to increase the timeouts of the …~~ [Security Solutions][Detection Engine] Increases pre-packaged socket timeout and chunks the requests Mar 15, 2021

FrankHassanabad linked an issue Mar 15, 2021 that may be closed by this pull request

[Security Solution] Error is displaying when updating the Elastic rules from 7.11.0 to 7.12.0. #94418

Closed

FrankHassanabad added Feature:Detection Rules Security Solution rules and Detection Engine release_note:fix v8.0.0 v7.13.0 v7.12.0 labels Mar 15, 2021

FrankHassanabad marked this pull request as ready for review March 15, 2021 14:16

FrankHassanabad requested a review from a team as a code owner March 15, 2021 14:16

FrankHassanabad added the auto-backport Deprecated - use backport:version if exact versions are needed label Mar 15, 2021

FrankHassanabad enabled auto-merge (squash) March 15, 2021 14:16

madirey approved these changes Mar 15, 2021

View reviewed changes

FrankHassanabad merged commit bb26564 into elastic:master Mar 15, 2021

kibanamachine mentioned this pull request Mar 15, 2021

[7.x] Updated to allow chunked queries and to increase the timeouts of the REST backend (#94531) #94587

Merged

banderror reviewed Mar 15, 2021

View reviewed changes

FrankHassanabad mentioned this pull request Mar 15, 2021

[7.12] Updated to allow chunked queries and to increase the timeouts of the REST backend (#94531) #94607

Merged

timroes added the Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. label Mar 16, 2021

banderror mentioned this pull request Mar 22, 2021

[Discuss] [Security Solution] [Alerting] HTTP route RFC for unified rule management #95060

Open

FrankHassanabad deleted the bug-fix-timeouts branch December 2, 2021 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Solutions][Detection Engine] Increases pre-packaged socket timeout and chunks the requests #94531

[Security Solutions][Detection Engine] Increases pre-packaged socket timeout and chunks the requests #94531

FrankHassanabad commented Mar 12, 2021 •

edited

Loading

kibanamachine commented Mar 13, 2021

madirey left a comment

kibanamachine commented Mar 15, 2021

banderror left a comment

banderror Mar 15, 2021

banderror Mar 15, 2021

elasticmachine commented Mar 16, 2021

[Security Solutions][Detection Engine] Increases pre-packaged socket timeout and chunks the requests #94531

[Security Solutions][Detection Engine] Increases pre-packaged socket timeout and chunks the requests #94531

Conversation

FrankHassanabad commented Mar 12, 2021 • edited Loading

Summary

Checklist

kibanamachine commented Mar 13, 2021

💚 Build Succeeded

Metrics [docs]

madirey left a comment

Choose a reason for hiding this comment

kibanamachine commented Mar 15, 2021

💔 Backport failed

banderror left a comment

Choose a reason for hiding this comment

banderror Mar 15, 2021

Choose a reason for hiding this comment

banderror Mar 15, 2021

Choose a reason for hiding this comment

elasticmachine commented Mar 16, 2021

FrankHassanabad commented Mar 12, 2021 •

edited

Loading