-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solutions][Detection Engine] Increases pre-packaged socket timeout and chunks the requests #94531
[Security Solutions][Detection Engine] Increases pre-packaged socket timeout and chunks the requests #94531
Conversation
💚 Build SucceededMetrics [docs]
To update your PR or re-run it, just comment with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!!
…REST backend (elastic#94531) ## Summary Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout. Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests. See this issue talked about below: sindresorhus/ky#233 https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0 https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20 **Manual testing** You can bump up the rule version numbers manually through a search and replace and then install them. You can add a `console.trace()` to the backend and slow down the requests to ensure they are not happening more than once. ``` Trace: at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11) at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9) at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:93:5) at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27 at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30) at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11) at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28) at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20) at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20) at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32) at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9) ``` ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you 👍
/** | ||
* How many rules to update at a time is set to 50 from errors coming from | ||
* the slow environments such as cloud when the rule updates are > 100 we were | ||
* seeing timeout issues. | ||
* | ||
* Since there is not timeout options at the alerting API level right now, we are | ||
* at the mercy of the Elasticsearch server client/server default timeouts and what | ||
* we are doing could be considered a workaround to not being able to increase the timeouts. | ||
* | ||
* However, other bad effects and saturation of connections beyond 50 makes this a "noisy neighbor" | ||
* if we don't limit its number of connections as we increase the number of rules that can be | ||
* installed at a time. | ||
* | ||
* Lastly, we saw weird issues where Chrome on upstream 408 timeouts will re-call the REST route | ||
* which in turn could create additional connections we want to avoid. | ||
* | ||
* See file import_rules_route.ts for another area where 50 was chosen, therefore I chose | ||
* 50 here to mimic it as well. If you see this re-opened or what similar to it, consider | ||
* reducing the 50 above to a lower number. | ||
* | ||
* See the original ticket here: | ||
* https://github.com/elastic/kibana/issues/94418 | ||
*/ | ||
export const UPDATE_CHUNK_SIZE = 50; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed explanation, including the "whys": why this was needed and why exactly 50.
const ruleChunks = chunk(UPDATE_CHUNK_SIZE, rules); | ||
for (const ruleChunk of ruleChunks) { | ||
const rulePromises = createPromises(alertsClient, savedObjectsClient, ruleChunk, outputIndex); | ||
await Promise.all(rulePromises); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's time to push bulk CRUD APIs again it seems :) #53144
…REST backend (elastic#94531) Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout. Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests. See this issue talked about below: sindresorhus/ky#233 https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0 https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20 **Manual testing** You can bump up the rule version numbers manually through a search and replace and then install them. You can add a `console.trace()` to the backend and slow down the requests to ensure they are not happening more than once. ``` Trace: at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11) at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9) at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:93:5) at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27 at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30) at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11) at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28) at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20) at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20) at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32) at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9) ``` Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
…of the REST backend (#94531) (#94607) * Updated to allow chunked queries and to increase the timeouts of the REST backend (#94531) Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout. Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests. See this issue talked about below: sindresorhus/ky#233 https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0 https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20 **Manual testing** You can bump up the rule version numbers manually through a search and replace and then install them. You can add a `console.trace()` to the backend and slow down the requests to ensure they are not happening more than once. ``` Trace: at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11) at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9) at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:93:5) at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27 at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30) at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11) at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28) at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20) at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20) at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32) at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9) ``` Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios * Wrong import as alerting is now called alerts. bad merge
…REST backend (#94531) (#94587) ## Summary Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout. Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests. See this issue talked about below: sindresorhus/ky#233 https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0 https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20 **Manual testing** You can bump up the rule version numbers manually through a search and replace and then install them. You can add a `console.trace()` to the backend and slow down the requests to ensure they are not happening more than once. ``` Trace: at updatePrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/rules/update_prepacked_rules.ts:34:11) at createPrepackagedRules (/Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:140:9) at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:93:5) at /Users/frankhassanabad/projects/kibana/x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/add_prepackaged_rules_route.ts:66:27 at Router.handle (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:272:30) at handler (/Users/frankhassanabad/projects/kibana/src/core/server/http/router/router.ts:227:11) at exports.Manager.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/toolkit.js:60:28) at Object.internals.handler (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:46:20) at exports.execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/handler.js:31:20) at Request._lifecycle (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:371:32) at Request._execute (/Users/frankhassanabad/projects/kibana/node_modules/@hapi/hapi/lib/request.js:279:9) ``` ### Checklist Delete any items that are not applicable to this PR. - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios Co-authored-by: Frank Hassanabad <frank.hassanabad@elastic.co>
Pinging @elastic/security-solution (Team: SecuritySolution) |
Summary
Increases the pre-packaged socket timeout and chunks the requests. Existing e2e tests should cover the changes. Interesting enough, when the server sends back a 408, Chrome will re-send the same request again which can cause socket/network saturations. By increasing the timeout, Chrome will not resend the same request again on timeout.
Right now, there is not a way to increase the timeouts for the alerting framework/saved objects as far as I know for connections. That would be an additional safety measure in additional to doing chunked requests. Chunked requests will ensure that the pre-packaged rule does not exhaust ephemeral ports and limit the concurrent requests.
See this issue talked about below:
sindresorhus/ky#233
https://groups.google.com/a/chromium.org/g/chromium-dev/c/urswDsm6Pe0
https://medium.com/@lighthopper/connection-retry-schedule-in-chrome-browser-a9c814b7dc20
Manual testing
You can bump up the rule version numbers manually through a search and replace and then install them. You can add a
console.trace()
to the backend and slow down the requests to ensure they are not happening more than once.Checklist
Delete any items that are not applicable to this PR.