-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) #174411
[Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) #174411
Conversation
/ci |
/ci |
…lItemIds()` service method
# Conflicts: # x-pack/plugins/fleet/server/services/artifacts/artifacts.test.ts # x-pack/plugins/fleet/server/services/artifacts/mocks.ts
Pinging @elastic/fleet (Team:Fleet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting all these changes together.
The improvements and all the extra logging here are really helpful.
I've played with it locally and didn't see blocking errors on the task.
I was able to reach the +10k fleet artifacts in Manifest Manager.
Did not check all the artifacts output but our tests should do the work, specially these two:
x-pack/test/security_solution_endpoint/apps/integrations/artifact_entries_list.ts
x-pack/test/security_solution_endpoint/apps/integrations/endpoint_exceptions.ts
Left few questions, let me know what do you think.
} | ||
return acc; | ||
}, []) | ||
); | ||
} | ||
} | ||
|
||
// If any non conflict error, it returns only the errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this, are we returning artifacts
in { artifacts }
response that contain errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. bulk*
type of methods should always return both items that were successfully created and any errors that were encountered. What was happening here on our side was that if artifacts were actually created, but (for example) one generated an error, then we would never update our Manifest with the ones that were created - which would then lead to them being orphan artifacts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, should we also update then the Manifest Manager (pushArtifacrts()
) side, so we handle those that contain errors and only add to the manifest those that have been correctly created?
/** | ||
* Artifacts Configuration for package policy update concurrency | ||
*/ | ||
packagerTaskPackagePolicyUpdateBatchSize: schema.number({ defaultValue: 10, max: 50, min: 1 }), | ||
packagerTaskPackagePolicyUpdateBatchSize: schema.number({ defaultValue: 25, max: 50, min: 1 }), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we ok changing this value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think its ok and for customers with large data, they probably want to increase it to 50
this.logger.debug( | ||
`${ManifestTaskConstants.TYPE} task run took ${endTime - startTime}ms` | ||
|
||
this.logger.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be at debug
level? otherwise, it could be logging this timing every minute (by default) even there is nothing to do on the task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to .info
so that we can always get information about how long the task actually took to run without having ot ask a customer to set it debug level. This (IMO) is helpful, especially since we don't have .cancel()
logic, to understand how long these tasks are actually running at customer's env.
const { artifacts: fleetArtifacts, errors: createErrors } = | ||
await this.artifactClient.bulkCreateArtifacts(artifactsToCreate); | ||
|
||
this.logger.info(`Count of artifacts created: ${fleetArtifacts?.length ?? 0}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be also at debug level? Or only show it if there are new artifacts created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can go either way on this one. I just thought it would be good for the task to report what it did
return []; | ||
} catch (err) { | ||
this.logger.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be an error? Not sure but want to double check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. yes, this should be .error()
. I'll correct it.
this.packagerTaskPackagePolicyUpdateBatchSize | ||
await policyUpdateBatchProcessor.complete(); | ||
|
||
this.logger.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be a debug
and only display info logs if there are policy changes (as done below)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same reason here - I wanted the task to output what it actually did in a brief way
const badArtifactIds: string[] = []; | ||
const errors: string[] = []; | ||
const artifactDeletionProcess = new QueueProcessor<string>({ | ||
batchSize: this.packagerTaskPackagePolicyUpdateBatchSize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we include new config value for it or just let the default (10)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is doing a bulk delete by id (if I'm not wrong), should it be a higher value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean?
This this.packagerTaskPackagePolicyUpdateBatchSize
is coming from the config - its passed to the ManifestManager when initializing the instance of it :
packagerTaskPackagePolicyUpdateBatchSize: config.packagerTaskPackagePolicyUpdateBatchSize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. But it refers to package policy update batch size, and this area is about cleaning orphan artifacts, so maybe we can add a new config or let the default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, but figure we could just use the same one rather than introduce more config properties for batch size. I don't think it matters though. Do you have specific concerns?
if you rather have one, we can add it to the other issue we have to further brainstorm efficiencies for artifact creation, rather than to further delay this PR. Is that ok with you?
…st-manger-with-large-data
…h-large-data' into task/olm-fix-manifest-manger-with-large-data
@paul-tavares hey, sorry for the delay, I went through the code, but I don't really understand how Manifest Manager works so not sure I am should be reviewing this. Do you mind if somebody else take a look at it instead of me? |
…st-manger-with-large-data # Conflicts: # x-pack/plugins/fleet/server/services/package_policy.ts
💚 Build Succeeded
Metrics [docs]Public APIs missing comments
Public APIs missing exports
Unknown metric groupsAPI count
ESLint disabled in files
ESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: |
💔 All backports failed
Manual backportTo create the backport manually run:
Questions ?Please refer to the Backport tool documentation |
* main: (224 commits) [Http] Replace `buildNr` with `buildSha` in static asset paths (#175898) [Ops] Fix GCS bucket access for future buildkite agents (#174756) [api-docs] 2024-02-07 Daily api_docs build (#176362) skip flaky suite (#176002) skip failing es promotion suite (#176359) [Cloud Security] [Grouping] Add URL Params support to the grouping components (#175749) chore(NA): update versions after v8.12.2 bump (#176309) chore(NA): update versions after v7.17.19 bump (#176313) skip failing test suite (#176352) [SLO] Enable burn rate alert by default during creation via UI (#176317) [Fleet] Add the uptime capability to observability projects (#176285) [Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) (#174411) [ResponseOps] Alert creation delay based on user definition (#175851) [data views] Default field formatters based on field meta values (#174973) [Cloud Security]Detection Rules counter on Rules Flyout (#176041) [Security Solution] Data Quality Dashboard persistence (#175673) [Ent Search] Connector client copy cleanup (#176290) [ML] Anomaly Detection: Adds actions menu to anomaly markers in Single Metric Viewer chart. (#175556) [ML] Anomaly Detection: Fix `values-dots` colors (#176303) [Fleet] Logstash Output - being compliant to RFC-952 (#176298) ...
…th large (>10k) (elastic#174411) ## Summary ### Fleet Changes: - Two new utilities that return `AsyncIterator`'s: - one for working with ElasticSearch `.search()` method - one for working with SavedObjects `.find()` method - NOTE: although the `SavedObjects` client already supports getting back an `find` interface that returns an `AysncIterable`, I was not convenient to use in our use cases where we are returning the data from the SO back to an external consumer (services exposed by Fleet). We need to be able to first process the data out of the SO before returning it to the consumer, thus having this utility facilitates that. - both handle looping through ALL data in a given query (even if >10k) - new `fetchAllArtifacts()` method in `ArtifactsClient`: Returns an `AsyncIterator` enabling one to loop through all artifacts (even if >10k) - new `fetchAllItemIds()` method in `PackagePolicyService`: return an `AsyncIterator` enabling one to loop through all item IDs (even if >10k) - new `fetchAllItems()` method in `PackagePolicyService`: returns an `AsyncIterator` enabling one to loop through all package policies (even if >10k) ### Endpoint Changes: - Retrieval of existing artifacts as well as list of all policies and policy IDs now use new methods introduced into fleet services (above) - Added new config property - `xpack.securitySolution.packagerTaskTimeout` - to enable customer to adjust the timeout value for how long the artifact packager task can run. Default has been set to `20m` - Efficiencies around batch processing of updates to Policies and artifact creation - improved logging ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
💚 All backports created successfully
Note: Successful backport PRs will be merged automatically after passing CI. Questions ?Please refer to the Backport tool documentation |
…th large (>10k) (elastic#174411) ## Summary ### Fleet Changes: - Two new utilities that return `AsyncIterator`'s: - one for working with ElasticSearch `.search()` method - one for working with SavedObjects `.find()` method - NOTE: although the `SavedObjects` client already supports getting back an `find` interface that returns an `AysncIterable`, I was not convenient to use in our use cases where we are returning the data from the SO back to an external consumer (services exposed by Fleet). We need to be able to first process the data out of the SO before returning it to the consumer, thus having this utility facilitates that. - both handle looping through ALL data in a given query (even if >10k) - new `fetchAllArtifacts()` method in `ArtifactsClient`: Returns an `AsyncIterator` enabling one to loop through all artifacts (even if >10k) - new `fetchAllItemIds()` method in `PackagePolicyService`: return an `AsyncIterator` enabling one to loop through all item IDs (even if >10k) - new `fetchAllItems()` method in `PackagePolicyService`: returns an `AsyncIterator` enabling one to loop through all package policies (even if >10k) ### Endpoint Changes: - Retrieval of existing artifacts as well as list of all policies and policy IDs now use new methods introduced into fleet services (above) - Added new config property - `xpack.securitySolution.packagerTaskTimeout` - to enable customer to adjust the timeout value for how long the artifact packager task can run. Default has been set to `20m` - Efficiencies around batch processing of updates to Policies and artifact creation - improved logging ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios (cherry picked from commit 9150f9f) # Conflicts: # x-pack/plugins/fleet/server/services/package_policy.ts
…orks with large (>10k) (#174411) (#176531) # Backport This will backport the following commits from `main` to `8.12`: - [[Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) (#174411)](#174411) <!--- Backport version: 8.9.8 --> ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) <!--BACKPORT [{"author":{"name":"Paul Tavares","email":"56442535+paul-tavares@users.noreply.github.com"},"sourceCommit":{"committedDate":"2024-02-06T21:48:12Z","message":"[Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) (#174411)\n\n## Summary\r\n\r\n### Fleet Changes:\r\n\r\n- Two new utilities that return `AsyncIterator`'s:\r\n - one for working with ElasticSearch `.search()` method\r\n - one for working with SavedObjects `.find()` method\r\n- NOTE: although the `SavedObjects` client already supports getting back\r\nan `find` interface that returns an `AysncIterable`, I was not\r\nconvenient to use in our use cases where we are returning the data from\r\nthe SO back to an external consumer (services exposed by Fleet). We need\r\nto be able to first process the data out of the SO before returning it\r\nto the consumer, thus having this utility facilitates that.\r\n- both handle looping through ALL data in a given query (even if >10k)\r\n- new `fetchAllArtifacts()` method in `ArtifactsClient`: Returns an\r\n`AsyncIterator` enabling one to loop through all artifacts (even if\r\n>10k)\r\n- new `fetchAllItemIds()` method in `PackagePolicyService`: return an\r\n`AsyncIterator` enabling one to loop through all item IDs (even if >10k)\r\n- new `fetchAllItems()` method in `PackagePolicyService`: returns an\r\n`AsyncIterator` enabling one to loop through all package policies (even\r\nif >10k)\r\n\r\n\r\n### Endpoint Changes:\r\n\r\n- Retrieval of existing artifacts as well as list of all policies and\r\npolicy IDs now use new methods introduced into fleet services (above)\r\n- Added new config property -\r\n`xpack.securitySolution.packagerTaskTimeout` - to enable customer to\r\nadjust the timeout value for how long the artifact packager task can\r\nrun. Default has been set to `20m`\r\n- Efficiencies around batch processing of updates to Policies and\r\nartifact creation\r\n- improved logging\r\n\r\n\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios","sha":"9150f9fa2f110bcd54f90f21554bad5e6d92fd0f","branchLabelMapping":{"^v8.13.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["release_note:skip","Team:Fleet","Team:Defend Workflows","ci:cloud-deploy","v8.13.0","v8.12.2"],"number":174411,"url":"https://github.com/elastic/kibana/pull/174411","mergeCommit":{"message":"[Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) (#174411)\n\n## Summary\r\n\r\n### Fleet Changes:\r\n\r\n- Two new utilities that return `AsyncIterator`'s:\r\n - one for working with ElasticSearch `.search()` method\r\n - one for working with SavedObjects `.find()` method\r\n- NOTE: although the `SavedObjects` client already supports getting back\r\nan `find` interface that returns an `AysncIterable`, I was not\r\nconvenient to use in our use cases where we are returning the data from\r\nthe SO back to an external consumer (services exposed by Fleet). We need\r\nto be able to first process the data out of the SO before returning it\r\nto the consumer, thus having this utility facilitates that.\r\n- both handle looping through ALL data in a given query (even if >10k)\r\n- new `fetchAllArtifacts()` method in `ArtifactsClient`: Returns an\r\n`AsyncIterator` enabling one to loop through all artifacts (even if\r\n>10k)\r\n- new `fetchAllItemIds()` method in `PackagePolicyService`: return an\r\n`AsyncIterator` enabling one to loop through all item IDs (even if >10k)\r\n- new `fetchAllItems()` method in `PackagePolicyService`: returns an\r\n`AsyncIterator` enabling one to loop through all package policies (even\r\nif >10k)\r\n\r\n\r\n### Endpoint Changes:\r\n\r\n- Retrieval of existing artifacts as well as list of all policies and\r\npolicy IDs now use new methods introduced into fleet services (above)\r\n- Added new config property -\r\n`xpack.securitySolution.packagerTaskTimeout` - to enable customer to\r\nadjust the timeout value for how long the artifact packager task can\r\nrun. Default has been set to `20m`\r\n- Efficiencies around batch processing of updates to Policies and\r\nartifact creation\r\n- improved logging\r\n\r\n\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios","sha":"9150f9fa2f110bcd54f90f21554bad5e6d92fd0f"}},"sourceBranch":"main","suggestedTargetBranches":["8.12"],"targetPullRequestStates":[{"branch":"main","label":"v8.13.0","labelRegex":"^v8.13.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/174411","number":174411,"mergeCommit":{"message":"[Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) (#174411)\n\n## Summary\r\n\r\n### Fleet Changes:\r\n\r\n- Two new utilities that return `AsyncIterator`'s:\r\n - one for working with ElasticSearch `.search()` method\r\n - one for working with SavedObjects `.find()` method\r\n- NOTE: although the `SavedObjects` client already supports getting back\r\nan `find` interface that returns an `AysncIterable`, I was not\r\nconvenient to use in our use cases where we are returning the data from\r\nthe SO back to an external consumer (services exposed by Fleet). We need\r\nto be able to first process the data out of the SO before returning it\r\nto the consumer, thus having this utility facilitates that.\r\n- both handle looping through ALL data in a given query (even if >10k)\r\n- new `fetchAllArtifacts()` method in `ArtifactsClient`: Returns an\r\n`AsyncIterator` enabling one to loop through all artifacts (even if\r\n>10k)\r\n- new `fetchAllItemIds()` method in `PackagePolicyService`: return an\r\n`AsyncIterator` enabling one to loop through all item IDs (even if >10k)\r\n- new `fetchAllItems()` method in `PackagePolicyService`: returns an\r\n`AsyncIterator` enabling one to loop through all package policies (even\r\nif >10k)\r\n\r\n\r\n### Endpoint Changes:\r\n\r\n- Retrieval of existing artifacts as well as list of all policies and\r\npolicy IDs now use new methods introduced into fleet services (above)\r\n- Added new config property -\r\n`xpack.securitySolution.packagerTaskTimeout` - to enable customer to\r\nadjust the timeout value for how long the artifact packager task can\r\nrun. Default has been set to `20m`\r\n- Efficiencies around batch processing of updates to Policies and\r\nartifact creation\r\n- improved logging\r\n\r\n\r\n\r\n### Checklist\r\n\r\n- [x] [Unit or functional\r\ntests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)\r\nwere updated or added to match the most common scenarios","sha":"9150f9fa2f110bcd54f90f21554bad5e6d92fd0f"}},{"branch":"8.12","label":"v8.12.2","labelRegex":"^v(\\d+).(\\d+).\\d+$","isSourceBranch":false,"state":"NOT_CREATED"}]}] BACKPORT-->
…th large (>10k) (elastic#174411) ## Summary ### Fleet Changes: - Two new utilities that return `AsyncIterator`'s: - one for working with ElasticSearch `.search()` method - one for working with SavedObjects `.find()` method - NOTE: although the `SavedObjects` client already supports getting back an `find` interface that returns an `AysncIterable`, I was not convenient to use in our use cases where we are returning the data from the SO back to an external consumer (services exposed by Fleet). We need to be able to first process the data out of the SO before returning it to the consumer, thus having this utility facilitates that. - both handle looping through ALL data in a given query (even if >10k) - new `fetchAllArtifacts()` method in `ArtifactsClient`: Returns an `AsyncIterator` enabling one to loop through all artifacts (even if >10k) - new `fetchAllItemIds()` method in `PackagePolicyService`: return an `AsyncIterator` enabling one to loop through all item IDs (even if >10k) - new `fetchAllItems()` method in `PackagePolicyService`: returns an `AsyncIterator` enabling one to loop through all package policies (even if >10k) ### Endpoint Changes: - Retrieval of existing artifacts as well as list of all policies and policy IDs now use new methods introduced into fleet services (above) - Added new config property - `xpack.securitySolution.packagerTaskTimeout` - to enable customer to adjust the timeout value for how long the artifact packager task can run. Default has been set to `20m` - Efficiencies around batch processing of updates to Policies and artifact creation - improved logging ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
…th large (>10k) (elastic#174411) ## Summary ### Fleet Changes: - Two new utilities that return `AsyncIterator`'s: - one for working with ElasticSearch `.search()` method - one for working with SavedObjects `.find()` method - NOTE: although the `SavedObjects` client already supports getting back an `find` interface that returns an `AysncIterable`, I was not convenient to use in our use cases where we are returning the data from the SO back to an external consumer (services exposed by Fleet). We need to be able to first process the data out of the SO before returning it to the consumer, thus having this utility facilitates that. - both handle looping through ALL data in a given query (even if >10k) - new `fetchAllArtifacts()` method in `ArtifactsClient`: Returns an `AsyncIterator` enabling one to loop through all artifacts (even if >10k) - new `fetchAllItemIds()` method in `PackagePolicyService`: return an `AsyncIterator` enabling one to loop through all item IDs (even if >10k) - new `fetchAllItems()` method in `PackagePolicyService`: returns an `AsyncIterator` enabling one to loop through all package policies (even if >10k) ### Endpoint Changes: - Retrieval of existing artifacts as well as list of all policies and policy IDs now use new methods introduced into fleet services (above) - Added new config property - `xpack.securitySolution.packagerTaskTimeout` - to enable customer to adjust the timeout value for how long the artifact packager task can run. Default has been set to `20m` - Efficiencies around batch processing of updates to Policies and artifact creation - improved logging ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
Summary
Fleet Changes:
AsyncIterator
's:.search()
method.find()
methodSavedObjects
client already supports getting back anfind
interface that returns anAysncIterable
, I was not convenient to use in our use cases where we are returning the data from the SO back to an external consumer (services exposed by Fleet). We need to be able to first process the data out of the SO before returning it to the consumer, thus having this utility facilitates that.fetchAllArtifacts()
method inArtifactsClient
: Returns anAsyncIterator
enabling one to loop through all artifacts (even if >10k)fetchAllItemIds()
method inPackagePolicyService
: return anAsyncIterator
enabling one to loop through all item IDs (even if >10k)fetchAllItems()
method inPackagePolicyService
: returns anAsyncIterator
enabling one to loop through all package policies (even if >10k)Endpoint Changes:
xpack.securitySolution.packagerTaskTimeout
- to enable customer to adjust the timeout value for how long the artifact packager task can run. Default has been set to20m
Checklist