[Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) #174411

paul-tavares · 2024-01-06T00:32:25Z

Summary

Fleet Changes:

Two new utilities that return AsyncIterator's:
- one for working with ElasticSearch .search() method
- one for working with SavedObjects .find() method
  - NOTE: although the SavedObjects client already supports getting back an find interface that returns an AysncIterable, I was not convenient to use in our use cases where we are returning the data from the SO back to an external consumer (services exposed by Fleet). We need to be able to first process the data out of the SO before returning it to the consumer, thus having this utility facilitates that.
- both handle looping through ALL data in a given query (even if >10k)
new fetchAllArtifacts() method in ArtifactsClient: Returns an AsyncIterator enabling one to loop through all artifacts (even if >10k)
new fetchAllItemIds() method in PackagePolicyService: return an AsyncIterator enabling one to loop through all item IDs (even if >10k)
new fetchAllItems() method in PackagePolicyService: returns an AsyncIterator enabling one to loop through all package policies (even if >10k)

Endpoint Changes:

Retrieval of existing artifacts as well as list of all policies and policy IDs now use new methods introduced into fleet services (above)
Added new config property - xpack.securitySolution.packagerTaskTimeout - to enable customer to adjust the timeout value for how long the artifact packager task can run. Default has been set to 20m
Efficiencies around batch processing of updates to Policies and artifact creation
improved logging

Checklist

Unit or functional tests were updated or added to match the most common scenarios

paul-tavares · 2024-01-08T22:03:20Z

/ci

paul-tavares · 2024-01-18T16:07:19Z

/ci

…stManager

…lItemIds()` service method

# Conflicts: # x-pack/plugins/fleet/server/services/artifacts/artifacts.test.ts # x-pack/plugins/fleet/server/services/artifacts/mocks.ts

elasticmachine · 2024-01-26T10:03:22Z

Pinging @elastic/fleet (Team:Fleet)

dasansol92

Thanks for putting all these changes together.
The improvements and all the extra logging here are really helpful.
I've played with it locally and didn't see blocking errors on the task.

I was able to reach the +10k fleet artifacts in Manifest Manager.

Did not check all the artifacts output but our tests should do the work, specially these two:

x-pack/test/security_solution_endpoint/apps/integrations/artifact_entries_list.ts
x-pack/test/security_solution_endpoint/apps/integrations/endpoint_exceptions.ts

Left few questions, let me know what do you think.

dasansol92 · 2024-01-31T09:23:31Z

x-pack/plugins/fleet/server/services/artifacts/artifacts.ts

          }
          return acc;
        }, [])
      );
    }
  }

-  // If any non conflict error, it returns only the errors


Removing this, are we returning artifacts in { artifacts } response that contain errors?

Yes, exactly. bulk* type of methods should always return both items that were successfully created and any errors that were encountered. What was happening here on our side was that if artifacts were actually created, but (for example) one generated an error, then we would never update our Manifest with the ones that were created - which would then lead to them being orphan artifacts.

Got it, should we also update then the Manifest Manager (pushArtifacrts()) side, so we handle those that contain errors and only add to the manifest those that have been correctly created?

dasansol92 · 2024-01-31T09:30:11Z

x-pack/plugins/security_solution/server/config.ts

  /**
   * Artifacts Configuration for package policy update concurrency
   */
-  packagerTaskPackagePolicyUpdateBatchSize: schema.number({ defaultValue: 10, max: 50, min: 1 }),
+  packagerTaskPackagePolicyUpdateBatchSize: schema.number({ defaultValue: 25, max: 50, min: 1 }),


Are we ok changing this value?

Personally, I think its ok and for customers with large data, they probably want to increase it to 50

dasansol92 · 2024-01-31T09:31:36Z

x-pack/plugins/security_solution/server/endpoint/lib/artifacts/task.ts

-              this.logger.debug(
-                `${ManifestTaskConstants.TYPE} task run took ${endTime - startTime}ms`
+
+              this.logger.info(


Should this be at debug level? otherwise, it could be logging this timing every minute (by default) even there is nothing to do on the task

I changed it to .info so that we can always get information about how long the task actually took to run without having ot ask a customer to set it debug level. This (IMO) is helpful, especially since we don't have .cancel() logic, to understand how long these tasks are actually running at customer's env.

dasansol92 · 2024-01-31T09:34:52Z

...ns/security_solution/server/endpoint/services/artifacts/manifest_manager/manifest_manager.ts

    const { artifacts: fleetArtifacts, errors: createErrors } =
      await this.artifactClient.bulkCreateArtifacts(artifactsToCreate);

+    this.logger.info(`Count of artifacts created: ${fleetArtifacts?.length ?? 0}`);


Should this be also at debug level? Or only show it if there are new artifacts created?

I can go either way on this one. I just thought it would be good for the task to report what it did

dasansol92 · 2024-01-31T09:36:26Z

...ns/security_solution/server/endpoint/services/artifacts/manifest_manager/manifest_manager.ts

      return [];
    } catch (err) {
+      this.logger.debug(


should this be an error? Not sure but want to double check

good catch. yes, this should be .error(). I'll correct it.

dasansol92 · 2024-01-31T09:38:53Z

...ns/security_solution/server/endpoint/services/artifacts/manifest_manager/manifest_manager.ts

-      this.packagerTaskPackagePolicyUpdateBatchSize
+    await policyUpdateBatchProcessor.complete();
+
+    this.logger.info(


should this be a debug and only display info logs if there are policy changes (as done below)?

same reason here - I wanted the task to output what it actually did in a brief way

dasansol92 · 2024-01-31T09:42:30Z

...ns/security_solution/server/endpoint/services/artifacts/manifest_manager/manifest_manager.ts

+    const badArtifactIds: string[] = [];
+    const errors: string[] = [];
+    const artifactDeletionProcess = new QueueProcessor<string>({
+      batchSize: this.packagerTaskPackagePolicyUpdateBatchSize,


should we include new config value for it or just let the default (10)?

Also, this is doing a bulk delete by id (if I'm not wrong), should it be a higher value?

What do you mean?

This this.packagerTaskPackagePolicyUpdateBatchSize is coming from the config - its passed to the ManifestManager when initializing the instance of it :

kibana/x-pack/plugins/security_solution/server/plugin.ts

Line 536 in 560a3c9

packagerTaskPackagePolicyUpdateBatchSize: config.packagerTaskPackagePolicyUpdateBatchSize,

Yes, you are right. But it refers to package policy update batch size, and this area is about cleaning orphan artifacts, so maybe we can add a new config or let the default value.

I thought about that, but figure we could just use the same one rather than introduce more config properties for batch size. I don't think it matters though. Do you have specific concerns?

if you rather have one, we can add it to the other issue we have to further brainstorm efficiencies for artifact creation, rather than to further delay this PR. Is that ok with you?

…st-manger-with-large-data

…h-large-data' into task/olm-fix-manifest-manger-with-large-data

tomsonpl · 2024-02-01T11:17:06Z

@paul-tavares hey, sorry for the delay, I went through the code, but I don't really understand how Manifest Manager works so not sure I am should be reviewing this. Do you mind if somebody else take a look at it instead of me?

…st-manger-with-large-data # Conflicts: # x-pack/plugins/fleet/server/services/package_policy.ts

kibana-ci · 2024-02-05T17:09:45Z

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id	before	after	diff
`fleet`	1104	1110	+6

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id	before	after	diff
`fleet`	51	54	+3

Unknown metric groups

API count

id	before	after	diff
`fleet`	1221	1229	+8

ESLint disabled in files

id	before	after	diff
`fleet`	11	12	+1

ESLint disabled line counts

id	before	after	diff
`securitySolution`	473	472	-1

Total ESLint disabled count

id	before	after	diff
`fleet`	55	56	+1
`securitySolution`	546	545	-1
total			-0

History

💛 Build #190880 was flaky a4d683e
💔 Build #190520 failed fe3aad4
💔 Build #190497 failed cb80498
💔 Build #190465 failed e4eec9a
💔 Build #190304 failed 76adcf2

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @paul-tavares

kibanamachine · 2024-02-06T21:53:21Z

💔 All backports failed

Status	Branch	Result
❌	8.12	Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 174411

Questions ?

Please refer to the Backport tool documentation

* main: (224 commits) [Http] Replace `buildNr` with `buildSha` in static asset paths (#175898) [Ops] Fix GCS bucket access for future buildkite agents (#174756) [api-docs] 2024-02-07 Daily api_docs build (#176362) skip flaky suite (#176002) skip failing es promotion suite (#176359) [Cloud Security] [Grouping] Add URL Params support to the grouping components (#175749) chore(NA): update versions after v8.12.2 bump (#176309) chore(NA): update versions after v7.17.19 bump (#176313) skip failing test suite (#176352) [SLO] Enable burn rate alert by default during creation via UI (#176317) [Fleet] Add the uptime capability to observability projects (#176285) [Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) (#174411) [ResponseOps] Alert creation delay based on user definition (#175851) [data views] Default field formatters based on field meta values (#174973) [Cloud Security]Detection Rules counter on Rules Flyout (#176041) [Security Solution] Data Quality Dashboard persistence (#175673) [Ent Search] Connector client copy cleanup (#176290) [ML] Anomaly Detection: Adds actions menu to anomaly markers in Single Metric Viewer chart. (#175556) [ML] Anomaly Detection: Fix `values-dots` colors (#176303) [Fleet] Logstash Output - being compliant to RFC-952 (#176298) ...

…th large (>10k) (elastic#174411) ## Summary ### Fleet Changes: - Two new utilities that return `AsyncIterator`'s: - one for working with ElasticSearch `.search()` method - one for working with SavedObjects `.find()` method - NOTE: although the `SavedObjects` client already supports getting back an `find` interface that returns an `AysncIterable`, I was not convenient to use in our use cases where we are returning the data from the SO back to an external consumer (services exposed by Fleet). We need to be able to first process the data out of the SO before returning it to the consumer, thus having this utility facilitates that. - both handle looping through ALL data in a given query (even if >10k) - new `fetchAllArtifacts()` method in `ArtifactsClient`: Returns an `AsyncIterator` enabling one to loop through all artifacts (even if >10k) - new `fetchAllItemIds()` method in `PackagePolicyService`: return an `AsyncIterator` enabling one to loop through all item IDs (even if >10k) - new `fetchAllItems()` method in `PackagePolicyService`: returns an `AsyncIterator` enabling one to loop through all package policies (even if >10k) ### Endpoint Changes: - Retrieval of existing artifacts as well as list of all policies and policy IDs now use new methods introduced into fleet services (above) - Added new config property - `xpack.securitySolution.packagerTaskTimeout` - to enable customer to adjust the timeout value for how long the artifact packager task can run. Default has been set to `20m` - Efficiencies around batch processing of updates to Policies and artifact creation - improved logging ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

paul-tavares · 2024-02-08T17:24:52Z

💚 All backports created successfully

Status	Branch	Result
✅	8.12

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

…th large (>10k) (elastic#174411) ## Summary ### Fleet Changes: - Two new utilities that return `AsyncIterator`'s: - one for working with ElasticSearch `.search()` method - one for working with SavedObjects `.find()` method - NOTE: although the `SavedObjects` client already supports getting back an `find` interface that returns an `AysncIterable`, I was not convenient to use in our use cases where we are returning the data from the SO back to an external consumer (services exposed by Fleet). We need to be able to first process the data out of the SO before returning it to the consumer, thus having this utility facilitates that. - both handle looping through ALL data in a given query (even if >10k) - new `fetchAllArtifacts()` method in `ArtifactsClient`: Returns an `AsyncIterator` enabling one to loop through all artifacts (even if >10k) - new `fetchAllItemIds()` method in `PackagePolicyService`: return an `AsyncIterator` enabling one to loop through all item IDs (even if >10k) - new `fetchAllItems()` method in `PackagePolicyService`: returns an `AsyncIterator` enabling one to loop through all package policies (even if >10k) ### Endpoint Changes: - Retrieval of existing artifacts as well as list of all policies and policy IDs now use new methods introduced into fleet services (above) - Added new config property - `xpack.securitySolution.packagerTaskTimeout` - to enable customer to adjust the timeout value for how long the artifact packager task can run. Default has been set to `20m` - Efficiencies around batch processing of updates to Policies and artifact creation - improved logging ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios (cherry picked from commit 9150f9f) # Conflicts: # x-pack/plugins/fleet/server/services/package_policy.ts

…orks with large (>10k) (#174411) (#176531) # Backport This will backport the following commits from `main` to `8.12`: - [[Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) (#174411)](#174411)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)

…th large (>10k) (elastic#174411) ## Summary ### Fleet Changes: - Two new utilities that return `AsyncIterator`'s: - one for working with ElasticSearch `.search()` method - one for working with SavedObjects `.find()` method - NOTE: although the `SavedObjects` client already supports getting back an `find` interface that returns an `AysncIterable`, I was not convenient to use in our use cases where we are returning the data from the SO back to an external consumer (services exposed by Fleet). We need to be able to first process the data out of the SO before returning it to the consumer, thus having this utility facilitates that. - both handle looping through ALL data in a given query (even if >10k) - new `fetchAllArtifacts()` method in `ArtifactsClient`: Returns an `AsyncIterator` enabling one to loop through all artifacts (even if >10k) - new `fetchAllItemIds()` method in `PackagePolicyService`: return an `AsyncIterator` enabling one to loop through all item IDs (even if >10k) - new `fetchAllItems()` method in `PackagePolicyService`: returns an `AsyncIterator` enabling one to loop through all package policies (even if >10k) ### Endpoint Changes: - Retrieval of existing artifacts as well as list of all policies and policy IDs now use new methods introduced into fleet services (above) - Added new config property - `xpack.securitySolution.packagerTaskTimeout` - to enable customer to adjust the timeout value for how long the artifact packager task can run. Default has been set to `20m` - Efficiencies around batch processing of updates to Policies and artifact creation - improved logging ### Checklist - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

paul-tavares added release_note:skip Skip the PR/issue when compiling release notes Team:Defend Workflows “EDR Workflows” sub-team of Security Solution v8.13.0 labels Jan 6, 2024

paul-tavares self-assigned this Jan 6, 2024

paul-tavares mentioned this pull request Jan 9, 2024

[8.12][Security Solution][Endpoint] Fix Artifact packager task retrieval of artifacts from Fleet when there are >10k records #174536

Closed

1 task

paul-tavares added the ci:cloud-deploy Create or update a Cloud deployment label Jan 18, 2024

paul-tavares added 22 commits January 24, 2024 15:44

Improve logging in artifact manager and task

560a3c9

Improve log message in the task

60138db

New fetchAllArtifacts() fleet service function

c8f3526

Add fetchAll() to FleetArtifactsClient

18e0ccf

again - better debug message output

a1067e5

Add fetchAll() to the Endpoint artifact client and use it in Manife…

490d343

…stManager

new common utility to create ES iterable - createEsSearchIterable()

f31f814

new createSoFindIterable() utility

664c9a6

remove try block from createEsSearchIterable()

7c8c6ea

Add fetchAllItemIds() to package_policy_service

a80ad42

ManifestManager: change listEndpointPolicyIds() to use new `fetchAl…

d65ca83

…lItemIds()` service method

package policy utility to map SO to PackagePolicy

b212b09

add fetchAllItems() to packagePolicy service

da28767

Utility to process data in batches

5193172

refactor ManifestManager .tryDispatch() to use BatchProcessor

e50ae71

small improvements to deleteArtifacts()

5d3d2d9

Improve cleanup()

29ffa68

Improve log message in the task

4a322e5

Add packagerTaskTimout to security solution server config

ec8420e

placeholder for tests in fleet artifacts

cfea5b4

tests for fetchAllArtifacts()

c827c8c

# Conflicts: # x-pack/plugins/fleet/server/services/artifacts/artifacts.test.ts # x-pack/plugins/fleet/server/services/artifacts/mocks.ts

fix manifest manager tests

9f9cc44

paul-tavares and others added 2 commits January 29, 2024 08:35

Merge branch 'main' into task/olm-fix-manifest-manger-with-large-data

9c5b292

Fix mocks for new AsyncIterator service methods

89621f0

paul-tavares added the v8.12.2 label Jan 29, 2024

paul-tavares added 3 commits January 30, 2024 09:57

fix failing test

9b3e3f6

Fix fleet tests

f78112c

fix types in mock

76adcf2

dasansol92 reviewed Jan 31, 2024

View reviewed changes

paul-tavares and others added 5 commits January 31, 2024 08:56

Merge branch 'main' into task/olm-fix-manifest-manger-with-large-data

e4eec9a

fix mock types and PR review feedback

6ee98e1

Merge remote-tracking branch 'upstream/main' into task/olm-fix-manife…

96836dd

…st-manger-with-large-data

Merge remote-tracking branch 'origin/task/olm-fix-manifest-manger-wit…

cb80498

…h-large-data' into task/olm-fix-manifest-manger-with-large-data

fix var name

fe3aad4

dasansol92 approved these changes Feb 1, 2024

View reviewed changes

paul-tavares and others added 2 commits February 1, 2024 14:31

Merge remote-tracking branch 'upstream/main' into task/olm-fix-manife…

a4d683e

…st-manger-with-large-data # Conflicts: # x-pack/plugins/fleet/server/services/package_policy.ts

Merge branch 'main' into task/olm-fix-manifest-manger-with-large-data

d2187a5

paul-tavares merged commit 9150f9f into elastic:main Feb 6, 2024
38 checks passed

paul-tavares deleted the task/olm-fix-manifest-manger-with-large-data branch February 6, 2024 21:48

paul-tavares mentioned this pull request Feb 8, 2024

[8.12] [Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) (#174411) #176531

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) #174411

[Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) #174411

paul-tavares commented Jan 6, 2024 •

edited by kibanamachine

Loading

paul-tavares commented Jan 8, 2024

paul-tavares commented Jan 18, 2024

elasticmachine commented Jan 26, 2024

dasansol92 left a comment

dasansol92 Jan 31, 2024

paul-tavares Jan 31, 2024

dasansol92 Feb 1, 2024

dasansol92 Jan 31, 2024

paul-tavares Jan 31, 2024

dasansol92 Jan 31, 2024

paul-tavares Jan 31, 2024

dasansol92 Jan 31, 2024

paul-tavares Jan 31, 2024

dasansol92 Jan 31, 2024

paul-tavares Jan 31, 2024

dasansol92 Jan 31, 2024

paul-tavares Jan 31, 2024

dasansol92 Jan 31, 2024

dasansol92 Jan 31, 2024

paul-tavares Jan 31, 2024

dasansol92 Feb 1, 2024

paul-tavares Feb 1, 2024

tomsonpl commented Feb 1, 2024

kibana-ci commented Feb 5, 2024 •

edited

Loading

API count

ESLint disabled in files

ESLint disabled line counts

Total ESLint disabled count

kibanamachine commented Feb 6, 2024

paul-tavares commented Feb 8, 2024

[Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) #174411

[Security Solution][Endpoint] Fix Manifest Manger so that it works with large (>10k) #174411

Conversation

paul-tavares commented Jan 6, 2024 • edited by kibanamachine Loading

Summary

Fleet Changes:

Endpoint Changes:

Checklist

paul-tavares commented Jan 8, 2024

paul-tavares commented Jan 18, 2024

elasticmachine commented Jan 26, 2024

dasansol92 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomsonpl commented Feb 1, 2024

kibana-ci commented Feb 5, 2024 • edited Loading

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Public APIs missing exports

API count

ESLint disabled in files

ESLint disabled line counts

Total ESLint disabled count

History

kibanamachine commented Feb 6, 2024

💔 All backports failed

Manual backport

Questions ?

paul-tavares commented Feb 8, 2024

💚 All backports created successfully

Questions ?

paul-tavares commented Jan 6, 2024 •

edited by kibanamachine

Loading

kibana-ci commented Feb 5, 2024 •

edited

Loading