[Fleet] Run agent policy schema in batches during fleet setup + add `xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` config #150688

hop-dev · 2023-02-09T12:27:29Z

Summary

As part of the Fleet plugin setup, we check to see if any agent policies have an out of date schema_version and upgrade them. We encountered an error when this upgrade happens on a large number of agent policies as we attempted the upgrade in one large batch.

This pull request performs the schema upgrade in batches of 100 by default and also adds the config value xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize to make the batch size configurable.

I have also added more debug logging to show progress, and reduced the response payload of one of our requests which was very large.

Dev testing

To test this you need an environemnt with lots of agent policies (> 2k) where schema_version
is not set. To create an environment with a large number of agent policies I have added a new param to the agent creation script, I ran:

cd x-pack/plugins/fleet
node scripts/create_agents --count 20  --kibana http://127.0.0.1:5601/mark --status online --delete --batches 3000 --concurrentBatches 100

To generate 3000 agent policies each with 20 agents in.

I then modified the agent policies so that they require an upgrade, as system_indices_superuser run:

POST /.kibana/_update_by_query
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "type": "ingest-agent-policies"
          }
        }
      ]
    }
  },
  "script": {
    "source": "ctx._source['ingest-agent-policies'].remove('schema_version')",
    "lang": "painless"
  }
}

restarting kibana will run the setup and in batches.

elasticmachine · 2023-02-09T15:18:45Z

Pinging @elastic/fleet (Team:Fleet)

hop-dev · 2023-02-09T15:20:00Z

x-pack/plugins/fleet/server/services/agent_policy.ts

@@ -315,15 +315,22 @@ class AgentPolicyService {
    soClient: SavedObjectsClientContract,
    options: ListWithKuery & {
      withPackagePolicies?: boolean;
+      fields?: string[];


added the ability to restrict the agent policy fields returned to reduce payload size.

hop-dev · 2023-02-09T15:20:26Z

x-pack/plugins/fleet/scripts/create_agents/create_agents.ts

@@ -25,6 +25,8 @@ const printUsage = () =>
    [--kibana]: full url of kibana instance to create agents and policy in e.g http://localhost:5601/mybase, defaults to http://localhost:5601


this is just changes to the test script to make creating envs with lots of agent policies easier.

x-pack/plugins/fleet/server/services/setup/upgrade_agent_policy_schema_version.ts

juliaElastic · 2023-02-09T15:42:13Z

x-pack/plugins/fleet/server/services/setup/upgrade_agent_policy_schema_version.ts

@@ -26,13 +27,23 @@ function getOutdatedAgentPoliciesBatch(soClient: SavedObjectsClientContract) {
 // deploy outdated policies to .fleet-policies index
 // bump oudated SOs schema_version
 export async function upgradeAgentPolicySchemaVersion(soClient: SavedObjectsClientContract) {


Great improvements!
It would be great to add an integration test with a small batch size.
I'm curious how long it takes to update a large set of agent policies in batches of 100. Hopefully not too long.

It does take a long time (e.g 50s per 1000 agent policies in my local dev env) but its such an expensive operation I didn't dare put the default higher at risk of overwhelming elastic and kibana. My reasoning was that I suspect the vast majority of users have less than 100 agent policies anyway.

kpollich

LGTM outside of Julia's very accurate nitpick 👍

x-pack/plugins/fleet/server/services/setup/upgrade_agent_policy_schema_version.ts

kibana-ci · 2023-02-09T17:16:35Z

💚 Build Succeeded

Buildkite Build
Commit: 1304086

Metrics [docs]

✅ unchanged

History

💚 Build #106883 succeeded 9cb8d3b
💔 Build #106828 failed aecd936fb6b684f8657eab62fb099e80a9764484

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

…xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` config (elastic#150688) ## Summary Closes elastic#150538 As part of the Fleet plugin setup, we check to see if any agent policies have an out of date `schema_version` and upgrade them. We encountered an error when this upgrade happens on a large number of agent policies as we attempted the upgrade in one large batch. This pull request performs the schema upgrade in batches of 100 by default and also adds the config value `xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` to make the batch size configurable. I have also added more debug logging to show progress, and reduced the response payload of one of our requests which was very large. ### Dev testing To test this you need an environemnt with lots of agent policies (> 2k) where `schema_version` is not set. To create an environment with a large number of agent policies I have added a new param to the agent creation script, I ran: ``` cd x-pack/plugins/fleet node scripts/create_agents --count 20 --kibana http://127.0.0.1:5601/mark --status online --delete --batches 3000 --concurrentBatches 100 ``` To generate 3000 agent policies each with 20 agents in. I then modified the agent policies so that they require an upgrade, as `system_indices_superuser` run: ``` POST /.kibana/_update_by_query { "query": { "bool": { "filter": [ { "term": { "type": "ingest-agent-policies" } } ] } }, "script": { "source": "ctx._source['ingest-agent-policies'].remove('schema_version')", "lang": "painless" } } ``` restarting kibana will run the setup and in batches. (cherry picked from commit 6e06452)

kibanamachine · 2023-02-09T17:21:54Z

💔 Some backports could not be created

Status	Branch	Result
❌	7.17	Backport failed because of merge conflicts
✅	8.7

Note: Successful backport PRs will be merged automatically after passing CI.

Manual backport

To create the backport manually run:

node scripts/backport --pr 150688

Questions ?

Please refer to the Backport tool documentation

… add `xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` config (#150688) (#150750) # Backport This will backport the following commits from `main` to `8.7`: - [[Fleet] Run agent policy schema in batches during fleet setup + add `xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` config (#150688)](#150688)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport)  Co-authored-by: Mark Hopkin <mark.hopkin@elastic.co>

hop-dev · 2023-02-09T20:38:37Z

💚 All backports created successfully

Status	Branch	Result
✅	8.6

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

…xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` config (elastic#150688) ## Summary Closes elastic#150538 As part of the Fleet plugin setup, we check to see if any agent policies have an out of date `schema_version` and upgrade them. We encountered an error when this upgrade happens on a large number of agent policies as we attempted the upgrade in one large batch. This pull request performs the schema upgrade in batches of 100 by default and also adds the config value `xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` to make the batch size configurable. I have also added more debug logging to show progress, and reduced the response payload of one of our requests which was very large. ### Dev testing To test this you need an environemnt with lots of agent policies (> 2k) where `schema_version` is not set. To create an environment with a large number of agent policies I have added a new param to the agent creation script, I ran: ``` cd x-pack/plugins/fleet node scripts/create_agents --count 20 --kibana http://127.0.0.1:5601/mark --status online --delete --batches 3000 --concurrentBatches 100 ``` To generate 3000 agent policies each with 20 agents in. I then modified the agent policies so that they require an upgrade, as `system_indices_superuser` run: ``` POST /.kibana/_update_by_query { "query": { "bool": { "filter": [ { "term": { "type": "ingest-agent-policies" } } ] } }, "script": { "source": "ctx._source['ingest-agent-policies'].remove('schema_version')", "lang": "painless" } } ``` restarting kibana will run the setup and in batches. (cherry picked from commit 6e06452) # Conflicts: # x-pack/plugins/fleet/scripts/create_agents/create_agents.ts

… add `xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` config (#150688) (#150781) # Backport This will backport the following commits from `main` to `8.6`: - [[Fleet] Run agent policy schema in batches during fleet setup + add `xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` config (#150688)](#150688)  ### Questions ? Please refer to the [Backport tool documentation](https://github.com/sqren/backport) \n\n### Questions ?\nPlease refer to the [Backport tool\ndocumentation](https://github.com/sqren/backport)\n\n\n\nCo-authored-by: Mark Hopkin <mark.hopkin@elastic.co>"}},{"branch":"main","label":"v8.8.0","labelRegex":"^v8.8.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/150688","number":150688,"mergeCommit":{"message":"[Fleet] Run agent policy schema in batches during fleet setup + add `xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` config (#150688)\n\n## Summary\r\n\r\nCloses #150538 \r\n\r\nAs part of the Fleet plugin setup, we check to see if any agent policies\r\nhave an out of date `schema_version` and upgrade them. We encountered an\r\nerror when this upgrade happens on a large number of agent policies as\r\nwe attempted the upgrade in one large batch.\r\n\r\nThis pull request performs the schema upgrade in batches of 100 by\r\ndefault and also adds the config value\r\n`xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` to make the batch\r\nsize configurable.\r\n\r\nI have also added more debug logging to show progress, and reduced the\r\nresponse payload of one of our requests which was very large.\r\n\r\n### Dev testing\r\n\r\nTo test this you need an environemnt with lots of agent policies (> 2k)\r\nwhere `schema_version`\r\nis not set. To create an environment with a large number of agent\r\npolicies I have added a new param to the agent creation script, I ran:\r\n\r\n```\r\ncd x-pack/plugins/fleet\r\nnode scripts/create_agents --count 20 --kibana http://127.0.0.1:5601/mark --status online --delete --batches 3000 --concurrentBatches 100\r\n```\r\n\r\nTo generate 3000 agent policies each with 20 agents in.\r\n\r\nI then modified the agent policies so that they require an upgrade, as\r\n`system_indices_superuser` run:\r\n\r\n```\r\nPOST /.kibana/_update_by_query\r\n{\r\n \"query\": {\r\n \"bool\": {\r\n \"filter\": [\r\n {\r\n \"term\": {\r\n \"type\": \"ingest-agent-policies\"\r\n }\r\n }\r\n ]\r\n }\r\n },\r\n \"script\": {\r\n \"source\": \"ctx._source['ingest-agent-policies'].remove('schema_version')\",\r\n \"lang\": \"painless\"\r\n }\r\n}\r\n```\r\n\r\nrestarting kibana will run the setup and in batches.","sha":"6e06452aac11ed22efa923284fbb9ad4da1f7ce1"}}]}] BACKPORT-->

hop-dev added 2 commits February 9, 2023 13:38

create agents in batches for test data generation

336efe1

make agent policy version upgrade handle high number of policies

9cb8d3b

hop-dev force-pushed the 150538-agent-policy-upgrade-resilience branch from aecd936 to 9cb8d3b Compare February 9, 2023 13:38

hop-dev added backport:all-open Backport to all branches that could still receive a release release_note:enhancement Team:Fleet Team label for Observability Data Collection Fleet team labels Feb 9, 2023

hop-dev changed the title ~~150538 agent policy upgrade resilience~~ [Fleet] Run agent policy schema in batches during fleet setup + add xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize config Feb 9, 2023

hop-dev marked this pull request as ready for review February 9, 2023 15:18

hop-dev requested a review from a team as a code owner February 9, 2023 15:18

hop-dev commented Feb 9, 2023

View reviewed changes

juliaElastic reviewed Feb 9, 2023

View reviewed changes

x-pack/plugins/fleet/server/services/setup/upgrade_agent_policy_schema_version.ts Outdated Show resolved Hide resolved

juliaElastic reviewed Feb 9, 2023

View reviewed changes

kpollich approved these changes Feb 9, 2023

View reviewed changes

x-pack/plugins/fleet/server/services/setup/upgrade_agent_policy_schema_version.ts Outdated Show resolved Hide resolved

move default batch size to const

1304086

hop-dev enabled auto-merge (squash) February 9, 2023 16:10

hop-dev merged commit 6e06452 into elastic:main Feb 9, 2023

kibanamachine added the v8.8.0 label Feb 9, 2023

kibanamachine mentioned this pull request Feb 9, 2023

[8.7] [Fleet] Run agent policy schema in batches during fleet setup + add xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize config (#150688) #150750

Merged

kibanamachine added the v8.7.0 label Feb 9, 2023

hop-dev mentioned this pull request Feb 9, 2023

[8.6] [Fleet] Run agent policy schema in batches during fleet setup + add xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize config (#150688) #150781

Merged

hop-dev deleted the 150538-agent-policy-upgrade-resilience branch February 9, 2023 21:18

kibanamachine added the v8.6.2 label Feb 9, 2023

juliaElastic mentioned this pull request Jun 14, 2023

[Fleet]: Kibana upgrade failed from 8.7.1>8.8.0 BC8 when multiple agent policies with integrations exist. #158361

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Run agent policy schema in batches during fleet setup + add `xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` config #150688

[Fleet] Run agent policy schema in batches during fleet setup + add `xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` config #150688

hop-dev commented Feb 9, 2023 •

edited by kibanamachine

Loading

elasticmachine commented Feb 9, 2023

hop-dev Feb 9, 2023

hop-dev Feb 9, 2023

juliaElastic Feb 9, 2023

hop-dev Feb 9, 2023 •

edited

Loading

kpollich left a comment

kibana-ci commented Feb 9, 2023

kibanamachine commented Feb 9, 2023

hop-dev commented Feb 9, 2023

		@@ -25,6 +25,8 @@ const printUsage = () =>
		[--kibana]: full url of kibana instance to create agents and policy in e.g http://localhost:5601/mybase, defaults to http://localhost:5601

[Fleet] Run agent policy schema in batches during fleet setup + add xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize config #150688

[Fleet] Run agent policy schema in batches during fleet setup + add xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize config #150688

Conversation

hop-dev commented Feb 9, 2023 • edited by kibanamachine Loading

Summary

Dev testing

elasticmachine commented Feb 9, 2023

hop-dev Feb 9, 2023

Choose a reason for hiding this comment

hop-dev Feb 9, 2023

Choose a reason for hiding this comment

juliaElastic Feb 9, 2023

Choose a reason for hiding this comment

hop-dev Feb 9, 2023 • edited Loading

Choose a reason for hiding this comment

kpollich left a comment

Choose a reason for hiding this comment

kibana-ci commented Feb 9, 2023

💚 Build Succeeded

Metrics [docs]

History

kibanamachine commented Feb 9, 2023

💔 Some backports could not be created

Manual backport

Questions ?

hop-dev commented Feb 9, 2023

💚 All backports created successfully

Questions ?

[Fleet] Run agent policy schema in batches during fleet setup + add `xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` config #150688

[Fleet] Run agent policy schema in batches during fleet setup + add `xpack.fleet.setup.agentPolicySchemaUpgradeBatchSize` config #150688

hop-dev commented Feb 9, 2023 •

edited by kibanamachine

Loading

hop-dev Feb 9, 2023 •

edited

Loading